Introduction: Sequences and consequences


  • Cheryl A. Kerfeld

    Corresponding author
    1. Department of Plant and Microbial Biology, University of California, Berkeley, California
    2. Synthetic Biology Institute, University of California, Berkeley, California
    • Department of Energy, Joint Genome Institute, Walnut Creek, California
    Search for more papers by this author

  • This work is supported by U.S. Department of Energy Under Contract No. DE-AC02-05CH11231 and by NSF (MCB0851094 and EF1105897).

Address for correspondence to: Cheryl A. Kerfeld, Department of Energy Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, California 94598; E-mail:

Almost a decade ago, Bio2010: Transforming Life Sciences Education for Future Research Biologists [1] called for a reformulation of undergraduate life sciences majors' instruction, citing that the typical curriculum no longer reflected modern research in biology. It advocated the importance of giving students the experience of real research to better understand biological principles and because experiencing “the power and beauty of creative inquiry” is the best way to engage students in learning about science.

Likewise, in the last decade, new sequencing technologies and the development of computational tools to explore genomic sequence data have transformed life sciences research. Genomes are being sequenced at an increasingly fast pace [2, 3]. The results of this technological advance are increasingly informing research. This is reflected in the number of citations in the PubMed database for articles involving sequence analysis; there were less than 1,200 in 1992 and over 200,000 in 2011 (Fig. 1a). The ability to readily interpret genomic data required concomitant advances in computational methods for making sense of raw sequence data. The number of articles published in computational biology has increased over the last 20 years with the greatest growth seen over the last decade (Fig. 1b). As shown in Fig. 1c, there were at least 45,295 articles published related to bioinformatics in the last five years (dark gray portion of pie graph). That is nearly double what was published in the previous five years (22,721 articles as shown in the light gray portion of the pie graph). Collectively, these data and the tools to interpret them have enabled researchers in fields across the continuum of biological organization—from molecule to ecosystem—to situate their experimental results in a genomic context. In a manner analogous to the way the inventions of the telescope and the microscope opened up unseen worlds for scientific exploration, genomics and bioinformatics has created a new way to view life.

Figure 1.

Number of NCBI PubMed articles published per year indexed according to the National Library of Medicine's Medical Subject Headings (MeSH) terms. MeSH is a controlled vocabulary thesaurus with hierarchical organization used for describing the content of a bibliographic reference. (a) Sequence Analysis. New PubMed articles published per year for the last 20 years indexed with the MeSH term Sequence Analysis, which falls hierarchically under Genetic Techniques and encompasses DNA, RNA, and protein sequence analysis and annotation. (b) Computational Biology. New PubMed articles published per year for the last 20 years indexed with the MeSH term Computational Biology, which is the MeSH index term for bioinformatics. Computational Biology encompasses computational methods and computer-based techniques for solving biological problems. (c) Total number of PubMed articles published in the last 10 years using the MeSH index Computational Biology (68,016). In the last five year period, from 2007 to 2011, the number of articles related to or using bioinformatics doubled from the previous five year period (2002–2006).

Genomics and bioinformatics have likewise created new opportunities for the teaching of and learning in the life sciences. Empowering students to work with real data and state-of-the art computational tools can catalyze the reforms in undergraduate life sciences instruction called for by Bio2010 [1] and Vision and Change [4]. Methodologically, it requires only a computer and an internet connection to introduce bioinformatics into a course. Conceptually, bioinformatics starts with the chemical formula for an organic molecule that can be used to trace connections from predicted molecular behavior to organismal fitness. Accordingly, genomics and bioinformatics can be used to illustrate and interconnect concepts across the life sciences curriculum.

At the same time, it can be used to give students an experience of research in the context of their courses. Working with bioinformatics tools and genomics data can be scaled to provide research experience to large numbers of students, because it can be provided to students in parallel. Each student uses the same set of tools but has their own unique sequence data set. It makes tractable the aim of giving all students a chance to build their understanding of the life sciences with bioinformatics tools and real data.

To realize the potential for the use of genomics and bioinformatics across the life sciences curriculum, the Genomics and Bioinformatics Education Program of the Department of Energy Joint Genome Institute (JGI) created IMG-ACT (Integrated Microbial Genomes Annotation Collaboration Toolkit). IMG-ACT is a fusion of a flexible rich text editor and web portal through which students explore microbial genome datasets and record their findings in an online notebook. IMG-ACT consolidates various databases and tools used in microbial genome analysis [5]. It is structured as a series of modules which are essentially guides to different types of bioinformatic analysis. This modularity enables faculty to tailor the experience for their particular pedagogical goals. IMG-ACT is useful for both novices and experts alike; beginners may rely on the tutorials that are only a click away, while experts appreciate the seamless workflow that enables them to compile the results of their bioinformatic research in an organized form. For the instructor, IMG-ACT enables both assigning and viewing student work on-line, making it feasible to involve large numbers of students in data analysis. The potential uses of IMG-ACT are manifold (Fig. 2). As of August 2012, 250 instructors at over 126 colleges and universities have used IMG-ACT with over 5,685 students.

Figure 2.

IMG-ACT serves as a hub to network courses, students, and research experience. [Color figure can be viewed in the online issue, which is available at]

In this issue, BAMBED features four articles by faculty from diverse institutions recounting their use of genomics and bioinformatics in their courses. They capture some of the multiplicity of ways that genomics and bioinformatics can be used to realize specific pedagogical aims. Ditty et al., describe how IMG-ACT serves as the link between students at the University of St Thomas in St Paul, a Primarily Undergraduate Institution, and UC Davis, a research intensive institution. This enabled not only the connection of bioinformatics and wet lab experiments, but also an integration of teaching and research at both institutions that led to a publication. At UCLA, IMG-ACT likewise plays a key role in a laboratory-based research project that is driven by peer-to-peer learning. The UCLA project illustrates how providing bioinformatics research experience in conjunction with a wet lab course enables students to realize that they may have a predilection for computational or for experimental work. Those who gravitate toward the computational prove to be excellent mentors for their peers, especially adept at communicating the “big picture” and sharing their enthusiasm for the research. At Austin College in San Antonio, students from Biochemistry and Microbiology courses are working together in a model interdisciplinary collaboration to study amino acid biosynthesis in a bacterium from a remote branch of the tree of life. They are discovering first-hand that annotations are hypotheses that need to be tested in the laboratory and that nature has a seemingly endless number of variations on the textbook versions of metabolic pathways. At Salt Lake Community College students also work in teams to study how genes are identified and organized in a halophile, a type of extremophile with a lifestyle of local interest. In this context, conceptual understanding is explicitly framed by considering the scientific method. Here too, communication among students is foregrounded, and those students who take a special interest in their genomics and bioinformatics coursework have the opportunity to build on it in an independent research project.

The underlying themes threading through all of these articles are collaboration and conceptual integration between courses, between schools, across levels of students, and between computational biology and wet lab research. In all cases students are working together with real data and tools to become knowledge producers. They are experiencing first-hand how algorithms translate the principles of biology into mathematical form [6]. Active learning with genomics and bioinformatics provides an authentic research experience and includes important lessons only available through working with real data and all of its ambiguities; for example, it demonstrates the fallibility of gene annotations, the need to critically think about the evidence that is returned from the internet.

The modern approach to understanding “what is life” has been transformed by “–omics” technologies from a reductionist focus on single molecules in isolation to an interdisciplinary integration of experimental and in silico data. The breadth of examples in these articles showcases the creativity of the faculty using these tools and demonstrates the power of genomics and bioinformatics for linking theory and modern practice across the life sciences curriculum. To keep pace with modern research methods and expand its versatility, IMG-ACT is under continual development. Opportunities to make explicit connections between genomic data and evolution and the studies of microbial communities are coming soon; a metagenome analysis toolkit as well as a module for exploring the evolutionary relationships between plant and cyanobacterial genes for the photosynthetic apparatus are being built. The articles in this issue are a snapshot of where we are now in using genomics and bioinformatics in various courses and types of institutions. They demonstrate how genomics and bioinformatics can be used in diverse curricular niches to help students forge interconnections from molecular function to the Darwinian definition of function, a curricular synthesis of consequence.


The author would like to thank James Bristow and Edwin Rubin whose visionary leadership at the JGI enabled the creation of IMG-ACT. Likewise, Daniel Drell (DOE headquarters) and Jonathan Eisen (UC Davis) provided both inspiration and conceptual support. IMG-ACT was created by the JGI Informatics Team; David Hays and Rene Perrier, in particular, played key roles. The author also thanks the Integrated Microbial Genomes Team, especially Amy Chen, Konstantinos Mavrommatis and Ernest Szeto for making possible the seamless integration of IMG EDU with IMG-ACT. The IMG-ACT system was in large part designed by faculty advisers who first met at the JGI in June 2007: Zhaohui Xu (Bowling Green State University), Sharyn Freyermuth (University of Missouri-Columbia), Kelynne Reed (Austin College), Jayna L. Ditty (The University of St. Thomas), Christopher Kvaal (St. Cloud State University), Cheryl Bailey (University of Nebraska), Sabine Heinhorst (University of Southern Mississippi), Kathleen Scott (University of South Florida), Robert Britton (Michigan State University), Erin Sanders (University of California, Los Angeles), Rick Johns (Northern Illinois University), A. Malcolm Campbell (Davidson College), Brad Goodner (Hiram College), and Stuart Gordon (Hiram College). The creativity, commitment, and generosity with their time is the key reason for the success of IMG-ACT. Seth Axen (JGI) and Jordan Moberg-Parker contributed figures and statistics used in this article. Finally, the author acknowledges Desiree Stanley, Edwin Kim, and especially Seth Axen, members of her team who have contributed to the creation and management of the IMG-ACT system.