Integration of bioinformatics into an undergraduate biology curriculum and the impact on development of mathematical skills

Authors


Abstract

The development of fields such as bioinformatics and genomics has created new challenges and opportunities for undergraduate biology curricula. Students preparing for careers in science, technology, and medicine need more intensive study of bioinformatics and more sophisticated training in the mathematics on which this field is based. In this study, we deliberately integrated bioinformatics instruction at multiple course levels into an existing biology curriculum. Students in an introductory biology course, intermediate lab courses, and advanced project-oriented courses all participated in new course components designed to sequentially introduce bioinformatics skills and knowledge, as well as computational approaches that are common to many bioinformatics applications. In each course, bioinformatics learning was embedded in an existing disciplinary instructional sequence, as opposed to having a single course where all bioinformatics learning occurs. We designed direct and indirect assessment tools to follow student progress through the course sequence. Our data show significant gains in both student confidence and ability in bioinformatics during individual courses and as course level increases. Despite evidence of substantial student learning in both bioinformatics and mathematics, students were skeptical about the link between learning bioinformatics and learning mathematics. While our approach resulted in substantial learning gains, student “buy-in” and engagement might be better in longer project-based activities that demand application of skills to research problems. Nevertheless, in situations where a concentrated focus on project-oriented bioinformatics is not possible or desirable, our approach of integrating multiple smaller components into an existing curriculum provides an alternative.

INTRODUCTION

The twentieth century saw the fractionation of the natural sciences in general, and biology in particular, into multiple disciplines that have allowed for a progressively more detailed and sophisticated understanding of the mechanisms of natural processes [1, 2]. However, this disciplinary emphasis has come at a cost; it has become increasingly difficult for scientists to communicate with each other across disciplinary boundaries [3]. In particular, many biologists lack a sophisticated understanding of the mathematical algorithms that power important computational tools they use [4]. Over the last decade, many scientists and educators have begun to emphasize the need for integration among the natural sciences and mathematics [5, 6]. In highly interdisciplinary fields such as genomics, the need for effective integration of biology and mathematics is especially acute. This level of integration demands that scientists of the future are able to synthesize ideas and approaches from different disciplines. This is also true for physicians, who face an increasingly quantitative and technologically complex profession [7, 8]. Moreover, an emphasis on synthesis in science has been appreciated as effective pedagogy: students learn better if they can apply familiar concepts to various problems introduced within the context of different disciplines [9–11]. By establishing connections between mathematics and biology, students' understanding of each discipline is improved.

Bioinformatics and genomics education provide an ideal opportunity to emphasize quantitative approaches to solving biological problems [12–14]. Bioinformatics, in particular, has grown out of the combined efforts of computational scientists and geneticists. In defining bioinformatics, the National Center for Biotechnology Information (NCBI) describes a focus in the field as “analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures” and notes the importance of using and developing “algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets” [15]. Our experience has suggested that many undergraduate life sciences majors possess relatively weak mathematical skills despite years of mathematics education at the high school level and during early college years. Some students may choose to major in the life sciences because they perceive it to be less quantitative and therefore more accessible [16]. These students often present with great enthusiasm for the study of biology and related disciplines. Therefore, bioinformatics (like ecology) provides an opportunity to bring math to students in a context that takes advantage of their intrinsic interests.

Jackson [13] has outlined the challenges and opportunities for undergraduate education in bioinformatics. Among his many suggestions are: (1) developing introductory biology courses that feature mathematical applications, (2) introducing substantially more mathematics into existing biology courses, (3) offer a computer-based course as an applied laboratory in bioinformatics, and (4) invest in faculty development and interdisciplinary collaboration. We implemented aspects of these suggestions while simultaneously improving undergraduate education in genomics by systematically introducing bioinformatics into existing undergraduate biology and biochemistry curricula at multiple levels. This approach is an adaptation of a similar approach of “bioinformatics across the curriculum” taken by the University of Wisconsin-La Crosse and Kalamazoo College [17, 18]; instead of creating specialized “bioinformatics courses” that will attract only a small number of students, quantitative approaches and bioinformatics are systematically woven into multiple biology and chemistry courses.

The project was designed and executed at Muhlenberg College, an independent private liberal arts college of ∼2,200 undergraduate students located in Allentown, PA. The college offers academic majors that include biochemistry, biology, chemistry, computer science, environmental science, neuroscience, and mathematics. The biology, biochemistry, and neuroscience majors require 1–2 semesters of calculus and completion of a three-course core Principles of Biology I-III (BIO 150, 151, and 152), in addition to cognates in chemistry and physics. The BIO 150/151/152 core sequence is organized around a “big to small” reductionist sequence: the first semester covers ecology, evolution and diversity, the second semester organismal biology, and the last semester cell and molecular biology. After completion of the core sequence, students are free to choose among various upper-level elective courses.

With this curriculum as a starting point, we identified target courses in which bioinformatics and genomics content could be introduced in stages throughout the course sequence. Our project had two interconnected goals: increasing students' knowledge of and ability to work in growing bioinformatics fields, in particular genomics, while increasing their quantitative literacy. We designed developmentally appropriate in silico and wet laboratory projects for courses at the introductory level (BIO 152, Principles of Biology III: Molecules and Cells), intermediate level (BIO 215, Genetics and BIO 220, Biochemistry), and advanced level (BIO 472, Genomes and Gene Evolution). For the introductory and intermediate level courses, bioinformatics and mathematics instruction were embedded into courses that did not emphasize these subjects over the whole of course content, but were delivered to a majority of life science majors. For the advanced course, bioinformatics and mathematics instruction were integral to the whole of course content and were delivered to a smaller population of self-selected students. These features of the curriculum allowed us to explore student learning and perception of mathematics and bioinformatics in different populations at different developmental levels.

PROJECT DESIGN

Introductory Biology

The first part of this project was expanding the bioinformatics experience in BIO 152, the third course in the core biology sequence. BIO 152 is offered every fall semester and consists of three components: lecture, recitation, and lab. Students enroll in one of two lecture sections of 35–75 students per section. Students concurrently enroll in a “wet” laboratory section of ∼16 students per section (three hours per week). The students also must enroll in a recitation section of 10–18 students per section (1 hour per week), taught by the lecture instructors. These recitation sections provide an opportunity for problem-solving, group projects, discussion, and peer-tutoring. During the project years of 2009–2011, the course enrolled between 90 and 125 students each Fall semester. We used the recitation sections as the primary component for integrating bioinformatics instruction into BIO 152. We employed Young [19] as a starting place to develop a series of directed recitation exercises that led students through the basics of databases (Genbank, Prosite), information retrieval (OMIM, PubMed, FASTA format, NCBI Structure/PDB Viewer) and local alignment (BLAST). In the course of this analysis, students were instructed about alignment scores and E values, realizing some of the quantitative goals of Jackson [13]. We created a local National Center for Biotechnology Information (NCBI) web portal, described more fully below, to improve the accessibility of NCBI utilities for students. The Fall 2011 BIO 152 cohort was introduced to the NCBI utilities as in previous years and then, in conjunction with introduction of the web portal, students were given a culminating assignment that required them to address a biological problem by drawing on information that can be found in the various NCBI databases. Thus we aimed to strike a balance between leading students to specific pedagogical goals and giving them the freedom to explore the databases.

A microarray data analysis project was also added to complement the existing bioinformatics exercises. The BIO 152 microarray component began with an overview of the basics of two-color microarray technology [14]. In recitation sections, students utilized sample data from existing microarray expression data using one of several methods developed through the Genome Consortium for Active Teaching (GCAT) [14, 20]. The focus of the students' work was on analyzing and interpreting the data. They determined expression ratios, explored the value of log transformation and other data manipulations, and considered the variability observed in their simulated experiment [21]. All of these basic tasks could be accomplished with basic familiarity with Microsoft Excel, although instructors did provide specific instruction on how to introduce commands for basic calculations into spreadsheets. Thus the focus of the student experience in recitation was on the quantitative methods to evaluate data, rather than the technical details of how microarrays are constructed and probed (the basics of these concepts were covered in lecture sections). While the exercise was directed, to the extent that students were led toward quantitative aspects of the data analysis, it was also sufficiently open-ended to allow students to make choices about which data sets to consider and which choices to make when evaluating their data. The original course materials are available at the Bioinformatics Education web page at Muhlenberg College [22].

Bioinformatics Central Web Portal

An important component of bioinformatics education for BIO 152 and upper-level courses was the development of a new local web portal for NCBI, Bioinformatics Central at the Muhlenberg College Trexler Library [23]. We constructed a website to support student learning within individual Biology courses and to provide a common interface for students from one level of course to the next. The expectation was that a familiar utility would help students retain bioinformatics skills learned at lower course levels when they reencounter the need for these skills in intermediate and upper-level courses. As is the case for many new skills, students may master them at the introductory level, but then have difficulty recalling what they learned a year or two later when it is needed in an upper-level course. They may become confused when they return to a website many months after learning it only to discover the design of the webpage has changed (NCBI updates its webpages frequently). This resource provides a simple and direct avenue for basic bioinformatics investigations by students in BIO 152 recitations and is useful as a common interface as students move into different upper-level electives and independent research. The Bioinformatics Central web portal was developed in conjunction with Muhlenberg Library and hosted on their website [23]. This had the additional advantage of integrated the efforts in expanding informatics at the library with the needs of science students and fostered local inter-departmental cooperation and innovation. The Bioinformatics Central web resource is actively maintained and modified as NCBI's web interface changes over time and is accessible to anyone on the internet.

Intermediate Biology (Biochemistry)

BIO 220 Biochemistry covers basic biochemical topics including structure and function of nucleic acids and proteins; an introduction to enzyme kinetics and regulation; and aspects of metabolism and signal transduction. During the study years of 2010 and 2011, BIO 220 enrolled a total of 43 and 33 students, respectively. The course employs a PCR-based laboratory as one way to stimulate students' thinking about nucleic acid polymerization and hybridization. Past experiments have included detecting genetic modification of plants or plant products through amplification of conserved regulatory sequences while the current version of the lab module involves identifying new transgenic plant lines, thereby inviting students to experience a common research application of PCR.

As part of this project, students further explored plant genetic engineering, particularly the implications of chromatin structure for gene expression, by assessing mRNA levels using qPCR. RNA was isolated from Arabidopsis plants, which students had previously genotyped for a transgene of interest, using an RNeasy Plant Mini Kit (Ambion). Students were provided with RNA from wild-type and previously established over-expression lines and asked to consider how these samples might be useful in the experimental design. Following reverse transcriptase reactions using a High Capacity RNA to cDNA Kit (Applied Biosystems), qPCR was carried out using TaqMan Gene Expression Assays At02221927_g1 (ADA2b) and At02270958_gH (ACT8) on a StepOnePlus system using Fast Taqman Master Mix (all from Applied Biosystems). The data were analyzed using the comparative ΔΔCT method [24, 25] to quantitate differences in expression levels for individual plants/transgenic lines relative to wild-type. We note that teaching laboratories with more limited resources could still capture the same educational goals using gel-based quantitative PCR techniques [26].

Students evaluated their data based on their initial hypothesis, developed with the knowledge that the transgenes are under the control a constitutive promoter known to be capable of driving high level of expression in plant systems. They also reflected on the effects of genome structure on transcription: Why aren't all inserted transgenes expressed to the same level? What might be the effect of insertion at different genomic sites? Students needed to articulate the role of various controls including the reason for assaying the expression of a “housekeeping” gene in their samples along with the target of interest. They were also asked to explain the calculation of relative levels of gene expression between samples and evaluate whether data sets from multiple replicates in different lab sections are equally reliable. Taken together, these strategies helped to reinforce mathematical concepts and applications in the context of bioinformatics.

Intermediate Biology (Genetics)

BIO 215 Genetics is taught annually and covers Mendelian, molecular and population genetics, with an emphasis on the practice of genetics, especially gene mapping and analytical techniques. The laboratory experience focuses on a limited number of longitudinal experiments that allow students to explore different model systems and different genetic concepts. For example, students follow Drosophila populations over 10 weeks while modeling the effect of different evolutionary forces on allele frequency. The laboratory sequence is by its nature relatively quantitative, with repeated experiences in probability (mapping and allele frequencies), simple applied algebra (population biology), and graphing logarithmic functions (gel electrophoresis).

As part of this project a new open-ended genomics project was implemented into the BIO 215 laboratory. Students were asked to use bioinformatics search tools at NCBI and the Yeast Genome Database (www.yeastgenome.org) to identify a defined yeast gene of interest to them. Many students chose genes that are homologous with human genes associated with disease. Students received instruction on parameters for oligonucleotide design for real-time qPCR and designed oligonucleotide pairs for amplification using SYBR green detection methods [27]. Students were then challenged to identify culture conditions that they expect to affect the expression of their gene of interest. Allowing students to identify their own targets and their own conditions helped to create a sense of student ownership of the project and a sense of engagement in hypothesis-driven exploratory research. BIO 215 students prepared total RNA using a RiboPure Yeast RNA kit (Ambion) as directed by the manufacturer, and synthesized cDNA using a High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems). Real-time qPCR was conducted on a StepOnePlus system (Applied Biosystems) using Fast SYBR Green Master Mix (Applied Biosystems). An internal reference of 18S rRNA target replicates was included in all experiments. Most of one instructional period was devoted to how to interpret melt curves and calculate ΔΔCT values for estimating comparative gene expression [24, 25], including the limitations of this approach. In thecourse of this analysis, students had to deal with logarithmic functions and graphing, as well as essential concepts in calculus (maximum rates of change and derivatives in the context of melt curves). Again, teaching laboratories could achieve this same goal without access to expensive real-time qPCR technology.

During the study years of 2010 and 2011, BIO 215 enrolled 23 and 19 students, respectively. Because BIO 215 and BIO 220 are offered at the same time, the two study populations do not overlap. Taken together, enrollments in both intermediate level classes constitute 89% of declared life sciences majors in both study years. Thus a large majority of life science majors were exposed to bioinformatics instruction at both the introductory and intermediate levels.

Advanced Specialized Course (Genomes and Gene Evolution)

Genomes and Gene Evolution (GGE) is an upper-level, seminar-style course that covers evolution and development (Evo-Devo), molecular evolution and systematics, and comparative, regulatory, and functional genomics. GGE focuses on the tools used for comparative evolutionary studies, such as global alignment, distance matrices, phylogenetic algorithms and tree-building, and for regulatory and functional genomics, especially microarrays. As such, the course involves the most intensive bioinformatics content of any course currently offered at Muhlenberg College. In the laboratory component of the course, students identified a gene family using NCBI tools including BLAST, and then performed an analysis of predicted evolutionary relationships among family members using CLUSTAL W [28] and phylogenetic tree-building using MEGA 4.0 [29]. We discussed different algorithms for tree-building, such as UPGMA and Neighbor-joining, comparing strengths and weaknesses of each approach, and computational techniques such as bootstrapping and rooting. Guest lectures by Clifton Kussmaul, a faculty member in the Math and Computer Science Department, provided critical expertise in computation and algorithms as they relate to bioinformatics search queries. The second portion of the laboratory involved a microarray experiment using yeast chips supplied and scanned by GCAT at Davidson College [14]. In these experiments students used two-color microarray analysis to compare gene expression between mutant and wild-type yeast strains, which forced them to consider experimental design and technical obstacles. Students also began their analysis with raw data (.tif files) from scans of the microarrays and conduct data analysis using MAGIC Tool [30], forcing them to consider many of the parameters and choices involved in this type of data analysis. They must make decisions about log transformation of data, background subtraction, and other issues that require quantitative analysis of how data manipulation influences results. Thus the course is relatively math-intensive in comparison to other biology courses.

During the study years of 2010 and 2011, GGE rostered 10 students (8 Biology, 1 Neuroscience, and 1 Biochemistry majors) and 8 students (5 Biology and 3 Biochemistry majors), respectively. None of the students had a Math or Computer Science second major or minor, but a few had taken advanced math or computer science courses. While GGE counted as an elective in the Biology and Biochemistry majors, it was not a required course for either major.

Assessment of Student Attitudes and Learning Goals

We designed a survey of student attitudes, coupled with a direct assessment of student performance on specific bioinformatics and mathematical skills. Assessment items were designed with the entire curriculum in mind, but all skills being assessed were initially introduced into the curriculum at the introductory level (BIO 152). The survey portion consisted of 14 questions asking students to report on their confidence in their ability to perform and understanding of bioinformatics and mathematical skills, followed by 10 multiple-choice questions directly assessing their ability in both areas. Confidence questions were scored on a scale of 1-5, with 1 defined as “Strongly Disagree” and 5 defined as “Strongly Agree.” Mathematical manipulations were described in the context of bioinformatics applications, but were designed with the expectation that students with good basic math skills should be able to solve them, even if they were unfamiliar with the concepts and vocabulary of bioinformatics. The complete text of the Bioinformatics Assessment is available in the Supporting Information and also on the Muhlenberg College Biology website [22]. The assessment was conducted in the first week (Pre-assessment) and last week (Post-assessment) of class for all four courses in this study, from Fall 2009 through Spring 2011. Students were not informed of the assessment ahead of time, so they could not prepare for it. Therefore, the direct assessment portion of our survey captures knowledge and skills that students can recall and apply on demand. The assessment tool itself and the assessment plan were reviewed by the Muhlenberg College Institutional Review Board and deemed exempt from full-panel review. Data were analyzed by the Muhlenberg College Office of Assessment under the supervision of Dr. Kathy Harring, Associate Dean for Institutional Assessment. Significance of our data across multiple variables was evaluating by Multivariate ANOVA. Significance of pair-wise comparisons was evaluated by Chi-square analysis.

In order to clarify further the data we obtained from our systematic learning assessments, we also solicited brief essays from 10 seniors who had taken the courses in this study at most or all levels and/or had performed genomics-related laboratory research. Seniors were asked to write brief, anonymous essays responding to the following prompts: “1. Looking back on the bioinformatics assignments in BIO 152 recitation, how did you feel about them at the time. Were they useful? Challenging? Trivial? Interesting? Boring? 2. Looking back on the bioinformatics assignments in BIO 152 with the perspective of all your later experiences, does your view of the bioinformatics assignments in BIO 152 change? If so, please describe how. 3. In your view, what would be the advantages and/or disadvantages to delaying bioinformatics education until upper-level courses?” We received seven complete anonymous essays, which were independently read by both study authors. We then looked for patterns in students' comments that would help contextualize and interpret the quantitative data amassed from the larger study. Student responses were very consistent among the essays, with the central themes of respondents being unambiguous.

The Bioinformatics Central website was pilot-tested with a small group of 10 BIO 220 students during the Spring of 2011 and implemented in BIO 152 during the Fall of 2011, after the website was complete and operational. The effectiveness of the Bioinformatics Central website was assessed using a brief on-line survey of student confidence in learning bioinformatics utilities. Students responded to these questions in the first week of the semester (pre-survey 1), two-thirds of the way through the semester before the introduction of the web portal and the culminating assignment (pre-survey 2), and in the final week of the semester (post-survey 3). Students were asked to rate their proficiency with six NCBI utilities, selecting from answer choices ranging from 0 “Never used it” to 5 “Extremely proficient”. In addition, students indicated what resources they employed to learn how to navigate these databases (e.g., NCBI website help, course instruction, library instruction, Google search). The complete text of this survey is available in the Supporting Information and also on the Muhlenberg College Biology website [22].

Finally, we embedded questions in routine course assessments from students in BIO 215 to help differentiate between different pedagogies as they relate to the goals of this work. Students were asked to rate different course components and laboratory activities, including the qPCR experiment that was part of this initiative, in terms of how they impacted their “overall experience in learning genetics,” “help in learning bioinformatics and genomics,” and “confidence in applying mathematics to biological systems.” For this smaller survey, students were asked to score each item on a simple three point scale: 0, “not at all”; 1, “somewhat”; 2, “a lot.” Data were evaluated by calculating the average score and the percentage of students who rated an activity as impacting on learning “a lot.”

PROJECT OUTCOMES

Introductory Level Instruction

Students enter their sophomore year of study in biology with a low level of confidence in and familiarity with bioinformatics (Table I). Precourse evaluations on specific bioinformatics applications (items 4, 5, 7) were scored from 1.16 to 1.45, with literature informatics (item 1) slightly higher at 2.57. Therefore, most students strongly disagreed with statements that they were confident in their abilities to use BLAST and understand E scores. They also disagreed with the statement that they were confident in their abilities to use literature search tools, such as PubMed. In contrast, students reported confidence (3.75–4.25) in their basic mathematical skills (items 10–12). The difference between the bioinformatics and mathematics confidence ratios may reflect familiarity—all students have studies these specific topics in high school, while very few have been introduced to the language and skills of bioinformatics. When challenged with the assertion that studying bioinformatics would help strengthen their mathematical skills (item 14), students were unsure (3.17).

Table I. Student attitudes towards bioinformatics and math
Item #Course levelIntroductoryIntermediateAdvanced
Statement Text I am confident in my ability to…PREPOSTPREPOSTPREPOST
  1. Table shows average student-reported confidence on a scale of 1 (strongly disagree) to 5 (strongly agree).

  2. Underlining: Post scores significantly different from Pre scores at p < 0.05.

  3. Bold: student confidence that exceeds average 4.0 (agree).

1Use the science reference tools PubMed, NCBI Bookshelf, and OMIM.2.57 3.99 4.03 4.173.89 4.00
4Use BLAST to make comparisons between a sequence and a database.1.45 3.833.00 4.373.78 4.70
5Know when to use BLASTN, BLASTP or BLASTX.1.16 3.362.67 3.902.78 4.30
7Interpret the significance of E scores when comparing a sequence to a database.1.24 3.562.94 4.322.78 4.80
10Work with mathematical ratios. 4.25 4.25 4.25 4.35 4.22 4.20
11Calculate probabilities. 4.02 4.18 4.17 4.303.893.40
12Calculate logarithms.3.754.213.973.903.673.40
14Studying bioinformatics helps strengthen my mathematical skills.3.173.233.313.133.333.20
Total number of students12310864601920

As expected from the self-reported lack of familiarity with bioinformatics, students entering BIO 152 performed poorly on direct tests of their bioinformatics skills (Table II). Students answered questions about bioinformatics applications (items 15, 18–20) correctly only 18.2–39.5% of the time. However, students also performed poorly on basic mathematical tasks, in contrast to their self-reported confidence. When challenged with items that demanded a basic working understanding of ratios and logarithms (items 23–24), students answered correctly only 37.5% of the time. Therefore, self-reported confidence in math skills is not necessarily a good indicator of actual ability on demand or in a new context.

Table II. Student performance on direct assessment of skills
Item #Course Level:% Correct
IntroductoryIntermediateAdvanced
Skill testedPREPOSTPREPOSTPREPOST
  1. Underlining: Post scores significantly different from Pre scores at p < 0.05.

  2. Bold: student performance that exceeds 75% proficiency.

15Bioinformatic utility used to identify primary research articles [PubMed]19.2 30.446.0 62.164.753.3
18Bioinformatic utility used to human protein most similar to purified peptide [BLASTP]39.5 71.3 76.2 77.6 81.2 80.0
19Meaning of E score = 0 in BLAST result [no chance of match being due to chance]19.3 54.945.858.656.3 86.7
20Difference between local and global alignment [local is faster]18.2 33.331.732.825.0 60.0
22Two color microarray experimental design and ratios [labeling of one sample in error]34.5 51.050.860.352.9 86.7
23Log2 calculation [four-fold induction is a log2 ratio of 2]37.545.146.056.962.5 100.0
24Ratios and magnitude of change [50,000/10,000 and 5000/25,000 are same magnitude of change]37.5 65.053.2 70.260.066.7
Total number of students12310864601920

When students were reassessed at the end of BIO 152, their confidence in literature informatics and bioinformatics skills increased significantly to 3.36–3.99 (Table I), with most students agreeing that they were confident in most bioinformatics applications (the one exception was item 5, knowing when to apply the different forms of BLAST, which was slightly lower than the others but still significantly different from preassessment). Student performance on bioinformatics skills increased significantly for all items (Table II), justifying their reported increased confidence. Performance on any particular item did not break our arbitrarily desired 75% performance level. Nonetheless, students demonstrated significant gains in both confidence and understanding upon the introduction of bioinformatics concepts and applications early in the curriculum.

Evaluation of student mathematical confidence and skills at the end of BIO 152 produced mixed results. Students did not report a significant increase in their already high level of confidence (Table I). When challenged with basic mathematics problems, students performed better on both logarithm and ratio problems (items 23–24, Table II); however the modest difference in the logarithm problem did not pass the threshold for statistical significance. Even though we could measure an improvement in student performance on some basic mathematical skills, they remained skeptical of the claim that bioinformatics instruction is helpful to learning math (item 14, Table I). We estimated that 41% of students in BIO 152 were simultaneously enrolled in a math or physics course. So it is possible that the gains we measured were not due solely to math delivered as part of bioinformatics instruction. Because our survey was anonymous and we did not include an item about co-enrollment in a mathematics course, it was not possible for us to control for this variable. With that caveat, we conclude that students did show a modest improvement in math performance and that bioinformatics instruction was a relevant component of that increased performance. Nonetheless, that gain, and the role of bioinformatics education in advancing it, was not generally visible to students. This situation might be improved by employing pedagogies that more explicitly call students attention to the link between what they are doing when they perform bioinformatics applications such as BLAST and the underlying mathematical structures.

Assessment of the Bioinformatics Web Portal at the library supports the notion that this utility, coupled with course instruction, has a positive effect on student confidence in bioinformatics (Table III). As expected from other surveys, student familiarity with bioinformatics tools was very low entering the BIO 152 course (average scores of 0.1 to 1.2 on pre-survey 1). Midway through the semester students reported much higher proficiency (1.7–3.8 on pre-survey 2). At the end of the course, after the students were introduced to the Bioinformatics Web Portal, self-reported proficiency levels increased further, to an average of 3.0 to 3.9 (post-survey). By the end of the course a majority of students reported using coursework materials and library instruction as a resource for navigating bioinformatics utilities. The reason for the decline in reported use of library instruction mid-semester may be due to the fact that in the initial survey, students were drawing on library instruction obtained as part of prior coursework but did not meet with our science reference librarian in this course until Bioinformatics Central [23] was introduced, after pre-survey 2 was completed. Following the introduction of the Bioinformatics Central portal, two-thirds of students reported using it as a resource. Use of the NCBI help utilities also increased over the course, but did not reach a majority. While students who reported using "trial and error" declined slightly after the Bioinformatics Web Portal was introduced, it was still greater than 50%. Overall, we conclude that the combination of course instruction and dedicated informatics utilities returns strong improvements in students' confidence in their abilities.

Table III. Student confidence on bioinformatics tasks
 Average student proficiency
Presurvey 1Presurvey 2Postsurvey
  1. Top portion of table shows average student-reported proficiency on a scale from 0 (never used it) to 1 (not at all proficient) to 5 (extremely proficient). Bottom portion shows percentages of respondents that used a particular resource to help learn how to use NCBI databases. Bioinformatics Central was not evaluated (n/a) in the first two surveys because students had not yet been introduced to the resource.

Please rate your proficiency with each of the following NCBI databases
 BLAST0.23.33.5
 Bookshelf0.12.93.4
 Entrez0.11.73.0
 OMIM0.12.83.9
 PubMed1.23.53.9
 Structure0.13.83.9
What resources did you employ to learn how to navigate NCBI databases?Percentage that used the resource
  Google search9.3 %13.8 %27.2%
  Library instruction41.3 %23.1 %58.0%
  NCBI website help section0.0 %26.2 %42.0%
  Information from coursework30.7 %81.5 %50.6%
  “Trial and error”/self-taught22.7 %75.4 %71.6%
  Bioinformatics Central @ Trexler Libraryn/an/a66.7%
  Total number of student respondents:816581

In order to obtain a better sense of student perceptions of learning bioinformatics at the introductory level, we invited seniors (generally about two years later) to write brief essays reflecting on their experience with bioinformatics across the curriculum. While the essays were anonymous, the students who responded included those with both strong and modest academic ability. In general, while some reported being intrigued by the types of information and databases available, students almost unanimously highlighted their lack of understanding of the utility and applications of bioinformatics while completing introductory coursework. Accordingly, students were not enthusiastic about the bioinformatics experience in BIO152 at the time although they consistently reported seeing the utility of bioinformatics approaches later on in their undergraduate careers. One student wrote, “As I recall, they [BIO 152 Bioinformatics assignments] weren't challenging, but at the same time they didn't seem boring or trivial, mainly because the subject matter revolved around “real life” problems, like human diseases. At the time the assignments didn't seem very useful, and probably would have remained that way if I didn't begin to do research. As a sophomore it can be hard to see the big picture…” These data parallel those reported by faculty at Kalamazoo College, who note a marked distinction between introductory vs. advanced courses in terms of students' views about their potential use of bioinformatics tools in the future (outside of coursework) [18]. Nonetheless, students generally agreed that bioinformatics should be introduced early. As one student wrote, reflecting back on the introductory experience, “…the experience changes because I've had to use most of the tools covered in the assignment[s] to conduct my research. With this in mind, I wish that I had paid closer attention to what I was doing at the time.” When asked if bioinformatics instruction should be delayed, the student consensus was clear that this would not be best. One student wrote, “…using computer-based tools for biology was difficult in BIO 152 not because we didn't know enough but because it was so new. Delaying the introduction of bioinformatics would just put off getting acquainted with the programs.” Taking these essays and our quantitative survey data together, we suggest that there is reason to believe that introducing bioinformatics instruction early in the curriculum is effective. While students may not recognize the value at the time, they do so as they move through the curriculum in later years. We also note that student “buy-in” at the introductory level can probably be improved by melding bioinformatics instruction with wet-lab instruction and student projects [31]. If they learn to use tools in the context of their own genomics lab work, they are more likely to recognize the value at the time.

Intermediate Level Instruction

Consolidation of essential biological concepts introduced at the introductory level often occurs at the intermediate level. While there was some erosion in student confidence levels in bioinformatics and mathematical skills from the postassessment at the introductory level to the preassessment at the intermediate level, students were still significantly more confident than when entering the introductory course (Table I). Consistent with the measure of confidence, student performance on bioinformatics concepts and skills when beginning one of the intermediate courses was similar to performance on exiting the introductory course (Table II). While still modest, performance on mathematics skills on entering intermediate courses was similar to exit from the introductory course. These observations indicate that students are generally retaining the learning gains made during the introductory course.

As we observed for the introductory sequence, student confidence in bioinformatics increased significantly over the course (Table I). Furthermore, final student confidence levels at the intermediate level were slightly, but significantly, higher at both the pre- and postassessments at the introductory level. As expected, student confidence in bioinformatics was bolstered by repeat encounters with the material. Student performance on literature informatics skills as measured by their knowledge of PubMed increased significantly over the semester (Table II, item 15). When comparing pre- and postassessments, the percentage of students answering bioinformatics questions correctly was similar, or increased to a degree that did not meet statistical test of significance (Table II, items 18–10). As we observed for introductory biology, modest gains in mathematical skills were evident over the intermediate course, with the increase to 70.2% on the ratio problem (item 24) being statistically significant (Table II). Students still fell short of a desired 75% proficiency, and as before it was difficult to disentangle the bioinformatics instruction from other student experiences. Nonetheless, student gains in mathematics skills were measurable at this level, and our interpretation of these data is that bioinformatics instruction was one component of this improvement. Similar to the results for introductory biology, students were initially skeptical, and remained skeptical, about the link between bioinformatics education and learning mathematics.

To help separate out different pedagogies within an intermediate course, we embedded additional questions in our routine course assessments. These questions were designed to have students reflect on which course components were most successful for general learning goals, for learning bioinformatics and genomics, and for building confidence in applying math to biological systems (Table IV). Students reported very high general learning scores for the self-designed bioinformatics yeast RT-PCR experiment that was introduced as part of this project (1.83 on a scale of 0-2). When asked about the impact of that project on learning bioinformatics and mathematics specifically, students still reported generally high scores (1.44 for both questions). However, students also rated more traditional pedagogies, such as lecture and homework problems, significantly higher (1.56 to 1.94). In particular, students clearly identified routine homework problems as the most significant factor in helping them learn bioinformatics and math applications in biology. Our interpretation of this result, combined with the consistent result that students do not link bioinformatics and math learning, is that students are most conscious that they are learning these skills when they are directly confronting mathematical manipulations as the focus of the exercise. While experiencing bioinformatics and performing mathematics in the context of the laboratory investigations are valued by students as a general learning tool, the link to learning mathematics remains invisible, “under the hood.”

Table IV. Student responses to pedagogy in bio 215 genetics
Course componentPedagogyIndex 0–2% rated learning“a lot”
  1. Table shows average student rating on a scale of 0 (not at all) to 2 (a lot), as well as the percentage of students identifying the component as contributing “a lot,” for different course components in BIO 215 during the Spring semester of 2011. Data are from 18 student participants.

Extent to which lab component contributed to overall experience in learning genetics
Drosophila population genetics lablongitudinal; exploratory; instructor-designed1.2222
Drosophila virtual gene mapping labinstructor-designed, known answers1.3344
Human mtDNA sequencing labinstructor-designed, exploratory1.5056
Yeast RT-PCR self-designed labexploratory; student-designed1.8383
Extent to which component contributed to learning about bioinformatics and genomics
Course lecturestraditional; didactic1.7272
Homeworktraditional; active1.8383
Primary literature presentationsinstructor-chosen, student researched, oral1.5056
Human mtDNA sequencing labinstructor-designed, exploratory1.3333
Yeast RT-PCR self-designed labexploratory; student-designed1.4450
Extent to which component helped confidence in applying math to biological systems
Course lecturestraditional; didactic1.5656
Homeworktraditional; active1.9494
Primary literature presentationsinstructor-chosen, student researched, oral1.1133
Human mtDNA sequencing labinstructor-designed, exploratory1.3344
Yeast RT-PCR self-designed labexploratory; student-designed1.4450

Advanced Instruction

BIO 472, Genomes and Gene Regulation (GGE), provided an opportunity to examine the outcomes for advanced students who had elected to take a course that focused explicitly on computation issues related to bioinformatics. Despite this difference, students entering BIO 472 displayed confidence and demonstrable skill in bioinformatics and mathematics that was similar to those entering intermediate level courses (Tables I and II). Along a similar vein, this small population that elected to take an explicitly bioinformatics and math-heavy class was no more convinced that learning bioinformatics helps in learning math.

By the end of GGE, students displayed varying performance on the direct assessment of skills in bioinformatics and math. For some bioinformatics skills, such as literature searches and BLAST (items 15 and 18), no improvement was evident (the slight decrease for item 15 in Table II is not significant). We do note, however, that performance on the BLAST utility was already very strong, so a failure to see improvement is not necessarily a concern. In contrast, performance on bioinformatics skills that had remained modest in intermediate level courses, such as E scores, algorithm differences and microarray experiments (items 19, 20, 22), improved significantly (Table II). Likewise, assessment of math skills gave mixed results, with the logarithm problem (item 23) finally increasing to a perfect 100%, but the ratio problem (item 24) remaining mired at somewhat modest levels of performance. The difference between these two items is somewhat puzzling, since both tasks were emphasized to similar extents in applications studied and utilized by students in this class.

Taken together, these data suggest that students who study bioinformatics and math intensively in an advanced class have confidence outcomes that are similar to or higher than those in intermediate classes and generally do perform noticeably better in both math and bioinformatics when we tested their skills directly. One possible interpretation of these data is that multiple rounds of reinforcement and/or student interest may be critical factors in achieving higher levels of student performance. Many students in GGE were encountering these concepts and math skills in a third course, and some were motivated to take the course based on a genuine interest in mathematical applications in biology. Nonetheless, even if students had such a genuine interest, they did not agree that studying bioinformatics necessarily strengthened their mathematical skills (Table I). Stronger student performance at this level might also reflect the more deliberate course focus on computational mathematical concepts in GGE—as opposed to the more basic calculation skills emphasized in introductory and intermediate level work.

CONCLUSIONS

We draw a few critical lessons from our study of the effectiveness of integrating bioinformatics education at multiple levels in a biology curriculum. First, introducing students early in their careers to bioinformatics concepts is effective and results in significant learning gains. We could measure concomitant improvement in both bioinformatics and mathematics skills in this population. Second, students generally retain this level of performance as they move on to intermediate course work. While skills do improve at this level too, the gains are more modest. Third, improvement to acceptable levels of overall performance was best realized in an advanced class that treats these skills and concepts as core material for the course. Finally, students' confidence in their bioinformatics knowledge and skills increased consistently throughout the curriculum, which correlates with the direct assessment of their skills as highlighted above. Students' views of their own quantitative literacy were illuminating. Self-reported confidence in mathematical skills involving ratios, probabilities, and logarithms was consistently high but was not necessarily a good indicator of students' abilities to solve new problems that required these fundamental skills. Students throughout the curriculum were also skeptical of the link between learning bioinformatics and improving their math skills. Instructors who want students to appreciate that lesson may need to identify methods of explicitly drawing attention to where the math is being learned. This goal, and better student buy-in, might be achieved if the bioinformatics lessons are coupled with wet-lab student genomics projects. Of course the lack of student awareness may be fine if students are reaching desirable levels of performance in the end.

Acknowledgements

The authors would like to acknowledge our colleagues who served as co-PIs on the project: Keri Colabory, Marten Edwards, and Clifton Kussmaul. They would also like to sincere express their thanks to Rachel Hamelers and Jennifer Jarson of the Muhlenberg College Trexler Library and Amanda Heiberger, Muhlenberg '12 for their contributions to the development of the Bioinformatics Central web portal. Terry Woodin of the National Science Foundation provided immensely helpful feedback and suggestions on this project. They also greatly appreciate the participation of the students in BIO 152, BIO 215, BIO 220, and BIO 472. Finally, they thank Kathy Harring, Associate Dean for Institutional Assessment at Muhlenberg College, for assistance in analysis of survey data.

Ancillary