Comparative modeling of proteins: A method for engaging students' interest in bioinformatics tools

Authors

  • Fernanda Badotti,

    1. Centro de Excelência em Bioinformática (CEBio), Fundação Oswaldo Cruz (FIOCRUZ-MG), Belo Horizonte, Minas Gerais, Brazil
    Search for more papers by this author
  • Alan Sales Barbosa,

    1. Departamento de Fisiologia e Biofísica, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil
    Search for more papers by this author
  • André Luiz Martins Reis,

    1. Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil
    Search for more papers by this author
  • Ítalo Faria do Valle,

    1. Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil
    Search for more papers by this author
  • Lara Ambrósio,

    1. Departamento de Microbiologia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil
    Search for more papers by this author
  • Mainá Bitar

    Corresponding author
    1. Departamento de Bioquímica e Imunologia, Universidade Federal de Minas Gerais (UFMG), Belo Horizonte, Minas Gerais, Brazil
    • Address for correspondence: Laboratório de Genética Bioquímica da Universidade Federal de Minas Gerais, Instituto de Ciências Biológicas, Bloco K4/Sala 245, Avenida Antônio Carlos, 6627 Pampulha, 31270-901 Belo Horizonte, Minas Gerais, Brazil. E-mail: duonlumo@gmail.com.

    Search for more papers by this author

Abstract

The huge increase in data being produced in the genomic era has produced a need to incorporate computers into the research process. Sequence generation, its subsequent storage, interpretation, and analysis are now entirely computer-dependent tasks. Universities from all over the world have been challenged to seek a way of encouraging students to incorporate computational and bioinformatics skills since undergraduation in order to understand biological processes. The aim of this article is to report the experience of awakening students' interest in bioinformatics tools during a course focused on comparative modeling of proteins. The authors start by giving a full description of the course environmental context and students' backgrounds. Then they detail each class and present a general overview of the protein modeling protocol. The positive and negative aspects of the course are also reported, and some of the results generated in class and in projects outside the classroom are discussed. In the last section of the article, general perspectives about the course from students' point of view are given. This work can serve as a guide for professors who teach subjects for which bioinformatics tools are useful and for universities that plan to incorporate bioinformatics into the curriculum. © 2013 by The International Union of Biochemistry and Molecular Biology, 42(1):68–78, 2014

Introduction

Ever since the invention of the microscope, we have seen biology through its lenses. Now, we are looking at biology through the lenses of bioinformatics. And this is a whole new world! In the near future, the study of biology without bioinformatics tools will certainly be unimaginable. Somehow similar to what happened when the microscope was popularized, more than three centuries ago. In the last few years, universities from all over the world have included bioinformatics in their curricula, encouraging biology students to incorporate computational skills since undergraduation [1-4]. Nevertheless, despite such efforts to align the current scientific knowledge with undergraduation teaching, there are still extensive gaps to be filled.

The incorporation of bioinformatics along the academic life of students has been long discussed [5-7]. The debate is intensified when the focus is in undergraduate students [7]. Even when bioinformatics is offered as a complementary subject in the curricula of life science undergraduate students, the chosen approach is variable. Some authors defend courses that present a wide overview about the field and present very successful examples [8, 9] of how this is a valuable methodology to introduce the students to a variety of bioinformatics tools. These are courses that explore a superficial layer of knowledge that can be further depth along the graduate program. In contrast, other authors claim that bioinformatics skills should be continuously assessed along the subjects that comprise the curriculum [5, 10-12]. In this sense, students can be encouraged to use bioinformatics tools when developing projects for classes.

The Canadian Bioinformatics Workshop (Bioinformatics.ca/workshops), for example, has realized that both undergraduate and postgraduate students are seeking for more than basic bioinformatics knowledge and that they are now looking for specific tools and data analysis. The students want to proceed with the examination of their data, and most of them have already acquired some knowledge from the online learning space. Brazas and Ouellete [4] claimed that “any bioinformatics continuing education programs need to stay aware of new developments in the online learning space in bioinformatics and continuously update its programming accordingly, as from experience, needs will change as the learning landscape changes.”

In this work, we report the experience of teaching comparative protein modeling to undergraduate, graduate, and postgraduate students from the Universidade Federal de Minas Gerais (UFMG), one of the most important Federal Universities in Brazil. We also focus in the transdisciplinary potential that this subject presents and its role as an approach to revisit fundamental concepts of protein biochemistry and physicochemistry. We discuss evaluation methods and learning experiences beyond the classroom. Students' perspectives are presented in the last section of this article and illustrate different realities experienced by them during and after the classes, when they were able to apply learning.

Students worked with different amino acid sequences, and the main goal was to generate a three-dimensional structure for this so-called target protein and to explore the results to draw conclusions regarding protein function and biological features. As the step-by-step comparative modeling protocol was presented, the students were encouraged to think about the fundamental concepts that are contained in the productive techniques and analyses used to generate the results.

With few exceptions, every class started with a theoretical exposure of each methodological approach or algorithm (∼1 hour), followed by a practical exercise in which the students would apply the productive technique and/or analyze previously obtained results (∼1 hour). The program was composed of 15 classes for 2 hours each that covered 1) introduction and selection of the target protein; 2) study of the molecular characterization of the target protein; 3) definition of a structural template and the study of its characteristics; 4) sequence and secondary-structure alignment between target and template; 5) preparation of input files for comparative modeling; 6) generation and visualization of candidate structures; 7) inclusion of heteroatoms and disulfide bridges in candidate structures; 8) refinement of loop regions; 9) evaluation of candidate structures regarding stereochemical properties; 10) evaluation of candidate structures regarding energetical properties; 11) visualization of the best-ranked model and definition of structural elements; 12) introduction to molecular docking calculations; 13) introduction to molecular dynamics; 14) introduction to normal mode analysis; and 15) perspectives to comparative modeling (Table 1). In the following sections, the main topics covered in class are detailed to provide an overall description of the entire process. A workflow summarizing all steps is given in Fig. 1.

Table 1. Course program
ClassShort description of subjectMainly
1Introduction and selection of the target proteinTheoretical
2Study of the molecular characterization of the target proteinPractical
3Definition of a structural template and the study of its characteristicsPractical
4Sequence and secondary-structure alignment between target and templatePractical
5Preparation of input files for comparative modelingPractical
6Generation and visualization of candidate structuresPractical
7Inclusion of heteroatoms and disulfide bridge in candidate structuresPractical
8Refinement of loop regionsPractical
9Evaluation of candidate structures regarding stereochemical propertiesPractical
10Evaluation of candidate structures regarding energetical propertiesPractical
11Visualization of best-ranked model and definition of structural elementsPractical
12Introduction to molecular docking calculationsTheoretical
13Introduction to molecular dynamicsTheoretical
14Introduction to normal mode analysisTheoretical
15Perspectives to comparative modelingTheoretical
Figure 1.

Protein comparative modeling workflow. This is a schematic representation of a workflow for the protein comparative modeling protocol described and used in this work. This is a valuable educational tool for students and teachers.

Academic Environment

UFMG is one of the most important Brazilian universities, with almost 50,000 students and 3,000 professors, dating back from the 1800s [https://www.ufmg.br]. The Bioinformatics Graduate Program from UFMG started in 2003 as one of the first in the country and is currently the best-ranked program in this field in Brazil [with a score of 6 points given by the national evaluation committee of Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) where the maximum score is 7]. The program is located in the Biological Sciences Institute (ICB), together with 18 other graduate and 12 undergraduate programs that comprise more than 3,500 students (http://www.pgbioinfo.icb.ufmg.br).

The course entitled “Molecular Modeling of Proteins” is a nonrequired course offered by the Bioinformatics Graduate Program, open to all students and with no restrictions or prerequisites. The class included undergraduate, graduate (masters and PhD), and postgraduate students from several different programs in the university. Statistics can be seen in Fig. 2, which shows the distribution of all 33 students enrolled in the class. Classes were placed in a computer room equipped with 40 desktop computers with Linux (Ubuntu 9.10) operational system.

Figure 2.

Profile of course audience. The main chart illustrates the distribution of students according to their academic level. For graduate students, there is additional information about the masters (left) or PhD (right) program in which they are enrolled.

Course Structure

The following sections describe methodological aspects of each class. We present techniques and programs with a superficial description to retain this concise material. For more detailed information on comparative modeling, we suggest the reader to consult the Modeller online manual, which can be found at http://www.salilab.org/modeller/manual and the scientific literature in the field (our group has recently submitted a work entitled “A Comparative Modeling Methodological Workflow: Theory and Practice” which is under review).

Identification of a Target Protein

In “real-life” research projects, several studies culminate in the identification of proteins with unknown structures. Examples include (but are not limited to) large-scale proteomics studies with protein sequence identification through mass spectrometry, studies of specific biological processes or pathways and the identification of novel proteins, genome sequencing and/or annotation, and studies of genes involved in a given phenotype and its mutations. When working in any of such fields, it is common for a biologist to identify a protein and wish to understand its function or, when the function is known, the molecular details of it. At this point, bioinformatics has an essential role both in the treatment of genomic, transcriptomic, and proteomic data generated by high-throughput experimental technologies and in the understanding of information gathered from traditional biology [13].

As this was a course for students with no previous experience in computational modeling, target proteins were not derived from previous projects of the students themselves (although there were exceptions); however, the examples should be realistic for the application of the technique. Therefore, before the first class, students were asked to consult their colleagues, teachers, advisors, and/or supervisors about possible protein molecules that they could structurally characterize. This sense of reality that comes from the work on an actual scientific project (instead of working in a solely didactic project) is an interesting stimulus when dealing with practical exercises in bioinformatics. Additionally, this was an important experience that brought students closer to actual laboratory data and enabled a closer contact among the students and research groups interested in having a protein modeled through this approach. In this sense, the class promoted a collaborative environment in the institute, in which students had the chance to be part of a real scientific research project and professors had the opportunity to have a different assessment of the proteins they work with.

In summary, the first class also included a review of the basic properties of proteins (such as amino acid characteristics, peptide bonds, secondary structure elements, structural motifs and domains, prosthetic groups, and ligands) and a superficial introduction to comparative modeling with an overview of the underlying steps.

Molecular Characteristics of the Target Protein

A protein is supposed to fill some key requirements in order to be considered a good target. Such requirements include (i) having a known amino acid sequence, (ii) not having any previously characterized structure, (iii) having an available structural template, and (iv) having some known features to guide the comparative modeling.

When not previously known, the sequence of the target protein can be retrieved from public databases, such as the National Center for Biotechnology Information (http://www.ncbi.nlm.nih.gov) or the Uniprot [14]. Once retrieved, this amino acid sequence will be used in all subsequent steps. After retrieving the sequence, students were asked to consult the Protein Data Bank (PDB [15]) through a Basic Local Alignment Search Tool (BLAST [16]) search and to identify any possible experimentally solved structures for the target protein. This is an important step to prevent redundant structures to be generated for the chosen protein. When one or more previously characterized structures were available for the target protein, students were advised to search for another case study to follow the classes with. On the contrary, students could keep the originally chosen target protein.

Students were then advised to review the literature and public databases searching for important features of the target proteins that may be known, such as (i) biological function, (ii) prosthetic groups, (iii) possible ligands, (iv) oligomerization state, (v) subcellular localization, and (vi) secondary structure. The last two items can also be predicted using bioinformatics tools as Psort (for subcellular localization [17]) and PsiPred (for secondary structure prediction [18]). It is frequent that some (if not all) of these features are not known for a given target protein and should therefore be inferred based on its structure. The less one knows about the target protein, the more important (and difficult) is the comparative modeling of it.

Template Identification

Once the target protein was defined, the next step was the identification of a structural template. To guide the search for such template, students were advised to use BLAST against the PDB, where experimentally solved structures are deposited. Regarding the sequence alignment between the template and the target proteins, a good result should present (i) a high coverage of the target sequence by the template, (ii) high alignment scores, and (iii) high identity and similarity values. Additionally, the template structure should have a good resolution (of 3 Å or less, according to PDB) and preferentially be an experimentally solved structure. After identifying the best template structure available, students were left to analyze important characteristics as previously performed for the target protein.

Alignment Between the Target Protein and the Template Structure and Evaluation of Secondary Structure Features

The theory of sequence alignments is complex, and it is crucial to recognize the importance of it to comparative modeling, especially when using satisfaction of spatial restrains. To understand sequence alignments, one should be familiar with the physicochemical properties of amino acids. When a residue is replaced by another with similar properties, there is little or no effect to protein structure and function. On the other hand, when a residue is replaced by another with different properties, this may lead to significant effects to protein structure and function. Sequence alignment algorithms are based on such fact, as they are ruled by scoring systems that reflect such amino acid characteristics.

Promals3D [19] is an alignment program that takes into account the secondary structure elements of amino acid sequences. Therefore, it is a good alternative when performing alignments for comparative modeling. DNATagger [20] is an interactive alignment colorizing tool that helps evidencing physicochemical similarities between sequences. When analyzing sequence alignments, it is interesting to consider the amino acid conservation at certain specific positions, for example, residues involved in disulfide bond formation, ion coordination, or active site residues. The template structure coordinate file usually contains information about such positions. At this point, it is interesting to call the students' attention to the biological meaning of these features for the protein structure and function. Students should evaluate the sequence alignment, the distribution of gaps, the amino acid substitutions, and the template coverage. It is common that for a given protein, only part of the sequence is covered by experimentally solved structures. In this case, it is still possible to generate a three-dimensional structure, but one should keep in mind that this will not represent the entire protein, but rather one structural domain. Students were also requested to evaluate secondary structure elements of the target protein and to compare those with the ones from the template structure.

Preparation of Input Files for Comparative Modeling

It is of fundamental importance to spend time in the preparation of input files, especially when teaching students with no previous contact with comparative modeling. There are four input files that are required for comparative modeling with Modeller [21]: (i) the alignment file in pir format, (ii) the file containing the target protein sequence in fasta format, (iii) the coordinate file for the template structure in pdb format, and (iv) the script file containing instructions for comparative modeling. An important observation is worth here: although Modeller has a protocol that includes automated sequence alignment, it is more interesting to perform the alignment as a separated step and spend time analyzing and understanding it thoroughly. The script is usually the most complex input file and should be carefully explained at a detailed level. It is often the first contact of the students with a programming language and, ideally, could be written together with the students and explained item-by-item, for a better comprehension. The naming scheme is also important and should be revised, as all files should be named properly in order to allow Modeller to recognize inputs and distinguish proteins. This is in general a difficult class, as errors are very common while preparing Modeller input files for the first time. Individualized attention may be required to check, clarify, and fix each error message (one can find more details within the Modeller manual at http://www.salilab.org/modeller/manual).

Generation and Visualization of Candidate Structures

This is the most rewarding class for students, as it is the first in which they can visualize the obtained results. When multiple candidate structures are generated (in this case, students were asked to generate 100 structures), it is interesting to compare those structures and to observe the differences and similarities among them. One way to perform this analysis is to open and structurally align all structures in Pymol [22]. In general, structures are very similar, and discrepancies are often observed in the terminal regions. It is important to explain how this result can be correlated to protein flexibility, particularly in loop and terminal regions.

Another analysis to be performed during this class is the structural superposition between target and template. Students should then report back to the sequence alignment and investigate the structural conservation of aligned residues that are important for protein structure and function (cysteine residues involved in disulfide bond formation, ion coordination residues, and active site residues). It is interesting to reserve some time for students to explore their own protein structures, as each student will face different challenges.

Inclusion of Heteroatoms and Disulfide Bridges in Candidate Structures

Although it could be more straightforward to include heteroatoms and disulfide bridges in the candidate structures since the first generation of those, their importance may become clearer for students when included in a separated step. Cysteine residues involved in disulfide bridge formation are listed in the template coordinate file, as well as possible heteroatoms. When dealing with heteroatoms, it is important to point out the differences between prosthetic groups, solvent molecules, and ligands. Prosthetic groups are essential components of protein structures and should thus be included in the generation of candidate structures. Ligands, on the other hand, are not integral to protein structure and can be further considered in docking simulations. Solvent molecules (usually water) may also be represented in the coordinate file.

Students were asked to investigate the sequence alignment, identify cysteine residues forming disulfide bridges in the template, and list the equivalent residues in the target protein. Information about the presence of disulfide bridges and heteroatoms can then be incorporated into the sequence alignment and Modeller script (http://www.salilab.org/modeller/manual). The new set of candidate structures should be further evaluated and compared with the previously generated set to assess differences in protein structure related to the presence of prosthetic groups and disulfide bridges which can be evidenced in Pymol visualization.

Refinement of Loop Regions

This is a very important step. Students should identify structural regions for refinement. Loop regions represent an important case, as those can be the result of insertions in the target protein (regions that are absent in the template structure). In this class, it is interesting to give more details about the functional importance of loop regions and to evaluate the structural consequences of insertions/deletions in the protein sequence. It is also a class in which technical limitations of comparative modeling can be discussed. After applying the refinement protocol, students could evaluate the results and compare those with unrefined structures.

Evaluation of Candidate Structures Regarding Stereochemical Properties

In this class, students can evaluate the candidate structures regarding their stereochemical properties evidenced in Ramachandran plots. There are several programs that can be used to generate such plots. Procheck [23] is the most used, although it has to be properly installed locally. Rampage [24] is a good alternative, and it is available as a web server. At this point, one should consider the students' computational background. As multiple candidate structures should be evaluated, it is crucial to automate the processes by using scripts that can easily extract information from Ramachandran plots and generate quantitative outputs that allow a numerical ranking of candidate structures. If possible, students should be requested to install Procheck (or an equivalent program) and develop such scripts to automatically evaluate all candidate structures. When students are not familiar with any programming language, the template and at least one candidate structure should be analyzed and all features of their Ramachandran plots should be explored and explained. All other candidate structures can then be automatically ranked.

Evaluation of Candidate Structures Regarding Energetical Properties

This is another quality evaluation for candidate structures. Once again, the computational background of the students should be considered. When possible, algorithms should be developed to evaluate all candidate structures automatically, regarding their native-like energetical properties. ProSA [25] is one of the best programs to perform such evaluation, and it is available both as a web server and in a local version. There are qualitative results, as well as quantitative. Students should analyze both for the template structure and at least one candidate structure and compare those in terms of Z-score. Additionally, in the web server, the protein energy profile is plotted in terms of individual residues, and a structure is displayed with its residues colored according to their energy values. Those are all interesting features of ProSA, and students should have time to investigate the results and draw conclusions. When students are not familiar with any programming language, the Z-score for all candidate structures should be automatically calculated with a previously generated algorithm.

Visualization of the Best Ranked Model and Definition of Structural Elements

It is important to discuss with students the protocol to select one candidate structure from the set as the one that best represents the structure of the target protein. Several parameters can be considered for this, including quantitative results from stereochemical and energy evaluations (Procheck and ProSA results, respectively). When the structural model is selected among candidate structures, students should carefully analyze its three-dimensional particularities, identify, and evidentiate important residues, sites, motifs, domains, prosthetic groups, disulfide bridges, and so forth.

Molecular Docking Calculations

When any ligand is known or can be inferred from literature or structural analysis, docking calculations can be performed to analyze the binding between this and the target protein. The oligomerization state of the target protein can also be constructed with molecular docking. Residues involved in protein–protein and/or protein–ligand interactions can be identified from literature data, active site analysis, structural observations, or computational predictions of surface accessible residues. Analysis of surface charges is also important in this step, as results can guide the definition of the protein binding properties. Protonation state and surface charges can be treated in the PDB2PQR server [26]. This is a good opportunity to discuss the influence of the surrounding pH to the protonation state of the protein and also the possible consequences of charge distribution for the interaction with other molecules. Once active residues for protein–protein and/or protein–ligand interactions are known, docking calculations may be performed using different programs. A good alternative is Haddock [27], which is a powerful tool also available as an easy to use web server.

Molecular Dynamics

This is often only a theoretical class, as not all universities have a computational facility available for students to perform molecular dynamics simulations. Additionally, this is a complex task and usually requires a long time for students to get familiar with the computational and physical background of this technique. Nevertheless, molecular dynamics is an important quality evaluation method for comparative modeling, as well as a powerful tool for analysis of molecular stability, flexibility, and interaction. It is important that students incorporate the idea that proteins (as any biological molecule) are not static and have intrinsic movements which are crucial for their function. This should be discussed from the first class and emphasized when necessary.

Normal Mode Analysis

As an alternative to molecular dynamics, normal mode analyses can be carried out to investigate large-scale molecular motions of proteins. It is fundamental that students understand the differences between molecular dynamics and normal modes. Among other important particularities, normal mode analyses are easier to perform and less time consuming than molecular dynamics. There are several programs devoted to calculate normal modes of proteins, but Nomad [28] is one of the most user-friendly web servers available. Normal mode motions can be further analyzed in Pymol, and this is usually a class where students are positively impressed by the results.

Perspectives on Comparative Modeling

This class is meant for students to draw conclusions from their results and analysis. A theoretical exposure with exemplification is usually stimulating and drives the students to think about their results in different perspectives. Real cases may be used as examples of applications for comparative modeling and additional techniques.

To illustrate results generated by students during the course, we have selected three different examples of challenges faced during protein modeling and strategies to overcome such challenges. This can be found in the Supporting Information.

Critical Assessment of Educational Results

Evaluation of Learning

We have used a modified version of a previously developed questionnaire [12] to assess the students' previous knowledge about comparative modeling and to compare with their mastery of content after the course. The questionnaire was composed of 10 statements, the first four of those regarding the students' previous knowledge and the last six regarding their learning. Students should assign a score from 1 to 5 to each statement, being 1 equivalent to “strongly disagree” and 5 equivalent to “strongly agree.” We have randomly selected 15 students to anonymously fill the form. Results for this survey (Table 2) indicate that students were able to improve their theoretical knowledge and practical skills on comparative modeling during the course although no formal statistical analysis was performed. In addition, when observing standard deviation values, which roughly represent the audience heterogeneity regarding the domain of each topic, one can see a tendency of homogenization among students after the course.

Table 2. Assessment of learning
StatementMeanSD
  1. Students have assigned a score to each statement about their knowledge on the subject before (first four statements) and after (last six statements) the course on a scale of 1 to 5 (1: 1/4 strongly disagree; 2: 1/4 disagree; 3: 1/4 neither agree nor disagree; 4: 1/4 agree; 5: 1/4 strongly agree).

Before the course  
I was familiar with the concepts of computational modeling methods.2.271.33
I was familiar with using computational modeling programs.1.930.96
I was comfortable working on a Unix/Linux-based computer.2.931.49
I was comfortable learning how to use computational modeling programs on my own.2.601.50
After the course  
I have a good conceptual understanding of homology modeling.4.330.49
I feel comfortable doing homology modeling.4.130.64
I have a good conceptual understanding of molecular dynamics simulations.3.530.74
I am comfortable learning a new computational program on my own.3.671.11
I am comfortable working on a Linux/Unix-based computer.3.871.19
I am comfortable reading journal articles that discuss computational modeling techniques.3.800.94

Positive Aspects of the Course Structure

Although observations made here were derived from the experience of a specific course on comparative modeling of proteins, the most important task was to present the potential of bioinformatics to students with no (or very little) previous contact with the field. This course has represented a shift of perspective for most students, which left the classroom with a much more concrete understanding about how can bioinformatics be an extremely powerful tool to retrieve, organize, categorize, and analyze biological data.

When teaching bioinformatics to students with little or no computational background, it is important to demystify the use of computers and programs as scientific tools. During these classes, students have become more confident on their ability to interact with web servers and locally installed programs. Most students also had their first contact with a Unix-based operational system. In this sense, it is interesting to select a system with a good graphical interface, propitiating a simple communication with the first-time user. On the other hand, students should get familiar with the command-line structure. The chosen system was the Linux distribution Ubuntu 9.10, and students had a very pleasant experience with it.

The interaction between students with different backgrounds, coming from different scientific fields of study and in different academic levels (undergraduate, graduate, and postgraduate students), has provided an interesting collaborative environment to the course, in which students could interact with each other offering and getting help in different tasks.

Negative Aspects of the Course Structure

In two different opportunities along the course (in the middle and at the end), students were asked to contribute with an evaluation, providing critics and suggestions for further improvement of the course structure. The main point raised was that it would be of great advantage to have class monitors to help students as problems appear during classes (and perhaps after hours). In this course, there were no monitors, and therefore, all questions were directly asked to the professor. In addition, although students were able to help each other in several cases, more complicated problems were solved by the professor, sometimes causing a delay in the class.

Although this was not tested in this course, it could be interesting to form groups of students with different backgrounds and academic levels to work together during the course. This can be a way of supplementing the lack of monitors and also a good strategy to encourage students to learn from each other, as each of them has a different expertise. These groups could be maintained until the end of the course, when students present a seminar about their results (see the next section). This would also culminate in fewer seminars and, therefore, each work could be better explored.

Projects Outside the Classroom

In addition to classes, students were encouraged to perform certain activities as extra-class experience. Some examples will be presented here to illustrate successful strategies.

After the sixth class, students were requested to search the specialized literature, choose, and read scientific articles that use comparative modeling of proteins and to discuss the results and methodology. This was a good experience for students with no previous contact to comparative modeling, as they were encouraged to critically assess the work of other groups to get a more comprehensive understanding of the technique. The resulting reviews were graded and considered as part of the evaluations.

Students were also encouraged to present the results obtained during the course as posters in scientific events. As an example, three students presented posters at the 8th International Conference of the Brazilian Association for Bioinformatics and Computational Biology (X-Meeting 2012). This can be a very interesting experience for those who have never presented a poster before. Students prepared their poster abstracts, which were reviewed by the professor (and all other authors) and submitted to evaluation by the scientific committee of X-Meeting. Posters were also prepared by students with professor supervision and presented during the conference. This activity was not subjected to grade and was not part of the student evaluations, but represented an important opportunity for students to experience a scientific conference.

Finally, this work is also the result of an extra-class experience of great value. Five students were involved in the manuscript writing and revision, and for most of them, this was the first article writing experience. The group has discussed the manuscript format and style during meetings. Revisions were carefully assessed and performed as a joint effort, and the submission was supervised by all authors. It was extremely rewarding and scientifically important for the students to construct this work together and to learn about the writing and revision process of an article.

Possible Evaluation Methods

A seminar presentation has been shown to be an interesting method to evaluate students in this subject. Complementarily, a detailed report presenting obtained results and the methodology can also be considered. Although the report usually sets the guidelines for students, the presentation of a seminar about the target protein, and the process of modeling, its three-dimensional structure is generally a more complete evaluation method. Students can be oriented since the first class to organize their results as a scientific report, which will help the production of the seminar. During the seminars, students were requested to grade all presentations except their own) according to Four different subjects: respect to time, respect to subject, clarity, and esthetics of presentation. The mean of these grades was considered as half of the total and the other half was given by the professor according to the same parameters. This was an interesting method that involved all students in the seminars and raised productive discussions.

It is not only important to assess the students' ability to describe techniques and results but also to evaluate their comprehension of bioinformatics as a tool for the generation and analysis of scientific data.

A relevant comment may be appropriate at this point. As the number of universities that offer a postgraduate course on bioinformatics is increasing worldwide, it is crucial that students have a first contact with related subjects at the undergraduate level. This experience, if positive, can unravel for the student an entire field of biology and include bioinformatics as an option for a postgraduate degree, substantially increasing the number of PhDs in the field.

Students' Perspectives

Student 1

“It was extremely interesting! In the beginning I felt lost, knowing very little of bioinformatics. As soon as I started working in the Laboratory, the concepts I had learned in the classroom suddenly made sense. Just last week I had to generate a structure for a protein in order to simulate the effect of a specific mutation and I was able to do it all by myself! This means I really learned the content of the subject. These classes were really important to me, especially because I intend to pursue a PhD degree in bioinformatics and there is nothing else on the subject at the undergraduate level.”

Student 2

“Everyone in the course learned at least the basics about proteins and modeling. As you said in class, a model made by bioinformatics is a snapshot of a femtosecond in time. I think in the future this will change and we will have a program that computes a model that takes into account different energy states on a given range. And we will be able to visualize such differences. Docking calculations will also change and be able to use these dynamic models to define the best fit between the protein and its ligand, somehow imitating reality. I would love to work in the developing of such program!”

Student 3

“For a long time, bioinformatics was like an oasis to me: beautiful but beyond reach. The possibilities offered in this field are amazing, but as it requires the domain of a computational language, at first I thought it wasn't for me. In this course, I was able to overcome this pre-assumption, because the step by step program made it easy to understand the process and we also had time to try everything ourselves. It is really gratifying to see the work we had in those two months transformed into a protein. Now, I feel capable and confident enough to model any protein and, suddenly, bioinformatics is little bit closer.”

Student 4

“Nowadays it's easy, accurate and each time cheaper to synthesize long DNA sequences. One of the most common applications for long, synthetic DNA constructs is the synthesis of genes that encode protein targets intended for expression. I'm a post doc student and I recently met the world of Synthetic Biology. My task was to synthesize a multi-epitope protein that could be used as a vaccine. The protein I created is totally new, so the use of bioinformatics tools was central since the construction of the nucleotide sequence until the study of the physicochemical and biological characteristics of my protein. From these information I could predict the behavior of my protein and adjust their structure until it was a good vaccine candidate, it was great to see how these tools can help to optimize the experimental work!”

Acknowledgements

The authors thank the invaluable intellectual contribution of Dr. Glória Franco for this work. They also thank the Bioinformatics Graduate Program from UFMG and its Coordinator, Dr. Vasco Ariston, for allowing this course to happen and providing the necessary infrastructure. They also thank the financial support of Brazilian funding agencies: CAPES, CNPq, and Fapemig.

Ancillary