Teaching expression proteomics: From the wet-lab to the laptop

Authors

  • Miguel C. Teixeira,

    1. IBB—Institute for Biotechnology and Bioengineering, Centre for Biological and Chemical Engineering, Instituto Superior Técnico, Technical University of Lisbon, 1049-001 Lisboa, Portugal
    2. Department of Chemical and Biological Engineering, Instituto Superior Técnico, Technical University of Lisbon, 1049-001 Lisboa, Portugal
    Search for more papers by this author
  • Pedro M. Santos,

    1. IBB—Institute for Biotechnology and Bioengineering, Centre for Biological and Chemical Engineering, Instituto Superior Técnico, Technical University of Lisbon, 1049-001 Lisboa, Portugal
    Current affiliation:
    1. Molecular and Environmental Biology Centre, Department of Biology, University of Minho, Campus de Gualtar, 4710-057 Braga, Portugal
    Search for more papers by this author
  • Catarina Rodrigues,

    1. IBB—Institute for Biotechnology and Bioengineering, Centre for Biological and Chemical Engineering, Instituto Superior Técnico, Technical University of Lisbon, 1049-001 Lisboa, Portugal
    Search for more papers by this author
  • Isabel Sá-Correia

    Corresponding author
    1. IBB—Institute for Biotechnology and Bioengineering, Centre for Biological and Chemical Engineering, Instituto Superior Técnico, Technical University of Lisbon, 1049-001 Lisboa, Portugal
    2. Department of Chemical and Biological Engineering, Instituto Superior Técnico, Technical University of Lisbon, 1049-001 Lisboa, Portugal
    • IBB—Institute for Biotechnology and Bioengineering, Centre for Biological and Chemical Engineering, Instituto Superior Técnico, Technical University of Lisbon, 1049-001 Lisboa, Portugal
    Search for more papers by this author
    • Tel.: 3518417233; Fax: 351218419199


Abstract

Expression proteomics has become, in recent years, a key genome-wide expression approach in fundamental and applied life sciences. This postgenomic technology aims the quantitative analysis of all the proteins or protein forms (the so-called proteome) of a given organism in a given environmental and genetic context. It is a challenge to provide effective training in this area due to its demanding laboratory procedures and laborious computational data analysis. However, the effective training of undergraduates and postgraduates in this field is highly recommended to prepare them for the challenges of postgenomic research and of medical, industrial and other economical activities. Since 2004, the area of Biological Sciences at the Department of Chemical and Biological Engineering of Instituto Superior Técnico (IST) has been teaching Expression Proteomics to undergraduate and postgraduate students in three formats: 1) as modules of curricular units (CU), in particular of Functional Genomics and Bioinformatics (FGB), offered as a mandatory CU to IST Biological Engineering or Biotechnology Master courses students, or as an elective CU to other MSc courses with a biological component and to the MSc in Information Systems and Computer Engineering; the topic is also part of the PhD program in Biotechnology; 2) as mentored coaching, in which IST students integrate ongoing research programs at the Biological Sciences Research Group of IBB at IST; and 3) as intensive thematic courses open to the external community. In this article, educational programs and teaching methodologies and tools that we have been using are outlined, from the wet-lab to the laptop. The current role of quantitative proteomics in biological research, with emphasis on microbial stress response and on biomedical and biotechnological applications, is addressed, as a case-study, anchored on our group research activities.

Teaching in the field of Expression Proteomics is an educational strategy for contemporary University training in fundamental and applied Life Sciences and in cross-disciplinary areas of Bioinformatics and Computational Biology.

Maintaining updated curricula in biology-related courses is a permanent challenge, at every education level. This is particularly the case when approaching new conceptual advances and technologies related to the postgenomics era. Since the release of the first prokaryotic [1] and eukaryotic [2] genome sequences, in 1995 and 1996, respectively, these advances and their applications have developed dramatically. One of the most interesting possibilities that arose from the knowledge of whole genome sequences is the global analysis of gene expression in a cell. This sort of analysis was performed for the first time in yeast only one year after the release of the genome sequence, making use of recently manufactured DNA microarrays [3]. This technology, accessible today to most Biological Sciences research labs, allows the monitoring of genome-wide gene expression at the mRNA (transcript) level, being thus known as transcriptomic analysis. However, the effectors of gene function are the encoded proteins. Thus, the possibility of analyzing global gene expression at the protein level is highly desired and was attempted 20 yr before the release of the first genome sequences [4, 5]. This analysis was based on the development of two-dimensional electrophoresis (2-DE) as a technique to separate, with high discrimination power, complex protein mixtures [6]. Based on this and other approaches, Expression proteomics has become a crucial tool in fundamental and applied research, with impact in human health, agriculture, environment and biotechnology.

Proteomics is the study of all PROTeins expressed by a genOME—the so called Proteome [7]. The proteome of a cell is constantly changing, occurring adjustments in the nature and concentration of proteins as well as in their post-translational modifications. After a perturbation (e.g., an environmental aggression), the proteome is altered in order to adapt the cell to the new environmental conditions. The quantitative comparison of the protein expression profile of biological samples, before and after the perturbation, should allow the identification of proteins whose concentration is affected and, consequently, to get insights into the adaptation mechanisms. This field of proteomic analysis is called expression proteomics(as well as quantitative proteomics or comparative proteomics). The most commonly used methodology in this kind of studies involves proteome separation by two-dimensional electrophoresis (2-DE), quantification of the relative abundance of each protein species using specialized software and protein identification by mass spectrometry approaches. Previous reports on the teaching of Expression Proteomics have been focused mostly on the use of either experimental [8] or computational [9, 10] approaches. This article describes an integrated view of the subject, from the wet-lab to the laptop, aimed at helping educators in the area of biological sciences in preparing their students to face the challenges of expression proteomics in all its extension. The training of undergraduate or graduate students of biology-related trans-disciplinary educational programs in this field is highly recommended to prepare them for the challenges that they will face as researchers, clinicians or biological, biomedical, or computer engineers in the postgenomic era.

TEACHING EXPRESSION PROTEOMICS

Teaching Expression Proteomics at Instituto Superior Técnico (IST), Technical University of Lisbon, at the MSc level (MSc in “Biological Engineering” or “Biotechnology”) is basically carried out as a specific module of 2 wk within the curricular unit “Functional Genomics and Bioinformatics” (3 hr Theoretical classes + 1.5 hr of Computational laboratories, per week). This curricular unit is a mandatory discipline for around 60 students per year, as part of the Biological Engineering (40) or Biotechnology (20) curricula. These students, together with others that annually choose the discipline as an option, attend the theoretical classes and are divided in groups of 15 for computational and, when available, of 12 for wet laboratory classes. Educators selected to teach this module are those involved in this area of research, preferably having had hands-on experience in the field. In some cases, external researchers are invited to give a lecture on their knowledge and views on expression proteomics. The objective of this course is to prepare the students to conduct, in future practice, their own expression proteomics experiments. They are expected to become trained from the point of view of experimental design to technical laboratory procedure, from gel-image analysis to genome-wide expression data analysis. Students are proposed a specific biological question which they are expected to answer by the end of the expression proteomics module based on their own data.

Lectures

This expression proteomics unit begins with lectures, focusing 5 major topics: 1) from proteins to proteomes; 2) sample preparation and fractionation into sub-proteomes; 3) expression proteomics based on 2-DE; 4) mass spectrometry approaches; 5) examples and applications.

The first Topic, “From proteins to proteomes,” puts proteomics into perspective in the context of the postgenomic era and provides an historical outline on how the field of proteomics appeared and has evolved in the past 30 yr. This first topic aims at providing the basics on protein biochemistry approaches, before and after the advent of genome sequencing, to ensure that students, from different academic backgrounds, have a common foundation upon which to take full advantage of the remaining course materials.

The second topic, “Sample preparation and fractionation into subproteomes,” focuses the challenges involved in the preparation of protein solutions compatible with isoelectric focusing (IEF) and in the analysis of the proteome as a whole. As it is virtually impossible to separate with high resolution the whole proteome in a single 2-DE, prefractionation methods such as those applied to the isolation of organelles [11], to hydrophobic protein extraction [12, 13], to sequential extraction of proteins using different extraction buffers containing progressively stronger solubilizing agents [14], and chromatographic procedures for sample enrichment [15] are described. These approaches and their advantages and limitations in quantitative expression proteomics are discussed withthe students. The importance of extract purification and solubilization in buffers appropriate for isoelectric focusing [16] is also addressed.

The third topic, “Expression proteomics based on two-dimensional electrophoresis,” aims at describing the 2-DE procedure and protein level quantification, from its biochemical basis to its major applications and constraints. The basis of protein separation in the first step of 2-DE by isoelectric focusing (IEF) is discussed. This step is commonly carried out in polyacrylamide matrices containing an immobilized pH gradient (IPG) where the polyampholytic proteins migrate, under a strong electric field, along a pH gradient strip toward the pH at which their net charge is null. The second step of protein separation, according to their molecular mass by SDS-PAGE (sodium dodecylsulphate–polyacrylamide gel electrophoresis), is also focused (Fig. 1). Staining procedure to detect and quantify the separated proteins, its dynamic range (range of linearity between the spot intensity and the concentration of the protein), its reproducibility and its compatibility with protein identification methods are discussed. Methods for protein detection, requiring protein labeling before the 2-DE procedure, are described, including in vivo protein radiolabeling using 35S-methionine or in vitro labeling of protein extracts with fluorescent dyes (e.g., using CyDyes for DIGE–Difference Gel Electrophoresis). Capture of images and gel analysis steps and constraints are also considered in class.

Figure 1.

Schematic representation of the process of protein separation by two-dimensional electrophoresis. IEF, isoelectric focusing; SDS-PAGE, sodium dodecylsulphate-polyacrylamide gel electrophoresis.

The fourth topic, “MS approaches,” aims at highlighting the mass spectrometry techniques used together or as alternative to 2-DE in expression proteomics. Mass spectrometry (MS), the most used technique for identification of proteins separated from 2D electrophoresis, is presented, from the step of spot removal from the gel, to in-gel protein digestion with proteases (commonly trypsin) and subsequent Peptide mass fingerprinting (PMF). Peptides masses experimentally determined for a proteolytic digest by mass spectrometry are compared to predicted masses from in silico digestions of all proteins present in a given database. The importance of genome sequencing to the exploitation of this rapid, low-cost and high-throughput protein identification methodology is highlighted. By this time, students will be aware of the possibilities and constraints of the 2-DE-based expression proteomics procedure, to be followed, step-by-step, in the wet-lab classes (Fig. 2). Other approaches used for protein fractionation as alternatives to 2-DE, relying on protein separation by chromatographic techniques are also exemplified. Mass spectrometry methods for quantitative proteomics are also outlined. The use of sample labeling with light or heavy isotopes for relative and absolute protein quantification by tandem MS is provided as an example of such modern approaches [17, 18].

Figure 2.

Schematic representation of the steps required for expression proteomic analysis using 2-DE electrophoresis.

The last topic, “Examples and applications,” has the objective to open the students' horizons to explore the overwhelming possibilities to obtain new biological knowledge opened by the expression proteomics approaches. Examples include case-studies based on our own experience in using Expression Proteomics to get mechanistic insights into the response and resistance to chemical stress resistance in two microbial model systems: the yeast Saccharomyces cerevisiae [19, 20] and bacteria of Pseudomonas species [21, 22]. One of these case-studies is then used as the basis for the laboratory and computational classes. Beyond our current documented research experience, important applications in the field of clinical proteomics are also outlined [23]. For example, the exploitation of expression proteomics to identify new disease biomarkers, with expected impact on diagnosis, on the evaluation and design of new pharmaceuticals [23] and on the follow up of treatment progression are considered with the students.

WET-LAB CLASSES

Teaching expression proteomics in laboratory classes to regular undergraduate or graduate courses can be a real challenge due to time and cost constraints. So far, we have only made such laboratory modules available to intensive one-week hands-on postgraduate course on expression proteomics, taking advantage of their lower number and the intensive period of tutoring. However, it would be desirable to extend these wet-lab classes to normal MSc curricula. A typical lab class on this subject would follow the procedure delineated in Fig. 3. The following procedure will be exemplified by the use of cell extracts obtained from Saccharomyces cerevisiae populations exposed or not to the agricultural fungicide mancozeb. The assessment of the changes occurring at the proteome level of model organisms, such as the eukaryotic experimental model Saccharomyces cerevisiae, may provide earlier and more sensitive biomarkers of a toxic response, which sometimes is only detected after years of chronic environmental exposure. This study on the field of Yeast Toxicoproteomics was recently published [19] and is the basis of the data analyzed as an example in these classes.

Figure 3.

2-DE procedure. 1, proteins resuspended in the IEF buffer; 4, sample and IPG strip application in the support for IPG strips; 5, placement of IPG strip holders in the IEF device; 7, SDS-PAGE for the second dimension; 8, gel stained for protein visualization; 9, image scanner for gel image densitometry.

The purpose of the wet-lab classes is to make the students aware of the technical challenges involved in this approach. Four groups, comprising three students each, are proposed to work at the same time, preparing one 2-DE per group. Several samples can be used with the objective of showing the diversity of biological material that can be used and how results can be explored. In this example, two of the groups are expected to obtain protein extracts from unstressed cells, while the other two will obtain their protein extracts from mancozeb-stressed cell populations. Each group will prepare their protein extracts and run a single gel. In the end of the wet-lab class each group will use the four gels produced in class, two replicates of each condition, for the comparative expression proteomics analysis using computational tools as explore in the laptop classes. The followed laboratory procedure (Fig. 3) takes 2 days and the equipment needed for such an experimental class is estimated to cost around 15,000 euros. However, upon this initial investment, which some labs have already undergone for research purposes, the cost per gel of the procedure described in Fig. 3 will be below 40 euros, which may be compatible with some University budgets.

DRY-LAB CLASSES

Once high quality images are obtained, software packages are required for gel analysis. These packages are made available by a number of different companies, including BioRad, GE Healthcare, and Progenesis. Most of these companies provide free tutorial versions, together with demonstration gels, which are downloadable from the web and can be used in practical computational classes to train the students in their use. We have been teaching such classes as part of a course on “Functional genomics and bioinformatics,” which is mandatory for the MSc courses in “Biological engineering” or “Biotechnology” and is part of the Biotechnology PhD program of a number of students. Our experience shows that this hands-on approach is essential to highlight the major challenges involved in these analyses and to emphasize the importance of obtaining good, comparable gels in order to attain reproducible and statistically significant results.

Independent of the software, the steps followed in the comparative analysis of the gels are similar. In all cases, the gel images are classified as usable or not depending on the degree of saturation reached during the scanning process. Afterwards, one gel is selected as the reference gel and, then, gels are analyzed individually for:

  • (i)Spot detection: This is an important step in which the software user has to optimize parameters in order to discriminate adjacent spots and disregard impurities in the gel.
  • (ii)Spot matching: in which corresponding spots in each different gel are matched. This is also a crucial step that can usually be guided by the manual drawing of a few vectors throughout the gel area to take into account gel-to-gel distortion variation. Automatic matching will afterwards use these vectors to guide the process of matching all the spots in all the gels, as accurately as possible.
  • (iii)Background subtraction and normalization follow this step, to allow normalized quantification of spot intensity.

Following these steps, gels have to be grouped as multiple replicates of each experimental condition analyzed and then the software is capable of analyzing the changes in relative intensity of each protein spot in the different gels and of calculating the statistical significance of the observed differential spot intensity. Most software are also able to use these results to perform spot clustering based on intensity profiles, which corresponds to protein clustering based on expression profiles (Fig. 4). The proteins found to be of interest, for instance those that are upregulated or downregulated in cells exposed to a given stress agent, can then be identified by MS.

Figure 4.

Visualization of two of the steps leading to the comparative analysis of 2D gels using the SameSpots software (NonLinear Dynamics). (a) Representation of Spot Matching between two gels obtained by the students. (b) Dendogram obtained upon protein clustering based on expression profile and, below, expression profile of one cluster of proteins whose content increases in cells changing from condition t1 to t2.

For classroom tutoring, a list of proteins whose expression is upregulated or downregulated in yeast cells exposed to mancozeb [19] is provided to students to proceed with the analysis of the biological meaning of the observed protein expression changes. The introduction of this Bioinformatics module has largely promoted active learning in our classrooms and enhanced student understanding of genome-wide expression analysis. Several computational tools are available online to facilitate this analysis and provide, in a semiautomatic way, the most relevant biological processes and underlying signaling mechanisms involved in the registered proteome-wide changes. As they are using a dataset from the yeast Saccharomyces cerevisiae, students are guided to group proteins according to:

  • 1)their description using GO terms. To do so, two computational tools are used: FatiGO (http://fatigo.bioinfo.cipf.es/) which orders the groups of proteins based on the number of proteins associated to each GO term; and GOToolBox (http://crfb.univ-mrs.fr/GOToolBox/index.php), which orders the groups of proteins according to statistical significance. This criterion is based on the over-representation of each GO term in the protein list compared with the whole proteome.
  • 2)the transcription factors predicted to regulate their expression, using the YEASTRACT database (www.yeastract.com). YEASTRACT is a repository of all documented regulatory associations in Saccharomyces cerevisiae and allows the students and researchers to predict the transcription regulatory networks underlying gene expression changes at a genomic scale [24, 25]. Although this analysis is more directly linked to gene expression analysis at the transcript/transcriptome level, it provides interesting suggestions on the regulatory network possibly involved in the observed proteome wide changes (data obtained by the students is exemplified in Fig. 5).
  • 3)the physical and genetic interactions between them [BioGRID (http://www.thebiogrid.org/)].
  • 4)the metabolic pathways in which they participate [KEGG (http://fatigo.bioinfo.cipf.es/)].
Figure 5.

Visualization of the transcription regulatory network presumed to underlie the increase in antioxidant protein expression (inside the white boxes) in yeast cells exposed to the fungicide mancozeb [19], obtained by the students using the YEASTRACT database [24, 25].

The selection of the dataset to be used in this dry-lab class is based on our own research expertise. It is, however, important to focus these classes on model organisms, among which Saccharomyces cerevisiae is a great example. Model organisms are the ones for which the amount of available information and associated computational tools makes possible the use of bioinformatics methods to solve biological problems, being thus perfect for making the students sensitive to the application of computational tools and approaches. Depending on the case-study and on the educational level/purpose, students in our dry-lab classes are guided in this bioinformatics analysis to provide logical explanations for the registered proteomic changes and to draw hypothesis and propose experimental approaches to validate them. Since the students are using data obtained in our own research programs, they are encouraged to compare their results and hypothesis with our paper on the subject [19]. This analysis in class is relatively superficial, but it prepares the students for a more detailed analysis to be carried out as mini-projects, as detailed in the section “Mentored coaching.” The assessment of the students performance during practical classes involving the use of bioinformatics tools applied to expression proteomics is made through a written report describing the approaches and main conclusions obtained in class. This report, together with others concerning the remaining computer laboratory classes of the course, represents 30% of the final grade of the curricular unit of Functional Genomics and Bioinformatics. The remaining 70% correspond to the assessment of the mini-project.

E-LEARNING PLATFORM

The basic theoretical concepts and some experimental procedures underlying 2-DE-based Expression Proteomics have been introduced as advanced contents in an e-learning portal (in Portuguese) under development at IST: e-escola (www.e-escola.pt). In the scientific area of Biology, this e-learning facility aims at promoting scientific culture and learning in the domain of life sciences, with particular emphasis on microbiology and the modern approaches in molecular biology, genetic engineering, functional genomics, proteomics and bioinformatics. It is, as far as we know, the first e-learning gateway written in Portuguese to include an advanced and detailed subtopic in Proteomics (http://www.e-escola.pt/ftema.asp?canal=biologia&id=128) and deeply facilitates undergraduate and graduate student studying and learning of the subject. It also provides an experimental procedure for 2-DE to be used in any lab with appropriate equipment (http://www.e-escola.pt/ftema.asp?id=116&canal=biologia). Despite the high degree of specificity and complexity of this particular subject, its related contents in “e-escola” were visited more than 800 times by 600 visitors from Portuguese speaking countries (Portugal and Brazil accounting for the majority of these visits, 534 and 135, respectively) in the last 3 mo. This is a small number compared with the 25,000 visits, registered during the same period, for the most popular content, “Microscopic observation of bacteria-gram coloration”, and reflects the advanced nature of the topic and the need to familiarize students, teachers and researchers with the emerging field of functional genomics, in particular with expression proteomics.

MENTORED COACHING

The best way to teach complex technological applications is at a personal basis, allowing the student to perform a full experiment in a professional environment, which is, of course, very difficult within the context of MSc or even PhD regular courses. However, we try to overcome this constraint by proposing to groups of MSc students (2nd cycle according to Bologna curricular reformulation), as part of the curricular unit of Functional Genomics and Bioinformatics, Mini Research Projects in which small groups are invited to follow 2-DE experiments ongoing in the research lab and to run their own computational and statistical analysis of the obtained 2D gels. In this case, they have the chance to use real-life gels (as opposed to demonstration gels) and the available 2D protein maps and to identify proteins whose expression changes significantly in the studied case and then proceed to hypothesize the biological meaning of the obtained results. This analysis requires suitable computational tools and also extensive bibliographic search. Throughout this process, small groups of students are guided by a senior researcher and accompanied by the PhD student in whose project they become involved. They, thus, have the opportunity to discuss and learn from the research team on the value of the proteomic data obtained for the elucidation of specific biological issues. The results of the mini-project (focused on expression proteomics or in other important topics of functional genomics and bioinformatics) are presented as a written report and as an oral presentation in the class. The assessment of these activities represents 70% of the final grade of the curricular unit of functional genomics and bioinformatics.

In an even more personalized fashion, MSc and PhD students are also put in contact with expression proteomics by working in a daily basis in the lab, when this is required for the fulfillment of their MSc or PhD course programs or dissertations. For PhD students, we usually encourage not only the use of the proteomic approaches already available in the lab, but their further development and the optimization of new approaches (e.g., membrane proteomics and phophoproteomics procedures) or the extension of existing procedures to other biological systems and issues, including opportunistic bacteria and higher eukaryotes (e.g., stem cells and plant proteomics).

CONCLUSIONS

The technological advances in the field of expression proteomics are changing the way how biological research is planned and carried out. Beyond the available, new or ever improving experimental techniques, expression proteomics produces a tremendous amount of data and requires the coordinate activities of multidisciplinary teams with expertise in biological sciences and functional genomics, bioinformatics and computational biology. Educators are beginning to teach these subjects, at least in a theoretical perspective to undergraduate and graduate students. When possible, these approaches are also taught in laboratory and computer classes which allow the students to fully understand the potential and the current limitations of a trans-disciplinary functional genomics approach. Although a comprehensive quantitative study of the learning outcome of this teaching procedure was not carried out, student reaction to the expression proteomics module of the curricular unit of functional genomics and bioinformatics has been very positive. Specially, the possibility to be integrated in the ongoing research projects, to deal with unpublished material and to get, in first hand, new insights into a specific research hot topic is considered by them challenging and a great opportunity to fully understand this approach from its various perspectives, from the experiment itself to the computational and data analysis point of view. To make the wet-lab classes available to all students attending the course, the major field for improvement in this educational program, is one of our current objectives.

Given the current implications and the promising potential of the Proteomics field in all areas of life sciences and clinical practice, teaching proteomics is instrumental to prepare a new generation of scientists to their future challenges. It is thus pivotal that educators adjust the curricula of their courses to live up to the requirements of modern biology, reflecting the inescapable transformation that postgenomics approaches brought up in the way science is made and providing better solutions to the major challenges in the fields of medicine, biotechnology and agriculture.

Ancillary