This work was supported by the National Science Foundation Major Research Instrumentation Program (Award no. 0079393) and the Howard Hughes Medical Institute University Award Program.
Proteomics is one of the important new disciplines to emerge from the genome sequencing projects of the last decade. In order to introduce our students to the techniques and promise of this emerging field, a capstone laboratory experience has been developed. The exercise involves multiple aspects of proteomics research including microbial culturing methods, two-dimensional gel electrophoresis techniques, matrix-assisted laser desorption-ionization time-of-flight mass spectrometry, and database mining. Over a 12-week semester, students design their own experiments and apply a proteomic approach to investigate the heat shock response in Escherichia coli. In the trial presented in this article, students successfully identified several major heat shock proteins. The laboratory outlined here can be readily adapted to explore a wide variety of responses in metabolic pathways or responses resulting from other environmental insults or stresses. Additionally, the laboratory can be modified to explore the proteomes of organelles, tissues, and other model organisms.
Proteomics is one of the many new disciplines to emerge from genome sequencing projects and the Human Genome Project. Some of these new areas of study include: genomics, which seeks to determine the structure and organization of a genome as well as variations between species; bioinformatics, which extracts or mines biological information from DNA sequence information; and functional and structural genomics, which shifts the emphasis from mapping the genomes to determining the biological function of open reading frames or determination of three-dimensional structures of proteins. Proteomics complements these other areas by striving to connect physiological processes to biological pathways, regulatory mechanisms and signaling cascades through the identification and quantification of proteins expressed by a cell, the localization of proteins, specific protein-protein interactions, and post-translational modifications. Proteomics is derived from the term proteome and originally described the systematic study of the protein complement expressed by a genome of a cell, or tissue [1–5]. A typical approach for examining the proteome (Fig. 1) involves four basic steps: 1) separation of proteins by two-dimensional gel electrophoresis; 2) isolation and enzymatic cleavage of particular proteins of interest; 3) peptide analysis by either matrix-assisted laser desorption-ionization time-of-flight (MALDI-TOF)11 or electrospray ionization (ESI) mass spectrometry (MS); and 4) analysis of the peptide mass fingerprint results using DNA or protein databases to identify the isolated protein.
The field of proteomics provides information that cannot readily be gained through genomic or bioinformatic methods and therefore provides a nice complement to these methods . For example, proteomics provides information regarding which proteins and how much of each protein is expressed by the cell. In addition, because proteomics directly examines proteins by mass spectrometry, insight can be obtained about the diverse collection post-translational modifications (such as phosphorylation or glycosylation) that are involved in the regulatory mechanisms, which modulate essential cellular processes. Moreover, the information gained through proteomics reflects the dynamic nature of the cell, in which, for example, the proteome can change over a matter of minutes in contrast to the relatively fixed information in the genome. A cell's proteome reflects the particular stage of development or the current environmental condition that the cell or organism is experiencing. Therefore, proteomics can ultimately be used to identify the molecular nature of a particular disease by comparing healthy and diseased tissues and target new areas for drug development.
Proteomics provides an opportunity for biochemistry educators to translate genome sequencing projects into laboratory experiences for students. Computational experiences have successfully brought genomics and bioinformatics to the classroom for undergraduates. The experience described here accomplishes the same goal by bringing proteomics to the undergraduate laboratory curriculum. Here, we describe a semester-long capstone laboratory in which students use a proteomics approach to investigate the heat shock response in Escherichia coli. Introducing students to the emerging field of proteomics offers a powerful way to combine the genome projects and bioinformatics with modern chemical and biochemical techniques, such as MALDI-TOF MS, isoelectric focusing (IEF), and gel electrophoresis. A capstone proteomics laboratory offers numerous pedagogical opportunities for preparing students to participate in research. First, the laboratory experience captures the breadth of a student's undergraduate experience by having a student call upon, synthesize, and integrate material, techniques, and concepts learned over the course of their education. The second pedagogical advantage is that proteomics allows a student to directly observe the dynamic nature of a cellular response by introducing a perturbation into its surrounding with an environmental “insult.” An advantage of examining the heat shock response is that the topic is well described in the literature, and the laboratory exercise will allow students to consider cellular adaptations at the transcriptional, translational, and post-translational levels, a feature not typically found in a single laboratory exercise or project. Finally, this laboratory experience affords students the opportunity to systematically explore the primary literature, with a focus on a specific system, and then systematically integrate that information into the design of their own experiments. In the case of the heat shock response, this topic also leads very nicely into discussions about protein folding and the relationship between structure and function.
BACKGROUND INFORMATION ON MOLECULAR CHAPERONES AND THE HEAT SHOCK RESPONSE
The response observed in the E. coli proteome is a phenomenon that is widespread among living cells. The E. coli heat shock system is considered by some to be a simplified form of the response observed in other organisms . When the bacteria are stressed with elevated temperatures (>42 °C), a set of proteins is synthesized in order to maintain the viability of the cell. These proteins are known as heat shock proteins (Hsps), many of which are molecular chaperones. In addition to molecular chaperones, other Hsps include proteases and regulatory factors [6–10]. The major E. coli molecular chaperones include GroEL (Hsp70), GroES (Hsp10), and DnaK (Hsp60). These proteins are well studied, and their role in response to thermal stress is believed to be associated with their function as molecular chaperones by aiding the proper folding of misfolded or unfolded proteins.
Molecular chaperones are a family of proteins that exhibit broad substrate specificity by recognizing non-native proteins involved in intermediate stages of folding, degradation, or membrane translocation . The structure of the GroEL/GroES complex has been solved in the presence and absence of bound ATP, and these structures and other studies have provided insight into the allosteric changes that occur during the chaperonin folding cycle. Briefly, GroEL forms a 14-mer of identical subunits arranged into stacked rings, with hydrophobic binding sites that line the cavity of the complex . Misfolded or denatured proteins bind to the GroEL cavity causing a conformational change that promotes the binding of ATP and subsequent binding of the co-chaperone GroES. These binding events cause a large conformational change that serves to seal the chaperone cavity. Although the precise mechanism of ATP hydrolysis and substrate release remain unclear, a single amino acid (Asp398) has been implicated in GroEL ATPase activity . Additionally, GroES appears to prevent the release of the polypeptide until folding is complete . Also unclear is the precise mechanism by which the GroEL/GroES complex facilitates protein folding. One suggestion is that protein folding is mediated by simply preventing aggregation of folding intermediates. Alternatively, a more complicated mechanism may be involved where protein folding is actively assisted in the cavity of the GroEL/GroES complex .
The heat shock response is characterized by changes at the transcriptional, translational, and post-translational levels [12–14]. Within seconds of a shift in temperature, transcription of heat shock genes increases until the rate of protein synthesis of GroEL and DnaK is increased by 13-fold and 8-fold, respectively . Induction of the heat-shock genes is modulated by a positive regulator, the σ32 subunit of RNA polymerase [6, 15, 16]. The σ32 subunit replaces the σ70 subunit in the core RNA polymerase complex, and under heat shock conditions, the σ32 subunit confers specificity to the heat shock promoters. The σ32 subunit is observed to have a transient increase in expression levels on heat shock. After heat shock, the σ32 subunit declines to new steady-state level that is elevated from the initial expression level at 30 °C . At 42 °C, a 2-fold increase is observed for these major Hsps, whereas a 10-fold increase is observed at 46 °C for the major Hsps such as GroEL and DnaK . Ultimately these Hsps account for 20% of the total cellular protein . Under heat shock conditions, a large population of cellular proteins begins to denature and are then recognized by molecular chaperons. Therefore, the increase in Hsps expression is thought to help the cell survive and manage the large number of denatured or misfolded cellular proteins that accumulate as a consequence of thermal stress.
This laboratory experience is set in the second semester of the senior year. Meeting 8 hours each week for 12 weeks, groups of two or three students (with a total class size of 12 students) are asked to explore the heat shock response in E. coli by designing their own experiments based on literature precedent. In addition to the experiment described here, students have also explored the difference in the E. coli proteome expression profile when cells are subjected to a temperature jump (30 ° to 46 °C over 3 min), versus a gradual temperature increase (30 ° to 46 °C over 60 min), as well the proteome expression profile associated with a decrease in temperature (the cold shock response). Students have also explored changes in the E. coli proteome expression profile with respect to growth on rich versus minimal media, and growth on alternate carbon sources, such as acetate and pyruvate versus growth on glucose.
Students do not have a structured laboratory meeting time; rather, students schedule their own laboratory time based on the requirements of the experiment. This flexibility affords students the opportunity to get some experience as researchers might and provides students with a real sense of ownership in their education. Of course, this scheduling feature is not necessary to adopt this laboratory. The first 2 weeks involve a series of group meetings with the instructor during which the students discuss the primary literature, develop their experimental designs, and detail the experimental logistics. Weeks 2–6 involve performing the heat shock experiment, prepare samples, and complete the first- and second-dimension analysis of the experiments. Weeks 7–10 involve identification of proteins that the students have targeted for study by MALDI-TOF MS. Weeks 11–12 involve writing up detailed laboratory reports and presenting their experimental results.
INSTRUMENTATION: MALDI-TOF MS
MALDI-TOF MS is an important tool used to identify proteins through the generation of peptide mass “fingerprints” from protease digestion of a target protein [17–19]. Although trypsin is the most commonly used protease, other enzymes such as chymotrypsin, endopeptidases Asp-C or Lys C, or even a chemical reagent such as cyanogen bromide (CNBr) can be used to cleave the polypeptide backbone in a sequence-specific fashion. When peptides of a protein are generated by digestion, a characteristic profile of masses, or peptide fingerprint, can be observed when examined by mass spectrometry (Scheme 1).
To enhance mass resolution and accuracy, MALDI-TOF instruments typically use delayed extraction techniques to minimize the distribution of initial kinetic energies after laser excitation, as well as ion mirroring when the instrument is used with a reflector. The mass resolving power for a linear instrument is in the range of 4,000 Da; whereas, a reflector instrument can have a resolving power up to 10,000 Da.
Although MALDI-TOF is an integral tool for the identification of proteins, the on-site availability of this instrument is not a requirement to implement this laboratory experience. Mass spectrometer core facilities at research universities offer mass spectrometry services to outside users at reasonable prices (e.g. $27 at Kansas State University Biotechnology Core Facility to $58 at the University of Wisconsin Biotechnology Mass Spectrometry Facility). Our instrument was acquired with support provided by the National Science Foundation, Major Research Instrumentation program. This laboratory project was part of the original proposal and was acquired as part of a faculty consortium that uses MALDI-TOF MS as an integral tool in their research.
MATERIALS AND METHODS
Culturing of E. coli K-12 MG1655 cells—
The E. coli stain used in this study was the same one used in the E. coli genome sequencing project . The strain was provided by the E. coli Genetic Stock Center at Yale University (cgsc.biology.yale.edu). A starter culture flask containing 50 ml Luria-Bertani (LB) media was inoculated from a previously prepared glycerol stock of the strain and incubated for 12 h at 30 °C. At an A595 of 0.2, a 10-ml aliquot was used to inoculate a 4-l Fernbach flask containing 1.0 l of LB media. The culture was grown to an A595 of 0.8, and a 50-ml aliquot was harvested by centrifugation (5000 × g) at 4 °C. The supernatant was discarded, and the pellet was quick frozen and stored at −80 °C. The incubator temperature was increased to 46 °C, and once the target temperature was achieved, 50-ml aliquots were removed after intervals of 30 and 60 min and processed as described above.
Using an established protocol , the pelleted sample was warmed to 25 °C and washed three times with 500 μl of a 50-mM 4-(2-hydroxyethyl)-1-piperazineethanesulfonic acid (HEPES) solution to remove residual LB media. The pellet was dissolved in 250 μl of lysis buffer (8 M urea, 4% w/v CHAPS, 40 mM Tris, pH 7.8). The protein concentration of each sample was determined using the Christian-Warburg assay, by measuring the absorbance of each sample at 260 and 280 nm. The protein concentration is equal to [protein] mg/ml = 1.55·A280 − 0.757·A260 [22–24].
First Dimension: IEF —
Continuing with the protocol as described , 11-cm PROTEAN IEF cell and Immobilized pH Gradient (IPG) strips (Bio-Rad, Hercules, CA) (pH 4.7–5.9) were used for separation in the first dimension. For the gels stained with Novex colloidal blue (Invitrogen, Carslbad, CA) presented here, 1 mg of total protein was used in the rehydration of the strip. Strips were placed in IEF cell wells with the desired amount of protein added to make 185 μl rehydration buffer (8 M urea, 2% w/v CHAPS, 0.001% w/v bromphenol blue, 0.2% w/v BioLytes [Bio-Rad], and 7 mg of D,L-dithiothreitol [DTT] was added just before use). The IPG strips were covered with mineral oil and actively rehydrated at 50 V for 12 h at 20 °C. Electrode wicks, prewetted in ddH2O, were added to the tray's electrodes and conditioned for 15 min at 250 V. The voltage was ramped rapidly over 2.5 h and finally focused at 8,000 V for a total of 35,000 vHours. Once focusing of the sample was complete, the strips were covered with equilibration buffer I (6 M urea, 20% v/v glycerol, 1.5 M Tris·HCl, 2% w/v SDS, 130 mM DTT, pH 7.0) and placed on an orbital shaker for 10 min at 20 °C. The buffer was removed and an equilibration buffer II was applied (6 M urea, 20% v/v glycerol, 1.5 M Tris·HCl, 2% w/v SDS, 135 mM iodoacetamide, pH 7.8) and placed on an orbital shaker for 10 min at 20 °C. The strips were stored at −80 °C until use.
Second Dimension: SDS Gel Electrophoresis and Gel Staining—
As described previously [25–27], IPG strips were placed on Criterion Precast (Bio-Rad) 1-mm, 4–20% acrylamide gradient SDS-PAGE gels and run at 200 V in 25 mM Tris, 192 mM glycine, 0.1% w/v SDS, pH 8.3, buffer for 55 min. Gels were stained using Novex (Invitrogen) colloidal blue. Gels were imaged on a Fluor S-Multi Imager Scanner (PerkinElmer, Wellsley, MA) and analyzed using two-dimensional (PDQuest, Research Triangle Park, NC) gel analysis software. Matched sets were made of the gel time course, and proteins of interest were tentatively identified by isoelectric point (pI) and molecular weight in comparison with two-dimensional gel images from the SWISS-2-DPAGE database (us.expasy.org/ch2-D/). The 30 °C gel was selected as the standard gel, and the time points after temperature increase were normalized using housekeeping proteins that did not appear to significantly change between the control and heat shock gels.
Protein Excision, Digestion, and MALDI-TOF MS —
Proteins of interest were excised from the gels using sterile glass pipettes. The excised gel fragments were destained by washing the gel three times with 400 μl of 50% acetonitrile (ACN), 25 mM ammonium bicarbonate solution (pH, 8.0). The gel pieces were then dehydrated with a 5-min rinse of 100% ACN. The gel fragments were lyophilized for 30 min, rehydrated, and incubated at 37 °C for 16–24 h in 50 μl of 25 mM ammonium bicarbonate containing 10 μg sequencing-grade trypsin. The fragments were washed twice and incubated at 37 °C for 30 min in 50% ACN/5% trifluoroacetic acid (TFA) solution. The supernatants were combined and lyophilized to dryness. The eluted peptide fragments were reconstituted in 3 μl of 50% ACN/0.1% TFA solution. A 0.5-μl sample was directly combined with 0.5 μl freshly prepared matrix solution (50% ACN/0.1% TFA saturated with recrystalized α-cyano-4-hydroxycinammic acid) and placed on a MALDI-TOF stage. The remaining sample volume was saved for subsequent C18 zip-tip treatment, if necessary, to remove contaminating salts. All peptide fingerprint data were acquired using a Voyager DE-Pro (Applied Biosystems, Foster City, CA) with reflectron, working in positive ion mode with an accelerating voltage of 20,000 V, a grid voltage of 76%, a 0.0–0.01% guide wire, with an extraction time of 150–300 nsec, using a close external calibrant. When possible, trypsin autolysis products were used as an internal calibrant. MALDI-TOF mass spectra were processed with Voyager 5.0 software using noise reduction, resolution-based Gaussian smoothing and de-isotoping programs. After de-isotoping, the peptide masses are collected and analyzed using the Protein Prospector (v 4.0.4; prospector.ucsf.edu). The MS-Fit program, which serves to correlate parent peptide mass fingerprint data (not fragment masses) with proteins virtually digested in a sequence database, was used to fit the submitted data. A wide range of options are offered to the user in MS-Fit to help refine the search, including the ability to filter the search by species, protein molecular weight, and pI, as well as by types of post-translational modification. To achieve the highest certainty of protein identification, however, the parent ion masses need to be determined as accurately as possible. Results are evaluated on the basis of molecular weight search (MOWSE) scores  and evaluated against other available evidence, such as the pI and molecular weight of the spot excised from the gel.
RESULTS AND DISCUSSION
Using a typical proteomics approach (Fig. 1), we examined the effect of elevated temperatures on the proteome of E. coli. Specifically, we subjected cells to a temperature shift from 30 °C to 46 °C. This temperature shift was selected in order to maximize heat shock protein expression levels . Samples of the proteome were collected from the culture growing at 30 °C and at time points after the temperature shift to 46 °C. These samples were solubilized, and the proteins were separated using two-dimensional gel electrophoresis. The proteome from the different temperatures and time points were compared. Individual protein spots that were found to have an increase in expression levels were tentatively identified on the basis of pI, molecular weight, and through a comparison with gel maps available at the SWISS-2-D gel database. The target protein spots were then excised from the gel and subjected to trypsin digestion. The resulting peptide fragments were analyzed by MADLI-TOF MS. The peptide mass results from MADLI-TOF MS were then compared with the Swiss-Prot database using the Protein Prospector interface, and several target proteins were identified as heat shock proteins.
The following are examples of data acquired by students. Figure 2 shows the changes in the proteome expression pattern observed on two-dimensional electrophoresis gels from a pH range of 4.7–5.9, associated with a temperature change from 30 °C to 46 °C. After cells were exposed to elevated temperatures for 30–60 min, a dramatic increase in the expression of proteins was observed in the upper left region of the gel. As apparent from our gels, hundreds of different proteins are observed over this narrow pH range. To simplify our analysis, we focused on the expected location of GroES and the significant changes observed in the region estimated to be between pH 4.8 and 5.0 and 40–80 kDa (Fig. 3). This region is expected to contain the major heat shock proteins GroEL and DnaK. These expanded views clearly illustrate that a significant change in protein expression occurs over time in response to a shift in temperature.
By comparing our proteome expression profiles with the SWISS-2-DPAGE database [25–27], we made a preliminary identification of five different spots from this region. These preliminary identifications distinguished two different forms of GroEL, two different forms of DnaK and one form of S1 peptide from the 30S ribosomal subunit. Consistent with reports in the literature and the SWISS-2-DPAGE database, the two forms of GroEL and DnaK apparently represent unphosphorylated and phosphorylated species . Both forms of GroEL and DnaK were present at all temperatures and times examined, with the putative phosphorylated forms having a higher molecular weight than the unphosphorylated form for both proteins . The putative unphosphorylated form also appears to have had a greater increase in expression relative to the increased expression of the phosphorylated forms. This result is consistent with observations made by Sherman and Goldberg , who observed an increase in expression of unphosphorylated forms of GroEL and DnaK with heat shock and suggested that both GroEL and DnaK are phosphorylation targets.
Another spot was preliminarily identified as the S1 peptide of the 30S ribosomal subunit. This spot, in contrast, was observed to have a decrease in expression after 30 min of exposure to an elevated temperature and then followed by an increase in expression after 60 min. This peptide is also phosphorylated, which may be part of the adaptive response of the cell . The S1 peptide is thought to facilitate the initiation of protein translation by binding mRNA at the Shine-Delgarno sequence . Although heat shock proteins have not been shown to directly interact with the S1 ribosomal subunit, the S6 and S2 ribosomal peptides are known in vivo substrates for GroEL before their incorporation into the 30S ribosomal subunit . Given the broad specificity exhibited by molecular chaperones, it is interesting to speculate that the S1 ribosomal peptide is also a potential substrate.
To confirm our preliminary identifications of these proteins, we examined each of the six spots using MADLI-TOF MS. Protein spots were excised from the gel and subjected to trypsin digestion. The resulting peptides were analyzed using MALDI-TOF MS. A peptide fingerprint obtained from one of the spots is illustrated in Fig. 4. The peptide masses were collected and analyzed using the Protein Prospector MS-Fit program to the Swiss-Prot proteomic database. This analysis identified the peptide fingerprint as containing DnaK with a MOWSE score of 31,460. Table I details the results obtained from our submission for the peptide masses obtained by MALDI-TOF MS (m/z submitted) and the expected masses for a virtual trypsin digestion (MH+ matched). Table I reveals a measure of the quality of the data obtained by MALDI-TOF MS by indicating the difference between the submitted and calculated masses (parts per million). For this search, a mass tolerance of 100 ppm was used. Table I indicates the specific peptide sequences and amino acid numbers that were identified for DnaK and reveals that the peptide sequences that matched our mass spectrum covers 17% of the primary sequence of DnaK.
MALDI-TOF MS results for the other spots are summarized in Table II. These results suggest that we may have identified phosphorylated form of DnaK as well as GroES, with our peptide mass spectrum covering 51% of the primary sequence of GroES. A low MOWSE score of 4395 was obtained for the putative unphosphorylated GroEL spot, and a very low MOWSE score of 320 was observed for the S1 ribosomal peptide. No peptide fingerprint was obtained for the phosphorylated form of GroEL or the unphosphorylated form of DnaK. Although we could not confirm the identity of the putative S1 protein by MALDI-TOF analysis, the changes we observe in the two-dimensional gels are interesting to consider because the protein translational machinery is integral to the thermoadaptive response of the cell.
In this particular laboratory experience, we examined the heat shock response in E. coli. This model system is ideal for students as an introduction to proteomics for several reasons. First, the heat shock response has dramatic effects on the proteome that can be readily observed using two-dimensional gel electrophoresis. Second, sample proteomes for E. coli are available for visual comparison with the SWISS-2-DPAGE database. Third, the heat shock response provides an excellent venue for teaching students to use primary literature on a focused topic. Finally, the heat shock system is an ideal introduction to discussions of issues relating to the thermostability of proteins and protein folding. Although a comprehensive quantitative assessment of this laboratory experience has not been undertaken, student response in course evaluations has been positive. Students remarked on the large degree of independence they were afforded and particularly appreciated how their experiments made a connection to the genome projects and the use of modern instrumentation.
Students observed dramatic changes in the E. coli proteome in response to a shift in temperature, as the cell adapted to its new environment. Several pieces of evidence led students to conclude that they had successfully identified at least two heat shock proteins, DnaK and GroES. This information includes pI, molecular weight, and MALDI-TOF MS. Two other proteins were tentatively identified by pI and molecular weight in the two-dimensional gel, consistent with GroEL and the S1 peptide; however, students were unable to confirm the identity of the two by mass spectrometry. Additional sequence information needs to be obtained from either post-source decay MS or MS/MS experiments to confirm the identity of these proteins.
Table Table I. Protein Prospector results for DnaK
Start amino acid
End amino acid
Table Table II. Summary of Protein Prospector results