Identifying gel-separated proteins using in-gel digestion, mass spectrometry, and database searching

Consider the chemistry



Matrix-assisted laser desorption/ionization (MALDI) mass spectrometry is an important bioanalytical technique in drug discovery, proteomics, and research at the biology-chemistry interface. This is an especially powerful tool when combined with gel separation of proteins and database mining using the mass spectral data. Currently, few hands-on laboratory opportunities exist for undergraduate students to master this technique despite the usefulness of this technique in biological research. One reason for this lack of incorporation into the teaching curriculum is the relatively low number of published laboratory experiments that demonstrate how mass spectrometry can be incorporated into undergraduate laboratories. We present a simple experiment designed to introduce students to the analysis of gel separated proteins using mass spectrometry. In this experiment, students analyze one or more proteins using gel electrophoresis, followed by in-gel digestion, MALDI-time-of-flight (TOF) mass spectrometry and database mining. The experiment also demonstrates how erroneous results can be obtained if careful attention is not paid to all aspects of the experimental process. The data presented here can be used in a classroom or laboratory setting even if hands-on access to a MALDI-TOF mass spectrometer is not possible.

Over the past 10 years the field of proteomics, the study of all proteins in an organism at a given point in time, has become a major field of research. One of the main tools used in proteomic studies is mass spectrometry. Unfortunately, as is often the case, academic curricula have lagged behind the research community in their embrace of biological mass spectrometry. Few textbooks devote more than a few pages, if any, to this important bioanalytical technique. Consequently, most academic institutions do not provide classroom instruction in this area, and fewer still have incorporated biological mass spectrometry into their laboratory curricula. Electrospray ionization (ESI) [1, 2] and matrix-assisted laser desorption/ionization (MALDI) [3–5] allow large biological molecules, including proteins, oligonucleotides, oligosaccharides, and lipids, to be ionized and analyzed by mass spectrometers. These are clearly important advances for the biology and chemistry fields, as demonstrated by the 2002 Nobel Prize being awarded in part for the development of these techniques. To prepare students for careers at the biology-chemistry interface, they need to be exposed to these techniques, preferably beginning at the undergraduate level.

One of the likely impediments to the incorporation of biological mass spectrometry into the undergraduate laboratory curriculum is the relatively small number of published laboratory procedures that demonstrate how this technique can be incorporated into the teaching laboratory. The publication of more laboratory experiments utilizing ESI and MALDI will hopefully foster the incorporation of these techniques into teaching laboratories. To this end, we have created several experiments that demonstrate how mass spectrometry can be applied to the analysis of peptides and proteins [6, 7]. MALDI mass spectrometry has found widespread use in biological research laboratories, but a survey of the literature only identified five examples where it has been incorporated into the teaching laboratory. One of these reports discusses an introductory laboratory experiment using MALDI designed to teach students about isotopes and molecular formulas [8], and one describes a laboratory where MALDI analysis of intact bacteria is used to generate a phenotypic profile [9]. The remaining three publications describe experiments where MALDI is used, following enzymatic digestion, to identify proteins. Two of these use in-solution digestion of standard proteins [10, 11]. Although this demonstrates the ability of MALDI mass spectrometry to identify proteins, in an actual research setting, pure proteins are almost never available for in-solution digestion. Rather, protein mixtures are first separated using gel electrophoresis, followed by in-gel digestion prior to analysis using MALDI mass spectrometry. The final laboratory exercise describes separating proteins using gel electrophoresis prior to MALDI analysis as part of an entire semester laboratory course [12]. The experiment presented here also uses gel separation prior to in-gel enzymatic digestion and subsequent analysis using MALDI mass spectrometry, but as a simple stand alone laboratory exercise using easily available reagents. In addition, an example is presented of how database mining programs can lead to erroneous results if careful thought is not given to the chemical procedures used throughout the analysis. As mass spectrometers become easier to use, the opportunity for collecting good data increases, but so too does the potential for treating the instrument as a “black box” without fully understanding how to interpret the data, which could lead to incorrect results being reported. It is therefore important that students be properly trained in how to use mass spectrometry techniques and how to interpret the results.

In the experiment presented here, students are given a sample containing one or more unknown proteins and must determine the identity of each protein. They initially separate the proteins using sodium dodecyl sulfate polyacrylamide gel electrophoresis (SDS-PAGE), which allows them to determine a rough molecular weight for each protein. They excise individual protein bands, reduce and alkylate any disulphide bonds, and enzymatically digest the proteins using trypsin overnight. During the next laboratory meeting, students extract the tryptic fragments, remove contaminant salts and detergent, analyze the fragments using MALDI mass spectrometry and use the results to attempt to identify the proteins using protein database searching programs. A diagram of the analysis procedure is shown in Fig. 1.

Figure 1.

Schematic of protein identification procedure.


All reagents, unless otherwise noted, were purchased from Sigma-Aldrich. Stock protein samples were stored in the freezer at −20 °C. We have found success using apomyoglobin (horse), lysozyme (chicken), and cytochrome c (horse) in this laboratory although other proteins could be used. For each protein, a 100 pmol/μL laboratory solution was prepared from stock using nanopure (18 MΩ-cm) water. The solutions were stored in 1.5 mL eppendorf tubes under the same conditions cited for the stock samples. Stock and laboratory solutions were stored for up to 3 years in the freezer (−20 °C), although longer-term storage could be accomplished by preparing the solutions, dividing them into appropriate aliquots, lyophilizing them, and storing the aliquots in the freezer. This would limit the amount of hydrolysis that can occur with proteins over time. Although we use commercially available “pure” proteins, this laboratory experiment could be done using proteins isolated from natural sources. In this case, protein purity should be as high as possible, and an idea of the amount of protein should be obtained, possibly via a Bradford assay. This will allow ∼500 pmol of protein to be added to each gel well. Aside from this, no modifications to the protocol presented here is needed.

SDS-PAGE Separation

A 15% acrylamide gel was prepared by mixing 3.45 mL of nanopure water, 2.6 mL of 1.5 M tris(hydroxymethyl)aminomethane (Tris), 100 μL of 10% SDS, 3.75 mL of 40% acrylamide/bis-acrylamide, and 100 μL of 10% ammonium persulfate. Just prior to casting the gel, 10 μL of tetramethylethylenediamine was added to the mixture and the solution was swirled for a few seconds to ensure mixing. Using a 10 mL syringe, the solution was dispensed between two glass gel loading plates. A gel comb was inserted, and the gel was allowed to polymerize for 30 minutes. A stock protein loading buffer solution was prepared by mixing 100 mL of 60 mM Tris-HCl pH of 6.8 with 25 g glycerol, 2 g SDS, and 10 mg bromophenol blue. 19.3 mg of 1, 4-Dithio-DL-threitol (DTT) was added to 1.0 mL of protein loading buffer and 10 μL of this solution was mixed with 10 μL of each unknown protein sample in separate 1.5 mL tubes. Each solution was thoroughly mixed by repeated aspiration and dispensing. The samples were placed in a boiling water bath for 5minutes. After boiling, 10 μL of each solution was added to a separate well in the gel using a 250 μL gel loading tip. A gel running buffer solution was made by mixing 3.03 g of Tris Base, 14.4 g of glycine, 1.0 g of SDS, and was diluted to 1.0 L with nanopure water. The electrophoresis tank was filled with buffer solution and the gel was run at 200 volts for approximately an hour. Gel progress was monitored by observing the leading bromophenol blue line and the separation was stopped when this line reached the bottom edge of the gel. After electrophoresis, the gel was soaked in coomassie blue staining solution and placed on a gently rocking platform for at least 15 minutes. The Coomassie blue solution was recycled and the gel was covered by destaining solution, made by mixing 540 mL of nanopure water, 360 mL of methanol, and 100 mL of acetic acid, for an hour and discarded to waste. Once the gel was adequately destained, a digital image was captured and the desired bands were cut out. If desired, the gel can be kept in destaining solution overnight to yield better contrast between the stained bands and gel background.

In-Gel Protein Digestion

Individual gel bands were cut into pieces and placed into individual 1.5 mL eppendorf tubes. Enough 50:50 acetonitrile (ACN):100 mM ammonium acetate (∼50 μL) was added to just cover the gel pieces. The tubes were shaken for 15 minutes at room temperature and the liquid was removed and discarded to waste. Fifty microliters of freshly made 10 mM DTT in 100 mM ammonium acetate was added to each eppendorf tube and allowed to incubate at 50–55 °C for an hour. After this, 50 μL of fresh 50 mM iodoacetamide in 100 mM ammonium acetate was added, the tubes were completely wrapped in aluminum foil, and the tubes were shaken at room temperature in the dark for an hour. The solution was removed and the gels were washed twice for 10 minutes each with 50:50 ACN:100 mM ammonium acetate. The washes were removed from the gel pieces which were then dried using a lyophilizer/vacuum centrifuge unit. Once the gels were completely dried, 0.6 μg of modified sequencing-grade trypsin (Promega) was added to 6 μL of 0.01% SDS in 50 mM ammonium acetate and added directly onto the dried gel pieces, which were allowed to rehydrate for 10 minutes. Just enough 50 mM ammonium acetate was added to cover the gel pieces and the tubes were incubated at a constant temperature of 37 °C overnight.

Peptide Fragment Recovery

After digestion, the supernatant was transferred into a new 1.5 mL eppendorf tube. The gel pieces were covered by 50:50 ACN:0.5% trifluoroacetic acid (TFA) (∼50 μL) and shaken for 20 minutes. The liquid was removed and was added to the digestion solution from the previous step. This washing step was repeated and the liquid again pooled with the previous samples. The lyophilizer/vacuum centrifuge unit was used to concentrate the samples by reduce the solution volume to ∼10 μL, without letting the solution completely evaporate. Each sample was acidified by adding 0.5 μL of TFA. A C18 ZipTip (Millipore) with a 0.6 μL bed volume was activated and washed three times with 10 μL of ACN each followed by two washings using 10 μL of 0.1% TFA in nanopure water. The sample was adsorbed onto the ZipTip by slowly aspirating and dispensing the sample solution 10 times. The ZipTip was washed three times with 5 μL each of 0.1% TFA. A 5 μL aliquot of 50:50 ACN:0.1% TFA was added to a new 1.5 mL tube and peptides were eluted into this solution by repeatedly aspirating the elution solution into the ZipTip and dispensing 10 times into the same tube.

MALDI-Time-of-Flight Analysis

A MALDI matrix was prepared by adding a saturating amount of α-Cyano-4-hydroxycinnamic acid to 1 mL of 1:2 ACN:0.1% TFA. A 2 μL aliquot of the matrix solution was then added to each of the eluted peptide samples. After thorough mixing and centrifugation, a 1 μL aliquot was applied to a ground stainless steel MALDI target plate and allowed to dry. The target plate was loaded into the MALDI-time-of-flight (TOF) mass spectrometer. We used a Bruker Daltonics Ultraflex TOF/TOF mass spectrometer with a 50 Hz nitrogen laser, although any MALDI-TOF mass spectrometer should provide comparable data. The instrument was operated in positive ion reflectron mode with an acceleration voltage of 25.0 kV and calibrated using a pre-made calibration standard (Bruker Daltonics) containing angiotensin I and II, substance P, bombesin, ACTH (clip 1–17), ACTH (clip 18–39), and somatostatin. Signal was maximized by moving the laser around the sample spot and adjusting the laser attenuation to maximize the signal. Multiple spectra were added until clearly distinguishable peaks were visible. Spectra were saved and analyzed using Bruker FlexAnalysis software. The Bruker Biotools software was used to send mass spectral data to the Mascot database search software using parameters described below.


Gloves should be worn to prevent skin contact with the reagents, proteins, and solvents. Goggles should be worn when working with the samples, which should be prepared in a well-ventilated area such as a hood. All waste should be collected and disposed of properly. Commercial mass spectrometers have built-in safety interlocks, so there is low risk associated with the operation of the instrument.


As the name implies, MALDI uses a matrix molecule co-mixed with the analyte sample. A wide range of matrices have been used, depending on what sort of molecule is being analyzed. In general, the matrix is a small molecule with an aromatic functional group. This allows the matrix, rather than the analyte molecule, to absorb the laser light. Thus, the matrix becomes excited rather than the analyte molecule. If enough vibrational energy is deposited into the matrix molecules, they will vibrate off of the sample target, carrying the analytes with them, leading to little or no fragmentation of the analyte molecules themselves. MALDI is therefore known as a “soft” ionization technique. During the MALDI process, a proton transfer occurs between the matrix and the analyte, resulting in a charged analyte particle that can be mass analyzed. For peptides and proteins, a single proton is usually transferred to the analyte molecule yielding primarily protonated molecules with a single charge, (M + H)+. The MALDI ionization technique is most commonly used with a TOF mass analyzer. Most manufacturers offer several different configurations of MALDI-TOF mass spectrometers with one of the main differences between different models being mass accuracy and mass resolution. Even low-end, basic MADLI-TOF instruments provide sufficient resolution over the mass range generally observed for tryptic peptides (500–3000 amu) to allow for monoisotopic masses to be determined for each tryptic peptide. For well calibrated instruments, mass errors of less than 0.5 amu can be expected. These general trends should be kept in mind when entering data into the database searching software, as erroneous values can lead to poor results.

The mass spectrum resulting from the in-gel tryptic digest of apomyoglobin is shown in Fig. 2.

Figure 2.

MALDI-TOF mass spectrum of fragment products resulting from the tryptic digestion of apomyoglobin.

A list of all ion peaks with a signal-to-noise ratio of 10 or larger from the mass spectrum of fragment products resulting from the tryptic digestion of apomyoglobin is given in Table I.

Table I. MALDI-TOF signals for tryptic peptides from apomyoglobin
Ion peaks (m/z) observed in MALDI-TOF mass spectrum

A variety of database searching software exists that utilize mass spectrometry data to attempt to identify the protein being analyzed, including ProteinProspector [13], Mascot [14], and SEQUEST [15]. These all use empirical mass spectral data as well as user selected parameters to compare proteins in the database with the results entered. The instrument we used for this experiment has software that directly interfaces with the Mascot software, although any database searching software should give comparable results. Both Mascot [16] and ProteinProspector [17] can be accessed via the web and allow anyone to perform a search. When the apomyoglobin mass spectral data is used to search the database, using the parameters shown in Table II, the top seven matches from the database all return apomyoglobin. A database Mowse score of greater than 75 was considered a likely protein match.

Table II. Parameters used for database searching
Database parameterValue used
Mass tolerance0.5 Da
Missed cleavages2
Mass valuesM + H

The ions from the apomyoglobin mass spectrum (Fig. 2) that match those predicted by the known sequence in the database are shown in underlined bold in Table I.

The mass spectrum from the tryptic digestion of cytochrome c is shown in Fig. 3.

Figure 3.

MALDI-TOF mass spectrum of fragment products resulting from the tryptic digestion of cytochrome c.

The list of all ion peaks with a signal-to-noise ratio of 10 or larger from the mass spectrum of the cytochrome c digest products is given in Table III. When the parameters given in Table II were used to compare the results from the cytochrome c digest with the database, the MASCOT software identified cytochrome c as the most likely protein. The ions from the cytochrome c mass spectrum (Fig. 3) that match those predicted by the known sequence in the database are shown in underlined bold in Table III.

Table III. MALDI-TOF signals for tryptic peptides from cytochrome c
Ion peaks (m/z) observed in MALDI-TOF mass spectrum

Figure 4 shows the mass spectrum that results when the gel band containing lysozyme is digested and analyzed using MALDI-TOF.

Figure 4.

MALDI-TOF mass spectrum of fragment products resulting from the tryptic digestion of lysozyme.

The list of all ion peaks with a signal-to-noise ratio of 10 or larger from the mass spectrum of the lysozyme digest products is given in Table IV.

Table IV. MALDI-TOF signals for tryptic peptides from lysozyme
Ion peaks (m/z) observed in MALDI-TOF mass spectrum

When the mass spectral data for lysozyme is entered into the database searching software using the same parameters (Table II) that were used for the analysis of the apomyoglobin data, none of the top 25 matches correspond to lysozyme. All of the matches have very low correlation scores. Students initially assume that the protein must not be contained in the database. When they are assured that the protein they were given is indeed in the database they are usually confused, but a more careful look at the experimental procedure and the parameters used to search the database reveals why no suitable matches were found.

During the in-gel digestion protocol, disulphide bonds are reduced using DTT and are subsequently reacted with iodoacetamide to cap the free sulfhydryls preventing reformation of the disulphide bonds. This reaction not only targets disulphide bonds, but rather all cysteine residues. The amino acid sequence of lysozyme shows that there are eight cysteine residues, corresponding to 6.2% of all amino acids in the protein (a relatively large fraction), each of which can undergo a reaction with iodoacetamide, resulting in the formation of carbamidomethyl bonds. When this modification is added to the parameters used to search the database, the top 10 protein matches all correspond to lysozyme. The ions that are modified by the carbamidomethyl group are shown with an asterisk (*) in Table IV. This highlights to students the need to understand the experimental procedure used and not just blindly enter data into the computer and accept the results obtained. Apomyoglobin contains no cysteine residues and thus the database searching software identifies apomyoglobin even without incorporating the carbamidomethyl modification into the database software. Cytochrome c contains only two cysteine residues (1.9%) and thus also yields good database matches without considering the carbamidomethyl modification.

We recognize that the acquisition of a MALDI-TOF mass spectrometer represents a large instrumental investment. A basic unit capable of collecting data comparable to that presented here would cost ∼$100,000 to $125,000. Although we hope that the publication of more laboratory experiments will help drive more institutions to purchase MALDI-TOF instruments, either from internal funding sources or via national grant programs, we have also included sample spectra and data here that can be handed out to students to allow them to complete the analyses without hands-on access to an instrument. Alternatively, many larger universities have core mass spectrometry facilities that may allow samples to be sent from other local academic institutions for a small fee. A variety of commercial sources will also provide MALDI-TOF analysis for a fee.

Students are required to turn in a written laboratory report by the end of the term. Students are told that they must provide all documentation needed to conclusively identify their unknown protein, but are not given a formal report protocol to follow. This is done to force students to think about the data and make rational decisions about what information they should include rather than just following a prescribed set of instructions. They need to not only include data printouts, but also some description of what the data means and how it was used to identify their protein. Students are also instructed to provide about a page-long description of the experimental procedure they used, including all chemicals used in their analysis.

Although students do not complete a formal assessment of this single experiment, they do complete an assessment of the entire Bioanalytical Chemistry course, of which this laboratory is one part. Students rate the entire course on an A–F scale and have given this course an average grade of A (3.77 of 4.00). Open ended comments have been very positive regarding this laboratory including comments like “The laboratory experience was invaluable.” and “The experiments performed and even just the amount of time in laboratory were invaluable. I think the course is a vital step in preparing students for a future beyond St. Olaf…”


We have described a simple single-unit laboratory designed to teach students how to separate proteins using gel electrophoresis, followed by identification via in-gel digestion, MALDI-TOF mass spectrometric analysis and database mining. In addition to teaching students how to identify proteins, this experiment demonstrates that care must be taken when analyzing the data and that the chemical reactions used in the analysis must be understood to ensure an accurate result. This further stresses the need to include proteomic analysis techniques in the undergraduate curriculum so that when students are asked to perform these analyses in academic or industrial research laboratories, they have a fundamental understanding of the processes and can correctly interpret the resulting data. We hope that as more laboratory experiments utilizing biological mass spectrometry techniques are published more institutions will incorporate this important bioanalytical tool into their curricula. The spectra and peak lists presented here will allow instructors to incorporate biological mass spectrometry into their laboratory or course curriculum, even if they do not have access to a MALDI-TOF mass spectrometer.