Proteomics is a rapidly developing field of biochemical analysis that aims to characterize a large numbers of proteins extracted from a cell, tissue, or organism, so that a global perspective of changes in protein expression can be obtained in a rapid fashion.1,, 2 Presently, the pre-dominant method of analyzing proteomes involves separating the protein ex-tract using two-dimensional poly-acrylamide gel electrophoresis (2D-PAGE).3 After visualization, the protein spots are individually extracted and proteolytically digested (usually with trypsin). The resultant peptides are then analyzed using mass spectrometry (MS).4 The protein is identified by either matching the masses of several of the observed peptides to those predicted for a protein in a database (peptide mapping), or by obtaining sequence information from a single peptide by tandem mass spectrometry (MS/MS).5,, 6 Although a single 2D-PAGE separation can resolve thousands of proteins,7 the entire method is relatively labor-intensive, time-consuming, and has both sensitivity and dynamic range limitations.8
Emerging analytical methods for global proteomics have avoided the use of 2D-PAGE for more high-throughput approaches using high-performance liquid chromatography (LC)/MS for characterizing a sizable fraction of a proteome in a single experiment. The use of LC/MS for proteome analysis is based on the premise that MS/MS data obtained for a single peptide is often sufficient for unique protein identification. Typically, these strategies use trypsin to convert the entire proteome sample into peptides that are then separated using various multidimensional liquid chromatographic separations that permit identification using LC/MS/MS analysis.9 In this manner, the peptide(s) act as a surrogate for the identification of the protein.
Some focus has been put on using the high resolving power and mass measurement accuracy (MMA) achievable with Fourier transform ion cyclotron resonance (FTICR)-MS to uniquely identify a protein based on the mass measurement of a single peptide marker.10 In this approach, peptide identification is based on the comparison of the mass measurements obtained with those calculated for a tryptic digest of the proteins predicted by the genome sequence of the particular organism under study. This strategy benefits greatly from ensuring complete proteolytic digestion across the proteome, so that the tryptic constraint can be used as an aid in peptide identification, as well as minimizing the number of possible peptides predicted from a genome. For example, a complete tryptic digest of all of the open reading frame products in yeast would ideally produce 331 438 polypeptides, with perhaps half of these being expressed at some detectable level at any given time. There are, however, over 14 million possible peptides that must be considered if the proteolysis is incomplete, requiring a much greater MMA for confident protein identification.
Most proteomic studies today rely on the use of trypsin to produce peptides that are then analyzed by MS in some fashion. The presence of tryptic cleavage sites (Lys or Arg) at the C-terminus is often used to validate the identification of these peptides; however, a significant fraction of peptides containing missed cleav-age sites are generally observed as well. We report here a strategy to rapidly measure the completeness of a tryptic digest of the soluble protein fraction of the yeast proteome. The method uses a lysine (Lys) auxotrophic strain of yeast and isotopically labeled Lys. This yeast strain is cultured in medium containing equivalent amounts of normal Lys (Lys12C6 plus natural-abundance isotopic variants) and 13C-enriched Lys (Lys13C6). Growing the auxotrophic yeast strain in the presence of equivalent amounts of Lys12C6 and Lys13C6 provides either isotopic version of Lys at any site in each protein.
After harvesting the culture, the extracted soluble proteins are digested with trypsin using different conditions. Since trypsin cleaves after Lys or Arg residues (except when followed by a Pro residue), the observed peptides should contain either none or one Lys residue if the digestion is complete, and the latter will appear as isotopic doublets in the mass spectrum. Occurrences of two or more Lys residues within a peptide (reflecting incomplete digestion) will appear as a binomial distribution in the mass spectrum. Thus, the multiplet pattern of the peptides produced provides a means to assess trypsin efficiency using any digestion conditions.
The materials used in all experiments were obtained from commercially available sources and used without further purification. Natural isotopic abundance Lys was obtained from Sigma (St. Louis, MO, USA) while the 13C-enriched Lys (i.e. Lys-13C6) was obtained from Cambridge Isotope Laboratories (Andover, MA, USA). Sequence-grade modified trypsin was obtained from Promega (Madison, WI, USA). Acetonitrile (HPLC grade) and glacial acetic acid (ACS reagent grade) were purchased from Aldrich (Milwaukee, WI, USA). Trifluoroacetic acid (TFA, HPLC grade) and dithiothreitol (DTT) were purchased from Sigma. Water was purified using a Barnstead Nanopure Infinity water purification system (Dubuque, IA, USA).
Saccharomyces cerevisiae strain FY834 (MATα, his3-Δ200, ura3-52, lue2Δ1, lys2-Δ202, trp1-Δ63, gal2+; American Type Culture Collection, Manassas, VA, USA) was grown in 500 mL of minimal medium (1.7 g/L YNB-AA/AS, 5 g/L (NH4)2SO4, 20 g/L dextrose), to which 50 mL of 10X dropout medium (minus Lys, Leu, His, and Trp) was added. The required amounts of Leu, His, and Trp were added to this medium as well as a mixture containing equal amounts of natural isotopic abundance and stable isotope enriched Lys (i.e. Lys-13C6). This medium was inoculated with a 5-mL aliquot of the yeast strain grown in YPD (1% yeast extract, 2% peptone, 2% dextrose). This culture was incubated at 30 °C with shaking (225 rpm) and the cells harvested at an OD600 of 1.36. The yeast cells were harvested by centrifugation at 4000 rpm for 10 min. The cells were resuspended in 200 µL of PBS (0.1 M sodium phosphate, 0.15 M NaCl, pH 7.2) and lysed by vortexing in the presence of acid-washed 0.5 mm zirconium-silica beads for two cycles (60 s/cycle). The cell lysate was recovered and centrifuged at 10 000 rpm for 10 min to remove any cell debris.
Two different sets of conditions were used to prepare the soluble protein extract for tryptic digestion: (1) 1 mM DTT and boiling for 5 min, and (2) 6 M Gdn·HCl and boiling for 5 min. After protein denaturation and/or reduction of the disulfide bonds, buffer exchange into 0.1 M NH4HCO3, pH 8.2, was performed by size exclusion chromatography using PD-10 columns (Pierce, Rockford, IL, USA). The tryptic digestion was performed overnight at 37 °C using a trypsin/protein ratio of 1:50 (w/w). After digestion, the samples were dried and resuspended in water containing 0.2% acetic acid (HOAc) and 0.05% TFA to achieve a final concentration of about 1mg/mL. The samples were centrifuged for 15 min at 14 000 rpm to remove any undissolved material.
The yeast peptide mixtures were analyzed by capillary reversed-phase LC coupled directly on-line with FTICR-MS. The tryptic peptides were separated using a 150 µm i.d. × 30 cm capillary column packed with 5-µm diameter C18 medium (POROS 20R2; Applied Biosystems, Framingham, MA, USA). The LC capillary was coupled to the mass spectrometer using an in-house manufactured electrospray ionization interface external to the magnetic field of the spectrometer. Ions were guided through the fringing fields of the 11.5 T super-conducting magnet with the aid of four sets of rf-only quadrupoles. During the LC run, 500 ICR mass spectra were acquired. The duration of each scan was approximately 5 s and involved ion capture in the cell, excitation, detection, and finally ion ejection. Ion detection was set to collect 128k data points with a lower m/z limit of 586.
To evaluate the completeness of a tryptic digest of a yeast proteome sample, a Lys-auxotrophic strain of the organism was cultured in a minimal medium to which a mixture containing natural isotopic abundance (Lys) and 13C-enriched Lys (Lys-13C6) was added. The ratio of Lys to Lys-13C6 is approximately 1:0.75 as determined by the mass spectrum of the mixture of these amino acids used in the cell culture medium, shown in Fig. 1. The number of isotopically distinct peaks originating from each peptide can be used to evaluate the efficiency of the tryptic digestion. (In the following, digestion at Arg is ignored in order to simplify the discussion, but the effects of this possibility on the conclusion are obvious. For example, some peptides containing a single Lys residue could represent a tryptic cleavage at Arg with a missed Lys cleavage site.)
For example, as shown in Fig. 2(A), a peptide resulting from digestion at Lys with no missed sites contains a single Lys and gives rise to a pair of peptide signals separated by 6 Th. The calculated abundances of the two isotopically distinct peptides, shown in the inset of this figure, are based on the ratio of Lys to Lys-13C6 added to the medium (see Fig. 1). In instances where a single missed cleavage site occurs, three isotopically distinct versions of the peptide were observed, as exemplified by Fig. 2(B). These peaks represent the peptide containing two Lys, one Lys plus one Lys-13C6, and two Lys-13C6 residues, respectively. In cases where two missed cleavage sites were present within a peptide, four isotopically distinct versions of the peptide were observed. An example of a spectrum with peaks representing such a peptide, containing three Lys, two Lys plus one Lys-13C6, one Lys plus two Lys-13C6, and three Lys-13C6 residues, is shown in Fig. 2(C).
Two tryptically digested yeast proteome samples were analyzed by LC/FTICR-MS. The first sample was denatured using 6 M Gdn·HCl and boiled for 5 min prior to desalting and the addition of trypsin. In the analysis of this mixture a total of 377 unique peptides containing at least a single Lys residue were observed. Of these, 328 peptides (∼88%) were the product of complete digestion (i.e. contained only one Lys residue), 44 (∼11.4%) contained one missed cleav-age site (i.e. contained two Lys residues), and 5 contained two missed cleavage sites (<1%). The second sample was boiled for 5 min to denature the sample and 1 mM DTT was also added to reduce any disulfide bonds present prior to tryptic digestion. Analysis of this sample by LC/FTICR-MS revealed a total of 337 unique peptides containing at least one Lys residue. Of these, 316 peptides (∼94%) were the result of complete digestion (i.e. contained one Lys residue), 15 (∼5%) contained one missed cleavage site (i.e. contained two Lys residues), and 6 contained two missed cleavage sites (<2%). The results of both digestion conditions are summarized in Table 1 .
|Digestion conditions||0 Missed cleavage sites||1 Missed cleavage site||2 Missed cleavage sites|
|Denatured||328 (88%)||44 (>11%)||5 (<1%)|
|Denatured and Reduced||316 (94%)||15 (<5%)||6 (<2%)|
Although the results presented in this study were obtained using a high-field FTICR mass spectrometer, the methodology presented is equally amenable to more conventional MS technology. A similar experiment, in which a combined proteome sample extracted from the same auxotrophic yeast strain grown in separate cultures containing either natural isotopic abundance Lys or Lys-13C6, was analyzed by LC/MS using an LCQ ion-trap (Finnigan). As shown in Fig. 3, several pairs of completely digested peptides are present within the mass spectrum. These experiments demonstrate the efficacy of this method to evaluate proteolytic digestion efficiency with conventional MS technology.
Relatively few strategies exist that are capable of quantitatively determining trypsin digestion efficiency on a proteome-wide scale. While the extent of proteolytic digestion can be assayed by examining the loss of high molecular weight components by SDS-PAGE or LC/MS following tryptic digestion, neither method provides high-throughput, quantitative evidence detailing digestion efficiency. Quantitative evidence could be obtained by performing LC/MS/MS experiments; however, such an analysis would be extremely time-consuming and would require peptide identification. We have designed a protocol aimed at determining trypsin digestion efficiency without the need for peptide identification.
The study presented here focused on trypsin, but a similar strategy can be used to examine the digestion efficiency using other proteolytic enzymes or chemical agents. For example, the combination of the normal and Lys13C6-labeled proteomes used in these experiments would be ideal to study endoprotease Lys-C (with no complications from cleavage also at Arg). With the numerous auxotrophic yeast strains available, suitable specific isotopically labeled proteomes can be generated to test the efficiency of most commonly used proteolytic enzymes.
It is possible to optimize the proteolysis conditions for a single protein standard, such as bovine serum albumin, by identifying the fragments by peptide mapping or MS/MS. Such a strategy, however, is impractical for the complex samples being analyzed in proteomic studies, due to the inability to accurately identify individual peptides based solely on their mass and/or the time required to conduct MS/MS studies. The ability to unambiguously determine the number of Lys residues within the peptides containing this residue in a single LC/MS experiment obviates the need to identify the peptides to assess the proteolysis efficiency. In addition, while the proteolysis efficiency may be determined for a protein standard, such conditions will not necessarily be effective considering the heterogeneity of a proteome sample.