Nanyang Environmental & Water Research Institute (NEWRI), The Advanced Environmental Biotechnology Centre, Nanyang Technological University, Singapore 637551.
Functional genomic approaches, such as proteomics, greatly enhance the value of genome sequences by providing a global level assessment of which genes are expressed, when genes are expressed and at what cellular levels gene products are synthesized. With over 1000 complete genome sequences of different microorganisms available, and DNA sequencing for environmental samples (metagenomes) producing vast amounts of gene sequence data, there is a real opportunity and a clear need to generate associated functional genomic data to learn about the source microorganisms. In contrast to the technological advances that have led to the accelerated rate and ease at which DNA sequence data can be generated, mass spectrometry based proteomics remains a technically sophisticated and exacting science. In recognition of the need to make proteomics more accessible to a growing number of environmental microbiologists so that the ‘functional genomics gap’ may be bridged, this review strives to demystify proteomic technologies and describe ways in which they have been applied, and more importantly, can be applied to study the physiology and ecology of extremophiles.
Proteomics is the term used to describe studies that examine the global protein complement of an organism, tissue or community. The proteome consists of all proteins expressed by an organism under a given set of conditions and therefore represents the functional complement of the genome (Wasinger et al., 1995; Wilkins et al., 1996; Goodlett and Yi, 2002). The proteome represents the product of global gene expression (transcription plus translation), protein stability, and protein processing and turnover. Proteomics therefore extends beyond genomic analyses, which only describe the theoretical capability of an organism or community, by providing a direct measure, globally of what proteins are synthesized, when they are synthesized and what their cellular (or extracellular) abundance is (Pandey and Mann, 2000). Proteomic measurements are achieved through the use of highly sensitive instrumentation and application of powerful computational methods to produce high throughput qualitative (i.e. protein identity) and quantitative (i.e. protein abundance) data (Aebersold and Mann, 2003). As a result, cellular pathways and processes functioning in a cell or community can be determined, providing the means to probe the fundamental biology and adaptive responses of microorganisms (Washburn and Yates, 2000). By mimicking environmental growth conditions, proteomics can be used to infer cellular responses that are ecologically meaningful, thereby providing the basis for developing views and hypotheses about environmental microbial processes. Extending beyond laboratory-based manipulation of axenic cultures, environmental proteomics (metaproteomics) provides the means to assess proteins that are synthesized by microbial communities. Linking metaproteomic data to environmental genomics (metagenomics) and geo-physico-chemical data provides a powerful means of inferring the roles of indigenous microbial communities in whole ecosystem function (Lauro et al., 2010; Schneider and Riedel, 2010).
Extremophiles are organisms that live under conditions that stretch well beyond those considered optimal for human life. From this anthropocentric view, extreme growth conditions for microorganisms are those that extend beyond a narrow band of relatively mild growth conditions. Extremophiles are therefore microorganismsthat proliferate under relative extremes of temperature, pH, salinity, pressure, radiation or other growth limiting abiotic constraints (Cavicchioli and Thomas, 2004). Ecological, physiological and evolutionary studies of extremophiles are particularly valuable for providing insight into the physical limits at which life can occur, the origins of life and the potential for the discovery of extraterrestrial life (Cavicchioli, 2002). Extremophiles also offer significant potential as a natural resource for the discovery of novel enzymes and other bioactive compounds that have commercial value for use in a broad range of industries.
This review focuses on describing the value of proteomics to the field of extremophiles. The approach we have taken is to introduce important concepts and techniques in proteomics, where available describe proteomic applications to extremophiles, and provide sufficient depth of knowledge, particularly about quantitative proteomics, to enable readers to consider how proteomic approaches may be applied to their extremophiles of interest. To make proteomics more accessible to those less familiar, we have included coverage of basic principles of mass spectrometry (MS), sample fractionation, quantification and statistical analysis, as applied to proteomics of individual organisms and metaproteomics of environmental samples.
Application of proteomic techniques to extremophiles
Extremophile research is often orientated towards determining what cellular responses are involved in growth and survival. As a functional genomics approach, proteomics is well suited to providing qualitative and quantitative data for addressing these types of questions. However, biologists performing proteomics experiments for the first time are unlikely to have expertise with all the technologies (particularly MS) required for the work. In addition to understanding the principles of MS, protein preparation, fractionation and processing (e.g. labelling) procedures tailored to suit the capabilities of individual mass spectrometers (MSrs) also need to be adopted. Determining an appropriate experimental design can be facilitated by consulting staff responsible for the operation and implementation of MSrs – developing a healthy rapport can go a long way towards generating cost-effective, successful proteomics outcomes. Evaluating a generalized workflow can also help in determining an appropriate experimental design (Fig. 1).
The main area in which proteomics of extremophiles differs from other cell types is the preparation of the proteins for analysis. As a result of the abiotic extremes confronting extremophiles, their proteins have evolved a range of specific properties (Siddiqui and Thomas, 2008), necessitating protein preparation procedures to be assessed on a case-by-case basis. For example, proteins from hyperthermophiles are extremely thermostable and are highly resistant to chemical denaturants, organic solvents and proteolytic digestion, leading to problems with denaturing and digesting proteins. For Pyrococcus furiosus this has been overcome by using a detergent-based microwave assisted acid hydrolysis step coupled with an overnight trypsin digest (Lee et al., 2009). Acidophiles and alkaliphiles tend to have proteins with altered surface charges that will affect their migration in isoelectric focusing (IEF) (Antranikian, 2008; Shirai et al., 2008), potentially restricting the use of gel-based methods for protein separation (see Protein separation below). On the other hand, extracting and preparing the proteins from halophiles for analysis can be as simple as washing cell pellets in pure water followed by centrifugation to remove cell debris (Whitehead et al., 2006; Leuko et al., 2009). With these caveats in mind, it should be noted that most contemporary proteomic approaches are amenable to extremophile research. In the following sections of this review, we cover many of the proteomic approaches available and highlight where these approaches have been used with extremophiles.
Many different MS approaches have been developed to achieve peptide and protein identification. Most commonly, a ‘bottom-up’ approach is used where proteins are digested with a sequence specific protease, such as trypsin that produces peptides cleaved at arginine (R) and lysine (K) amino acid residues. The less common ‘top-down approach’ involves subjecting intact proteins to the MSr. While the top-down approach is useful in advanced downstream applications such as determining post-translational modifications and protein–protein interactions, it remains relatively low throughput and is currently less advanced than bottom-up proteomics. For these reasons these approaches are not covered in this review, and the reader is directed to a number of excellent papers describing top-down proteomics (Wang et al., 2005; Siuti and Kelleher, 2007; Han et al., 2008; Heck, 2008; Vellaichamy et al., 2010).
The requirement to fractionate samples
High throughput proteomic analysis is often hampered by the complexity of samples. A culture lysate from a microorganism may contain several thousand proteins. When proteolytically digested the sample will contain tens of thousands of unique peptides, and when post-translational modifications are taken into account an even higher level of complexity may be present (Stasyk and Huber, 2004; Righetti et al., 2005). In addition to sample complexity, cellular proteins exhibit a wide range of concentrations. Highly abundant proteins (e.g. central metabolic proteins) may be present in concentrations up to seven orders of magnitude higher than low abundance proteins (Corthals et al., 2000; Issaq et al., 2005). Despite low abundance, some proteins (e.g. transcriptional regulatory proteins) may play crucial roles and are therefore important to detect. As high throughput MS relies on automated intensity based ion selection, the presence of peptides from high abundance (high MS intensity) proteins masks the selection of the low abundance (low intensity) peptides, and hence their detection. In order to circumvent problems with detection and increase overall proteome coverage, fractionation strategies can be used to reduce the complexity of samples before MS analysis.
When using separation procedures it is important to ensure that samples are treated in ways that render them generally free of detergents, salts and organic solvents. For example, the activity of the most commonly used proteolytic enzyme, trypsin, can be perturbed by high levels of such contaminants. Sample clean-up steps include acetone precipitation, protein dialysis, and cartridge-based or specialized pipette tip-based chromatography of peptide mixtures.
A useful form of fractionation is to separate microbial biomass into three sub-cellular fractions: the secreted fraction (secretome), which consists of the culture supernatant and contains extracellular proteins; the insoluble fraction, which consists of membranes, membrane proteins and large complexes; and the soluble fraction, which primarily contains cytoplasmic proteins. The combined analysis of these three fractions can greatly enhance proteome coverage and provide information about compartmentalization (e.g. membrane associated proteins) or proteins present in large complexes (e.g. ribosomes) (Williams et al., 2010a,b).
Researchers performing secretome studies of extremophilic archaea need to take into account that signal peptide sequences are generally poorly experimentally characterized in this domain of life (Pohlschroder et al., 2005; Saunders et al., 2006). Therefore, measures need to be included to ensure that fractionation is robust (e.g. inclusion of assays for cytoplasmic markers), so that secretome data can be interpreted with confidence (Saunders et al., 2006; Williams et al., 2010a). Secretome fractions can also contain surface proteins that are released by endogenous peptidases during growth or through cell death (Ellen et al., 2009). This issue may be minimized by harvesting cultures at an early stage of growth, although harvesting early decreases protein yield.
Studying the insoluble proteome of extremophiles will help to shed light on membrane and surface proteins that are directly involved in interaction with the environment. For example the halophilic alkalithermophile Natranaerobius thermophilus is thought to adapt to multiple environmental extremes through the use of a large repertoire of Na+(K+)/H+ antiporters (Mesbah et al., 2009). Proteomic studies of insoluble fractions have revealed proteins involved in signal transduction, transport of various cellular substances and secreted proteins, and cell surface proteins in the psychrophilic methanogen, Methanococcoides burtonii (Burg et al., 2010; Williams et al., 2010a,b).
To purify, reduce the complexity or visualize individual proteins, cellular extracts or sub-cellular fractions can be separated using gel-based or gel-free methods before MS. Gel-based protein separation, when combined with reverse phase (RP) LC separation of digested peptides (see below) and MS, is often referred to as GeLC-MS or GeLC-MS/MS. Separation by molecular weight only (1 dimensional gel electrophoresis; 1DE) can help to remove compounds that inhibit peptide digestion or MS. Two dimensional gel electrophoresis (2DE) separates proteins based on their isoelectric point (pI) through IEF, and molecular weight through polyacrylamide gel electrophoresis (PAGE). The technique can further help to reduce sample complexity, and allows snapshot (one sample) and differential (comparison of more than one sample) protein profiles to be generated (Gorg et al., 2000; Gygi et al., 2000).
Following electrophoresis, individual proteins or groups of proteins are visualized by staining techniques, of which several are available for use. The staining techniques have different ranges and limits of detection and the choice of specific techniques should be considered based on requirements. Coomassie staining is inexpensive, easy to use, reliable and compatible with MS. However, it has low sensitivity (> 1 µg to ∼100 ng). Silver staining can detect proteins at lower abundance within a narrow range of concentrations (∼5 to ∼80 ng), uses complicated methodology and can produce problems with MS. More recently several fluorescent stains have been developed, which are relatively simple to implement and have ranges of detection limits from ∼1 ng or less to > 1 ug. However, these stains use relatively expensive reagents and require specialized equipment for visualization and spot excision (Lopez et al., 2000; Patton, 2000; Patton et al., 2002; Wang et al., 2007).
Following staining and visualization, bands containing multiple (1DE) or individual (2DE) proteins are excised and digested in-gel, as described by Shevchenko and colleagues (2007), and peptides subjected to RP-LC separation before MS analysis (Wilm et al., 1996; Lasonder et al., 2002; Everley et al., 2004; Ting et al., 2010). When developing proteomic methods it is wise to test whether gel-based separation tends to help (e.g. removes contaminants thereby improving the quality of mass spectra) or hinders (e.g. reduces protein yield without improving the quality of mass spectra) protein identification and coverage.
Complex mixtures such as whole cell lysates can be challenging to analyse using 2DE, because of: (i) the large dynamic range in protein abundance, (ii) the presence of proteins that fall beyond the pI and mass thresholds of the technique (see Application of proteomic techniques to extremophiles), (iii) the presence of isoforms or co-migrating proteins that can complicate the interpretation of gel spots, (iv) the presence of membrane proteins or hydrophobic proteins that do not resolve effectively (see below) (Peng and Gygi, 2001; Wittke et al., 2004; Yan and Chen, 2005) or (v) difficulties in replicating protein profiles leading to a reduction in the number of spots that can be scored and used to gain accurate measures of protein quantity (Gorg et al., 2000; Wittke et al., 2004).
The separation of proteins using a gel-free approach is most commonly achieved by LC (Patterson, 1994; Aebersold and Goodlett, 2001; Griffin et al., 2001; Goodlett and Yi, 2002; Rabilloud, 2002; Yates and Snyder, 2004). Soluble proteins can be effectively separated by off-line methods before protein digestion using ion exchange columns (e.g. strong cation exchange, SCX) as have been used for the psychrophile M. burtonii (Goodchild et al., 2004a), or size exclusion columns, which have been used for the haloalkaliphile Natronomonas pharaonis (Konstantinidis et al., 2007). While chromatographic separation techniques are now largely automated and can achieve relatively rapid separations of complex mixtures, increasing the number of samples to process increases the time and cost of MS analysis. Analogous to gel-based separation, the processing of samples from extremophiles with proteins with skewed charge or pI (e.g. intracellular proteins from haloarchaea) will require careful consideration of the separation matrix and eluent gradients for each protein mixture of interest.
The application of off-line pre-digestion methods to insoluble proteins is more challenging than for soluble proteins. However, it is important to develop appropriate protocols to separate out high abundance soluble proteins that can contaminate insoluble fractions (Klein et al., 2005). Useful differential solubility fractionation procedures have been developed for the extremophile M. burtonii (Burg et al., 2010), and for Escherichia coli (Ramos et al., 2008).
Other approaches for reducing overall sample complexity at the protein level for extremophiles include subjecting cell extracts from a Sulfurispharea sp. of hyperthermophilic archaea to extended high temperature incubations plus incubation with denaturing agents to enrich for proteins with high conformational stability (Prosinecki et al., 2006), and using filtration and reduced duration of protein digestion to enhance coverage of low molecular weight proteins for Halobacterium salinarum (Klein et al., 2007). The best coverage will inevitably be obtained by using multiple types of fractionation and separation procedures, as have been used for Haloferax volcanii (Kirkland et al., 2008) and M. burtonii (Burg et al., 2010; Williams et al., 2010a).
A fast, sensitive, reproducible and automated means of obtaining protein identities from peptide mixtures obtained following protein digestion is to perform LC-MS/MS (Link et al., 1999). Peptides are typically separated according to hydrophobicity on a RP-C18 column and eluted online into a MSr. Complex samples often require a second dimension of LC (e.g. SCX) before RP-LC (LC/LC-MS/MS) (Ye et al., 2000; Shen et al., 2004). SCX has been used in this manner to separate peptides before MS analysis for the thermoacidophile Sulfolobus solfataricus (Zaparty et al., 2010). Another secondary means of peptide separation is in-gel IEF, as has been used for S. solfataricus to separate peptides before LC-MS/MS analysis. The methodology is identical to protein separation by IEF in the first dimension of 2DE, although for peptide separation the IPG strip is cut into a number of fractions and the peptides are eluted using procedures similar to in-gel digestion (Chong et al., 2007). Whether this second dimension of separation is required is largely dependent on the characteristics of the sample and the MSr used.
Tandem mass spectrometry (MS/MS) and protein identification
After LC, separated peptides can be introduced into a MSr by electrospray ionization (ESI). Matrix-assisted laser desorption ionization (MALDI) can also be used but is not directly coupled with LC separation (Hillenkamp and Peter-Katalinic, 2007). For ESI, peptides are eluted online (e.g. from RP-LC separation) into the MSr where the mass-to-charge ratios (m/z) of the peptide ions are first measured by MS to determine the molecular mass of each peptide. The most intense peptide ions are automatically selected by instrument software for impact with a neutral gas (e.g. argon) in a collision cell, producing collision induced dissociation spectra. The m/z of the peptide fragments are then measured, and in combination with the intact peptide mass, this tandem mass spectrum (Aebersold and Mann, 2003; Yates, 2004; Keller and Hettich, 2009) is used for determining peptide identity (Biemann, 1986; Hunt and Yates, 1986). Many MSrs are suitable for MS and MS/MS proteomics, each with their own particular strengths. The most commonly used mass analysers include quadrupoles, ion traps (IT), time of flight (TOF), TOF-TOF, quadrupole-TOF (QTOF), hybrid quadrupole-IT hybrids, IT-orbitrap hybrids and IT-Fourier transform ion-cyclotron resonance MSr (FTMS) hybrids (Aebersold and Mann, 2003; Yates, 2004).
Protein identification through database matching and de novo sequencing
Peptide and protein identification is dependent on the ability to match experimentally acquired intact mass and fragmentation patterns of peptides to theoretical mass and fragmentation patterns generated from protein sequences inferred from DNA sequence data. MS protein identification software, such as Mascot (Perkins et al., 1999), Sequest (Eng et al., 1994) and X!-tandem (Craig and Beavis, 2004), use algorithms to interpret MS/MS spectral data and produce matches to databases with a score that reflects the statistical significance of the match (Aebersold and Mann, 2003).
The spectral matching software cannot be used for extremophiles that do not have available genome sequence data. In contrast, de novo sequencing software, such as DeNoS (Savitski et al., 2005) and PEAKS (Ma et al., 2003), extract amino acid sequence information from mass spectra that can be matched to any database using MS-BLAST or an integrated protein database searching algorithm (Ma et al., 2003; VerBerkmoes et al., 2009). A method using 2DE in-gel guanidination followed by sulfonation of peptide N-termini has been developed for improving de novo identifications and has been applied to the heavy metal tolerant bacterium Shewanella oneidensis (Sergeant et al., 2009).
Proteomic methods using absolute or relative measures of changes in protein abundance are very useful for inferring physiological responses to environmental stimuli (Krijgsveld and Heck, 2004; MacCoss and Matthews, 2005). Thus quantitative proteomics represents an approach enabling an important question in extremophile research, ‘how does the organism survive under such extreme conditions’, to be explored.
Absolute quantification involves introducing a known quantity of a reference protein or peptide standard into each sample. By comparing the height of the MS peptide peaks against the standard, peptide concentrations can be determined. A spiked standard can also be used to normalize peak intensities across experiments (Gerber et al., 2003; Kirkpatrick et al., 2005). Absolute quantification is most commonly used in clinical research (examples include; Brun et al., 2009; Williamson et al., 2011).
Relative quantification involves the comparison of two or more conditions (e.g. high vs. low temperature, treatment vs. control) and relative abundance can be determined from visual protein intensity, MS peak intensity or spectral counting (Fig. 2). Quantitative 2DE proteomics is assessed visually by densitometry of corresponding protein spot intensities. The relative peak intensity (peak area or peak height) of peptides from a survey scan [i.e. the first dimension of MS (MS1)], or of tag fragments in the second dimension of MS, can be correlated with the abundance of a peptide in a sample. Peak intensity quantification is used in label-free and stable isotope labelling quantification (see Stable isotope labelling quantitative proteomics below). Quantification by spectral counting uses the number of confidently identified peptides per protein as a proxy for protein abundance and is only used in label-free quantification.
The main advantage of the 2DE approach is the ability to separate and visualize up to thousands of proteins on a single polyacrylamide gel (Fey et al., 2000; Gorg et al., 2000). Labelling proteins with fluorescent dyes before electrophoresis (difference in-gel electrophoresis) can also enhance the use of 2DE for quantitative proteomics (Unlu et al., 1997; Marouga et al., 2005). However, complex mixtures of whole cell lysates from extremophiles that include large numbers of very acidic or basic proteins, and proteins with very high or low molecular weight can be difficult to analyse by 2DE (Peng and Gygi, 2001; Wittke et al., 2004; Yan and Chen, 2005). Other potential complications include, a large non-linear dynamic range of spot intensities, isoforms and co-migrating proteins that confound interpretations of protein profiles, hydrophobic proteins that precipitate, and difficulties in obtaining reproducibility between biological and technical replicates (Gorg et al., 2000; Wittke et al., 2004). Moreover, as a result of advances in LC-utilizing methods (see below), the popularity of 2DE as an approach for quantitative proteomics has declined.
Label-free quantitative proteomics
In contrast to labelling approaches (see Stable isotope labelling quantitative proteomics below) effective label-free approaches are a recent development in quantitative proteomics. Label-free methods are implicitly gel-free and either use spectral features or spectral counting. Similar levels of biomass are required for label-free and stable isotope labelling (Hendrickson et al., 2006). Label-free approaches tend to be less expensive than labelling methods (as labelling reagents are not required), provide for assessment of a greater dynamic range of peptide abundance, and are not limited by the total number of test conditions being compared (Zybailov et al., 2005; Bantscheff et al., 2007; Nesvizhskii et al., 2007).
The spectral features approach quantifies proteins between two or more independent experiments by aligning peptide peaks from MS1 scans and comparing relative peak intensities of the same peptides across experiments (Bondarenko et al., 2002; Chelius and Bondarenko, 2002; Wang et al., 2003; Nesvizhskii et al., 2007). In order to ensure detailed, accurate and reproducible data, a high resolution MSr is required (e.g. a modern QTOF), and the same data acquisition protocol should be used for each sample (i.e. same columns, gradient and preferably temperature controlled) (America et al., 2006).
A challenge in spectral feature quantification lies in matching each detected peptide peak from one dataset to the same peptide peak in another dataset. The exact m/z and retention time of the peak may differ, usually because of technical drift in LC or MS instrumentation; these factors complicate the comparison of datasets, particularly if retention time drift is non-linear (America et al., 2006). Spiked chromatographic standards facilitate more accurate alignments and comparisons, and a relatively large number of replicates with minimized sample manipulation are required to provide an accurate assessment of abundance changes (Old et al., 2005; Bantscheff et al., 2007).
Spectral counting requires large numbers of replicates to be effective (Old et al., 2005), and is affected by systematic variation (e.g. inconsistent loading and variations in peptide behaviour in chromatography). As peptide characteristics such as size, charge and hydrophobicity influence peptide ionization efficiency, and affect the success of downstream protein identification and quantification, spectral counting should be considered a semi-quantitative method (Bantscheff et al., 2007). Advances being made in spectral counting approaches (Rappsilber et al., 2002; Craig et al., 2005; Ishihama et al., 2005; Tang et al., 2006) are likely to benefit quantitative proteomic studies of extremophiles in the future. Moreover, in comparison with isotope labelling strategies, which require peptides to be present in sufficient abundance in all test conditions (see Stable isotope labelling quantitative proteomics below), spectral counting does not have this requirement and can therefore be used to evaluate proteins that are in low abundance in, or unique to, one test condition, as was shown for M. burtonii (Burg et al., 2010).
Stable isotope labelling quantitative proteomics
The fundamental concept of stable isotope labelling is the creation of heavy and light isotopic protein or chemical label derivatives, resulting in detectable ion shifts in the mass spectra. Replacement of 13C for 12C, or 15N for 14N, or 2H for 1H in proteins or labels can generate characteristic mass shifts without affecting the chemical or structural properties of proteins or peptides (Zhong et al., 2004; Yan and Chen, 2005). Corresponding heavy and light peptides or label fragments from the same MS analysis are quantified, with their ratio representing the relative abundance of the corresponding peptide (Nesvizhskii et al., 2007). Stable isotope labelling for quantitative proteomics was introduced in 1999 by three separate groups (Gygi et al., 1999b; Oda et al., 1999; Pasa-Tolic et al., 1999). The isotopes can be introduced into proteins or peptides either in vitro or in vivo (Fig. 3). Labelling in vitro requires the label to be covalently attached to proteins after biomass has been harvested, whereas labelling in vivo involves the incorporation of the label into proteins during cell growth. The common in vitro approaches include isotope-coded affinity tags (ICAT), isotope-coded protein labelling (ICPL), isobaric labelling [iTRAQ (isobaric tag for relative and absolute quantification) and TMT (tandem mass tags)] and 16O/18O digestion labelling.
In vitro labelling: ICAT and ICPL
In ICAT labelling, proteins are labelled at cysteine residues with a thiol-reactive group before trypsin digestion (Gygi et al., 1999b). After digestion, the peptide mixture is separated by SCX chromatography, followed by biotin-affinity purification in an avidin column to selectively isolate ICAT-labelled peptides. Cleavage of the linker releases the peptide containing the thiol-reactive group and isotopic tag from the biotin-affinity tag to reduce the overall size of the label. The peptides are further separated by RP chromatography before analysis by MS/MS (Gygi et al., 2002). Incorporation of the heavy (13C) vs. light (12C) ICAT tag confers a consistent 8 Da mass shift between the light vs. heavy peptide derivatives. Correlation of the intensity of heavy vs. light isotopic peptide peaks from the MS1 scan enables relative quantification (Gygi et al., 2002).
In the labelling system ICPL, tags differing in mass through the inclusion of 13C or deuterium atoms are attached to amino groups of peptides (Schmidt et al., 2005; Leroy et al., 2010). Advantages of ICPL over ICAT include the ability to label up to four samples, labelling all proteins (not just those containing cysteine), and labelling post-digestion, which increases the number of labelled peptides per protein (Leroy et al., 2010; Paradela et al., 2010). ICPL labelling can also be used with MSr with lower sensitivity than is required for iTRAQ labelling (see In vitro labelling: isobaric labelling below) and can be used with MALDI MSrs. Studies using IPCL include the effects of simulated microgravity on the heavy metal resistant bacterium Cupriavidus metallidurans (Leroy et al., 2010), and the effects of environmental perturbations and nutrients on the haloarchaeon H. salinarum (Tebbe et al., 2009).
In vitro labelling: isobaric tags
Isobaric labelling involves attaching a module consisting of a reporter tag, balance group and peptide reactive group to N-termini of peptides and lysine amine side-chains after enzymatic digestion (Ross et al., 2004). The tags are isotopically balanced rendering peptides identical during MS1. During collision induced dissociation the tag is fragmented enabling the relative intensity of the reporter ions to be assessed in a single spectra and used for quantification (Ross et al., 2004; Zieske, 2006; Chen et al., 2007; Choe et al., 2007; Fenselau, 2007; Dayon et al., 2008). iTRAQ systems are available for application to four or eight samples, and the TMT system can label up to six different samples (Dayon et al., 2008). However, the total number of proteins identified and quantified tends to decrease with the number of samples compared (Pichler et al., 2010). Improved validation and coverage can be obtained by utilizing biological replicates across multiple injections (technical replicates) and normalizing across samples with a pooled reference (Gan et al., 2007; Song et al., 2008; Ow et al., 2009). Accuracy can also be improved by incorporating known ratios of spiked proteins with analyses and using strong statistical error models for evaluating and minimizing variation and compression in peak intensities (Karp et al., 2010). Isobaric labelling appears better than ICAT for analysing membrane proteins, and has been applied successfully to several extremophiles (Bisle et al., 2006; Williams et al., 2010a,b).
The in vivo approach (also termed metabolic labelling), involves the incorporation of heavy (13C, 15N or 2H) or light (12C, 14N or 1H) isotopes from nutrients (e.g. 14N/15N derivatives of ammonia) into proteins during cell growth; the use of labelled amino acids is termed SILAC (Ong et al., 2002). Metabolic labelling enables complete incorporation of the isotope into proteins, does not require additional enzymatic or chemical labelling, and enables labelled biomass to be combined during the early stages of the experiment (just before or just after protein extraction) thereby maximizing consistency of treatments and minimizing the chance of introducing inter-sample variation caused by experimental error (Washburn et al., 2002a; Krijgsveld et al., 2003). A disadvantage is the need to grow cultures in the presence of labelled nutrients for at least seven rounds of cell division in order to achieve a theoretical incorporation of 99% (Ting et al., 2009), which can be assessed by measurement of atom percent excess (MacCoss et al., 2005). In addition, only pairwise comparisons can be performed compared with up to eight simultaneous comparisons with iTRAQ. A disadvantage specific to SILAC is that some isotopically labelled amino acids can be interconverted (e.g. arginine to proline) in vivo causing difficulties for interpreting abundance data (Engen et al., 2002; Ong et al., 2003a,b). This can be minimized by using amino acids that are at the end of a metabolic pathway (e.g. lysine, leucine, methionine, tyrosine and valine) (Engen et al., 2002). The use of a suitable labelling compound is crucial for the efficacy of metabolic labelling. For example, labelled ammonia cannot be used if the microorganism fixes nitrogen (thereby fixing 14N from the atmosphere), and amino acids cannot be used if the microorganism is prototrophic and does not use exogenous amino acids for growth.
In addition to processing samples competently using appropriate MSrs, the MS output needs to be computationally processed properly to obtain confident identification, quantification and statistical validation of proteins and their abundance levels (Venable et al., 2004). Concomitant with the technical advances occurring with MSrs, more requirements are being placed on the quality of experimental design, data acquisition, assignation of protein identities, database deposition and data processing (particularly normalization and statistical testing). Useful guides describing current acceptable practices can be found in the publishing requirements for Molecular and Cellular Proteomics (Carr et al., 2004; Celis, 2004) and Proteomics (Wilkins et al., 2006), and in publications by the Human Proteome Organisation (HUPO) (Kaiser, 2002; Taylor et al., 2006; 2007; 2008; Gibson et al., 2008).
A growing number of open source software packages for quantifying protein abundance are available, in addition to commercial software (examples are included in Table 1). Many software packages are specific to instrument types, file types or labelling strategy. This is due, in part, to the different combinations of labelling and separation techniques used, MSr types and manufacturers, and MSr configurations used, resulting in a large number of spectra types being generated (Cannataro, 2008). This is further complicated by the range of raw data file formats produced by MSrs, the diversity of approaches used by analytical programmes to perform protein identification and quantification and the variety of output formats generated during data processing and analysis (Keller et al., 2005).
Table 1. Examples of software packages available for quantitative proteomics.
In the environmental microbiology community it is well recognized that because a large proportion of environmental microorganisms have not been able to be cultivated, their functional roles and interactions within microbial communities remain largely unknown (VerBerkmoes et al., 2009). As inferences about microbial physiology and ecology are biased by studies from a very limited number of cultivated species (Wilmes and Bond, 2006), approaches, such as metaproteomics, have been developed to enable studies to be performed on microbial communities from the environment (Schloss and Handelsman, 2003). Sampling microbial populations from the environment allows for a larger range of individual microorganisms and their interactions to be studied.
An in vivo labelling strategy that should find good application to studies of extremophiles is protein-based stable isotope probing (Protein-SIP) (Jemlich et al., 2008), where labelled substrates (e.g. 13C benzene) are introduced to microbial communities with incorporation in proteins measured using MS. The approach enables the identification of active microorganisms and metabolic pathways, such as those involved in remediation of environmental contaminants (Jemlich et al., 2009). For an individual microorganism, Protein-SIP may also prove useful for measuring nutrient and pathway utilization under comparative growth conditions (Jemlich et al., 2010).
Metaproteomic coverage of individual microorganisms within a sample will be limited by the complexity of the community and relative abundance of individual proteins (VerBerkmoes et al., 2009). Coverage can be improved using high performance hybrid MS instruments such as FT-ICR-MS (Syka et al., 2004) and LTQ-FT-Orbitrap (Hu et al., 2005b), which enable high-mass accuracies to be achieved over large dynamic ranges of protein abundance down to the fmol level (Banfield et al., 2005).
Quantitative methods for evaluating metaproteomic data are in their infancy (Keller and Hettich, 2009), and semi-quantitative approaches (e.g. spectral counting) may prove useful (Morris et al., 2010). A capacity to perform quantification is likely to require MSr developments leading to increased dynamic range and speed of analysis (Keller and Hettich, 2009). The use of microchip techniques, which use polymer microfluidic devices coupled to LC-MS/MS, and which simplify sample preparation, improve protein identification and reduce the requirement for sample quantity, analysis time and cost (Eithier et al., 2006; Horvatovich et al., 2007; Srbek et al., 2007), may prove useful for metaproteomic studies in the future.
In metaproteomics, the ability to identify proteins is dependent upon the quality and extent of metagenome coverage (Keller and Hettich, 2009). The quality of genome annotation and protein predictions is important, with failure to predict open reading frames and identification of incorrect gene start sites precluding peptide identification (VerBerkmoes et al., 2009). When adequate metagenomic data are not available, de novo sequencing can be used. This has been successfully used for the temporal evaluation of proteins in a bacterial community following exposure to cadmium (Lacerda et al., 2007), and metaproteomic studies of microbiota in the human gastrointestinal tract (Klaassens et al., 2007; VerBerkmoes et al., 2009).
Similar to the way that genome sequence data of an individual microorganism enhance the capacity to identify proteins using MS, having matched metagenome and metaproteome data is not essential (Lacerda and Reardon, 2009), but greatly increases the chance to make confident protein identifications (VerBerkmoes et al., 2009). This is particularly the case if sample complexity is low and metagenome coverage is high, as was demonstrated for an acid mine drainage site (Ram et al., 2005) and the oxycline zone of an Antarctic meromictic lake, Ace Lake (Ng et al., 2010). In the acid mine drainage study, ∼48% of proteins from an individual member of the biofilm were identified (Ram et al., 2005). In the Antarctic study, ∼31% metaproteomic coverage was obtained, an in combination with nearly complete reconstruction of the genome for a psychrophilic green sulfur bacterium, insight was gained into the physiological traits that enable the bacterium to gain dominance under cold, nutrient-, oxygen-limited and extremely varied annual light cycles (Ng et al., 2010).
With greatly improved (and cost-effective) capacity to perform DNA sequencing, and improvements in MS, increasingly larger scale metaproteogenomic studies are being performed. These include studies of membrane proteins from South Atlantic marine communities where comparative metaproteomics of different sampling sites, along a natural gradient in nutrient concentrations, revealed shifts in nutrient utilization and energy transduction capacity of the microbial populations (Morris et al., 2010). In a separate study of Ace Lake, an integrative analysis of metaproteogenomic data from samples taken at six depths of the lake enabled the identity and functional capacity of microorganisms to be identified (Lauro et al., 2010). From these analyses the interactions between populations that fulfil nutrient cycling and that shape the evolution of the microbial communities, was able to be determined (Lauro et al., 2010). Based on the metaproteogenomic analyses, this study also developed amathematical model to describe the effects of environmental perturbations on the stability of the ecosystem. As extreme environments often have low microbial community complexity (Ram et al., 2005; Ng et al., 2010), which greatly enhances the ability to obtain useful protein coverage for representative species, there is excellent scope for applying metaproteomics to a broad range of extreme environments.
An enormous scope exists for learning about environmental microorganisms through the application of proteomics. Microbial proteomic studies range from single time and condition proteome snapshots of the protein complement of an individual microorganism, to multiplex quantitative analyses of cellular responses of a single microorganism, through to metaproteogenomic assessments of microbial communities representing diverse ecosystems, and the probing of active remediatory community members using Protein-SIP. Extremophiles colonize a large range of natural, and artificially created habitats, having gained the specific mechanisms of adaptation required for colonization through billions of years of evolution. While modern microbiologists have learned that performing biological research means coming to terms with massive streams of DNA sequence data, fewer have developed an ability to generate and comprehend proteomic data. Proteomics is a multidisciplinary science requiring effective understanding and technical expertise in the chosen biology, and in MS and bioinformatics. Proteomic efforts for diverse extremophiles will become successful when biologists interface with mass spectrometrists who have a genuine interest in seeing MS data translated into real biological meaning. This path will likely necessitate a broad range of trouble shooting (e.g. sample preparation), personnel training (e.g. students trained in biology learning how to ‘fly’ a MSr), and fund raising (e.g. collaborative grants). Once a critical mass of expertise and a target-specific proteomic platform is established, a broad range of ecophysiological questions can be fruitfully addressed for essentially any extremophile of interest.
We acknowledge the reviewers of this manuscript for their useful critical appraisal of the original submission. Research performed in RC's laboratory is supported by the Australian Research Council.