Proteomics of extremophiles

Authors


E-mail r.cavicchioli@unsw.edu.au; Tel. (+61) 2 9385 3516; Fax (+61) 2 9385 2742.

Nanyang Environmental & Water Research Institute (NEWRI), The Advanced Environmental Biotechnology Centre, Nanyang Technological University, Singapore 637551.

Summary

Functional genomic approaches, such as proteomics, greatly enhance the value of genome sequences by providing a global level assessment of which genes are expressed, when genes are expressed and at what cellular levels gene products are synthesized. With over 1000 complete genome sequences of different microorganisms available, and DNA sequencing for environmental samples (metagenomes) producing vast amounts of gene sequence data, there is a real opportunity and a clear need to generate associated functional genomic data to learn about the source microorganisms. In contrast to the technological advances that have led to the accelerated rate and ease at which DNA sequence data can be generated, mass spectrometry based proteomics remains a technically sophisticated and exacting science. In recognition of the need to make proteomics more accessible to a growing number of environmental microbiologists so that the ‘functional genomics gap’ may be bridged, this review strives to demystify proteomic technologies and describe ways in which they have been applied, and more importantly, can be applied to study the physiology and ecology of extremophiles.

Introduction

Proteomics is the term used to describe studies that examine the global protein complement of an organism, tissue or community. The proteome consists of all proteins expressed by an organism under a given set of conditions and therefore represents the functional complement of the genome (Wasinger et al., 1995; Wilkins et al., 1996; Goodlett and Yi, 2002). The proteome represents the product of global gene expression (transcription plus translation), protein stability, and protein processing and turnover. Proteomics therefore extends beyond genomic analyses, which only describe the theoretical capability of an organism or community, by providing a direct measure, globally of what proteins are synthesized, when they are synthesized and what their cellular (or extracellular) abundance is (Pandey and Mann, 2000). Proteomic measurements are achieved through the use of highly sensitive instrumentation and application of powerful computational methods to produce high throughput qualitative (i.e. protein identity) and quantitative (i.e. protein abundance) data (Aebersold and Mann, 2003). As a result, cellular pathways and processes functioning in a cell or community can be determined, providing the means to probe the fundamental biology and adaptive responses of microorganisms (Washburn and Yates, 2000). By mimicking environmental growth conditions, proteomics can be used to infer cellular responses that are ecologically meaningful, thereby providing the basis for developing views and hypotheses about environmental microbial processes. Extending beyond laboratory-based manipulation of axenic cultures, environmental proteomics (metaproteomics) provides the means to assess proteins that are synthesized by microbial communities. Linking metaproteomic data to environmental genomics (metagenomics) and geo-physico-chemical data provides a powerful means of inferring the roles of indigenous microbial communities in whole ecosystem function (Lauro et al., 2010; Schneider and Riedel, 2010).

Extremophiles are organisms that live under conditions that stretch well beyond those considered optimal for human life. From this anthropocentric view, extreme growth conditions for microorganisms are those that extend beyond a narrow band of relatively mild growth conditions. Extremophiles are therefore microorganismsthat proliferate under relative extremes of temperature, pH, salinity, pressure, radiation or other growth limiting abiotic constraints (Cavicchioli and Thomas, 2004). Ecological, physiological and evolutionary studies of extremophiles are particularly valuable for providing insight into the physical limits at which life can occur, the origins of life and the potential for the discovery of extraterrestrial life (Cavicchioli, 2002). Extremophiles also offer significant potential as a natural resource for the discovery of novel enzymes and other bioactive compounds that have commercial value for use in a broad range of industries.

This review focuses on describing the value of proteomics to the field of extremophiles. The approach we have taken is to introduce important concepts and techniques in proteomics, where available describe proteomic applications to extremophiles, and provide sufficient depth of knowledge, particularly about quantitative proteomics, to enable readers to consider how proteomic approaches may be applied to their extremophiles of interest. To make proteomics more accessible to those less familiar, we have included coverage of basic principles of mass spectrometry (MS), sample fractionation, quantification and statistical analysis, as applied to proteomics of individual organisms and metaproteomics of environmental samples.

Application of proteomic techniques to extremophiles

Extremophile research is often orientated towards determining what cellular responses are involved in growth and survival. As a functional genomics approach, proteomics is well suited to providing qualitative and quantitative data for addressing these types of questions. However, biologists performing proteomics experiments for the first time are unlikely to have expertise with all the technologies (particularly MS) required for the work. In addition to understanding the principles of MS, protein preparation, fractionation and processing (e.g. labelling) procedures tailored to suit the capabilities of individual mass spectrometers (MSrs) also need to be adopted. Determining an appropriate experimental design can be facilitated by consulting staff responsible for the operation and implementation of MSrs – developing a healthy rapport can go a long way towards generating cost-effective, successful proteomics outcomes. Evaluating a generalized workflow can also help in determining an appropriate experimental design (Fig. 1).

Figure 1.

Generalized workflow and approaches for common proteomic analyses utilizing mass spectrometry. In the general proteomic workflow, proteins are first extracted from a sample. Those proteins can then either be directly denatured and digested to peptides or can undergo a variety of separations based on their physical and/or chemical properties. Following protein separation, samples can either be analysed via intact protein MS or, most commonly, be denatured and digested to peptides. Following digestion peptides can be directly analysed by LC-ESI-MS/MS or by MALDI-MS/MS (not extensively covered in this review), or undergo additional secondary separations based on their physical and/or chemical properties before LC-MS/MS. If sufficient results are not achieved, additional protein or peptide separation strategies can be used, or more powerful instrumentation utilized. Red text and orange stars indicate points in the generalized workflow where labels can be introduced. Blue text indicates procedures. Italicized text indicates questions the researcher should consider. Dotted lines indicate potential procedural steps. *Membrane/insoluble samples require special processing (see Sub-cellular fractionation and Protein separation).inline image 2DE can be utilized in differential proteomics additionally to its use as a means of protein separation.

The main area in which proteomics of extremophiles differs from other cell types is the preparation of the proteins for analysis. As a result of the abiotic extremes confronting extremophiles, their proteins have evolved a range of specific properties (Siddiqui and Thomas, 2008), necessitating protein preparation procedures to be assessed on a case-by-case basis. For example, proteins from hyperthermophiles are extremely thermostable and are highly resistant to chemical denaturants, organic solvents and proteolytic digestion, leading to problems with denaturing and digesting proteins. For Pyrococcus furiosus this has been overcome by using a detergent-based microwave assisted acid hydrolysis step coupled with an overnight trypsin digest (Lee et al., 2009). Acidophiles and alkaliphiles tend to have proteins with altered surface charges that will affect their migration in isoelectric focusing (IEF) (Antranikian, 2008; Shirai et al., 2008), potentially restricting the use of gel-based methods for protein separation (see Protein separation below). On the other hand, extracting and preparing the proteins from halophiles for analysis can be as simple as washing cell pellets in pure water followed by centrifugation to remove cell debris (Whitehead et al., 2006; Leuko et al., 2009). With these caveats in mind, it should be noted that most contemporary proteomic approaches are amenable to extremophile research. In the following sections of this review, we cover many of the proteomic approaches available and highlight where these approaches have been used with extremophiles.

MS and proteomics

The main analytical tool for determining the identity, abundance and modification state of proteins is MS (Pandey and Mann, 2000; Rabilloud, 2002; Wittke et al., 2004). Most commonly in proteomics research, the accurate mass of a group of peptides derived from sequence specific proteolysis of a protein is determined through MS. By comparing protein sequence databases to the mass spectrum of a peptide, a protein can be correctly identified (Aebersold and Goodlett, 2001). The availability of matching genome sequence data greatly facilitates proteinidentification, although cross-species matching can yield identifications (Ostrowski et al., 2004). A large number of microbial genomes (> 1000) have been sequenced, including diverse types of extremophiles. The public availability of many of these (e.g. Integrated Microbial Genomes http://img.jgi.doe.gov/cgi-bin/pub/main.cgi; NCBI http://www.ncbi.nlm.nih.gov/sites/genome; JVCI CMR http://cmr.jcvi.org/tigr-scripts/CMR/CmrHomePage.cgi), and the low cost and rapid pace at which genomes can now be sequenced, has greatly reduced the impediment that the lack of genome sequence data once caused.

Many different MS approaches have been developed to achieve peptide and protein identification. Most commonly, a ‘bottom-up’ approach is used where proteins are digested with a sequence specific protease, such as trypsin that produces peptides cleaved at arginine (R) and lysine (K) amino acid residues. The less common ‘top-down approach’ involves subjecting intact proteins to the MSr. While the top-down approach is useful in advanced downstream applications such as determining post-translational modifications and protein–protein interactions, it remains relatively low throughput and is currently less advanced than bottom-up proteomics. For these reasons these approaches are not covered in this review, and the reader is directed to a number of excellent papers describing top-down proteomics (Wang et al., 2005; Siuti and Kelleher, 2007; Han et al., 2008; Heck, 2008; Vellaichamy et al., 2010).

The requirement to fractionate samples

High throughput proteomic analysis is often hampered by the complexity of samples. A culture lysate from a microorganism may contain several thousand proteins. When proteolytically digested the sample will contain tens of thousands of unique peptides, and when post-translational modifications are taken into account an even higher level of complexity may be present (Stasyk and Huber, 2004; Righetti et al., 2005). In addition to sample complexity, cellular proteins exhibit a wide range of concentrations. Highly abundant proteins (e.g. central metabolic proteins) may be present in concentrations up to seven orders of magnitude higher than low abundance proteins (Corthals et al., 2000; Issaq et al., 2005). Despite low abundance, some proteins (e.g. transcriptional regulatory proteins) may play crucial roles and are therefore important to detect. As high throughput MS relies on automated intensity based ion selection, the presence of peptides from high abundance (high MS intensity) proteins masks the selection of the low abundance (low intensity) peptides, and hence their detection. In order to circumvent problems with detection and increase overall proteome coverage, fractionation strategies can be used to reduce the complexity of samples before MS analysis.

When using separation procedures it is important to ensure that samples are treated in ways that render them generally free of detergents, salts and organic solvents. For example, the activity of the most commonly used proteolytic enzyme, trypsin, can be perturbed by high levels of such contaminants. Sample clean-up steps include acetone precipitation, protein dialysis, and cartridge-based or specialized pipette tip-based chromatography of peptide mixtures.

Sub-cellular fractionation

A useful form of fractionation is to separate microbial biomass into three sub-cellular fractions: the secreted fraction (secretome), which consists of the culture supernatant and contains extracellular proteins; the insoluble fraction, which consists of membranes, membrane proteins and large complexes; and the soluble fraction, which primarily contains cytoplasmic proteins. The combined analysis of these three fractions can greatly enhance proteome coverage and provide information about compartmentalization (e.g. membrane associated proteins) or proteins present in large complexes (e.g. ribosomes) (Williams et al., 2010a,b).

The secretome of extremophiles holds particular interest for identifying enzymes of potential biotechnological value, such as proteases and cellulases (Miyazaki, 2005; Pulido et al., 2007; Blumer-Schuette et al., 2008). Secreteome analyses from extremophiles have tended to use gel-based separations (Saunders et al., 2006; Ellen et al., 2009; Muddiman et al., 2010). However, there are no technical barriers to utilizing high throughput methods, such as LC/LC-MS/MS (Williams et al., 2010a) (see Protein separation below).

Researchers performing secretome studies of extremophilic archaea need to take into account that signal peptide sequences are generally poorly experimentally characterized in this domain of life (Pohlschroder et al., 2005; Saunders et al., 2006). Therefore, measures need to be included to ensure that fractionation is robust (e.g. inclusion of assays for cytoplasmic markers), so that secretome data can be interpreted with confidence (Saunders et al., 2006; Williams et al., 2010a). Secretome fractions can also contain surface proteins that are released by endogenous peptidases during growth or through cell death (Ellen et al., 2009). This issue may be minimized by harvesting cultures at an early stage of growth, although harvesting early decreases protein yield.

The soluble and insoluble fractions can be separated effectively using a carbonate-extraction and ultracentrifugation method (Molloy et al., 2000; Blonder et al., 2002; Blonder et al., 2004a; Burg et al., 2010). In contrast to analysing the proteome of the soluble fraction, the insoluble fraction requires specific proteomic methodology to enable integral membrane proteins, membrane associated proteins and proteins that form the hydrophobic core of protein complexes to be resolved effectively. The proteins need to be solubilized while remaining relatively chemically inert so they do not interfere with downstream quantitative techniques (e.g. labelling) or with the MSr. Several methods have been applied successfully to extremophiles (Goo et al., 2003; Zhu et al., 2004; Barry et al., 2006; Bisle et al., 2006; Graham et al., 2006a; Burg et al., 2010; Williams et al., 2010b; Zaparty et al., 2010). Methods include the use of denaturing chaotropic reagents (Graham et al., 2007), phase partitioning with detergents (Everberg et al., 2008) and other detergent-based methods (Zheng et al., 2007), all of which tend to be coupled with 2D electrophoresis (2DE). The 2DE methods have limitations with regards to removing MS interfering agents (e.g. detergents), and problems with gel electrophoresis of hydrophobic proteins including post-solubilization precipitation during IEF, and running as streaks on gels (Washburn and Yates, 2000; Klein et al., 2005) (also see 2DE-based quantitative proteomics). Other methods include affinity labelling with lipid soluble probes (Tang et al., 2007), and a tube digestion method that allows for removal of interfering detergents (Lu and Zhu, 2005). A method utilizing high pH and cyanogen bromide has been developed to counteract these problems (Washburn et al., 2001), and modifications of the procedure have also been published (Wu et al., 2003; Blackler et al., 2008) and applied to extremophiles (Chong et al., 2005).

For insoluble proteins, a method using methanol as a solvent has proven particularly useful (Blonder et al., 2002), with improvements developed that include thermal denaturation and in-solvent digestion (Blonder et al., 2004b,c). This method of solubilization, which has been shown to be applicable to a number of extremophiles (Blonder et al., 2004a; Burg et al., 2010; Williams et al., 2010a,b), is superior to detergent-based solubilization (Mitra et al., 2007; Zhang et al., 2007), and is compatible with liquid chromatography (LC) and MS (e.g. LC/LC–MS/MS) (Zhang et al., 2007) and downstream label-based quantification (Williams et al., 2010a,b).

Studying the insoluble proteome of extremophiles will help to shed light on membrane and surface proteins that are directly involved in interaction with the environment. For example the halophilic alkalithermophile Natranaerobius thermophilus is thought to adapt to multiple environmental extremes through the use of a large repertoire of Na+(K+)/H+ antiporters (Mesbah et al., 2009). Proteomic studies of insoluble fractions have revealed proteins involved in signal transduction, transport of various cellular substances and secreted proteins, and cell surface proteins in the psychrophilic methanogen, Methanococcoides burtonii (Burg et al., 2010; Williams et al., 2010a,b).

Protein separation

To purify, reduce the complexity or visualize individual proteins, cellular extracts or sub-cellular fractions can be separated using gel-based or gel-free methods before MS. Gel-based protein separation, when combined with reverse phase (RP) LC separation of digested peptides (see below) and MS, is often referred to as GeLC-MS or GeLC-MS/MS. Separation by molecular weight only (1 dimensional gel electrophoresis; 1DE) can help to remove compounds that inhibit peptide digestion or MS. Two dimensional gel electrophoresis (2DE) separates proteins based on their isoelectric point (pI) through IEF, and molecular weight through polyacrylamide gel electrophoresis (PAGE). The technique can further help to reduce sample complexity, and allows snapshot (one sample) and differential (comparison of more than one sample) protein profiles to be generated (Gorg et al., 2000; Gygi et al., 2000).

Following electrophoresis, individual proteins or groups of proteins are visualized by staining techniques, of which several are available for use. The staining techniques have different ranges and limits of detection and the choice of specific techniques should be considered based on requirements. Coomassie staining is inexpensive, easy to use, reliable and compatible with MS. However, it has low sensitivity (> 1 µg to ∼100 ng). Silver staining can detect proteins at lower abundance within a narrow range of concentrations (∼5 to ∼80 ng), uses complicated methodology and can produce problems with MS. More recently several fluorescent stains have been developed, which are relatively simple to implement and have ranges of detection limits from ∼1 ng or less to > 1 ug. However, these stains use relatively expensive reagents and require specialized equipment for visualization and spot excision (Lopez et al., 2000; Patton, 2000; Patton et al., 2002; Wang et al., 2007).

Following staining and visualization, bands containing multiple (1DE) or individual (2DE) proteins are excised and digested in-gel, as described by Shevchenko and colleagues (2007), and peptides subjected to RP-LC separation before MS analysis (Wilm et al., 1996; Lasonder et al., 2002; Everley et al., 2004; Ting et al., 2010). When developing proteomic methods it is wise to test whether gel-based separation tends to help (e.g. removes contaminants thereby improving the quality of mass spectra) or hinders (e.g. reduces protein yield without improving the quality of mass spectra) protein identification and coverage.

Complex mixtures such as whole cell lysates can be challenging to analyse using 2DE, because of: (i) the large dynamic range in protein abundance, (ii) the presence of proteins that fall beyond the pI and mass thresholds of the technique (see Application of proteomic techniques to extremophiles), (iii) the presence of isoforms or co-migrating proteins that can complicate the interpretation of gel spots, (iv) the presence of membrane proteins or hydrophobic proteins that do not resolve effectively (see below) (Peng and Gygi, 2001; Wittke et al., 2004; Yan and Chen, 2005) or (v) difficulties in replicating protein profiles leading to a reduction in the number of spots that can be scored and used to gain accurate measures of protein quantity (Gorg et al., 2000; Wittke et al., 2004).

The separation of proteins using a gel-free approach is most commonly achieved by LC (Patterson, 1994; Aebersold and Goodlett, 2001; Griffin et al., 2001; Goodlett and Yi, 2002; Rabilloud, 2002; Yates and Snyder, 2004). Soluble proteins can be effectively separated by off-line methods before protein digestion using ion exchange columns (e.g. strong cation exchange, SCX) as have been used for the psychrophile M. burtonii (Goodchild et al., 2004a), or size exclusion columns, which have been used for the haloalkaliphile Natronomonas pharaonis (Konstantinidis et al., 2007). While chromatographic separation techniques are now largely automated and can achieve relatively rapid separations of complex mixtures, increasing the number of samples to process increases the time and cost of MS analysis. Analogous to gel-based separation, the processing of samples from extremophiles with proteins with skewed charge or pI (e.g. intracellular proteins from haloarchaea) will require careful consideration of the separation matrix and eluent gradients for each protein mixture of interest.

The application of off-line pre-digestion methods to insoluble proteins is more challenging than for soluble proteins. However, it is important to develop appropriate protocols to separate out high abundance soluble proteins that can contaminate insoluble fractions (Klein et al., 2005). Useful differential solubility fractionation procedures have been developed for the extremophile M. burtonii (Burg et al., 2010), and for Escherichia coli (Ramos et al., 2008).

Other approaches for reducing overall sample complexity at the protein level for extremophiles include subjecting cell extracts from a Sulfurispharea sp. of hyperthermophilic archaea to extended high temperature incubations plus incubation with denaturing agents to enrich for proteins with high conformational stability (Prosinecki et al., 2006), and using filtration and reduced duration of protein digestion to enhance coverage of low molecular weight proteins for Halobacterium salinarum (Klein et al., 2007). The best coverage will inevitably be obtained by using multiple types of fractionation and separation procedures, as have been used for Haloferax volcanii (Kirkland et al., 2008) and M. burtonii (Burg et al., 2010; Williams et al., 2010a).

Peptide separation

A fast, sensitive, reproducible and automated means of obtaining protein identities from peptide mixtures obtained following protein digestion is to perform LC-MS/MS (Link et al., 1999). Peptides are typically separated according to hydrophobicity on a RP-C18 column and eluted online into a MSr. Complex samples often require a second dimension of LC (e.g. SCX) before RP-LC (LC/LC-MS/MS) (Ye et al., 2000; Shen et al., 2004). SCX has been used in this manner to separate peptides before MS analysis for the thermoacidophile Sulfolobus solfataricus (Zaparty et al., 2010). Another secondary means of peptide separation is in-gel IEF, as has been used for S. solfataricus to separate peptides before LC-MS/MS analysis. The methodology is identical to protein separation by IEF in the first dimension of 2DE, although for peptide separation the IPG strip is cut into a number of fractions and the peptides are eluted using procedures similar to in-gel digestion (Chong et al., 2007). Whether this second dimension of separation is required is largely dependent on the characteristics of the sample and the MSr used.

Tandem mass spectrometry (MS/MS) and protein identification

After LC, separated peptides can be introduced into a MSr by electrospray ionization (ESI). Matrix-assisted laser desorption ionization (MALDI) can also be used but is not directly coupled with LC separation (Hillenkamp and Peter-Katalinic, 2007). For ESI, peptides are eluted online (e.g. from RP-LC separation) into the MSr where the mass-to-charge ratios (m/z) of the peptide ions are first measured by MS to determine the molecular mass of each peptide. The most intense peptide ions are automatically selected by instrument software for impact with a neutral gas (e.g. argon) in a collision cell, producing collision induced dissociation spectra. The m/z of the peptide fragments are then measured, and in combination with the intact peptide mass, this tandem mass spectrum (Aebersold and Mann, 2003; Yates, 2004; Keller and Hettich, 2009) is used for determining peptide identity (Biemann, 1986; Hunt and Yates, 1986). Many MSrs are suitable for MS and MS/MS proteomics, each with their own particular strengths. The most commonly used mass analysers include quadrupoles, ion traps (IT), time of flight (TOF), TOF-TOF, quadrupole-TOF (QTOF), hybrid quadrupole-IT hybrids, IT-orbitrap hybrids and IT-Fourier transform ion-cyclotron resonance MSr (FTMS) hybrids (Aebersold and Mann, 2003; Yates, 2004).

Protein identification through database matching and de novo sequencing

Peptide and protein identification is dependent on the ability to match experimentally acquired intact mass and fragmentation patterns of peptides to theoretical mass and fragmentation patterns generated from protein sequences inferred from DNA sequence data. MS protein identification software, such as Mascot (Perkins et al., 1999), Sequest (Eng et al., 1994) and X!-tandem (Craig and Beavis, 2004), use algorithms to interpret MS/MS spectral data and produce matches to databases with a score that reflects the statistical significance of the match (Aebersold and Mann, 2003).

The spectral matching software cannot be used for extremophiles that do not have available genome sequence data. In contrast, de novo sequencing software, such as DeNoS (Savitski et al., 2005) and PEAKS (Ma et al., 2003), extract amino acid sequence information from mass spectra that can be matched to any database using MS-BLAST or an integrated protein database searching algorithm (Ma et al., 2003; VerBerkmoes et al., 2009). A method using 2DE in-gel guanidination followed by sulfonation of peptide N-termini has been developed for improving de novo identifications and has been applied to the heavy metal tolerant bacterium Shewanella oneidensis (Sergeant et al., 2009).

Importance of quantitative proteomics

The term ‘differential expression’ does not fully describe the events that take place in a cell that lead to observed cellular levels of gene products (protein and mRNA). A more accurate terminology to describe both proteomic and transcriptomic (global mRNA levels) data is to refer to protein or mRNA abundance respectively. Protein abundance is the product of regulatory events taking place at the level of gene expression, transcript stability (Atwater et al., 1990), translation (Lange and Hengge-Aronis, 1994; Hengst and Reed, 1996), protein stability and turnover (Bachmair et al., 1986; Belle et al., 2006), and post-translational modification (Mann and Jensen, 2003). Furthermore, mRNA levels may not positively correlate with protein abundance (Gygi et al., 1999a; Chen et al., 2002), and differences between transcriptomic and proteomic levels for individual genes or clusters of genes (e.g. operons) can provide insight into transcriptional vs. translational regulation (Baliga et al., 2002; Trauger et al., 2008; Zivanovic et al., 2009; Campanaro et al., 2010; Sun et al., 2010).

Measuring relative protein abundance

Proteomic methods using absolute or relative measures of changes in protein abundance are very useful for inferring physiological responses to environmental stimuli (Krijgsveld and Heck, 2004; MacCoss and Matthews, 2005). Thus quantitative proteomics represents an approach enabling an important question in extremophile research, ‘how does the organism survive under such extreme conditions’, to be explored.

Absolute quantification involves introducing a known quantity of a reference protein or peptide standard into each sample. By comparing the height of the MS peptide peaks against the standard, peptide concentrations can be determined. A spiked standard can also be used to normalize peak intensities across experiments (Gerber et al., 2003; Kirkpatrick et al., 2005). Absolute quantification is most commonly used in clinical research (examples include; Brun et al., 2009; Williamson et al., 2011).

Relative quantification involves the comparison of two or more conditions (e.g. high vs. low temperature, treatment vs. control) and relative abundance can be determined from visual protein intensity, MS peak intensity or spectral counting (Fig. 2). Quantitative 2DE proteomics is assessed visually by densitometry of corresponding protein spot intensities. The relative peak intensity (peak area or peak height) of peptides from a survey scan [i.e. the first dimension of MS (MS1)], or of tag fragments in the second dimension of MS, can be correlated with the abundance of a peptide in a sample. Peak intensity quantification is used in label-free and stable isotope labelling quantification (see Stable isotope labelling quantitative proteomics below). Quantification by spectral counting uses the number of confidently identified peptides per protein as a proxy for protein abundance and is only used in label-free quantification.

Figure 2.

Quantitative proteomics approaches. XIC, extracted ion chromatogram. 2D gel from Ostrowski et al. (2004).

2DE-based quantitative proteomics

2DE coupled to protein identification using MS (2DE-MS and 2DE-MS/MS) involves scanning protein spots to obtain intensity profiles, excising spots, enzymatically digesting proteins and applying MS methods such as MALDI-TOF or ESI-LC-MS/MS for protein identification. 2DE approaches have yielded quantitative proteomic data for diverse bacteria (Budde et al., 2006) including thermophiles (Graham et al., 2006b) and psychrophiles (Kawamoto et al., 2007), and archaea including thermophiles and hyperthermophiles (Lim et al., 2003; Barry et al., 2006; Kwon et al., 2009) and a psychrophile (Goodchild et al., 2004b; Cavicchioli et al., 2006).

The main advantage of the 2DE approach is the ability to separate and visualize up to thousands of proteins on a single polyacrylamide gel (Fey et al., 2000; Gorg et al., 2000). Labelling proteins with fluorescent dyes before electrophoresis (difference in-gel electrophoresis) can also enhance the use of 2DE for quantitative proteomics (Unlu et al., 1997; Marouga et al., 2005). However, complex mixtures of whole cell lysates from extremophiles that include large numbers of very acidic or basic proteins, and proteins with very high or low molecular weight can be difficult to analyse by 2DE (Peng and Gygi, 2001; Wittke et al., 2004; Yan and Chen, 2005). Other potential complications include, a large non-linear dynamic range of spot intensities, isoforms and co-migrating proteins that confound interpretations of protein profiles, hydrophobic proteins that precipitate, and difficulties in obtaining reproducibility between biological and technical replicates (Gorg et al., 2000; Wittke et al., 2004). Moreover, as a result of advances in LC-utilizing methods (see below), the popularity of 2DE as an approach for quantitative proteomics has declined.

Label-free quantitative proteomics

In contrast to labelling approaches (see Stable isotope labelling quantitative proteomics below) effective label-free approaches are a recent development in quantitative proteomics. Label-free methods are implicitly gel-free and either use spectral features or spectral counting. Similar levels of biomass are required for label-free and stable isotope labelling (Hendrickson et al., 2006). Label-free approaches tend to be less expensive than labelling methods (as labelling reagents are not required), provide for assessment of a greater dynamic range of peptide abundance, and are not limited by the total number of test conditions being compared (Zybailov et al., 2005; Bantscheff et al., 2007; Nesvizhskii et al., 2007).

The spectral features approach quantifies proteins between two or more independent experiments by aligning peptide peaks from MS1 scans and comparing relative peak intensities of the same peptides across experiments (Bondarenko et al., 2002; Chelius and Bondarenko, 2002; Wang et al., 2003; Nesvizhskii et al., 2007). In order to ensure detailed, accurate and reproducible data, a high resolution MSr is required (e.g. a modern QTOF), and the same data acquisition protocol should be used for each sample (i.e. same columns, gradient and preferably temperature controlled) (America et al., 2006).

A challenge in spectral feature quantification lies in matching each detected peptide peak from one dataset to the same peptide peak in another dataset. The exact m/z and retention time of the peak may differ, usually because of technical drift in LC or MS instrumentation; these factors complicate the comparison of datasets, particularly if retention time drift is non-linear (America et al., 2006). Spiked chromatographic standards facilitate more accurate alignments and comparisons, and a relatively large number of replicates with minimized sample manipulation are required to provide an accurate assessment of abundance changes (Old et al., 2005; Bantscheff et al., 2007).

Spectral counting is a more sensitive method for quantification than the spectral features approach (Old et al., 2005). Quantification by spectral counting assumes that the number of spectra confidently matched to peptides correlates with protein abundance. Distinct list of proteins obtained from independent MSr runs of biological replicates are compared with identify differentially abundant proteins (Blondeau et al., 2004; Old et al., 2005; Zybailov et al., 2005; Hendrickson et al., 2006; Paoletti et al., 2006; Xia et al., 2007). This method has been used to analyse the thermal response and effects of co-cultures on the thermophiles Thermotoga maritima and Caldicellulosiruptor saccharolyticus (Andrews et al., 2010).

Spectral counting requires large numbers of replicates to be effective (Old et al., 2005), and is affected by systematic variation (e.g. inconsistent loading and variations in peptide behaviour in chromatography). As peptide characteristics such as size, charge and hydrophobicity influence peptide ionization efficiency, and affect the success of downstream protein identification and quantification, spectral counting should be considered a semi-quantitative method (Bantscheff et al., 2007). Advances being made in spectral counting approaches (Rappsilber et al., 2002; Craig et al., 2005; Ishihama et al., 2005; Tang et al., 2006) are likely to benefit quantitative proteomic studies of extremophiles in the future. Moreover, in comparison with isotope labelling strategies, which require peptides to be present in sufficient abundance in all test conditions (see Stable isotope labelling quantitative proteomics below), spectral counting does not have this requirement and can therefore be used to evaluate proteins that are in low abundance in, or unique to, one test condition, as was shown for M. burtonii (Burg et al., 2010).

Stable isotope labelling quantitative proteomics

The fundamental concept of stable isotope labelling is the creation of heavy and light isotopic protein or chemical label derivatives, resulting in detectable ion shifts in the mass spectra. Replacement of 13C for 12C, or 15N for 14N, or 2H for 1H in proteins or labels can generate characteristic mass shifts without affecting the chemical or structural properties of proteins or peptides (Zhong et al., 2004; Yan and Chen, 2005). Corresponding heavy and light peptides or label fragments from the same MS analysis are quantified, with their ratio representing the relative abundance of the corresponding peptide (Nesvizhskii et al., 2007). Stable isotope labelling for quantitative proteomics was introduced in 1999 by three separate groups (Gygi et al., 1999b; Oda et al., 1999; Pasa-Tolic et al., 1999). The isotopes can be introduced into proteins or peptides either in vitro or in vivo (Fig. 3). Labelling in vitro requires the label to be covalently attached to proteins after biomass has been harvested, whereas labelling in vivo involves the incorporation of the label into proteins during cell growth. The common in vitro approaches include isotope-coded affinity tags (ICAT), isotope-coded protein labelling (ICPL), isobaric labelling [iTRAQ (isobaric tag for relative and absolute quantification) and TMT (tandem mass tags)] and 16O/18O digestion labelling.

Figure 3.

in vivo and in vitro stable isotope labelling approaches. SILAC, stable isotope labelling with amino acids in cell culture.

In vitro labelling: ICAT and ICPL

In ICAT labelling, proteins are labelled at cysteine residues with a thiol-reactive group before trypsin digestion (Gygi et al., 1999b). After digestion, the peptide mixture is separated by SCX chromatography, followed by biotin-affinity purification in an avidin column to selectively isolate ICAT-labelled peptides. Cleavage of the linker releases the peptide containing the thiol-reactive group and isotopic tag from the biotin-affinity tag to reduce the overall size of the label. The peptides are further separated by RP chromatography before analysis by MS/MS (Gygi et al., 2002). Incorporation of the heavy (13C) vs. light (12C) ICAT tag confers a consistent 8 Da mass shift between the light vs. heavy peptide derivatives. Correlation of the intensity of heavy vs. light isotopic peptide peaks from the MS1 scan enables relative quantification (Gygi et al., 2002).

Advantages of ICAT labelling include tolerance to salts, detergents and agents of alkylation, and the ability to reduce sample complexity by isolating peptides that contain cysteine residues on the avidin column (Eng et al., 1994; Gygi et al., 1999b; Link et al., 1999). The primary disadvantage of ICAT is that only proteins that contain a cysteine will be detected. ICAT was used in one of the first quantitative proteomics studies performed on archaea (Cavicchioli et al., 2006), including Halobacterium sp. (Baliga et al., 2002), and M. burtonii (Goodchild et al., 2005). Studies of extremophiles may particularly benefit from the OxICAT system, a modification of the ICAT protocol (Leichert et al., 2008; Chiappetta et al., 2010; Kumsta et al., 2010), which enables quantification of oxidative thiol modifications arising from oxidative stress or regulatory thiol modification of proteins (Kiley and Storz, 2004; Kumsta et al., 2010).

In the labelling system ICPL, tags differing in mass through the inclusion of 13C or deuterium atoms are attached to amino groups of peptides (Schmidt et al., 2005; Leroy et al., 2010). Advantages of ICPL over ICAT include the ability to label up to four samples, labelling all proteins (not just those containing cysteine), and labelling post-digestion, which increases the number of labelled peptides per protein (Leroy et al., 2010; Paradela et al., 2010). ICPL labelling can also be used with MSr with lower sensitivity than is required for iTRAQ labelling (see In vitro labelling: isobaric labelling below) and can be used with MALDI MSrs. Studies using IPCL include the effects of simulated microgravity on the heavy metal resistant bacterium Cupriavidus metallidurans (Leroy et al., 2010), and the effects of environmental perturbations and nutrients on the haloarchaeon H. salinarum (Tebbe et al., 2009).

In vitro labelling: isobaric tags

Isobaric labelling involves attaching a module consisting of a reporter tag, balance group and peptide reactive group to N-termini of peptides and lysine amine side-chains after enzymatic digestion (Ross et al., 2004). The tags are isotopically balanced rendering peptides identical during MS1. During collision induced dissociation the tag is fragmented enabling the relative intensity of the reporter ions to be assessed in a single spectra and used for quantification (Ross et al., 2004; Zieske, 2006; Chen et al., 2007; Choe et al., 2007; Fenselau, 2007; Dayon et al., 2008). iTRAQ systems are available for application to four or eight samples, and the TMT system can label up to six different samples (Dayon et al., 2008). However, the total number of proteins identified and quantified tends to decrease with the number of samples compared (Pichler et al., 2010). Improved validation and coverage can be obtained by utilizing biological replicates across multiple injections (technical replicates) and normalizing across samples with a pooled reference (Gan et al., 2007; Song et al., 2008; Ow et al., 2009). Accuracy can also be improved by incorporating known ratios of spiked proteins with analyses and using strong statistical error models for evaluating and minimizing variation and compression in peak intensities (Karp et al., 2010). Isobaric labelling appears better than ICAT for analysing membrane proteins, and has been applied successfully to several extremophiles (Bisle et al., 2006; Williams et al., 2010a,b).

In vitro labelling: 16O/18O digestion labelling

Labelling of peptides with 16O (H216O) or 18O (H218O) is achieved by enzymatic incorporation of 18O into the C-terminus of peptides either during or after protein digestion (Yao et al., 2003; Miyagi and Rao, 2007). The enzymatic linkage of oxygen isotopes can reduce artefacts that occur during chemical labelling procedures, but the rate of incorporation can vary between peptides thereby complicating calculations of differential abundance (Johnson and Muddiman, 2004; Julka and Regnier, 2004; Miyagi and Rao, 2007; Ramos-Fernandez et al., 2007). This approach has been reported for investigating the surface proteome of the metal reducing bacterium Shewanella oneidensis MR-1 (Zhang et al., 2010).

In vivo labelling

The in vivo approach (also termed metabolic labelling), involves the incorporation of heavy (13C, 15N or 2H) or light (12C, 14N or 1H) isotopes from nutrients (e.g. 14N/15N derivatives of ammonia) into proteins during cell growth; the use of labelled amino acids is termed SILAC (Ong et al., 2002). Metabolic labelling enables complete incorporation of the isotope into proteins, does not require additional enzymatic or chemical labelling, and enables labelled biomass to be combined during the early stages of the experiment (just before or just after protein extraction) thereby maximizing consistency of treatments and minimizing the chance of introducing inter-sample variation caused by experimental error (Washburn et al., 2002a; Krijgsveld et al., 2003). A disadvantage is the need to grow cultures in the presence of labelled nutrients for at least seven rounds of cell division in order to achieve a theoretical incorporation of 99% (Ting et al., 2009), which can be assessed by measurement of atom percent excess (MacCoss et al., 2005). In addition, only pairwise comparisons can be performed compared with up to eight simultaneous comparisons with iTRAQ. A disadvantage specific to SILAC is that some isotopically labelled amino acids can be interconverted (e.g. arginine to proline) in vivo causing difficulties for interpreting abundance data (Engen et al., 2002; Ong et al., 2003a,b). This can be minimized by using amino acids that are at the end of a metabolic pathway (e.g. lysine, leucine, methionine, tyrosine and valine) (Engen et al., 2002). The use of a suitable labelling compound is crucial for the efficacy of metabolic labelling. For example, labelled ammonia cannot be used if the microorganism fixes nitrogen (thereby fixing 14N from the atmosphere), and amino acids cannot be used if the microorganism is prototrophic and does not use exogenous amino acids for growth.

Metabolic labelling has been successfully used for quantitative proteomic studies of bacteria (Conrads et al., 2001; Wang et al., 2002; Zhong et al., 2004) including a marine oligotroph (Ting et al., 2009; Ting et al., 2010), archaea (Andreev et al., 2006; Xia et al., 2006) and yeast (Washburn et al., 2002b; MacCoss et al., 2003; Zybailov et al., 2005; Zybailov et al., 2006; Usaite et al., 2008). Once metabolic labelling procedures are established for individual extremophiles (see for example, Ting et al., 2009) the benefits can flow onto studies examining physiological responses (see for example, Ting et al., 2010).

Post-experimental processing of data

In addition to processing samples competently using appropriate MSrs, the MS output needs to be computationally processed properly to obtain confident identification, quantification and statistical validation of proteins and their abundance levels (Venable et al., 2004). Concomitant with the technical advances occurring with MSrs, more requirements are being placed on the quality of experimental design, data acquisition, assignation of protein identities, database deposition and data processing (particularly normalization and statistical testing). Useful guides describing current acceptable practices can be found in the publishing requirements for Molecular and Cellular Proteomics (Carr et al., 2004; Celis, 2004) and Proteomics (Wilkins et al., 2006), and in publications by the Human Proteome Organisation (HUPO) (Kaiser, 2002; Taylor et al., 2006; 2007; 2008; Gibson et al., 2008).

A growing number of open source software packages for quantifying protein abundance are available, in addition to commercial software (examples are included in Table 1). Many software packages are specific to instrument types, file types or labelling strategy. This is due, in part, to the different combinations of labelling and separation techniques used, MSr types and manufacturers, and MSr configurations used, resulting in a large number of spectra types being generated (Cannataro, 2008). This is further complicated by the range of raw data file formats produced by MSrs, the diversity of approaches used by analytical programmes to perform protein identification and quantification and the variety of output formats generated during data processing and analysis (Keller et al., 2005).

Table 1.  Examples of software packages available for quantitative proteomics.
SoftwareMS instrumentationInput data formatLabelling strategyQuantification strategyReference
MSQuantQSTAR, QTOF, LTQ-FTMascot html result fileSILAC (14N/15N possible with extra software)Area under centroided MS1 XIC of isotopologuesSchulze and Mann (2004)
Trans-Proteomic Pipeline (TPP)MostMost files can be converted to mzXML using utilitiesStable isotope labellingSee below for examplesKeller and Shteynberg (2011)
ASAPRatioQTOF, LTQ, LTQ-FT, LTQ-OrbitrapmzXML, part of TPPStable isotopic labellingArea under MS1 XIC of isotopologuesLi et al. (2003)
MFPaQQSTARWeb-based application, Mascot result file and Analyst .wiff requiredICAT, SILACPeak intensity of MS1 isotopologuesBouyssie et al. (2007)
MaxQuantHigh resolution mass spectrometers (LTQ-Orbitrap)Xcalibur .raw fileStable isotopic labelling especially SILAC, label freeIntensity of MS1 3D-peaks (intensity vs. m/z vs. time) scans of isotopologuesCox and Mann (2008); Graumann et al. (2008)
RelExThermoFischer (.raw)DTASelect output files (.out)14N/15N metabolic labellingBackground-subtracted intensity ratio of MS1 XIC of isotopologuesMacCoss et al. (2003)
CensusLow and high-resolution mass spectrometersDTA Select output, mzXML, pepXMLLabel free, stable isotopic labelling, iTRAQMS1 based on RelEx, or MS2 from single-reaction monitoring (SRM) scans or iTRAQPark et al. (2008)
QNHigh resolution hybrid ion trap mass spectrometer; LTQ-FTSEQUEST data output14N/15N metabolic labellingArea under MS1 XIC of isotopologuesAndreev et al. (2006)
MSightApplied Biosystems, Brucker, Waters, ABI-SCIEX, ThermoFinnigan mass spectrometersmzXMLLabel free2D representation of LC-MS data, similar to 2DEPalagi et al. (2005)
APEXMostprotXML, part of TPPLabel freeAbsolute quantification by MS2 spectral counting, corrected by machine learning-based prior expectation of observing each peptide based on physicochemical propertiesLu et al. (2006)
MapQuantLTQ-FT.raw fileLabel freeMS1 scans are converted into 2D maps (RT vs. m/z). Quantification of 2D map features.Leptos et al. (2006)
PEPPeRLTQ-Orbitrap, LTQ-FTHigh resolution MS scansLabel freeSpectral feature quantification by landmark and peak matching of MS1, uses MapQuant as part of the PEPPeR pipelineJaffe et al. (2006)
MasicMost.raw, mzXML, mzData, CDF and MGF file pairsLabel freeAccurate mass and time tag quantification from MS1Monroe et al. (2008)
msInspect/AMTMostmzXML and pepXML, part of TPPLabel freeAccurate mass and time tag quantification from MS1May et al. (2007)
LTQ-iQuantLTQ, LTQ orbitrap, using pulsed Q dissociationmzXMLIsobaric taggingReporter ion intensity-dependant peptide weightingOnsongo et al. (2010)
ZoomQuantIon trap mass spectrometersDTA Select .out files18O labellingIsotopomer peak areas from MS1Halligan et al. (2005)
ProQuantQSTAR, QTrap; Applied Biosystems mass spectrometersRaw data filesiTRAQ (4-plex)Isotopomer peak area from MS2 of reporter ions (114-117 m/z for 4-plex)http://www.absciex.com/
ProteinPilotQSTAR, QTrap; Applied Biosystems mass spectrometersRaw data filesSILAC, isobaric taggingIsotopomer peak areas from MS1 for SILAC. Isotopomer peak area from MS2 of reporter ionshttp://www.absciex.com/
Multi-QApplied Biosystems, Brucker, Waters and ThermoFinnigan mass spectrometersmzXMLiTRAQPeak intensities of reporter ions in MS2Lin et al. (2006)
LibraMost, QTOF, QSTAR, TOF-TOFmzXML, part of TPPiTRAQPeak intensities of reporter ions in MS2Keller et al. (2005)
i-TrackerMost.dta or .mgf filesiTRAQArea under report ion peaks in MS2Shadforth et al. (2005)
MascotMost.dta or .mgf filesMostArea under MS1 XIC of isotopologues, area under report ion peaks in MS2 for isobaric tags, spectral counting.http://www.matrixscience.com/

Many factors in the experimental design can affect the types of analyses that can be performed and the extent to which biological interpretations can be made (Boguski and McIntosh, 2003; Rocke, 2004; Hu et al., 2005a; Hunt et al., 2005; Karp et al., 2005; 2007; Biron et al., 2006; Chich et al., 2007; Stead et al., 2008). To account for experimental variation it is important to replicate experiments. Biological and technical replicates need to be processed appropriately so that the inherent low variability between technical replicates (compared with biological replicates) does not compromise the assessment of the statistical significance of the data (Molloy et al., 2003; Karp et al., 2005; Chich et al., 2007). Furthermore, as proteomics experiments inherently include variations in sample preparation, LC separation, ionization efficiency and MSr performance, approaches have been developed for dealing with random and systematic variations by normalizing data and using appropriate significance tests (Pavelka et al., 2008; Ting et al., 2009).

Metaproteomics

In the environmental microbiology community it is well recognized that because a large proportion of environmental microorganisms have not been able to be cultivated, their functional roles and interactions within microbial communities remain largely unknown (VerBerkmoes et al., 2009). As inferences about microbial physiology and ecology are biased by studies from a very limited number of cultivated species (Wilmes and Bond, 2006), approaches, such as metaproteomics, have been developed to enable studies to be performed on microbial communities from the environment (Schloss and Handelsman, 2003). Sampling microbial populations from the environment allows for a larger range of individual microorganisms and their interactions to be studied.

To maximize the quality, quantity and representation of microorganisms in environmental samples (Maron et al., 2007), methods for protein extraction using direct lysis (Maron et al., 2006; Lefevre et al., 2007; Schmeisser et al., 2007) and indirect lysis (Ehler and Cloete, 1999; Maron et al., 2003; 2004; Mary et al., 2010) have been developed. Direct lysis methods have been applied to planktonic (Kan et al., 2005; Ng et al., 2010; Sowell et al., 2009; Lauro et al., 2010), biofilm (Ram et al., 2005), sediment (Ogunseitan, 1993), soil (Singleton et al., 2003; Benndorf et al., 2007; Bastida et al., 2009) and activated sludge (Wilmes and Bond, 2004) communities. The approach enhances protein recovery (compared with indirect methods) but its success is limited by the effects of interfering compounds (e.g. humic acids) in the samples (Bastida et al., 2009). Indirect methods involve isolating or enriching microorganisms from samples before protein extraction, and are particularly useful for samples with high levels of interfering compounds, such as soil and sediment samples (Ehler and Cloete, 1999; Maron et al., 2003; 2004). An approach using microwave cell fixation and high-speed flow cytometric sorting of cells has been developed (Mary et al., 2010) and may prove broadly useful for metaproteomic studies of planktonic communities.

An in vivo labelling strategy that should find good application to studies of extremophiles is protein-based stable isotope probing (Protein-SIP) (Jemlich et al., 2008), where labelled substrates (e.g. 13C benzene) are introduced to microbial communities with incorporation in proteins measured using MS. The approach enables the identification of active microorganisms and metabolic pathways, such as those involved in remediation of environmental contaminants (Jemlich et al., 2009). For an individual microorganism, Protein-SIP may also prove useful for measuring nutrient and pathway utilization under comparative growth conditions (Jemlich et al., 2010).

Metaproteomic coverage of individual microorganisms within a sample will be limited by the complexity of the community and relative abundance of individual proteins (VerBerkmoes et al., 2009). Coverage can be improved using high performance hybrid MS instruments such as FT-ICR-MS (Syka et al., 2004) and LTQ-FT-Orbitrap (Hu et al., 2005b), which enable high-mass accuracies to be achieved over large dynamic ranges of protein abundance down to the fmol level (Banfield et al., 2005).

Quantitative methods for evaluating metaproteomic data are in their infancy (Keller and Hettich, 2009), and semi-quantitative approaches (e.g. spectral counting) may prove useful (Morris et al., 2010). A capacity to perform quantification is likely to require MSr developments leading to increased dynamic range and speed of analysis (Keller and Hettich, 2009). The use of microchip techniques, which use polymer microfluidic devices coupled to LC-MS/MS, and which simplify sample preparation, improve protein identification and reduce the requirement for sample quantity, analysis time and cost (Eithier et al., 2006; Horvatovich et al., 2007; Srbek et al., 2007), may prove useful for metaproteomic studies in the future.

In metaproteomics, the ability to identify proteins is dependent upon the quality and extent of metagenome coverage (Keller and Hettich, 2009). The quality of genome annotation and protein predictions is important, with failure to predict open reading frames and identification of incorrect gene start sites precluding peptide identification (VerBerkmoes et al., 2009). When adequate metagenomic data are not available, de novo sequencing can be used. This has been successfully used for the temporal evaluation of proteins in a bacterial community following exposure to cadmium (Lacerda et al., 2007), and metaproteomic studies of microbiota in the human gastrointestinal tract (Klaassens et al., 2007; VerBerkmoes et al., 2009).

Similar to the way that genome sequence data of an individual microorganism enhance the capacity to identify proteins using MS, having matched metagenome and metaproteome data is not essential (Lacerda and Reardon, 2009), but greatly increases the chance to make confident protein identifications (VerBerkmoes et al., 2009). This is particularly the case if sample complexity is low and metagenome coverage is high, as was demonstrated for an acid mine drainage site (Ram et al., 2005) and the oxycline zone of an Antarctic meromictic lake, Ace Lake (Ng et al., 2010). In the acid mine drainage study, ∼48% of proteins from an individual member of the biofilm were identified (Ram et al., 2005). In the Antarctic study, ∼31% metaproteomic coverage was obtained, an in combination with nearly complete reconstruction of the genome for a psychrophilic green sulfur bacterium, insight was gained into the physiological traits that enable the bacterium to gain dominance under cold, nutrient-, oxygen-limited and extremely varied annual light cycles (Ng et al., 2010).

With greatly improved (and cost-effective) capacity to perform DNA sequencing, and improvements in MS, increasingly larger scale metaproteogenomic studies are being performed. These include studies of membrane proteins from South Atlantic marine communities where comparative metaproteomics of different sampling sites, along a natural gradient in nutrient concentrations, revealed shifts in nutrient utilization and energy transduction capacity of the microbial populations (Morris et al., 2010). In a separate study of Ace Lake, an integrative analysis of metaproteogenomic data from samples taken at six depths of the lake enabled the identity and functional capacity of microorganisms to be identified (Lauro et al., 2010). From these analyses the interactions between populations that fulfil nutrient cycling and that shape the evolution of the microbial communities, was able to be determined (Lauro et al., 2010). Based on the metaproteogenomic analyses, this study also developed amathematical model to describe the effects of environmental perturbations on the stability of the ecosystem. As extreme environments often have low microbial community complexity (Ram et al., 2005; Ng et al., 2010), which greatly enhances the ability to obtain useful protein coverage for representative species, there is excellent scope for applying metaproteomics to a broad range of extreme environments.

Perspective

An enormous scope exists for learning about environmental microorganisms through the application of proteomics. Microbial proteomic studies range from single time and condition proteome snapshots of the protein complement of an individual microorganism, to multiplex quantitative analyses of cellular responses of a single microorganism, through to metaproteogenomic assessments of microbial communities representing diverse ecosystems, and the probing of active remediatory community members using Protein-SIP. Extremophiles colonize a large range of natural, and artificially created habitats, having gained the specific mechanisms of adaptation required for colonization through billions of years of evolution. While modern microbiologists have learned that performing biological research means coming to terms with massive streams of DNA sequence data, fewer have developed an ability to generate and comprehend proteomic data. Proteomics is a multidisciplinary science requiring effective understanding and technical expertise in the chosen biology, and in MS and bioinformatics. Proteomic efforts for diverse extremophiles will become successful when biologists interface with mass spectrometrists who have a genuine interest in seeing MS data translated into real biological meaning. This path will likely necessitate a broad range of trouble shooting (e.g. sample preparation), personnel training (e.g. students trained in biology learning how to ‘fly’ a MSr), and fund raising (e.g. collaborative grants). Once a critical mass of expertise and a target-specific proteomic platform is established, a broad range of ecophysiological questions can be fruitfully addressed for essentially any extremophile of interest.

Acknowledgements

We acknowledge the reviewers of this manuscript for their useful critical appraisal of the original submission. Research performed in RC's laboratory is supported by the Australian Research Council.

Ancillary