Proteomics in evolutionary ecology: linking the genotype with the phenotype

Authors

  • ANGEL P. DIZ,

    1. Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidade de Vigo, 36310 Vigo, Spain
    Search for more papers by this author
  • MÓNICA MARTÍNEZ-FERNÁNDEZ,

    1. Unidad de Oncología Molecular, División Biomedicina Epitelial, Dpto. Investigación Básica, Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas (CIEMAT), 28040 Madrid, Spain
    Search for more papers by this author
  • EMILIO ROLÁN-ALVAREZ

    1. Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidade de Vigo, 36310 Vigo, Spain
    Search for more papers by this author

Emilio Rolán-Álvarez, Fax: 34 986812556; E-mail: rolan@uvigo.es

Abstract

The study of the proteome (proteomics), which includes the dynamics of protein expression, regulation, interactions and its function, has played a less prominent role in evolutionary and ecological investigations in comparison with the study of the genome and transcriptome. There are, however, a number of arguments suggesting that this situation should change. First, the proteome is closer to the phenotype than the genome or the transcriptome, and as such may be more directly responsive to natural selection, and thus closely linked to adaptation. Second, there is evidence of a low correlation between protein and transcript expression levels across genes in many different organisms. Finally, there have been some recent important technological improvements in proteomics methods that make them feasible, practical and useful to address a wide range of evolutionary questions even in nonmodel organisms. The different proteomic methods, their limitations and problems when interpreting empirical data are described and discussed. In addition, the proteomic literature pertaining to evolutionary ecology is reviewed with examples, and potential applications of proteomics in a variety of evolutionary contexts are outlined. New proteomic research trends such as the study of posttranslational modifications and protein–protein interactions, as well as the combined use of the different -omics approaches, are discussed in relation to the development of a more functional and integrated perspective, needed for achieving a more comprehensive knowledge of evolutionary change.

The publication of the human and other genomes during the last decade has demonstrated some key methodological advances that have contributed to the development of many disciplines such as taxonomy, phylogenetics, ecology and evolution (Vasemägi & Primmer 2005) and has accelerated biomedical research (Lander 2011). However, despite enormous investment in human genomics, a large gap still exists in predicting the phenotype from the genotype (Makowsky et al. 2011) and this has stalled the application of genomics to medical diagnosis and therapy (Chakravarti 2011; Marshall 2011). The need for understanding all steps in the decoding of the DNA message explains the interest in gene expression and its regulation, and therefore parallel progress to the genomic revolution has included transcriptome analysis (Ranz & Machado 2006; Stapley et al. 2010; Ozsolak & Milos 2011), probably because both genomics and transcriptomics use similar experimental approaches and methods (Ekblom & Galindo 2011). However, gene expression does not end at the transcriptomic level, and a detailed understanding of the formation of the phenotype would require the study of all the steps during gene regulation and their final products at the proteome level (Feder & Walser 2005; Karr 2008; see Fig. 1). Advances in the quantification and identification of proteins have allowed a technical revolution that expands the research possibilities and designs in biology (Thiellement et al. 1999; Anderson et al. 2000; Gouw et al. 2010). Such advances, however, have been rarely exploited in the field of evolutionary ecology, although they have been made use of other areas of biology for more than a decade (reviewed in Thiellement et al. 1999; Anderson et al. 2000; Madi et al. 2003; Patterson 2003). A recent study has emphasized the need for incorporating proteomic methods in ecology or population studies (Biron et al. 2006), but reviews on proteomics (or proteomic applications) have focused on more specific fields, such as those pertaining to species-specific barriers to fertilization (Karr 2008; Findlay & Swanson 2010), phylogeny (Navas & Albar 2004; Karr 2008), phenotypic plasticity (Aubin-Horth & Renn 2009), aquatic toxicology (Sanchez et al. 2011), marine organisms (López 2007; Fornéet al. 2010; Tomanek 2011), snake venom (Calvete et al. 2007b; Fox & Serrano 2008; Georgieva et al. 2008), plants (Thiellement et al. 1999; Zivy & de Vienne 2000; Thiellement et al. 2002; Edwards & Batley 2004), microorganisms (Texier et al. 2005; Thomas et al. 2007; Jerez 2008) and human populations (Nedelkov 2008). In this review, we will focus specifically on the proteomic applications related to evolutionary and molecular ecology. The first section will emphasize why it is necessary to incorporate studies at the proteome level. The second section will summarize different techniques and methods available to undertake proteomic studies and will point out some practical considerations when designing and interpreting proteomic experiments and data. The third section will examine the existing proteomic literature on evolutionary ecology and potential applications of proteomics in this field.

Figure 1.

 Possible regulation steps causing discrepancies between the expression of a mRNA and its corresponding protein. First, the genome has to be ready to be transcribed. After transcription, several processes can change the levels of such an mRNA, or even its translation. Once the protein is synthesized, its expression levels can vary by interacting with other proteins or suffering different posttranslational modifications (PTM), and its average half-life can be also modified according the specific needs of the moment (see further details in Feder & Walser 2005). In addition to these mechanisms, developmental processes and environment may cause proteomic differentiation across tissues and organs, necessary to understand the full phenotype complexity. Therefore, depending on which level we were working in, different relationships with the phenotype will be found. For simplicity, the potential influence of nongenetic (although inherited) mechanisms onto the phenotype (epigenetic, paternal effects, ecological inheritance, cultural inheritance, etc.) have not been fully considered in this figure (reviewed in Danchin et al. 2011). (*) Notice that posttranscriptional modifications could be produced at both nuclear and cytoplasmic levels.

Why is a proteomic perspective needed?

The term ‘PROTEOME’ was launched and defined in the 1990s to establish an equivalent concept to ‘genome’ and is defined as the full complement of PROTEins expressed by the genOME of one organism, tissue or cell at a specific time (Wilkins et al. 1995). The proteome is much more dynamic than the genome, i.e. changing between different somatic cells or tissues of an organism, sometimes in response to different stimuli, stress or diseases. Likewise, the term ‘proteome’ led to a new research field called proteomics, defined as the high-throughput study of the proteome, including protein quantification, protein–protein interactions, posttranslational modifications (PTMs) and protein function. Proteomic techniques and methods allow researchers to visualize, compare and identify thousands of proteins in a single experiment, either following an exploratory or a hypothesis-driven strategy. The proteomic field is complementary to genomics in so much as it provides additional information and direct evidence about gene expression and its regulation in different tissues and/or cell types at different times (Fig. 1). For example, proteomics yields quantitative information about expressed proteins and PTMs that cannot be deduced from genomics (Jensen 2006; Schrimpf et al. 2009).

There are some strong arguments to include the proteome level when studying gene expression and regulation in an evolutionary context. One of these arguments is based on growing evidence that studies on the transcriptome or the genome may be insufficient for understanding the phenotype because there is a lack of convergence between the proteome and the transcriptome (Gygi et al. 1999; Mack et al. 2006; Waters et al. 2006; de Godoy et al. 2008; Irmler et al. 2008; Maier et al. 2009; Rossignol et al. 2009; Schrimpf et al. 2009; Zhong et al. 2009; but see Schwanhaüsser et al. 2011). On average, the Pearson correlation coefficient between some proteins and their respective mRNAs levels was reported to be around 0.5 in different studies (reviewed in Feder & Walser 2005; Maier et al. 2009). Moreover, a number of studies comparing proteomic and transcriptomic data observed that a considerable percentage of proteins detected did not show their corresponding transcripts (Maziarz et al. 2005; Mack et al. 2006). Similarly, the analysis of the Caenorhabditis elegans proteome by mass spectrometry identified several new proteins predicted from the genome that were not observed in the transcriptome (Schrimpf et al. 2009; but see Irmler et al. 2008; Napolitano et al. 2008). Transcription/translation discrepancies were initially explained owing to variability associated with technical problems in measurement of transcripts and proteins (see Maier et al. 2009), although recent studies show this source of variation is not enough to explain the large discrepancies reported and they are usually interpreted as biologically relevant (Waters et al. 2006; Wang 2008; Maier et al. 2009). These observed discrepancies could be explained if the regulation of gene expression and protein abundance in cells occur separately (Waters et al. 2006; Irmler et al. 2008; Service 2008) or if nongenetic inheritance processes influence the proteome and transcriptome differentially (Danchin et al. 2011). In addition, they might serve as a potential source of information for understanding regulatory steps within biochemical networks (Wang 2008 and references therein), thus contributing to a more integrated view of biological processes. For example, Li et al. 2011 demonstrated that some stretches of mRNA sequences obtained from human B cells did not match their corresponding exonic sequences at the DNA level. Far from being any kind of artefact (but see comment from Hayden 2011), the results were validated by detecting the expression of the discordant RNA sequences through analysis of their corresponding peptides and analysis of other tissues. Demonstrating that we are a long way from understanding highly complex biological processes, but by incorporating different -omic methods in future work, we will gain a better understanding of regulatory steps within biochemical pathways and many biological processes in general. Another interesting reason to include a proteomic analysis in some experiments is the possibility to find new genes from sequenced genomes that were not recognized by automatic annotation methods. For instance, Findlay et al. 2009 used a shotgun proteomic approach and discovered 19 new genes with evolutionary interest from a genome of a model organism (Drosophila) that had not yet been annotated.

A second argument for including the proteome in gene expression studies is that evolutionary factors, for example natural selection, are expected to act differentially at the distinct -omic levels. In recent years, many authors have claimed that changes at the level of the phenotype, irrespective of the genotype, may play an underappreciated role in driving ecological diversification and speciation (i.e. owing to phenotypic plasticity; reviewed in Pfennig et al. 2010). Therefore, even though from an evolutionary viewpoint, we expect that the ultimate changes will occur at the DNA level (Futuyma 2009; but see Danchin et al. 2011), it has been emphasized that natural selection acts directly at the phenotypic level during organism adaptation (Doolitle & Sapienza 1980; Khaitovich et al. 2004). This gives an obvious reason to focus on the proteome, because the phenotype of an organism is more directly related to protein quantity and function than to variability at the DNA sequence level (Feder & Walser 2005; Karr 2008; Futuyma 2009). This argument can be better explained with an example: A single-base pair substitution in a Pseudomonas fluorescens strain has enabled it to use a new ecological niche, unlike the ancestral strain (Knight et al. 2006). However, when the authors compared levels of protein expression between both strains, they observed differences in expression patterns affecting 52 proteins. It is obvious that the ultimate cause was the specific mutational change of the DNA, but such information is of little evolutionary value if we cannot understand the biochemical and physiological mechanisms that produce the adaptive change in the organism. Thus, quantitative studies at the proteome level together with the study of protein PTMs are required (in addition to other regulatory levels) to capture the full biological meaning of any biodiversity pattern (see Thiellement et al. 1999; Feder & Walser 2005; Jensen 2006). In addition, it has been also experimentally demonstrated that several biological processes impact differently at distinct omic levels, e.g. a higher correlation of protein expression levels than mRNA abundances was observed between genes across species (Schrimpf et al. 2009; Laurent et al. 2010). In fact, protein correlation between taxa was higher than correlation between protein and mRNA expression within species (Schrimpf et al. 2009), whilst in other studies, proteomic differences were detected between subpopulations or groups without corresponding transcriptomic effects (Maziarz et al. 2005; Mack et al. 2006; Zhong et al. 2009; Rees et al. 2011). All these patterns suggest, in our opinion, that the proteome level is closer to the phenotypic level, where natural selection is acting, affecting more directly protein variation through balancing, divergent or directional selection.

Finally, as we briefly introduce in the next section, proteomic technology has evolved considerably during the last decade, having available now a battery of methods useful for different objectives. For example, these range from inexpensive 2-DE methods to separate and analyse quantitatively the global protein expression patterns to address simple questions, to the most sophisticated and expensive gel-free mass spectrometry (MS) methods to separate, quantify and identify the proteome of different samples in a highly automated and high-throughput fashion.

How to undertake a proteomic study

Technological and methodological advances in proteomics

Several advances have been key to the development of proteomics in the last two decades, such as the initiation of large-scale genome sequencing projects, the rapid growth of nucleotide and protein public databases, the advances in the separation of proteins by two-dimensional gel electrophoresis in polyacrylamide gels (2-DE) or separation of peptides by liquid chromatography (LC) and, finally, the advances in the field of MS that have allowed a general improvement of protein identifications (reviewed in Westermeier et al. 2008). The proteomic analysis workflow has three main steps (Fig. 2): it starts with the extraction of the protein content from the tissue/cells of samples under study in an appropriate lysis buffer (reviewed in Görg et al. 2004). This is followed by a protein/peptide separation method (2-DE or LC) and usually continued by a MS analysis for protein identification (see Box 1 and Fig. 1 for further technical details). It is important to note that, before the MS analysis, proteins can be either digested usually with trypsin enzyme to produce peptides (bottom-up analysis) or analysed without any modification (top-down analysis). The bottom-up approach is the most frequent method of choice because of some technical advantages (detailed in Steen & Mann 2004 and Yates et al. 2009) (Fig. 2).

Figure 2.

 Basic workflows in Proteomics. Proteomic experiments start with a protein extraction step from the organism/tissue under study (A). Then, two different workflows can be followed. The two-dimensional electrophoresis (2-DE) coupled to mass spectrometry approach (connected by red arrows from B to G) or the gel-free LC coupled to mass spectrometry (LC-MS; connected by green arrows from D2 to G). This latter workflow provides the keystone of current shotgun proteomic approaches (Motoyama & Yates 2008). Proteins can be then separated by two-dimensional electrophoresis (2-DE) and protein spots visualized through protein staining/labelling methods (B). 2-DE gel images from different samples are quantitatively analysed. A list of protein spots that are up/down regulated in control vs. treatment groups is finally provided (C). Individual protein spots are digested (D1), and resulting peptides are analysed by mass spectrometry (MS) for protein identification (see F-G). Alternatively, proteins are trypsin digested (D2) and separated by an n-dimensional LC step (E) that helps to reduce peptide complexity for further MS analysis. Peptides are ionized and their mass-to-charge (m/z) recorded after MS analysis, to finally provide MS spectra. Ionized peptides can be also fragmented, then producing MS/MS spectra (F and Box 1). Proteins are identified by comparing empirical vs. theoretical spectra obtained from protein databases by using bioinformatics tools (G), a common step for both workflows. Data analysis provides scores and p-values to assess the confidence of the different protein identifications (see main text and Box 1). (*) Please note that this figure is incomplete as the different gel-free quantitative proteomics methods (i.e., workflow with the green arrows) are not depicted, although interested readers can see very informative schemes/figures in Bantscheff et al.’s 2007 and Nesvizhskii et al.’s 2007 papers. It is noteworthy to mention that shotgun proteomic approaches have successfully used hybrid approaches to separate proteins/peptides, i.e. combining both gel and gel-free (LC) methods, before high-throughput MS analyses (see Motoyama & Yates 2008).

Table Box 1. Proteomics methodologies
In proteomics, several methods are used to separate proteins (and/or peptides) before their identification by mass spectrometry (MS; see also Fig. 2). These methods are mainly based on in-gel or gel-free approaches, each method having its own weaknesses and strengths (see details in the different reviews provided later). Two-dimensional electrophoresis (2-DE) and LC have been the two main methods commonly used. Protein identification is achieved by the analysis of protein/peptide samples in MS devices. Some key proteomic approaches and terms will be defined briefly later.
  Two-dimensional electrophoresis (2-DE) This technique allows separation of complex mixtures of proteins according to their isoelectric point and molecular weight (reviewed in Görg et al. 2004; Rabilloud et al. 2010), followed by quantification through the use of different staining methods (Miller et al. 2006) and computer analysis of the protein spot patterns obtained. It is usually coupled to MS analysis to identify the protein spots of interest. An improvement of this technique is DIGE (difference gel electrophoresis; see Timms & Cramer 2008).
  Liquid chromatography This separation method is coupled to MS analysis (LC-MS) and represents the base technology of different gel-free proteomics approaches [see multidimensional protein identification technology (MudPIT) later]. This methodology has been traditionally associated with high-pressure liquid chromatography (HPLC), usually coupled to MS instruments with an electrospray ionization (ESI) source (Steen & Mann 2004; Yates et al. 2009).
  Multidimensional protein identification technology This is based on the orthogonal combination of several separation techniques, as an alternative to a previous protein fractionation step (Steen & Mann 2004; Motoyama & Yates 2008). Eluted peptides from LC are then introduced and analysed in a mass spectrometer, usually connected online, which allow automation of the whole procedure (i.e. protein separation and identification in the same run).
  Shotgun proteomics This represents a step forward in the LC-MS approach (see Motoyama & Yates 2008), implying a high-throughput analysis to achieve protein identification and quantification from complex protein samples using uni- or multi-dimensional LC to separate peptides previously digested by trypsin, and their analysis by tandem mass spectrometry (MS/MS) (reviewed in Nesvizhskii et al. 2007).
  Mass spectrometry The analytical technique that separates and quantifies ions on the basis of their mass-to-charge (m/z) ratios by means of a mass spectrometer providing MS spectra as a result (reviewed in Domon & Aebersold 2006; Yates et al. 2009). A mass spectrometer consists of three parts used sequentially: an ion source, a mass analyser and a detector. It can give both qualitative and quantitative information about its analytes. The MS spectra obtained are then analysed by specific software based on different algorithms for protein identification, whilst statistical analysis provides scores and p-values to assess the confidence of protein identification. Protein identifications can be based on MS spectra (precursor ions) and/or MS/MS spectra (product ions).
  MS spectrum Usually represented by a vertical bar graph. Each ion is represented as a bar with a specific mass-to-charge ratio (m/z) and its length indicates its relative abundance. It reflects the analysis of precursor ions (i.e. previously ionized peptides).
  MS/MS spectrum Also known as tandem mass spectrum and represented in a similar way to the bar graph of MS spectra. The difference is that the information represented in this spectrum is the result of one peptide (i.e. precursor ion) fragmentation along its backbone, which is physically isolated and collided with an inert gas in a collision cell situated between both mass analysers, generating product ions and their corresponding MS/MS spectra. These MS/MS spectra allow for the deduction of peptide sequences and facilitate the protein identification (reviewed in Steen & Mann 2004; Domon & Aebersold 2006).
  Peptide mass fingerprinting Single proteins are converted to peptides, usually by trypsin digestion and, after MS analysis, a MS spectrum provides an extremely specific fingerprint that often allows the protein identification (Pappin et al. 1993). This is achieved by matching the experimental vs. theoretical (from protein databases) peptide masses obtained in MS analysis. Because peptide mass fingerprinting (PMF) requires purified proteins, this MS analysis is usually combined with a previous protein separation by 2-DE. Recently, use of this method alone for protein identification has been questioned (see Mann & Kelleher 2008).
  De novo sequencing This is an alternative approach to infer the polypeptide sequence directly from the MS/MS spectra without the help of sequence database searches by following some basic rules on the chemistry of proteins (Steen & Mann 2004). This approach can be useful to identify peptides from new protein variants. There is also a hybrid approach based on the extraction of short sequence tags followed by an error-tolerant database searching (see Nesvizhskii et al. 2007; Junqueira et al. 2008).

Proteomics also provides a means to study the quantitative changes at the protein level. For example, there are research studies where the main interest is to compare a number (n) of samples from control and treatment groups (i.e. samples from ecotype A vs. ecotype B, species A vs. species B, clean vs. polluted environment, etc.) to measure the protein concentrations for a range of proteins under study across samples. The subsequent identification of candidate proteins can be carried out by MS analysis. In circumstances where protein identifications might be a difficult task (e.g. in nonmodel organisms), it is still possible to obtain valuable quantitative information to test evolutionary hypotheses (see e.g. Aquadro & Avise 1981; Blank et al. 2005; Diz & Skibinski 2007; Martínez-Fernández et al. 2010b). There are two main quantitative proteomic approaches, i.e. relative (differing technically between 2-DE and LC-MS methods) and absolute, mainly used in LC-MS methods. Relative quantification is used in 2-DE analysis, where protein spots are visualized through different stains allowing their matching and quantification by specific software (see Görg et al. 2004; Miller et al. 2006; Westermeier et al. 2008). Statistical analyses are carried out over normalized protein spot intensities across different 2-DE gels, and candidate protein spots up/down regulated are determined. On the other hand, either relative or absolute quantitative approaches are used in gel-free MS-based (shotgun) proteomics. In the relative quantification, samples are either labelled with stable isotopes or analysed following some other label-free quantification methods (see Yates et al. 2009). The relative ratio based on peptide abundances is empirically determined, i.e. comparing peptide pairs labelled with heavy (control) vs. light (treatment samples) isotopes to allow final relative protein quantifications through statistical analysis, whilst label-free quantification methods are based on spectral counts obtained from MS analysis (Bantscheff et al. 2007; Yates et al. 2009). Absolute measurements of proteins are feasible when synthetic peptides of known concentration are added to the samples under study. It should be mentioned at this point that quantitative proteomics approaches based on 2-DE coupled to MS and gel-free MS (shotgun) approaches provide complementary information (e.g., see Monteoliva & Albar 2004; Schmidt et al. 2004; Jungblut et al. 2010; van Cutsem et al. 2011).

Proteomics in nonmodel organisms

Proteomic experiments and projects rely heavily on whether the genome of the organism under study has been sequenced and annotated in public databases. Such organisms can be referred to as model organisms (Vasemägi & Primmer 2005). The information from public databases is then used and makes protein identifications by MS analysis feasible. This is especially true in ambitious proteomic projects where the main aim is to describe and characterize the proteome of a whole organism, cell or tissue. There are several successful examples of shotgun proteomic studies in a variety of model organisms where a high number of proteins were identified (e.g. de Godoy et al. 2008; Schrimpf et al. 2009; Zhong et al. 2009). However, we would like to emphasize at this point that proteomic projects should not be restricted to model organisms and that interesting questions and new hypotheses can be addressed from proteomic studies in organisms with unknown genome sequences (nonmodel organisms; examples reviewed in Jerez 2008; Karr 2008; Sanchez et al. 2011; Tomanek 2011). Protein identifications by MS from a nonmodel organism are based on the partial alignment of analysed proteins to database sequences from model organisms, i.e. cross-species protein identification by homology. The success rate of cross-species protein identifications depends on several factors, such as the phylogenetic distance between the sequenced and nonsequenced species, and the MS quality for each analysed protein (Liska & Shevchenko 2003). To achieve success in cross-species protein identifications, it is usually sufficient that only a few peptides are highly homologous (and not the whole protein). To increase the success rate in protein identifications of nonmodel organisms, de novo sequencing (or similar) approaches from MS data have been used (see Box 1). Although there are different software programs to carry out de novo sequencing, the success of this method depends on getting high quality MS data and the necessary expertise in the very time-consuming interpretation step that precludes its widespread use (reviewed in Steen & Mann 2004; Nesvizhskii et al. 2007). The benefits of using a combined approach have been also reported (see Junqueira et al. 2008 for further details and discussion). A good example of the application of these techniques in protein identification has been provided in a study of hibernation in the ground squirrel (Spermophilus tridecemlineatus; Russeth et al. 2006).

A new approach to aid in protein identification in nonmodel organisms is to obtain a full-length cDNA library from the organism, or even better from the tissue under study, to make a specific database (then translating the set of cDNAs to protein sequences) to be used in the protein identification step by MS (Bouck & Vision 2007; Findlay & Swanson 2010). Actually, the newest methodological advance in this direction consists of the deep sequencing of RNA (RNA-Seq) that is allowing the identification and quantification of the transcriptome, including those of nonmodel organisms (Wang et al. 2009b and reference therein). In a recent shotgun proteomic study, this approach was successfully used to identify and quantify 882 proteins in the marine invertebrate Bugula neritina that provides an important starting point for understanding the molecular mechanisms involved in larval biology, development and antifouling (Wang et al. 2010).

Practical considerations in proteomic studies

Proteomic experimental designs have gained some input from the transcriptomics field. Nevertheless, each -omic approach has its own features, so that the rationale behind the different experimental designs and the underlying assumptions of statistical tests need to be checked specifically for proteomic experiments. Although some of the points discussed later could be perhaps perceived as quite trivial, finding situations where they are overlooked or misunderstood is quite common (see Table 1 and corresponding section). First, the importance of including both biological and technical replication in any proteomic experiment should be emphasized (Hunt et al. 2005; Karp et al. 2005). Biological variability, because of genetic or environmental factors, is a natural feature of all organisms that need to be accounted for when assessing the significance of differences between experimental groups (e.g. different species or populations) under study. On the other hand, technical replicates enable us to measure of technical/experimental noise and comparison with the biological variation of samples under study. The preference of biological over technical replicates has been discussed both in proteomic (Hunt et al. 2005; Karp et al. 2005; Horgan 2007; Karp & Lilley 2007) and microarray work (Kendziorski et al. 2005).

Table 1.   Summary of proteomic studies with an evolutionary ecology perspective from a literature search. The species are denominated with the taxonomic name, or exceptionally by using a group name when including several genera. ‘Tissue’ describes the cellular type used in the proteomic study. ‘Separation’ is the method to separate proteins: one dimensional electrophoresis (1-DE), two-dimensional electrophoresis (2-DE, including DIGE) or any variant of liquid chromatography (LC). ‘Identification’ describes protein identification methods: peptide mass fingerprinting (PMF); two steps of mass spectrometry analysis (MS/MS) and de novo peptide sequencing (de novo). ‘Objective’ uses different keywords to describe the main objective (several keywords separated by a coma can be used for the same study). Notice that when different studies used the same species, their specific characteristics listed in the different columns are separated by a semicolon in the corresponding order as to the references given
SpeciesTissueSeparationIdentificationObjectiveReferences
  1. (#)several rodent species from different genus were studied; (*) Edman degradation method was also used for protein identification.

Bryophits
 Physcomitrella patensGametophores2-DEMS/MSAdaptationWang et al. (2008c)
Plants
 Brassica oleraceaLeaves and stems2-DENonePolyploidizationAlbertin et al. (2005)
 Phragmites communisLeaves2-DEMS/MSAdaptationCui et al. (2009)
 Oryza sativaSeedlings2-DEMS/MSAdaptationPandey et al. (2010)
 Lycopersicon esculentumFruits2-DEPMF + MS/MSAdaptationRocco et al. (2006)
 Lens culinarisSeeds2-DEPMF + MS/MSVariability, PhylogeneticScippa et al. (2010)
 Oryza sativaSeeds2-DEPMFVariability, heterosisWang et al. (2008b)
 Cicer arietinumSeedlings/Cell wall fraction2-DEPMF + MS/MSVariabilityBhushan et al. (2006)
 Imperata cylindricaLeaves1-DE + LCMS/MSAdaptationChang (2008)
 Pachycladon spp.Leaves2-DE + LCMS/MSAdaptationVoelckel et al. (2010)
 Populus tremula spp.Leaves2-DEPMFAdaptationRenault et al. (2004)
 Triticum spp.Endosperm2-DEPMF + MS/MSAdaptationSkylas et al. (2002)
Fungi
 Batrachochytrium dendrobatidisWhole2-DEMS/MSAdaptationFisher et al. (2009)
Bryozoa
 Bugula neritinaLarvae2-DEPMF + MS/MSVariability, ontogenyThiyagarajan et al. (2009)
Worms and parasites
 Schistosoma spp.Sporocysts; Various2-DE; LCMS/MSAdaptation, coevolutionRoger et al. (2008) and Liu et al. (2006)
 Spinochordodes telliniiWhole2-DEPMFCoevolutionBiron et al. (2005)
 Caenorhabditis elegansMitochondria; sperm and oocytesLCMS/MSVariability, speciationLi et al. (2009) and Chu et al. (2006)
 Marenzelleria spp.Metameres2-DENoneVariability, adaptationBlank et al. (2005)
Molluscs
 Haliotis spp.Vitelline envelope1-DE; LCMS/MSSpeciation, phylogeneticAagaard et al. (2006, 2010)
 Littorina saxatilisWhole2-DEMS/MS + de novo; noneAdaptation, speciationMartínez-Fernández et al. (2008, 2010b)
 Mytilus spp.Foot; eggs; foot2-DENone; MS/MS; PMF + MS/MS + de novoVariability, adaptation, speciationDiz & Skibinski (2007), Diz et al. (2009a) and López et al. (2002)
Barnacles
 Balanus amphitriteLarvae2-DEPMF + MS/MSVariability, ontogenyThiyagarajan et al. (2009)
Skorpions
 Heterometrus petersiiVenomLCMS/MSVariabilityMa et al. (2010)
Insects
 Meconema thalassinumBrain2-DEPMFCoevolutionBiron et al. (2005)
 Acyrthosiphon pisumSalivary glands2-DE + LCPMF + MS/MSCoevolution, phylogeneticCarolan et al. (2011)
 Allonemobius spp.Spermatophores and thorax2-DEPMF + MS/MSVariability, speciationMarshall et al. (2011)
 Gryllus spp.Seminal fluidLCMS/MSVariability, speciationAndrés et al. (2008)
 Myzus persicaeWhole2-DEMS/MSCoevolutionFrancis et al. (2006)
 Cimex lectulariusSperm and seminal fluid containers; Salivary glands2-DE; 1-DEMS/MSVariability, speciation, coevolutionReinhardt et al. (2009) and Francischetti et al. (2010)
 Afrocimex constrictusSperm and seminal fluid containers2-DEMS/MSVariability, speciationReinhardt et al. (2009)
 Apis melliferaLarvae; Midguts; Whole; SpermLC; LC; LC; 2-DEMS/MS; MS/MS; MS/MS; MS/MSVariability, ontogeny, adaptation, variability, ontogeny, variability, ontogeny speciationKamakura (2011), Parker et al. (2010), Wolschin & Amdam (2007) and Poland et al. (2011)
 Polistes metricusLarvae1-DE + LCMS/MSVariability, OntogenyHunt et al. (2010)
 Vespa spp.Trophallactic fluid1-DE + LCMS/MS + de novoAdaptationRoskens et al. (2010)
 Anopheles stephensiSalivary gland1-DE(*)CoevolutionValenzuela et al. (2003)
 Drosophila spp.Seminal fluid; Sperm; Seminal fluid; Reproductive tissue1-DE + LC; 2-DE + LC; LC; 2-DE + LCMS/MS; PMF + MS/MS(*); MS/MS; MS/MSVariability, speciationKelleher et al. (2009), Dorus et al. (2006), Findlay et al. (2008) and Mack et al. (2006)
 Stomoxys calcitransSalivary gland1-DE(*)CoevolutionWang et al. (2009a)
 Helicoverpa armigeraLarvae (epidermis)2-DEPMF + MS/MSVariability, ontogenyFu et al. (2009)
 Heliconius spp.SpermatophoreLCMSVariability, speciationWalters & Harrison (2010)
Fishes
 Sparus aurataOocytes2-DE + LCMS/MSVariability, ontogenyZiv et al. (2008)
 Danio rerioOocytes2-DE + LCMS/MSVariability, ontogenyZiv et al. (2008)
 Carassius auratusLiver2-DEPMF + MS/MSAdaptationWang et al. (2008a)
Frogs
 Xenopus laevisOocytes, eggs and embryosLCMS/MSVariability, ontogenyMcGivern et al. (2009)
Reptiles
 Atropoides spp.Venom1-DE + LCPMF + MS/MS (*)Variability, adaptation, speciationAngulo et al. (2008)
 Bitis spp.Venom1-DE + LCPMF + MS/MSVariability, adaptation, speciationCalvete et al. (2007a)
 Crotalus spp.Venom1-DE + LCPMF + MS/MS (*)Variability, adaptation, speciationCalvete et al. (2010)
 Bothriechis nigroviridisVenom1-DE + LCMS/MS (*)Variability, adaptation, speciationFernández et al. (2010)
 Bothrops spp.Venom1-DE + LC; 2-DEPMF + MS/MS (*); MS/MSVariability, adaptation, speciation, variabilityGutiérrez et al. (2008) and Serrano et al. (2005)
 Bothrops atroxVenom1-DE + LCPMF + MS/MS (*); PMF + MS/MSVariability, ontogenyNúñez et al. (2009) and Calvete et al. (2011)
 Bothrops spp.Venom1-DE + LCPMF + MS/MS (*)Variability, adaptation, speciationGutiérrez et al. (2008)
 Bothrops atroxVenom1-DE + LCPMF + MS/MS (*)Variability, ontogenyNúñez et al. (2009)
 Bothrops asperVenom1-DE + LCPMF + MS/MS (*)Variability, ontogenyAlape-Girón et al. (2008)
 Crotalus atroxVenom2-DEMS/MSVariabilitySerrano et al. (2005)
 Micrurus surinamensisVenom2-DE + LCPMF + MS/MS (*)VariabilityOlamendi-Portugal et al. (2008)
 Lachesis spp.Venom1-DE + LCPMF + MS/MS (*)Variability, adaptation, speciationSanz et al. (2008)
 Sistrurus spp.VenomLCNone; PMF + MS/MSVariability, adaptation; variability, adaptation, speciationGibbs et al. (2009) and Sanz et al. (2006)
 Crotalus durissusVenom1-DE + LCPMF + MS/MS (*)VariabilityBoldrini-França et al. (2010)
Mammals
 Mus spp.Reproductive tissues; SpermLC; LCMS/MS; MS/MSVariability, speciationDean et al. (2009) and Dorus et al. (2010)
 Rattus norvegicusBlood platelets1-DE + LCMS/MSVariabilityYu et al. (2010)
 Muroid rodents(#)Seminal vesicle secretion1-DE + LCPMF, MS/MS + de novoVariability, speciationRamm et al. (2009)
 Homo sapiensBlood platelets1-DE + LCMS/MSVariabilityYu et al. (2010)

Another relevant decision deals with the choice of using individual (specimens or tissues) or pooled samples in biological replicates. Pooling is preferable when there is a limitation in the available quantity of protein from each individual sample to make biological replicates, whilst this approach might not be desirable when the use of individual information can be biologically relevant in itself (see Karp et al. 2005; Crawford & Oleksiak 2007). Pooling specimens in a sample increases the statistical power of the test because of the reduction in the measured biological variation within groups (reducing theoretically the within group variance, i.e. variance/number of specimens pooled; Sokal & Rohlf 1995). A recent empirical study investigated the consequences of sample pooling in proteomics, and it was concluded that pooling can be a valid approach (Diz et al. 2009b). However, when pooling, uncontrolled confounded effects within a group could potentially bias the experiment. For example, sexual differences in expression in a species using pooled replicates could be misleading if pools within sexes present different age classes (and if ageing shows strong differences in protein expression in same proteins that also affect sexual differentiation). To minimize this and/or similar bias, the pooling should be carried out on a moderate to high number of specimens, in case a randomization of individuals to control for the confounding factor is not feasible. A technical but relevant aspect of pooling is that proteins should be extracted from specimens or individual tissues, and then pooled a posteriori, by adding the same protein amount from each individual sample into the pool (see Kendziorski et al. 2005). Such a protocol would ensure that the different pools will be true biological replicates.

Concerning data analysis, proteomics has in common with transcriptomics and genomics the problem of multiple hypothesis testing, where the type I error controlled in an individual test is wrongly estimated in a family of tests, hence increasing the number of false positives. There are a wide range of statistical methods that have been developed to deal with this problem, although this type of correction has rarely been applied in quantitative proteomics work (reviewed in Diz et al. 2011). Perhaps one main reason for multiple testing corrections being so rarely used in past experimental studies was their traditional low statistical power, although there are now a number of multiple testing correction alternatives that give considerable statistical power to detect significant effects keeping a reasonable low rate of false positives, e.g. the sequential goodness of fit (SGoF) method (Carvajal-Rodríguez et al. 2009). There is also a need to develop new software tools for dealing with some statistical constraints or specific needs in the analysis of proteomic data (Nesvizhskii et al. 2007). A similar problem arises from the overwhelming number of protein annotations and available information throughout different databases (including cross-links between them), which have been organized with compatible/standard formats that were not always useful. Fortunately, in recent years, the different software and databases are now following international standards (e.g. see UniProt project: http://www.uniprot.org, and Gene Ontology project: http://www.geneontology.org).

Finally, another important concern in proteomics is how to deal with the high complexity of proteomes. Some new strategies have been proposed to minimize this problem. One is to apply a protein fractionation step to facilitate downstream protein identification and quantification, especially facilitating the access to low-expressed proteins. Another possibility is to analyse subproteomes (cellular, organellar or tissue proteomes), instead of full proteomes of one organism. This is providing very promising results in terms of protein identifications, quantification, location within the cell and understanding of basic processes in cell biology and, finally, for the whole organism under study (see Walther & Mann 2010). We believe that in the future, there will be a demand for a higher effort in using proteomics in an evolutionary context for isolated tissues, as the results observed will be easier to interpret in functional terms (see e.g. Rees et al. 2011).

Interpreting proteomic data and complementary approaches

There are some limitations in the interpretation of proteomic data from some experiments that might not be immediately obvious. For example, there are many quantitative (often exploratory) studies reporting up/down regulated proteins in literature (see Table 1), e.g. comparing control vs. treatment samples, providing useful targets that need to be further investigated with other complementary -omic methods. To exemplify the need to incorporate new complementary technology and functional analysis that should help to improve and give better integrated knowledge on data from exploratory proteomics studies, we will use the example illustrated in Box 2. The example concerns several quantitative proteomic studies in two different ecotypes of a marine snail using 2-DE and MS, aiming to unravel the molecular mechanisms underlying adaptation to different ecological environments.

Table Box 2. A possible adaptive role of arginine kinase in a marine snail
On exposed rocky shores of NW Spain, there is a striking polymorphism in Littorina saxatilis populations: two ecotypes are found associated with distinct shore levels and microhabitats (exposed/sheltered). The two ecotypes show extensive differentiation in shell morphology, behaviour and some molecular markers (reviewed in Rolán-Alvarez 2007). Moreover, there is strong support for a process of divergent natural selection in situ being responsible for different adaptations of each ecotype to its particular habitat. A recent study showed that these ecotypes also differed in a considerable percentage (16%) of the proteome analysed (Martínez-Fernández et al. 2008). In particular, one protein spot (number 39) clearly differed in expression between ecotypes (expression average ± standard deviation; in sheltered ecotype 25.84 ± 56.11; in exposed ecotype 1830.88 ± 1147.19) and was finally identified by MS as arginine kinase (AK). Given that this enzyme is involved in the energetic metabolism through the maintenance of cellular ATP levels (Uda et al. 2005), it was further concluded that individuals of the exposed ecotype could need a higher level of AK to deal with the higher energy demand owing to the specific ecological conditions (Martínez-Fernández et al. 2008). In a second study, Martínez-Fernández et al. (2010b) observed that proteomic differentiation between ecotypes was largely insensitive to drastic environmental changes experienced during growth, pointing to a mainly genetic determination of the proteomic differentiation in this system.
  To validate previous results, a new study was performed comparing levels of gene expression for the AK gene by a quantitative polymerase chain reaction method (qPCR). The methodology and choice of the best reference genes followed here are described in Martínez-Fernández et al. (2010a). The study used 10 exposed and 10 sheltered pools of L. saxatilis, each containing five males and five females, from NW Spain. Using Rest software (http://www.gene-quantification.de/rest-2009.html), significant differences in AK gene expression were found between both ecotypes (= 0.007; normalized by two reference genes, Fig. 3). However, this study provides evidence that the sheltered ecotype showed the highest gene expression in AK, the opposite of what was previously found in the proteomic study presented earlier (see text for discussion of these results).
  Recently, a new proteomic study on these ecotypes (A. P. Diz et al., unpublished), using some technical improvements, allowed the detection of a larger proportion of the proteome and found evidence that two new spots (455 and 456), both identified as AK, showed the same trend between ecotypes as was observed with the transcriptomic analysis (see Fig. 3). After a literature survey, it was found that a family of AK genes has been observed across a variety of related organisms including several marine molluscs (reviewed in Uda et al. 2005). Actually, several studies have observed strong clines in allele frequencies in one AK locus in other littorinids, corresponding to certain environmental gradients (Tatarenkov & Johannesson 1999), although such a cline has not been observed in the Spanish ecotypes. Therefore, the proteomic approach in the Spanish ecotypes allowed detection for the first time of a new polymorphism in protein expression in an enzyme that has shown some environmental adaptive role in similar species. The apparently contradictory patterns observed between both proteomic studies require future experimental efforts, because they could be caused by different PTMs of the same enzyme, different isoforms of the same gene or even different genes (see techniques described in the section: Interpreting proteomic data and complementary approaches, which would allow testing of these alternative explanations and functional implications). Irrespective of the future interpretation of the AK polymorphism, the quantitative proteomics approach followed in these first exploratory studies has proven to be useful in providing candidate proteins that might play a role in adaptive mechanisms requiring further confirmatory and functional studies (see main text). It is clear that this kind of proteomic approach saves time and resources when compared to more traditional approaches where single genes or proteins are sequentially studied in the hope of finding new knowledge relevant to the biological phenomena under study.
Figure 3.

 Arginine kinase (AK) expression differences found amongst pools of both ecotypes at mRNA level, expressed in percentage and normalized by histone and tubulin as reference genes.

There are a number of cellular mechanisms that may alter the function of a particular expressed protein, PTMs being the most important. Therefore, proteomic data (e.g. see Box 2) may be further analysed for detecting polymorphism, e.g. in phosphorylations, glycosylations and acetylations, although in addition, a few hundred different PTMs have been reported (reviewed in Jensen 2006). PTMs are site specific, being associated with specific amino acids, and the sequence motifs can be identified both by computational sequence analysis and experimental methods (Blom et al. 2004; Görg et al. 2004; Danielsen et al. 2011). Moreover, the same PTMs can cause a functional change depending on the context, and thereby complicating the proteomic interpretation in functional terms (Jensen 2006). These PTMs usually show different predicted changes in protein mass, and so they could be experimentally analysed from 2-DE proteomic maps by using geometric-morphometric tools (Rodríguez-Piñeiro et al. 2005), identified by specific staining modifications (reviewed in Görg et al. 2004; Paradela & Albar 2008), or analysed and identified by gel-free MS methods (Jensen 2006).

There are other new technical innovations, which could help to provide important functional information, some of them being of interest in planning of future studies that might help to refine our knowledge of biological process (e.g. in the evolutionary and ecological adaptation study of the marine snail of Box 2). For example, a combination of techniques, including proteomic methods, was successfully used to visualize cellular protein complexes (Tinnefield 2011), which can provide a better understanding of the structural and functional aspects of particular proteins, protein turnover in cell cultures and ultimately how cells function (Eden et al. 2011; Plotkin 2011). Cell-free protein synthesis, with obvious applications in biotechnology and functional proteomics, is another new technological innovation applied (He 2008). Last, but not least, future efforts will undoubtedly focus on the study of the dynamic, rather than static, transcriptome and proteome changes during any physiological or biological phenomena under study possibly by combining different disciplines that will give a more realistic molecular picture of biological function (Klose 2009).

Present and future applications to evolutionary ecology

General overview: a literature survey

We carried out a bibliographic search to estimate the impact of proteomics and transcriptomics on distinct fields through the Web of Knowledge database (http://www.isiwebofknowledge.com/) to compare the number of evolutionary studies using proteomics and/or transcriptomics approaches across years (raw data not shown). We found 1600 records for ‘proteomics AND evolution’, but 15 307 for ‘transcriptomics AND evolution’, whilst in a more refined search, i.e. by searching in exclusively evolutionary biology journals, the same trend was observed: 138 records in ‘proteomics’ for 1460 records in ‘transcriptomics’. These results suggest that proteomics has been only recently included in evolutionary biology and so that evolutionary studies in proteomics are still in its infancy. To get a more detailed picture of the types of proteomic studies in the field of evolutionary ecology, we have carried out a literature search focussing specifically on proteomics studies concerning evolutionary ecology of eukaryotes (see Table 1), because there are already a few detailed reviews on prokaryotes (see Thomas et al. 2007; Jerez 2008).

Considering the published proteomic studies summarized in the Table 1, several conclusions can be reached: for example, nearly half of these studies have been published during the last 3 years, roughly 30% either did not use any or did not state clearly the use of biological replicates, 60% did not clearly state the use of technical replicates and around 70% were carried out using pools of samples rather than individual samples (although in several cases the information was not clearly provided). The importance of biological and technical replicates and the considerations and assumptions behind the sample pooling strategy has been discussed in a previous section. Most of these studies used a gel electrophoresis approach (either 1 or 2-DE) to separate proteins before protein identifications by mass spectrometry (MS) analysis. However, there is an increasing trend to use gel-free LC-MS (shotgun proteomics) approaches during the last 3 years, with improvements in the reported number of protein identifications (see Table 1). It can actually be seen that most of the summarized studies used a specific-tissue approach rather than whole individual samples to lower the protein sample complexity and hence facilitate the interpretation of the results. Protein identification by mass spectrometry was mainly carried out by MS/MS rather than the PMF approach (see Box 1), with some but little use of de novo analysis (see Box 1). Additionally, around 65% are relative quantitative proteomics studies, none report any absolute quantitative results, whilst the remaining 35% provide descriptive proteome maps of different tissues/organisms. Remarkably during the last 2 years, a growing number of studies are being reported where proteomic, transcriptomic and genomic approaches are used complementarily (e.g., see Schmidt et al. 2011), a strategy that is highly desirable as it allows specific aims at different -omic levels to be addressed with a more integrated approach.

From a systematic perspective, <20% of the studies were carried out in plants, just one in fungi, all others were on animals. We have grouped the studies from Table 1 into four main thematic blocks, and some of them will be highlighted later. The first group of studies focussed on the study of evolutionary patterns of reproductive proteins and the implication for fertilization and/or speciation processes (described by ‘speciation’ in Table 1). There are several studies examining the evolution of reproductive proteins. For example, Findlay et al. (2008) used a proteomic approach combined with detailed genomic analyses and found that some novel proteins from seminal fluid of Drosophila, transferred to females during mating, have evolved by positive selection. Likewise, Aagaard et al. (2006, 2010) found that some proteins from the egg coat of abalone evolved under positive selection, in parallel with their counterpart proteins in sperm. Both types of protein are involved in egg–sperm interactions and hence in fertilization, and presumably in the speciation process in this marine organism. Most of these proteomics studies were carried out in nonmodel organisms.

A second set of studies mainly focussed on the intra- and inter-specific protein variability of different tissues/organisms, and the implications for the adaptive response, speciation, developmental and behavioural change and hybridization at the proteome level (described by ‘variability’ in Table 1). In some studies, it was shown that some of the variation detected at the proteome level has a genetic basis, suggesting possible linkage to adaptive changes (Diz & Skibinski 2007; Martínez-Fernández et al. 2010b). There are also a relatively high number of proteomic studies on the venom of different snake species, providing evidence of large variation at an intra-specific, inter-specific, intra-genus level and/or across ontogeny (e.g., see Calvete et al. 2007a; Gibbs et al. 2009). Some other studies in plants assessed the protein expression patterns in hybrids and the possible role of heterosis (Wang et al. 2008b), whilst others addressed the protein variability of different species of marine organisms at different larval stages (Thiyagarajan et al. 2009).

A third set of studies focused on the proteome of the host or pathogen/parasitic association to better understand the interaction, co-evolution and the underlying adaptive mechanism at the molecular level (described as ‘coevolution’ in Table 1). For example, a combined study at the proteome and transcriptome level in the salivary gland of a Pea aphid points towards an adaptive co-evolutionary trend at the molecular level in host–pathogen/parasitic interactions (Carolan et al. 2011), whilst in another study, the protein composition of saliva in the common bed bug Cimex lectularius was identified, which provided important insights into how haematophagic insects can evade their host’s immune response (Francischetti et al. 2010).

Finally, a fourth group of proteomic studies focussed on the molecular mechanisms underlying the adaptive response to different ecological conditions (described by ‘adaptation’ in Table 1), aiming to find clues about the molecular mechanisms of adaption to different ecological conditions. Remarkably, a high number of these studies were carried out in plants. For example, the comparison of the proteome of rice, maize and chickpea plants living under water-deficit conditions showed important differences in protein expression pointing to evolutionary divergence (Pandey et al. 2010). The molecular mechanisms underlying the ecological adaptations of different honey bee populations, and the relevance of these results to bee breeding and management, were investigated in another study (Parker et al. 2010). A few other studies in Table 1 address other relevant evolutionary issues outside the four main thematic blocks described earlier.

Potential new applications in certain evolutionary contexts

In previous sections, we have described a few interesting applications of proteomics to the field of evolutionary ecology. In addition, we have given good reasons for applying proteomic methods in evolutionary ecology, essentially that the observed biological patterns usually coincide better with proteomic than with transcriptomic patterns, the former representing a biological level closer to the phenotype. We would like to further emphasize, however, some new possibilities that have been rarely attempted, and in which proteomics may represent a promising alternative and/or a complementary strategy to genomic and transcriptomic approaches. One is the identification of genes (candidate genes) responsible of different biological patterns (Vasemägi & Primmer 2005). The use of proteomics increases the chance to find new candidate proteins that could not have been easily detected using genomic or transcriptomic methods (see Findlay et al. 2009; Li et al. 2011). Another advantage of using proteomics for complementing transcriptomic studies is that it could be likely to expect at some extent different results (see previous sections).

There is a recent interest in understanding the patterns of gene expression in ecological diversification and ecological speciation (Vasemägi & Primmer 2005; Whitehead & Crawford 2006; Hodges & Derieg 2009; Rice et al. 2011), even though there are rather few studies from a proteomic perspective (see Table 1). A number of studies have described the changes in gene expression associated with hybridization between different taxa, providing knowledge on the genetic mechanisms of reproductive isolation (reviewed in Landry et al. 2007). Therefore, it may be possible to study a number of hybrids captured in the wild and analyse them separately for phenotypic and proteomic data, using correlation and regression tools to identify potential candidate proteins underlying the phenotypic change. This situation superficially resembles the quantitative trait loci (QTL) analysis provided in laboratory F2 or back-crossed families in which it is known that some genes and corresponding phenotypes are segregating (i.e., loci contributing to the observed familial variation in the quantitative trait; Besnier et al. 2010) but applied to nonmodel organisms where controlled laboratory crosses may not be possible or practical (see Vasemägi & Primmer 2005).

Several works have found that gene expression itself may be a valuable trait for detecting evolutionary events or patterns (Hegarty et al. 2009; Lockwood & Somero 2010; Wolf et al. 2010) or even for testing evolutionary hypotheses. For example, the faster evolution of male-expressed gene hypothesis (Wu & Davis 1993) predicts a higher evolutionary rate of loci involved in male reproductive organs than in genes associated with other traits (reviewed in Wong 2011). Such evolutionary predictions could be explored by comparing levels of variation in protein expression. For example, Marshall et al. (2011) investigated the possible relationship between male-biased expressed genes and reproductive isolation by comparing the proteome of ejaculates and thorax tissues in two closely cricket species. They observed that the proteome from ejaculates showed smaller variation within species and larger differentiation between species than the thorax proteome. Furthermore, they confirmed at the gene level that some of these reproductive proteins have evolved under positive natural selection contributing to the divergence between species. This study points to the utility of a preliminary proteomic survey to detect candidate targets to be involved in evolutionary and/or adaptive processes that can be demonstrated a posteriori by complementary techniques. Moreover, a proteomic perspective may add some relevant information per se, even when no particular protein identification can be achieved (Aubin-Horth & Renn 2009). For example, the comparison of protein expression across different age groups and amongst populations of distinct ecotypes/species suggested a pattern of adaptive evolution by paedomorphosis, because protein expression in one ecotype/species resembled the pattern of protein expression in younger stages of a related one (Núñez et al. 2009; Calvete et al. 2010).

Proteomic research strategies have commonly overlooked the genetic basis of the proteome change. Few studies have estimated heritabilities (Diz et al. 2009a) or genetic components underlying protein expression patterns (e.g. Foss et al. 2007; Melzer et al. 2008; Diz et al. 2009a,b; Garge et al. 2010; Martínez-Fernández et al. 2010b). Moreover, a few other studies have investigated the genetic architecture of protein expression from 2-DE maps by using expression quantitative trait loci (PQLs or peQTLs; Klose et al. 2002; Foss et al. 2007; Garge et al. 2010; reviewed in Thiellement et al. 1999, 2002; Consoli et al. 2002). However, we still do not know the detailed genetic architecture of quantitative gene expression, except for in a few exceptional cases (reviewed in Gibson & Weir 2005), for example the comparison between two yeast strains for protein and corresponding transcript abundances, in which the genetic architecture of both traits was identified (Foss et al. 2007). Interestingly, the QTLs detected for the proteome were usually different from those for the corresponding transcripts. These results suggest that many loci are involved in both proteomic and transcriptomic expression levels (with pleiotropic effects on different transcripts/proteins) but also suggest that some loci may be specifically involved in the observed protein abundance but not necessarily in the corresponding transcript levels. Whether these views turn out to be correct depend on future experimental efforts, but they will surely be crucial for understanding the evolutionary change by integrating the different molecular levels with the phenotype.

Additionally, changes in the regulation of genes may be important for phenotypic evolution (Futuyma 2009). A related and also difficult task is to disentangle the particular molecular mechanisms responsible for gene regulation, i.e. whether the determination of the protein expression patterns is obtained preferentially by changes in cis-regulatory elements (regulatory sequence acting as enhancers or inhibitors of transcription factors) or trans-regulatory elements (reviewed in Rodríguez-Trelles et al. 2003; Wray 2007; Hoekstra & Coyne 2007). In this context, it is reasonable to assume that proteomic techniques may be of some help in finding candidate regulatory genes, as these are expected to be proteins. Another interesting question is to know the relative contribution of inherited changes on gene expression without any alteration at the DNA sequence level because of different mechanisms, e.g. epigenetic, parental effects, ecological inheritance, etc (see Danchin et al. 2011). Perhaps a new contribution of proteomic techniques to evolutionary studies may be the detection of more cases of regulatory candidate genes, which itself may help to assess the relative importance of different regulatory mechanisms during evolution.

Finally, another interesting but scarcely addressed issue at the proteome level is to obtain a complete understanding of the huge number of interactions between proteins that may be involved in differential expression responses to distinct cellular and environmental stimuli (Kiemer & Cesareni 2007; Kelly & Stumpf 2008). From the analysis of the full set of interactions in the proteome, a new discipline within proteomics has emerged: the interactome, which is expected to be a promising field in future years (Cho et al. 2004; Kelly & Stumpf 2008; Bonetta 2010). Many proteins involved in the same cellular functions interact with each other, and those interactions may present diverse patterns under different situations. Hence, protein interaction maps would be of great value for understanding cellular functions as well as diseases (Cho et al. 2004). Actually, the strength of protein interactions may be a trait in itself, capable of evolving in our ancestors to produce modern metabolism (Fernández & Lynch 2011). A related issue with a recent interest is the combination of proteome techniques with the study of the whole pattern of metabolites, the metabolome, which is aimed to explain the complex distribution of metabolites across distinct environmental stressors or cellular conditions (Fernie 2003; Shulaev et al. 2008; Wilmes et al. 2010).

Concluding remarks

Proteomics is now becoming a mature field, thanks to the reported advances in instrumentation and software in recent years (Gstaiger & Aebersold 2009; Gouw et al. 2010). For example, researchers have succeeded in identifying the complete yeast proteome in one experiment of only a few days duration, including proteins expressed at the level of just a few copies per cell (de Godoy et al. 2008) or have even been able to study the proteome in archaeological tissues, as in Neolithic human bones (Schmidt-Schultz & Schultz 2004). Moreover, proteomics is moving now from more descriptive or exploratory studies to targeted studies (see Kuster et al. 2005), where some specific hypotheses are tested aiming to answer more specific biological questions. We wish to point out to the evolutionary ecology community that proteomic methods are available, feasible and practical, even for nonmodel organisms, and may provide the possibility of addressing old problems from a new perspective. As in many other fields, proteomics has its own limitations and problems, and future applications should incorporate better experimental designs and methods, following proteomic guidelines, using biological and technical replicates, applying multitest corrections or being careful with functional interpretation of the observed patterns. Functional proteomics will require detailed and dynamic studies, which are tissue specific, including changes at ontogenetic level and the possible impact of nongenetic mechanisms. Moreover, technical advances in protein separation and identification will continue to increase the sophistication of the technology, making it more powerful and inexpensive. In fact, it is already possible to apply some proteomic technologies to study the proteomic content of cytological preparations and individual cells (Gry et al. 2010; Eden et al. 2011). Despite the general improvement of protein identification technology, in the next few years, a huge scientific effort will be required to complete the (dynamic) proteomic map of model organisms (Service 2008). It is early to ascertain the full possibilities of this and similar approaches, but if they could be used routinely for nonmodel organisms, it may cause another revolution, bringing new detailed data for incorporation into ecological models. Nevertheless, science is not exclusively based on technological improvement. Of course, the development of ideas and hypotheses to test is the main fuel for advances in research, but technical limits have also historically influenced these ideas and hypotheses (Mayr 1982). We think that previous research in molecular ecology has preliminarily emphasized the genetics of evolutionary change at the DNA level, but perhaps the greatest revolution in store is the ability to integrate the molecular and the phenotypic levels in functional terms (Vasemägi & Primmer 2005; Aubin-Horth & Renn 2009). For this integration, proteomics will be only another tool of the experimental machinery but hopefully not a neglected one.

Acknowledgements

We would like to thank A. Caballero, M. Páez de la Cadena, M. Saura, D.O.F. Skibinski and three anonymous referees for critical reading and helpful comments on various versions of this manuscript, N. Santamaría for administrative technical help and to the following institutions for general funding: Ministerio de Ciencia e Innovación (MCINN) (CGL2008-00135/BOS and BFU2011-22599), Fondos Feder and Xunta de Galicia (INCITE09 310 006 PR and ‘Grupos de Referencia Competitiva’ 2010/80). M. Martínez-Fernández is currently funded by a ‘Juan de la Cierva’ research fellowship (JCI-2010-06167) from MICINN, and A.P. Diz by an ‘Isidro Parga Pondal’ research fellowship from Xunta de Galicia (Spain).

A.P.D. is a postdoctoral researcher working in evolutionary biology, specifically interested in understanding the functional consequences of genetic changes and the molecular mechanisms underlying the processes of adaptation and ultimately speciation. M.M.-F. is a postdoctoral researcher interested in gaining insights into the processes and mechanisms implicated in differential gene and protein expression involved in different biological processes. E.R.-A. is professor in genetics at the University of Vigo, and his main interest is about marine organisms’ microevolution, with special emphasis on functional adaptation and speciation processes.

Ancillary