Proteomics methods for subcellular proteome analysis

Authors


Abstract

The elucidation of the subcellular distribution of proteins under different conditions is a major challenge in cell biology. This challenge is further complicated by the multicompartmental and dynamic nature of protein localization. To address this issue, quantitative proteomics workflows have been developed to reliably identify the protein complement of whole organelles, as well as for protein assignment to subcellular location and relative protein quantification based on different cell culture conditions. Here, we review quantitative MS-based approaches that combine cellular fractionation with proteomic analysis. The application of these methods to the characterization of organellar composition and to the determination of the dynamic nature of protein complexes is improving our understanding of protein functions and dynamics.

Abbreviations
ER

endoplasmic reticulum

LOPIT

localization of organelle proteins by isotope tagging

PCP

protein correlation profiling

SILAC

stable isotope labeling by amino acids in cell culture

Introduction

Many pioneering organellar proteomics studies have focused on providing a detailed list of the protein contents of organelles, substructures, or compartments isolated from cells, following specific biochemical enrichment strategies [1-5]. Quantitative proteomics workflows are at the center of subcellular resolution proteomics, and this sets them apart from the high-throughput assignment of proteins to subcellular compartments with methods such as protein correlation profiling [5, 6], recording the number of ions detected per protein [3, 4], or localization of organelle proteins by isotope tagging (LOPIT) [7, 8]. Although these methods have allowed the characterization of protein localization within these organelles, the interpretation of these protein inventories is complicated by the fact that many proteins are not exclusive to one compartment, but are instead present in separate subcellular locations [9, 10], highlighting the importance of measuring the relative abundance of proteins in different locations. Moreover, the dynamic nature of cellular processes underlines the need to assess how subcellular localization can change between different compartments under different cell growth and physiological conditions.

The challenge in organellar proteomics has now moved from the identification of proteins to measuring changes in protein abundance, modification state, and localization within the cell. The emergence of this next generation of proteomics methods is characterized by the development of quantitative MS-based methods, which have moved from primarily identifying proteins, to also providing measurements of relative changes in protein levels or localization under different experimental conditions.

In this review, we describe recent advances in quantitative MS-based proteomics methods for studying the subcellular proteome arrangement and dynamic changes upon perturbations.

Single-organelle proteomics

The combination of MS-based proteomics methods and traditional biochemical fractionation protocols has been a logical step in the characterization of subcellular organization. The protein contents of specific subcellular compartments can thus be identified following specific enrichment strategies that concentrate and enrich organelles and subcompartments of the cell. The importance of identifying a protein within a specific cellular compartment is that it allows a better understanding of the function of both the protein and the compartment. Indeed, the identification of a protein of known function within a specific compartment is a strong indication that the function is either present within that compartment, or that regulation of the protein can occur within it. Similarly, identification of an uncharacterized protein within a cellular compartment can provide clues to its function. High-throughput protein identification with proteomics methods is proving to be a key approach in elucidating the function of proteins by association, and is an efficient way to study the complexity of mechanisms involved in the regulation of cellular functions [11].

The typical workflow starts with a biochemical isolation method suitable for a specific organelle (Fig. 1). Following the enrichment procedure, verification of the purity of the isolated organelles is necessary. For example, the isolation of the organelle can be ensured by the presence of a specific enzymatic activity, such as galactosyl transferase activity assay for the Golgi [12], with morphometric analysis at the electron microscopy level to assess homogeneity during the isolation [3], or with immunofluorescence microscopy [12]. However, none of these methods is likely to be as sensitive as MS/MS in detecting low levels of contamination, and they will only allow rough characterization of the enrichment method with known protein markers. The complex mixture of proteins generated will often be fractionated to reduce the complexity of the sample. Commonly used methods consist of either one-dimensional SDS/PAGE or two-dimensional SDS/PAGE followed by in-gel digestion to generate peptides, or direct solubilization of proteins for in-solution digestion. Alternatively, proteins can be prefractionated with size exclusion chromatography or strong cation exchange prior to enzymatic digestion. The resulting mixture of peptides is then separated by HPLC combined with MS. Proteins will then be identified by tandem MS with collision-induced dissociation to generate a set of fragments from a specific peptide ion. The sequencing of genomes of several species has resulted in the development of proteome databases allowing peptide matching of MS data [13]. This allows accurate identification of proteins in the original sample, providing an inventory of the proteins that are present in the isolated organelle.

Figure 1.

A typical organellar proteomics workflow. The typical workflow starts with a biochemical enrichment method that is suitable for a specific organelle (1). The complex mixture of proteins generated is generally fractionated to reduce the complexity of the sample, most often by one-dimensional SDS/PAGE (2), and this is followed by in-gel digestion or direct solubilization and in-solution digestion to generate peptides (3). This mixture of peptides is then separated by HPLC combined with MS (4), and this is followed by data analysis for protein identification (5).

For several compartments, such as the nucleus and the mitochondrion, biochemical enrichment is generally easier to achieve by disrupting cells under mild conditions to maintain organelle integrity, and then using differential centrifugation to allow enrichment of seemingly intact organelles from the whole cell extract [1, 14]. Such methods for isolation of organelles are based on differences in sedimentation coefficients and densities of organelles. Organelles can thus be separated by density gradient centrifugation in buffers that roughly preserve the integrity of the organelle. In a more chemically oriented approach, detergents with increasing solubilization efficiency can be used to enrich specific cellular fractions [15, 16]. Several commercially available isolation kits have been developed to separate different subcellular compartments, such as the QProteome Cell Compartment Kit (Qiagen, Hilden, Germany), the Subcellular Protein Fractionation Kit (Pierce, Rockford, IL, USA), and several others. Most of these kits can be easily used in benchtop centrifuges and are suitable for subsequent MS analysis, thereby increasing the reproducibility while being cost-effective and time-effective. However, such kits are often limited with regard to what organelles can be enriched.

Free flow electrophoresis is a highly versatile technology that was introduced nearly 50 years ago for the separation of a wide variety of charged analytes, including low molecular mass organic compounds, peptides, proteins, protein complexes, membranes, organelles, and whole cells in aqueous media under native and denaturing conditions [17]. The analyte of interest is injected into a thin, laminar separation buffer film, and deflected perpendicularly by an electric field. The absence of any kind of matrix results in high sample recovery rates and fast fractionation. Furthermore, the enrichment allows the visualization of less abundant proteins for subsequent separation by electrophoresis or LC-MS analysis. Experience with free flow electrophoresis for the enrichment of organelles has been very limited up to now, but includes, for example, the isolation of perixomes [18], as well as endosomes, lysosomes, and phagosomes [19]. Another method for enrichment of cell organelles is immunoprecipitation with specific antibodies directed against epitopes presented on the surface of the organelle [20]. Heterologous proteins or affinity tags such as tandem affinity purification tags [21] and FLAG [22] presented on an organelle surface can also be used for the isolation of an organelle of interest by expressing a resident protein of that organelle. However, the introduction of a tag and overexpression of the tagged proteins can lead to mislocalization of the proteins.

This general approach of cellular fractionation to enrich fractions in specific organelles has been an area of intense activity in recent years, and has yielded extensive lists of proteins associated with different subcellular compartments [23]. Several groups have been able to isolate diverse physical compartments to perform proteomics studies, such the Golgi complex [24], clathrin-coated vesicles [25], mitochondria [14, 26], phagosomes [27], the nuclear envelope [28], and large functional complexes such as the nuclear pore [29] and the spliceosome [30]. Further development of these methods allowed researchers to increase the resolution of the enriched organelle. For example, an isolation of the Golgi described by Bell et al. [24] led to the identification of > 1400 proteins in different components of the secretory pathway, which consists of rough and smooth ER, Golgi, and Golgi-derived coat protein I vesicles [3]. In another example, using a method combining high pH and proteinase K digestion for the global analysis of both membrane and soluble proteins from membrane-containing samples, Wu et al. provided one of the first analyses of the Golgi membrane proteome where it was possible to determine the orientation of the membrane proteins [31]. Whereas most methods will solubilize membrane proteins and use specific proteases, the advantage of the method developed by Wu et al. is that it provides information on membrane protein topology.

Another approach to organellar proteomics is to quantitatively follow similarities in protein behavior during the enrichment procedure. The idea is that proteins within an organelle should be present with patterns similar to those of known proteins from the organelle of interest. This principle is used in protein correlation profiling (PCP) [32], which is based on the consensus profile obtained by measuring peptides generated by proteolysis of proteins from different fractions of a density centrifugation gradient. Andersen et al. identified 41 likely candidates of the centrosome and validated 23 novel components by using PCP [32]. All of these proteins showed a correlation with already known centrosomal proteins. Proteins present in lipid droplets have also been identified with PCP by correlating the enrichment profile of proteins identified by MS with that of known lipid droplet proteins [33]. This method was particularly suitable because of the difficulty of isolating lipid droplets from other membrane-bound cellular organelles such as the ER, peroxisomes, and endosomes. The authors were able to distinguish 111 lipid droplet proteins from the 1481 proteins identified in the different fractions [33].

The technological advances in MS in recent years have moved the major source of error in organellar proteomics from the erroneous identification of proteins to the identification of contaminants (Fig. 2). Indeed, many early experiments had a significant number of proteins identified as false positives resulting from misidentifications [34], often because of the inability of the instrument to reliably identify low-abundance proteins in the sample. This led to a choice between being overly inclusive at the risk of increasing the error in the identification, or setting a very stringent threshold of confidence, resulting in missing important protein components. It is noteworthy that many interesting proteins (i.e. poorly characterized) are of low abundance with tightly regulated expression, whereas highly abundant and thus easily identifiable proteins are often more ubiquitous or involved in structural cellular functions [35]. Currently, proteomics techniques can identify thousands of proteins in enriched organelle samples where a substantially lower number of identifications would be expected. The additional identifications from seemingly highly pure fractions thus most likely result from minute amounts of other proteins carried over during the purification procedure. This issue is exemplified by the isolation of nucleoli and the subsequent identification of nucleolar proteins. An isolation protocol for the nucleolus with a combination of sonication and sucrose density centrifugation was originally described [1], and identified 271 proteins in nucleoli, including proteins with no known nucleolar functions [36]. This number increased to 489 proteins when the dynamics of protein influx within the nucleolus following inhibition of transcription were studied [37], and later to > 4500 proteins within isolated nucleoli [38]. Clearly, this number is approaching the number of proteins expressed in the whole cell, and probably reflects the ability to identify contaminating proteins present in the enriched fraction.

Figure 2.

Identification of an organellar protein. A protein identified in a fraction enriched in a specific organelle or in an enriched subcellular compartment (the nucleolus is used here as an example) can be a genuine component of the organelle (1), a protein found to be partially localized in that compartment (2), or a protein predominantly found elsewhere in the cell, but also in small amounts in the organelle of interest (3). It is also possible to identify contaminants, such as proteins within the cells that are normally never localized within the isolated organelle (4) or proteins external to the experiment, such as keratins (5).

All of the approaches described above allow the identification of proteins within the organelle following its isolation. However, the analysis of organellar content is complicated, because many proteins are present in multiple compartments [9, 10]. As a result, the identification of a protein in a specific cell organelle is not sufficient, and it is necessary to study the quantitative distribution of proteins across organelles in order to obtain a comprehensive picture of the biological system and to grasp the role of localization change in the regulation of biological processes. Measurement of the relative abundance of proteins in different locations, and assessment of location changes between different compartments, under different cell growth and physiological conditions, are thus necessary.

Subcellular-resolution proteomics

Although organelle-based approaches can provide valuable information about specific subcellular compartments in isolation, it is also important to study protein localization in the context of the whole cell to obtain a system-wide view of proteome organization. When a protein is identified in an enriched organelle or a subcellular compartment, it is often relevant to measure whether the protein identified is a main component of the organelle, is partially localized in that compartment, or is merely an experimental contaminant (Fig. 2). Intuitively, the identification of an abundant protein in a subcellular compartment suggests that the protein is mainly localized within that compartment, which would be reflected by its predominant localization within that compartment if tested by methods such as immunofluorescence microscopy (Fig. 2). However, as the limit of detection of the mass spectrometers is improving, it is possible to also identify proteins that are mostly localized elsewhere in the cell, but are present in low amounts or under specific conditions within the organelle. Whereas the identification of such proteins can still be relevant to the functions of the organelle, it deviates from the localization annotation of such proteins obtained with more traditional methods (immunofluorescence microscopy, green fluorescent protein tagging, biochemical fractionation, etc.). Finally, it is also possible to identify contaminants, both external to the experiment, such as keratins, and internal to the experiment, such as proteins from elsewhere in the cells that never localize in the isolated organelle. In high-throughput assignment of proteins to subcellular fractions, quantitative proteomics methods have been developed to address the localization of proteins within the cells.

An extension to the PCP method can be applied to the study of several subcellular compartments in a single experiment. Foster et al. mapped > 1400 proteins to as many as 10 subcellular compartments, including nuclear and cytosolic fractions, by using a combination of several well-characterized marker proteins [6]. With this approach, the localization of proteins is assigned simultaneously to 10 different subcellular locations [6]. Although these methods do not use isotope labels, another method that has been developed is based on stable isotopic labels: LOPIT [7]. LOPIT can also assign proteins to subcellular compartments according to the principle that proteins residing in the same organelle cofractionate and show similar distribution profiles after density gradient centrifugation [39]. The protein distribution in the gradient fractions is correlated by western blotting with antibodies for a specific marker of each organelle. The measurement of the relative abundance of proteins between different fractions along the length of the gradients is possible with labeling of the protein of each fraction by the use of, for example, isobaric tag for relative and absolute quantitation [39] or isotope-coded affinity tag [7]. With LOPIT based on isobaric tag for relative and absolute quantitation labeling, > 500 proteins were simultaneously localized in different organelles, with quantitation of each protein relative to the studied organelles [40].

Stable isotope labeling by amino acids in cell culture (SILAC)-based approaches can also be used to determine the subcellular localization of proteins in the context of the whole cell. SILAC involves the use of stable isotopic atoms along with MS for quantitative analysis [41]. This method allows quantitative analyses of proteins by comparison of the masses of light and heavier forms of the same peptide from a given protein, arising from the presence of heavier, stable isotopes such as 13C, 2H, and 15N. A technique called spatial proteomics that measures the cellular distribution of thousands of proteins by using a combination of cellular fractionation and MS was recently developed [42]. This method involves first culturing cells with SILAC medium to ensure that proteins are fully labeled [41]. The labeled cells are separated into fractions, which are then recombined such that each fraction has a distinct isotope signature. The labeling thus allows quantification of the relative abundance of peptides originating from different subcellular fractions following the identification of proteins by MS, and was used to study the relative distribution of the proteome between the cytoplasm, nucleus, and nucleolus [42].

These types of quantitative proteomics method are particularly useful when sufficient enrichment of a specific organelle is not possible. Indeed, most methods providing an inventory of proteins in an organelle are not able to provide information on the multicompartmentalization of proteins, and cannot differentiate potential contaminants. By the use of quantitative strategies that can provide an enrichment coefficient of proteins between different compartments, it is possible to measure the abundance of proteins in different compartments.

Dynamics of proteome localization

Although static snapshots of protein localization provide valuable information, the proteome dynamically responds in space and time [43]. One of the major challenges in cell biology is to not only assess the localization of proteins, but also study how the localization of proteins changes in response to perturbations or stimulations. Although imaging techniques allow high-resolution analyses, they are often limited to a small number of proteins at a time. MS-based methods are now emerging as a methodology that allows both quantification of the localization of proteins within the cells, and measurement of dynamic changes in protein localization under different cellular conditions.

Some of the approaches that have combined organellar isolation and MS analysis of the protein content have highlighted changes in the proteome in specific organelles in response to treatment. Several quantitative proteomics methods can be used to measure changes in the protein content of organelles. For example, an organellar proteomics approach was utilized to quantify alterations in protein abundance in mitochondria enriched from cisplatin-sensitive and cisplatin-resistant human epithelial ovarian cancer cell lines by spectral counting [44]. It is also possible to measure the flux of proteins entering and leaving an organelle following a specific treatment by quantifying the changes at several time points, providing the dynamics of changes over time. Using SILAC and isolation of nucleoli, Andersen et al. were able to characterize changes in the protein content of the nucleoli in response to three different metabolic inhibitors [37]. These types of experiment provide an approach for the temporal characterization of the protein content of cellular organelles, and demonstrate the dynamics of protein localization in specific subcellular compartments under different cellular growth conditions. However, the use of isolated organelles makes it difficult to differentiate between changes in protein expression and changes in protein localization.

An evolution of the SILAC-based spatial proteomics method can be used to quantify changes in protein localization. The comparison of ratios between two experiments allows identification of the subcellular location of uncharacterized proteins and determination of changes in the subcellular location of proteins in response to a stress. By comparing the ratios of one protein that is present in different compartments in response to a stimulus, a change in the location of the protein can be quantified. This method has been used to identify changes in the subcellular localization of proteins in response to the topoisomerase inhibitor etoposide, which leads to DNA double-stranded breaks [42], and the role of p53 in the regulation of the cellular response to DNA damage [45]. This method can thus provide accurate information at a proteome-wide level on the changes in protein localization. The spatial proteomics approach provides a discovery method to identify new protein components of organelles, as well to identify proteins affected by different cellular treatments. The results obtained with this method can be used to further analyze and independently verify data from microscopy and molecular studies that are not readily applicable in a high-throughput manner, and can be used both to characterize a wide range of different cell types and, combined with alternative fractionation techniques, to analyze multiple subcellular compartments and structures. In this particular case, cellular proteins were separated into cytoplasmic, nuclear and nucleolar fractions. Because of the limitation in the number of isotopes that can be used in a single experiment, the resolution thus far has been limited to only three [42] or four [46] subcellular compartment and a single time point following a cellular perturbation. However, it should be possible to increase the spatial and time resolution by using a reference protein set to compare each subcellular compartment and each time point in order to map protein flux in response to cellular stimuli.

Protein turnover and protein degradation are highly regulated biological processes that can occur in different cellular compartments. Therefore, characterizing the mechanisms regulating protein degradation is relevant for understanding organelle dynamics and regulation. In general, the approaches described above have been used to identify the protein inventory over time or under different conditions. However, those approaches do not provide information regarding the mechanisms underlying the changes in protein dynamics. Stable isotope incorporation during protein translation can also be used transiently for pulse labeling of proteins to measure protein synthesis and degradation rates, a method called pulsed SILAC [47]. This method has been combined with subcellular fractionation protocols to analyze protein degradation in multiple subcellular fractions of different human cell lines: HeLa [48] and U2OS [46] cells. Similarly, the turnover of nuclear proteins from rat livers and brains following metabolic labeling allowed the identification of extremely long-lived proteins [49]. These studies analyzed both global and compartment-specific protein degradation, highlighting biological processes in which the control and compartmentalization of protein degradation is likely to play an important role.

Bioinformatics tools for the analysis of organelle proteomics data

Classification, interpretation and analysis of the data is of great importance in order to understand and share the results obtained in subcellular proteomics experiments. Therefore, bioinformatics tools have been developed to create databases and data repositories providing accessible inventories by merging the data of several organellar proteomics experiments [50]. A major initiative with the aim of standardizing protein attribution is the Gene Ontology project, which classifies proteins by localization as well as by biological process and molecular function [51]. Interpretation of the data is also important in order to understand the different functions of organelles, and to infer protein function. Thus, it is possible to use this information to predict protein–protein interactions and protein localization, as well as to identify features such as protein localization signals. Improvement in organelle resolution when assigning protein–organelle association can also be achieved by using an algorithm that analyzes several experiments that, individually, do not allow full differentiation between different organelles [52], highlighting the need for robust analytical methodology allowing better identification of protein localization.

Conclusion and outlook

Organellar proteomics has come a long way since the pioneering studies providing early inventories of different subcellular compartments, to now being able to measure complex changes in proteomes of multiple compartments in single experiments. With the development of MS-based methods for the characterization of protein localization and the accumulation of proteomics data, a more accurate organellar location of the proteome is emerging. The main challenge is now moving towards integrating the large amount of data generated in those studies. These types of experiment have been proposed as ‘third-generation proteomics’ [43], which has evolved from the development of methods for simple protein identification to being able to measure the dynamic responses of entire proteomes in both space and time. The challenge ahead will now consist of increasing the spatial and time resolution of these experiments to provide cell-wide mapping of the dynamic changes in protein subcellular localization. These next-generation methods for measuring the localization of proteins on a large scale will have a major impact on cell biology, improving our understanding of protein localization and how cells respond to different conditions.

Acknowledgements

F.-M. Boisvert is funded by the Canadian Institutes of Health Research (MOP-123469) and the National Sciences and Engineering Research Council of Canada (418404-2012).

Ancillary