Concise Review: Trends in Stem Cell Proteomics


  • Hossein Baharvand Ph.D.,

    Corresponding author
    1. Department of Stem Cells, Royan Institute, Tehran, Iran
    • Department of Stem Cells, Royan Institute, P.O. Box 19395-4644, Tehran, Iran. Telephone: 98-21-22172330; Fax: 98-21-22414532
    Search for more papers by this author
  • Ali Fathi,

    1. Department of Stem Cells, Royan Institute, Tehran, Iran
    Search for more papers by this author
  • Dennis van Hoof,

    1. Hubrecht Laboratory, Netherlands Institute of Developmental Biology, Utrecht, The Netherlands
    2. Department of Biomolecular Mass Spectrometry, Utrecht University, Utrecht, The Netherlands
    Search for more papers by this author
  • Ghasem Hosseini Salekdeh Ph.D.

    Corresponding author
    1. Department of Physiology and Proteomics, Agricultural Biotechnology Research Institute of Iran, Karaj, Iran
    • Physiology and Proteomics Department, Agricultural Biotechnology Research Institute of Iran (ABRII), P.O. Box 31535-1897, Karaj, Iran. Telephone: 0098-261-2702893; Fax: 98-261-2704539
    Search for more papers by this author


Gene expression analyses of stem cells (SCs) will help to uncover or further define signaling pathways and molecular mechanisms involved in the maintenance of self-renewal, pluripotency, and/or multipotency. In recent years, proteomic approaches have produced a wealth of data identifying proteins and mechanisms involved in SC proliferation and differentiation. Although many proteomics techniques have been developed and improved in peptide and protein separation, as well as mass spectrometry, several important issues, including sample heterogeneity, post-translational modifications, protein-protein interaction, and high-throughput quantification of hydrophobic and low-abundance proteins, still remain to be addressed and require further technical optimization. This review summarizes the methodologies used and the information gathered with proteome analyses of SCs, and it discusses biological and technical challenges for proteomic study of SCs.

Disclosure of potential conflicts of interest is found at the end of this article.


Stem cells (SCs) are undifferentiated cells generally characterized by their functional capacity to both self-renew and to generate a large number of differentiated progeny cells [1]. Conventionally, SCs are either classified as those derived from embryo or adult tissues. ESCs, embryonal carcinomal cells (ECCs), and embryonic germ cells are derived from the preimplantation embryo (e.g., inner cell mass of blastocyst, morula [reviewed in [2]), and single blastomeres [3]], teratocarcinomas, and primordial germ cells, respectively. These cells are pluripotent; that is, they have the ability to form all embryonic germ layer derivatives except extracellular tissues (e.g., placenta). SCs found in adult organisms are present in most tissues and are referred to as adult SCs, such as MSCs, hematopoietic SCs (HSCs), and NSCs [4]. They are considered multipotent, since they can produce mature cell types of one or more lineages, but cannot reconstitute the organism as a whole. What determines SC potency largely depends on intrinsic properties of SCs, as well as extrinsic cues provided by the niche (microenvironment where SCs reside). Because of their exceptional properties, SCs have the potential to be used for developmental biology, drug screening, functional genomics applications, and regenerative medicine.

Gene expression analyses of the SCs will help uncover and further define signaling pathways and molecular mechanisms involved in the maintenance of the undifferentiated state and initial loss of pluripotency and/or multipotency. A detailed understanding of these molecular mechanisms will, thus, be essential for the aforementioned SC applications. In contrast to the transcriptome, which is studied with microarrays [5, [6], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23]–24], important issues of the proteome, such as protein amount, stability, subcellular localization, post-translational modifications (PTMs), and their interactions can be elucidated at proteome level. Wilkins et al. [25] coined the term “proteome” (PROTEins expressed by a genOME) to refer to the total set of proteins expressed in a cell, tissue, or organism.

Currently, two-dimensional gel electrophoresis (2-DE) and non-2-DE-based approaches are broadly applied to proteomic analyses. Applying proteomics to investigate the programs that control self-renewal, differentiation, and plasticity will provide valuable insight into how the factors involved induce differentiation of SCs to specific lineages.

Recent reviews have comprehensively addressed various aspects that are relevant in the context of SC proteomics [26, [27], [28]–29]. Here, we review various proteomics methodologies used to study SCs, review proteome analyses of SCs, discuss biological and technical challenges encountered with proteomic studies of SCs, and provide insight into how proteomics-based research is likely to develop.

Proteomics: An Overview of Technology

Sample Preparation and Protein Extraction

Although proteomic analysis can be used for qualitative comparisons, it is much more informative when used quantitatively. The isolation of proteins from SCs and derivatives for proteome analyses is complicated. The human genome harbors 26,000–31,000 protein-encoding genes [30], whereas the total number of human protein products, including splice variants and essential PTMs, has been estimated to be close to 1 million [31, 32]. Another important factor for proteomic analysis is the dynamic range of protein concentrations; one cell can contain between 1 and more than 100,000 copies of a single protein [33]. A high dynamic range can be partially achieved by fractionation of the proteome into subproteomes, for example, by applying affinity purification [34]. Reduction of a complex sample is also achieved by specific isolation of individual proteins or protein complexes. In general, hydrophobic membrane proteins are much more difficult to handle than hydrophilic proteins; hence, hydrophobic proteins require specific extraction procedures [35, 36].

Two-Dimensional Electrophoresis

High-resolution 2-DE of proteins is the fundamental tool of proteomics and allows thousands of proteins to be analyzed simultaneously (Fig. 1). 2-DE has been available since 1975 [37]. In 1988, a basic protocol of electrophoresis with immobilized pH gradients (IPGs) was described [38]. The advent of IPGs for the first dimension has produced significant improvements in 2-DE separation, with higher resolution, improved reproducibility, and higher loading capacity for preparative gels [39]. Other important technological advances in 2-DE include the development of sensitive protein stains and the use of in-gel sample application in contrast to loading at either anodic or cathodic ends of the gel.

Figure Figure 1..

A schematic representation of differential protein display using two-dimensional gel electrophoresis (2-DE) and difference in-gel electrophoresis (DIGE). The number in the center of panel represents the “stage” in the legend. (A): Cell samples are grown under different conditions/treatments (stage I) (1), and total proteins are extracted (2) and subjected to isoelectric focusing (IEF) (first-dimension electrophoresis) (3). The IEF gels are reduced with dithiothreitol and alkylated with iodoacetamide prior to SDS-polyacrylamide gel electrophoresis (SDS-PAGE) (second-dimension electrophoresis) (4). The first dimension separates proteins according to isoelectric point (pI), whereas the second dimension separates them approximately according to molecular weight (Mr). Proteins are then visualized using silver, Coomassie Brilliant Blue (CBB), or SYPRO Ruby staining methods (5), and the protein pattern is captured by a high-resolution camera or densitometer and analyzed by software (6). The Mr and pI of each protein are estimated by comparison with the mobility of standard proteins, and changes in staining intensity between replicate gels or between treatments are measured. For protein identification, gels are stained with MS-compatible stains such as SYPRO Ruby or CBB. Protein spots are excised from the gel and analyzed by MS (7). (B): DIGE system includes steps common to 2-DE. However, in DIGE, two protein samples are differentially labeled by Cy3 and Cy5, respectively, and then mixed (8) and run on the same IEF and SDS-PAGE gel. This allows coseparation of different labeled samples in the same gel and ensures that all samples will be subject to exactly the same first- and second-dimension electrophoresis running conditions, limiting experimental variation and resulting in accurate within-gel matching [113]. Following 2D separation, the gel is imaged at the excitation wavelengths of each of fluorescent dye using a scanner, after which an overlay image can be generated (9). Differences in protein abundance are then accurately quantified using software. Coomassie stain is the most economical and easy-to-use, but having a detection limit of 50–100 ng per spot, it is also the least sensitive. Apart from radioactive gel visualization, silver staining is the most sensitive method, with a detection limit of 1–2 ng. SYPRO Ruby is very easy to use and almost as sensitive as silver stain, but it is more expensive than CBB and silver staining [114]. Cy3/Cy5 is the most expensive method and requires costly imaging systems, but it shows very small variability with different gels and operators. Unlike silver staining, CBB, SYPRO Ruby, and Cy3/Cy5 show a good compatibility with MS. When the aim of the study is relative quantification between samples, a wide linear dynamic range is required. CBB and silver stain have a low dynamic range (approximately 10-fold), whereas SYPRO Ruby and Cy3/Cy5 represent a much higher dynamic range (1,000-fold) and show better correlation between spot density and protein content compared with silver staining. Abbreviations: 2D, two-dimensional; MS, mass spectrometry.

Mass Spectrometry

The most significant breakthrough in proteomics has been the application of mass spectrometry (MS). It allows the identification of proteins in the femtomole to picomole range and has superseded classic Edman N-terminal sequencing, which is less automated and less sensitive and requires an unblocked N terminus [40, 41]. The main components of a mass spectrometer are an ion source, one or several mass analyzers that measure the mass-to-charge ratio (m/z) of the ionized analytes, and a detector that registers the number of ions at each m/z value (Fig. 2).

Figure Figure 2..

Schematic overview of peptide sequencing by tandem MS. The main components of a mass spectrometer are an ion source, one or more mass analyzers that measure the mass-to-charge ratio (m/z) of the ionized analytes, and a detector that registers the number of ions at each m/z value. The gas-phase ions are produced in the ion source, after which they enter the mass analyzer and are separated according to their mass/charge (m/z) ratios in the mass analyzer. MALDI (A, B) and ESI (C–E) are two techniques most commonly used to volatize and ionize the variety of biomolecules, including peptides, proteins, metabolites, and oligonucleotides to enable their measurement by MS. The mass analyzer is the central component behind the technology. Examples of mass analyzers currently used in proteomics research are: TOF, Q, triple Q or LIT, ion trap Fourier transform ion cyclotron resonance (FTICR), and Orbitrap. The TOF analyzers separate ions based on the differences in transit time from the ion source to the detector in flight tubes under vacuum (A–C). The Q analyzer transmits only ions within a narrow m/z range and uses the stability of the trajectory to separate these ions according to their m/z ratio on four parallel cylindrical metal rods (C). In triple Q or LIT, ions of a particular m/z are selected in a first section (Q1) and fragmented in a collision cell (q2), after which the fragments are separated in Q3. In the LIT, ions are captured in Q3. They are then excited via resonant electric field and the fragments are scanned out, creating the tandem mass spectrum (D). The ion trap analyzer captures or traps the ions, which are then subjected to MS or MS/MS analysis (E). In ion traps, selection, fragmentation, and analysis of ions takes place in the same space. The FTICR mass spectrometer is also a trapping device that operates under high vacuum in a high magnetic field. Recently, Orbitrap has been introduced as powerful MS that provides high resolution, high mass accuracy, and good dynamic range [115]. Its principles are based on the injection of ions for storage into an electrostatic field. The ions are detected using Fourier transform, similar to the detection method used in FTICR. Because of the high resolving power of the Orbitrap and the sensitivity that can be gained using Fourier transform for ion detection, the Orbitrap will become a very significant MS technology. Several configurations of mass spectrometers that combine ESI and MALDI with a variety of mass analyzers are routinely used. MALDI is usually coupled to TOF analyzers that measure the mass of intact peptides. The MALDI has also been implemented in TOF-TOF mass spectrometers to provide true MS/MS capabilities (B). The two TOF sections are separated by a collision cell. An ion of a particular m/z is selected in a first mass analyzer and fragmented in a collision cell, and the fragment ion masses are analyzed by second TOF analyzer. Fragmentation occurs mainly at the amide bonds of the peptide, resulting in a nested set of peptides that differ in mass by one amino acid. The measured fragment masses can be compared with theoretical mass spectra calculated from the protein sequences in the database. ESI has mostly been coupled to ion traps and Q instruments. Q-TOF instruments exhibit high resolution and mass accuracy in MS and MS/MS mode (C). The precursor ions are selected in the Q (Q1) and undergo fragmentation through collision-induced dissociation (q2). The product ions are analyzed in the TOF device. Abbreviations: ESI, electrospray ionization; LIT, linear ion trap; MALDI, matrix-assisted laser desorption/ionization; MS, mass spectrometry; Q, quadrupole; TOF, time-of-flight.

Identification of Separated Proteins from 2-DE Gels Using MS

This approach is commonly applied for the identification of proteins isolated from 2-DE. It usually begins with peptide mapping, initially suggested by Henzel et al. [42]. The separated proteins are digested with an enzyme (for instance, trypsin), and the masses of the proteolytic peptides are measured by MS. The mass spectra are obtained with a relatively simple MS instrument, such as matrix-assisted laser desorption/ionization-time of flight (MALDI-TOF). The masses of the measured proteolytic peptides are compared with predicted proteolytic peptides from protein sequence databases. This step can be fully automated, but it requires the complete sequence of the protein or of the coding region of its gene to be present in the database. As more gene sequence data become available, the success rate of this method will increase. The method is now popularly called peptide mass fingerprinting. In some cases where peptide mapping does not provide sufficient information for confident identification, there is a need for more sophisticated instrumentation, such as a MALDI-TOF/TOF or quadrupole TOF instrument, which provide higher mass accuracy and sensitivity and include peptide fragmentation and partial peptide sequence determination for several tryptic fragments (Fig. 2).

MS-Based Protein Profiling

Although 2-DE has been widely used for proteome analysis, this methodological approach has several limitations. For example, it is inadequate for the analysis of more complex mixtures, and detection as well as identification is strongly biased toward the more abundant proteins [43, [44]–45]. Moreover, hydrophobic proteins, such as membrane proteins, which are not readily soluble in aqueous media, are rarely detected with 2-DE [46].

In MS-based protein profiling, the proteins are enzymatically digested and subjected directly to MS. This system, also referred to as “shotgun proteomics,” features a protein separation step coupled to a mass spectrometer with superior resolving power and dynamic mass range. Most popular at present are two-dimensional (2D) (strong cation exchange/reversed phase) [47, 48] or three-dimensional (strong cation exchange/avidin/reversed phase) [49] chromatographic separation methods of peptide mixtures. The protein samples can also be prefractionated using SDS-polyacrylamide gel electrophoresis (SDS-PAGE) or isoelectric focusing (IEF) prior to analysis.

MS-Based Quantitative Analysis

Several MS-based strategies have been developed that allow different samples to be compared quantitatively. In extracted ion current (XIC)-based quantification, the signal intensity of peptides that elute from the chromatographic column is plotted over time, and the area under this curve is the XIC [35, 50]. In this approach, the intensity of the peptide signals between two states can be compared. Two major advantages of XIC-based quantification are that no labeling is required and that it can be used with any type of sample. A more versatile approach for precise relative quantification involves the differential labeling of two or more sets of proteins or peptides derived from different cell states with light and heavy isotopes of the same chemical reagent followed by MS analysis (Fig. 3). These techniques also allow relative quantification of basic, hydrophobic, or large proteins excluded from analysis using 2-DE or difference in-gel electrophoresis (DIGE).

Figure Figure 3..

Schematic representation of methods for isotopic labeling for relative quantification using MS. This strategy makes use of the facts that chemically identical analytes of different stable-isotope composition can be distinguished in a mass spectrometer because of their mass difference and that the ratio of signal intensities for such analytes indicates their abundance ratio (reviewed in [110]). (A): Proteins or peptides derived from two different cell states are derivatized with light and heavy isotopes of the same chemical reagent. The samples are then combined and analyzed by MS. The relative abundance levels of the proteins are calculated by comparing the peak heights of the light- and heavy-labeled peptides. In the chemical labeling approach (A), stable isotope-bearing chemical reagents are targeted toward reactive sites on a protein. A method that has gained popularity is the ICAT approach [116]. The label contains a thiol group (which reacts with cysteine residues), eight hydrogens (light) or deuteriums (heavy), used for relative quantification, and a biotin group that is selectively recognized during the affinity extraction step by an avidin moiety attached to the chromatographic column. Another method is the isobaric Tag for relative and absolute quantitation, which uses the same approach but adds an innovative concept, namely a tag that generates a specific reporter ion in fragmentation spectra [117]. There are four tags that allow the analysis of four separately labeled protein pools in a single experiment, increasing analytical throughput. In metabolic labeling (B), the incorporation is achieved by using media that include isototopic labels, such as 15N or 14N. SILAC has proven to be a simple and powerful approach for quantitative proteomics [110]. In this method, one cell state is metabolically labeled by, for example, 13C-containing arginine, which is incorporated into newly synthesized polypeptides in a sequence-specific fashion. Thus, all arginine-containing peptides can be labeled. Both the high labeling efficiency and the absence of additional chemical labeling steps make the method easy to apply. In labeling via enzyme reaction (C), an 18O isotope is coupled to the peptides by the protease that splits the proteins into peptides. During this proteolytic step, which takes place in H218O, 18O from the water molecules is incorporated into the resulting peptides [118, 119]. Abbreviations: ICAT, isotope-coded affinity tag; MS, mass spectrometry; MS/MS, tandem mass spectrometry; SILAC, stable isotope labeling by amino acids in cell culture.

Protein Chips

Protein chips will likely be the next major manifestation of the revolution in proteomics and offer another solution to analyze low-abundance proteins and have the potential for high-throughput applications to identify biomarkers. Protein chips differ from previously described methods: whereas screening by 2-DE or liquid chromatography (LC)-MS/MS can potentially detect any protein, protein chips can only provide data on set of proteins selected by the investigator (Fig. 4).

Figure Figure 4..

Schematic representation of protein microarray. Protein chips are made in much the same way as DNA microarrays. A glass or plastic surface is spotted with an array of molecules designed to capture specific proteins; these capturing molecules can be other proteins, such as antibodies, antigens, enzymes, or peptides, or they can be snippets of DNA or small organic ligands. Fluorescent markers or other detection schemes reveal which spots have formed a complex with their targets. Because the identity of each protein-binding molecule on the chip is known, a particular spot on the chip that emits light upon excitation-specific stimulation indicates that a target protein has been captured or that an enzyme has been activated. However, this technique is not as straightforward as that for conventional DNA chips. (A): Analytical protein microarrays contain. antibodies with high affinity and specificity that are spotted on an appropriate surface. These chips are used for proteome profiling and monitoring proteome expression pattern and clinical diagnostics. In the example shown, all of the proteins in two biological samples that one would like to compare are labeled with distinguishable markers (e.g., labeled with red or green fluorescent dyes). Then, protein samples are mixed, and this mixture is used to incubate the array. Spots that show up as green or red have an excess of protein from one sample over the other. Spots that appear yellow approximate an equal amount of protein from each sample. This experimental procedure is analogous to gene expression profiling with DNA microarrays, and it allows the effect of various physiological stimuli or genetic alterations on proteins levels to be examined simultaneously. In analytical protein array, different types of ligands, including antibodies, antigens, DNA or RNA, aptamers, carbohydrates, or small molecules, with high affinity and specificity, are spotted onto a derivatized surface. (B): Functional protein microarrays require native proteins or peptides that are individually purified or synthesized from cDNAs by in vitro transcription/translation or other high-throughput techniques, after which they are spotted onto a suitable surface to form the functional protein microarrays. These chips are used to probe protein activities, binding properties, and post-translational modifications. In the example shown, proteins are labeled with distinguishable markers (e.g., labeled with green fluorescent dye). The spots that “light up” are candidate binding partners. With the proper detection method, functional protein microarrays can be used to identify the substrates of enzymes of interest. This class of chips is particularly useful in drug and drug-target identification and in unraveling biological networks.

Modern surface-enhanced laser desorption and ionization (SELDI) technology uses MS as a read-out system to analyze differential protein expression on spot arrays. Comparison of two mass spectroscopy data sets generated from two different samples immediately identifies the differentially expressed proteins. Thus, high-throughput analysis of crude samples readily and rapidly generates data that can be used for diagnosis or prognosis. The key disadvantage is that the mass spectrum obtained usually does not enable identification of the proteins analyzed, necessitating additional experimental procedures (e.g., enrichment by affinity chromatography and identification by methods such as tandem MS) [51].

Stem Cell Proteomics

Profiling and Differential Expression Analysis

Proteome mapping serves as a starting point for building up a comprehensive database of the SC proteome. Several groups have used proteomics to identify SC-specific proteins in mouse ESCs (mESCs) [52, [53]–54], human ESCs (hESCs) [54, 55], human umbilical cord blood (UCB) MSCs [56], human bone marrow (BM) MSCs [57], rat NSCs [58], and human NSCs [27]. A comprehensive list of proteomic investigations including different SC types, practical approaches, and major achievements is given in Table 1. Although proteins involved in energy metabolism comprise the largest group of identified proteins in adult SCs [56, 58, 59], a significant proportion of identified proteins in ESCs are involved in protein synthesis, processing, and transport [53, 55], reflecting the potential of hESCs to either maintain an undifferentiated state or quickly change phenotype, as observed in rapid differentiation processes. One of the characteristics of the protein subset identified in ESC lines is that they contain relatively abundant nuclear proteins in terms of both variety and protein content. This might be related to the high nucleus-to-cytoplasm ratio of ESCs.

Table Table 1.. A summary of published papers about stem cell proteomics
original image
original image
original image

In all, ninety-two proteins were commonly identified in both mouse and human ESCs (Fig. 5). The comparison of rat NSCs [58] with human NSCs [27] and the comparison of human UCB-MSCs [56] with human adipose-derived (AD) MSCs [60] revealed 52 and 65 MSC- and NSC-specific proteins, respectively. NSCs have been shown to be more similar to ESCs than to MSCs (Fig. 5). The global overlap between genes expressed in ESCs and NSCs supports previous results at the transcriptome level [5] and corroborates the observed default differentiation pathway of ESCs to neural lineages. This is in line with observations in which embryonic cells of both frogs [61] and mice [62] become neural cells in the absence of cell-to-cell signaling. Possible reasons for discrepancies in proteome analyses of SC are discussed below.

Figure Figure 5..

Venn diagram of shared and unique protein expression in ESCs, MSCs, and NSCs. Regions of overlap between circles indicate common gene expression. Regions that do not overlap between circles indicate unique gene expression signatures of particular cell types. (A): Venn diagram showing unique and common proteins in hESCs (left) and mESCs (right). The overlapping region of hESCs and mESCs contains 92 proteins identified in both ESCs. (B): Venn diagram showing unique and common proteins in human bone marrow MSCs (left) and adipose-derived MSCs (right). The overlapping circles indicate 52 proteins identified in both MSCs. (C): Venn diagram showing unique and common proteins in rat NSCs (left) and human NSCs (right). The overlapping circles indicate 65 proteins identified in both NSCs. (D): Venn diagram showing unique and common proteins in ESCs, MSCs, and NSCs. The overlapping circles indicate nine proteins expressed in all three stem cells (SCs). The gene symbols of SCs specific and common proteins are indicated in boxes. The three SC types (ESCs, MSCs, and NSCs) shared nine proteins identified in proteomics screens, including proteins involved in energy production and metabolisms (Atp5b, ATP synthase β chain; Eno1, Enolase 1; and Tpi1, Triosephosphate isomerase), disease and stress (Stip1, stress-induced-phosphoprotein 1; and Prdx1, Peroxiredoxin 1), protein folding (Cct2, chaperonin-containing TCP1-subunit 2β; Hspa5, 78-kDa glucose-regulated protein precursor; Hspa8; β subunit and Heat shock cognate 71-kDa protein), and an unclassified protein (Tpt1, Translationally controlled tumor protein, or TCTP). Abbreviations: hESC, human ESC; mESC, mouse ESC.

Over the past few years, there has been a growing interest in applying proteomics to study differential expression of SC genes in different developmental stages, thereby specifically aiming at unraveling the regulatory networks active during differentiation of mESCs [63, [64], [65], [66], [67]–68], hESCs [54], MSCs [57, 60, 69, 70], NSCs [27, 59], and HSCs [71]. Interestingly, the nine SC-specific proteins distinguished in this review were among those differentially expressed; that is Peroxiredoxin 1 [27, 59, 60, 63], Heat shock cognate 71 kDa protein [27, 59], Enolase 1 [59, 66], 78-kDa glucose-regulated protein precursor [27, 66], T-complex protein 1 [66], Translationally controlled tumor protein [64], and ATP synthase β chain [66]. A small number of other differentiation-associated proteins have also been published, including proteins involved in stress response and oxidative defense (HSP27 [60, 64, 65, 72]), 60-kDa heat shock protein [59, 64, 66], Peroxiredoxin 4 [59, 64], cell cytoskeleton (Tubulin-α) [27, 54, 59, 63, 64, 72], vimentin [63, 65, 66], receptor for intracellular transport (Syntaxin 7) [27, 60], and multifunctional protein (calreticulin) [64, 66, 72].

Although these studies have generated a wealth of data, it is rather difficult to create a definitive proteome profile of undifferentiated and differentiated SCs in the different published studies. One of the major hurdles that have to be overcome in large-scale proteomic studies in general is the reduction of sample complexity. As yet, there is no single preparation method that allows identification of all proteins present in a sample. Applying different separation methods will produce various sample compositions, thus resulting in different data sets after proteome analysis. Also, different quantitative analysis methods may show variation in accuracy and sensitivity to samples of various complexities. Wu et al. [73] compared three quantitative methods frequently used in proteomics, 2-DE-DIGE, isotope-coded affinity tag (ICAT), and isobaric Tag for relative and absolute quantitation. They reported that there is a limited overlap of differentially expressed proteins identified by the three methods from two closely related HCT-116 cell lines, suggesting the complementary nature of these approaches [73]. Nevertheless, the complementary information obtained through different methods should potentially provide a better portrait of the biological system under investigation.

On the other hand, differences in culture methodologies applied in different laboratories are likely to induce variations in protein expression. SCs are notoriously difficult to culture compared with more conventional cell lines; this difficulty is mainly due to our lack of knowledge about SCs. Some culture methodologies that work in one laboratory may not work as well in another laboratory because of unknown or less well-defined factors (e.g., serum batches) that affect cell growth and behavior. In addition, the different methods used to derive SC lines will also contribute to differences in cell line characteristics (discussed further in Heterogeneity of Proteome). These factors force individual laboratories to empirically adjust and optimize the culture conditions required to grow SCs. Thus, the different protein separation techniques, proteome analysis methods, and culture conditions, all of which depend on the interest of individual research groups in a specific topic, result in the generation of various proteome data sets that are almost impossible to compare directly. However, the different proteome profiles that have become and will become available are usually supplementary and thereby complement our overall knowledge of SCs and the proteins expressed in different environments as well as under different culture conditions.

Membrane Proteomics

Approximately 20%–30% of all genes in an organism encode integral membrane proteins, which are involved in numerous cellular processes. The target residues for tryptic cleavage (i.e., lysine and arginine) are mainly absent in transmembrane helices and preferentially found in the hydrophilic part of these lipid bilayer-incorporated proteins. Because of the protein aggregation step of IEF, 2-DE is unsuitable for the separation of integral membrane proteins and is limited to detection of membrane-associated proteins and membrane proteins with a low hydrophobicity (e.g., those having only one or two transmembrane helices). In contrast, the combination of one-dimensional (1D) gel separation and LC-MS/MS has been applied with success [74]. Another, more successful approach to isolating membrane proteins relies on cell surface-labeling in combination with high-resolution 2D LC-MS/MS. First, cell surface proteins of intact cells are selectively labeled with the membrane-impermeable reagent biotin, and biotinylated plasma membrane proteins are then enriched via affinity capture using immobilized avidin. The biotinylated proteins can be separated by gel electrophoresis and identified with MS [75]. The only record of cell surface proteome characterization of ESCs was made by Nunomura et al. [36], who studied cell surface proteins using cell surface labeling of undifferentiated mESCs (line D3) coupled to high-resolution 2D LC-MS/MS. They identified 324 proteins, 235 of which had a putative signal sequence and/or transmembrane segments [36]. Using 1D gel followed by LC-MS/MS [36], Foster et al. [35] applied an XIC-based quantification method and identified 104 membrane proteins from human MSCs; they found that expression levels changed during differentiation toward osteoblast cells.

In both studies, many of the identified proteins were abundant housekeeping proteins, such as ribosomal constituents, structural molecules, histones, and chaperones. Although some of these proteins might be associated with the membrane, it was difficult to distinguish them from the intracellular components released by dead cells. Combining this method with quantitative proteomic approaches, such as stable isotope labeling, will provide valuable information about stage- and lineage-specific expression of SCs.

Post-Translational Modification

Many regulatory steps, especially those involved in cell proliferation, migration, and differentiation, depend on protein PTMs rather than protein abundance [76]. Several 2-DE reports have identified large numbers of isoforms or PTMs in SCs (Table 1). By comparing proteomic data with transcriptome analyses, Unwin et al. [77] showed that the shift in proteome from long-term reconstituting HSCs (LinSca+Kit+; LSK+) to non-long-term reconstituting progenitor cells (LinSca+Kit; LSK) was associated with post-transcriptional control of protein levels. Another study, performed by Schrattenholz et al. [78], enabled the enrichment of phosphoproteins of neuronal derivatives of mESCs that were exposed to chemical ischemia in a differential and quantitative proteome analysis. Moreover, in a study that was restricted to a defined set of proteins, Prudhomme et al. [79] used a computational systems biology approach to study phosphorylation states of 31 intracellular signaling network components across 16 different stimuli at three time points. They applied quantitative Western blotting and partial-least-squares modeling to determine which components showed the strongest correlation with cell proliferation and differentiation rates [79].

Kratchmarova et al. [80] applied a quantitative phosphoproteomics approach to study the effects of growth factors (epidermal growth factor [EGF] and platelet-derived growth factor [PDGF]) on human MSCs. Carrette et al. [81] metabolically labeled the proteins in cell culture using stable isotope labeling by amino acids in cell culture (SILAC) (Fig. 3), combined the cell lysates of the three states, and incubated this mixture with antibodies against phosphotyrosine. The precipitated complexes were resolved with 1D SDS-PAGE and proteolytically digested, after which the resulting peptide mixture was analyzed with LC-MS/MS. These results showed that EGF and PDGF modulate ostogenic capability of MSCs through mitogen-activated protein kinase (MAPK)/ERK, P38 kinase, and phosphatidylinositol 3-kinase signaling.

Puente et al. [72] sought to characterize the SC state by identifying the phosphoproteome of mESCs and their derivatives formed in embryoid bodies (EBs). Samples were loaded onto phosphoprotein-affinity columns, and eluted proteins were separated by 2-DE followed by silver staining. Proteins visualized with silver stain were identified by MALDI MS/MS or LC-MS/MS. The set of proteins that exhibited altered PTM during differentiation included several proteins previously displayed in gene expression arrays as conserved features of the SC phenotype. Proteins related to protein catabolism, protein folding, chromatin remodeling, and other functions were found to exhibit altered phosphorylation between the ESC and EB states. As such, these data suggest that kinase activity and the phosphorylation state of target substrates act as critical regulators of SC function.

Heterogeneity of Proteome

The reproducibility of proteome profiles of individual SC samples or their derivatives generated under similar conditions is a major criterion for large-scale proteomics-based studies. The proteome of a cell is highly dynamic and depends on several parameters, including genetic background, the method of derivation, growth condition, and the stage of the cell cycle during sample collection. Therefore, individual samples of cells in the same physiological state should be made for accurate and reliable quantitative proteome comparisons with respect to protein up- or downregulation.

Zenzmaier et al. [82] compared CD34+ preparations from five different umbilical cord samples. Out of hundreds of spots detected on 2D gels, they found only 52 common proteins, 22 of which were identified using nano-high-performance LC-MS/MS [82]. Since the purity of the cell samples was >88%, the observed heterogeneity could not be attributed to contaminating cells. Instead, the difference in the protein patterns was interpreted as SC-intrinsic heterogeneity.

We analyzed the proteome of three hESC lines in triplicate and identified 54 and 14 proteins showing quantitative (p ≤ .01) and qualitative changes, respectively [55]. Moreover, van Hoof et al. [54] reported that the expression levels of proteins such as β-actin and Oct4 were similar between the hESC lines NL-HESC-01 [83] and HES-2 [84], whereas the expression ratios of several of these proteins were different in another hESC line, HUES-1 [85]. HES-2 and NL-HESC-01 cells were both passaged mechanically by a cut-and-paste method in serum-containing medium [84, 86], whereas HUES-1 cells were passaged enzymatically by trypsinization and cultured in serum replacement with basic fibroblast growth factor [85]. Potential sources of variation among hESC lines included the following (reviewed in [87, 88]): (a) differences due to origin of cell lines ((i) genomic diversity; (ii) stage of preimplantation embryo at derivation; (iii) conditions of early culture (feeder layer, culture conditions); (iv) differences in culture [89] and derivation procedures applied in laboratories, such as different feeder cell types and densities, culture substrates, culture media, growth factors/other additives, and freezing method; (v) the passage number and method of passaging [2, 90, 91]; and [vi) imprinting and X-inactivation]; (b) differences arising over time in culture ((i) genetic changes (loss or gain of specific sequences); and (ii) general and specific epigenetic changes [DNA methylation, histone acetylation, and micoRNAs); reviewed in [92]]; and (c) differences due to mosaicism in cultures ((i) partial or terminal differentiation of subpopulations within cultures; and [ii) variation among epigenetic and genetic changes].

In adult SCs, it was shown that the proliferation and osteogenic capacity of human MSCs decrease during serial subculturing [90]. Moreover, passage-specific proteins were found, which were suggested to be differentially regulated and to play a role in the decrease of osteogenic differentiation potential under serial subculturing.

The purification and extraction of specific SC-derived cell types and the consistency and reproducibility of sample generation are thus considered important issues. SC differentiation usually yields mixed and heterogeneous cell populations. Therefore, optimization of protocols for enhancement of differentiation toward a specific cell linage and the following purification should be taken into account (reviewed in [93]). Feasible methods that may help to achieve this include the following: (a) addition of specific combinations of growth factors or chemical morphogens, (b) changing the physical and geometrical microenvironment, (c) coculture or transplantation of SCs with inducer tissues or cells, (d) implantation of SCs into specific organs or tissues, and (e) overexpression of transcription factors associated with development of specific cell lineages. However, to date, these strategies have not yielded pure populations of mature progeny and apparently require efficient protocols to purify specific cell populations. Methods such as fluorescence-activated cell sorting and magnetic-activated cell sorting allow purification as such, but they depend on the cell to express a surface marker that can be recognized by a fluorescent or magnetic microbead-tagged antibody; to be desirably effective, the marker needs to be cell-type-specific. In most cases, these cell markers are not commercially available; thus, sorting methods rely on, for example, genetic modification of SCs by tagging a lineage-specific promoter to a fluorescent marker. Alternatively, cells could be transduced with a drug-resistance gene instead of a marker, to allow preferential selection of subpopulations.

Application of Protein Array to Stem Cell Proteomics

Protein arrays offer a different solution and have the potential for high-throughput applications to identify novel protein markers and molecular pathways. Hayman and Przyborski [94] applied SELDI-TOF to rapidly generate protein peakmap bioprofiles. They demonstrated that this approach can be used with up to 100% accuracy to distinguish human ECCs from differentiated derivatives [94]. It should be noted that this approach does not identify the individual molecules expressed in the cell sample. Yet if the identification of a particular protein is required, the current approach can be combined with other technology, such as SELDI-tandem MS. Using cytokine protein arrays, it has been shown that cytokine induction and signal transduction are important for the differentiation of human UCB-MSCs [95]. Sakaguchi et al. [96] used a ProteinChip system to identify OP9-a BM cell line-conditioned medium molecule responsible for neurosphere formation from NSCs.

The application of reverse-phase protein arrays for the analysis of primary acute myelogenous leukemia samples, as well as leukemic and normal SCs, has been demonstrated [97]. Using this strategy, the differences in protein expression in as few as three cell protein equivalents could be detected. Therefore, it was suggested that this approach can be applied as a highly reliable and reproducible high-throughput system for rapid, large-scale proteomic analyses of protein expression and phosphorylation state in primary acute myelogenous leukemia cells, as well as in human SCs.


The secretome is defined as a subset of the proteome that contains all proteins actively exported out of a cell from any origin. The type of proteins secreted by the cells strictly depends on the type of cell and the cellular state; therefore, the secretome reveals much about what is going on inside the cell.

The proteomics approach was used to characterize an environment that supports the growth of undifferentiated hESCs and to identify factors critical for their independent growth. Proteome analysis of conditioned medium (CM) from mouse embryonic fibroblast feeder layers (STO cell line) [98] and human neonatal foreskin cell line (HNF02) [99] resulted in the identification of several proteins involved in cell growth, differentiation, and extracellular matrix formation and remodeling; many intracellular proteins were identified.

Zvonic et al. [100] compared the secretomes of CMs obtained from four individual primary AD-SC cultures in uninduced or adipogenic-induced conditions and identified several proteins, such as adiponectin and plasminogen activator inhibitor 1, and multiple serine protease inhibitor proteins (serpins).

These studies indicate the complexity of the environment formed by the feeder cells and provide a useful starting point for future studies. Secretome studies show a high potential for identification of biomarkers involved in many cellular processes, including growth, division, differentiation, development, and death.

Transplantation Proteomics

Although considerable progress in human transplantation medicine has been made, several major obstacles still restrict more widespread application of cell transplantation and in particular that of SCs. The major clinical obstacle that has to be overcome is demonstrating the safety and feasibility of cell therapy. Proteomic analyses of tissues and body fluids after cell therapy could address these concerns. For example, Kaiser et al. [101] analyzed urine after HSC transplantation (HSCT) and could clearly distinguish between patients with graft-versus-host disease (GVHD) and those with no problems after HSCT with a high specificity (82%) and a sensitivity of 100%.

Wang et al. [102] quantitatively analyzed the human plasma proteome before and after the onset of GVHD, leading to the identification of a large number of proteins that are affected by GVHD after HSCT. They identified 75 proteins that exhibited quantitative changes between the pre- and post-GVHD samples [102]. Some of these proteins were well-known acute-phase reactants, including serum amyloid A, apolipoproteins A-I/A-IV, and complement C3.

To study salivary protein changes that occur after HSCT, Imanguli et al. [103] analyzed serially collected saliva samples from 41 patients undergoing allo-HSCT using SELDI-MS in conjunction with 2-DE. Significant changes in multiple salivary proteins that lasted at least 2 months post-transplant were detected, including upregulation of lactoferrin and secretary leukocyte protease inhibitor and downregulation of secretary IgA. Weissinger et al. [104] could correlate proteomic data with the clinical diagnosis of acute GVHD. From their proteome analysis, a tentatively acute GVHD-specific model consisting of 31 polypeptides was chosen that allowed them to distinguish between patients with GVHD and those with no problems after HSCT with high specificity (98%) and a sensitivity of 100%. The subsequent blinded evaluation of 599 samples enabled diagnosis of acute GVHD, even prior to clinical diagnosis, with a sensitivity of 83.1% and a specificity of 75.6%.

These results showed the power of proteomics as an unbiased laboratory-based screening method, enabling diagnosis and pre-emptive therapy.

Insight into Stem Cell Protein Networks and Signaling Pathways for Pluripotency

Understanding molecular mechanisms underlying SC pluripotency should illuminate fundamental properties of SCs and the process of cellular reprogramming. Proteomics proved to be a powerful approach to gain insight concerning key intracellular signals governing SC self-renewal and differentiation.

In an attempt to analyze the cue-signal-response relationship underlying SC self-renewal versus differentiation, the phosphorylation states of 31 intracellular signaling network components were quantitatively studied under fibronectin, laminin, leukemia inhibitory factor, and fibroblast growth factor-4 treatments [79]. Using a multivariate proteomic approach, Prudhomme et al. [79] identified a set of signaling network components most critically associated with differentiation (Stat3, Raf1, MEK, and ERK), proliferation of undifferentiated mESCs (MEK and ERK), and proliferation of differentiated cells (PKBα, Stat3, Src, and PKCε).

A quantitative MS-based phosphoproteomics approach has been applied to elucidate critical differences in the signaling mechanisms of EGF and PDGF that led to the differential effects on osteoblast differentiation of human MSC (as described in Post-Translational Modification) [80]. By studying tyrosine-phosphorylated proteins in response EGF and PDGF, Kratchmarova et al. [80] found that less than 10% of all phosphotyrosine proteins are specific to either the EGF or PDGF activation program in human MSCs, revealing a range of widely shared signaling pathways. Examples include the mitogen-activated MAPK cascades and signal attenuation through receptor ubiquitination followed by endocytic removal from the cell surface. However, based on the observation that EGF-treated human MSCs but not PDGF-treated cells undergo osteogenic differentiation, the variation contained in the 10% of differentially activated genes was clearly of crucial significance. The most striking difference was the preferential activation of phosphatidylinositol 3-kinase exclusively by PDGF, signifying a possible control point in the osteogenic differentiation process. These results demonstrated that, at least in some cases, decisions can be made by preferentially activating a small subset of the signaling network.

Using a chip-based proteomics approach, factors affecting the proliferation of NSCs have been screened. Sakaguchi et al. [96] used a ProteinChip system to identify molecules present in conditioned medium of OP9, a BM cell line that induces neurosphere formation from NSCs. In this screen, they identified a soluble carbohydrate-binding protein, Galectin-1, as a candidate. Galectins make up a family of carbohydrate-binding lectin proteins that are implicated in cell adhesion, growth, differentiation, neoplastic transformation, and metastasis [105]. Galectin-1 has also been identified as one of the relatively abundant proteins in mouse embryonic fibroblast-conditioned medium [98] and human foreskin fibroblast-conditioned medium [99]. Based on results from intraventricular infusion experiments and phenotypic analyses of knockout mice, Sakaguchi et al. [96] suggested that the carbohydrate-binding activity of Galectin-1 is required for its promotion of adult neural progenitor cell proliferation.

In a recent investigation, proteomics has been applied to gain insight into the regulatory protein networks in which Nanog operates in mESCs [106]. A construct bearing pluripotency factor Nanog with a Flag tag as well as a peptide tag that serves as a substrate for in vivo biotinylation was expressed in ESCs. The tagged protein was recovered from cellular lysates with streptavidin beads and further purified using anti-Flag antibodies. MS was then applied to identify its interacting partners. Not surprisingly, many of the candidates were other transcription factors, some of which had already been associated with ESCs. The resulting data set was used to generate a complex network of interacting proteins that were depicted in a concise scheme [106]. Most of proteins in this network were shown to be essential for early development and/or ESC properties. The knockout of several network proteins, including Prmt1, YY1, Rnf2, BAF155, Rybp, Oct4, Cdk1, NF45, Sall4, Elys, Tif1β, Pelo, Dax1, and REST, resulted in defects in proliferation and/or survival of the inner cell mass or other aspects of early development. The knockout of Err2, Rif1, Nac1, and Zfp281 resulted in defects in self-renewal and/or differentiation of ESCs. The coexpression of most of network genes and their roles as both targets and effectors indicate that this interactome may serve as a functional module committed to maintaining ESC pluripotency. This network provided a solid base for further exploration of the signaling pathways involved in ESC maintenance [106]. Sall4 had been found to be involved in these signaling pathways by three other groups independently [107, [108]–109]. This protein was also identified in the large-scale proteome study by van Hoof et al. [54]; however, the association with Oct4 and Nanog had not been made. This illustrates the likelihood that numerous proteins specifically identified in SCs play a significant role in SC sustaining processes. Venn diagrams such as Figure 5 will narrow down the search for novel ESC-associated proteins; however, the involvement and role of such candidates in SC maintenance needs to be confirmed by additional experiments.

Future Challenges and Outlook

Proteomic methods have produced large data sets of proteins involved in mechanisms and pathways that regulate SC proliferation and differentiation. The insights thus obtained in SC biology have also created many opportunities to improve public health. In recent years, numerous proteomics techniques have been developed and are continuously being improved, in both peptide and protein separation (e.g., LC and 2-DE), as well as MS methods and accuracy. However, several important issues that remain to be addressed rely on further technical advances in proteomics analysis. When large proteomes consisting of thousands of proteins are analyzed, the dynamic resolution is restricted and only the most abundant proteins can be detected [110].

Despite advances in non-2-DE based proteomics technologies, 2-DE remains the pivotal and most widespread method of currently proteomics [73, 80]. However, we believe that large-scale MS-based quantification approaches will significantly contribute to our understanding of SCs in the future and will soon become the standard to analyze the SC proteome. To enable proteome-wide quantification, further optimization of chemical-labeling reagents, including chemicals targeting specific protein classes and MS instrument performance, are necessary [110].

Proteome-wide quantification of membrane proteins requires methods that solve problems such as contamination of intracellular components, protein insolubility, and loss of hydrophobic peptides, which prevent protein identification. Although protein chips are still under development, they have already proven their value to study protein functions and expression patterns. Requiring only small amounts of material makes them exceptionally well-suited to study SC populations. However, further optimization of these techniques is needed before they can be widely used in proteome analyses. The application of several array-based approaches, such as phosphorylation or G protein-coupled receptor arrays, that are missing from the current SC literature will provide highly valuable contributions.

The advantage of MS-based proteomics is its ability to indiscriminately study PTMs that affect activity and binding properties of proteins, thereby altering their roles within the cell. It is likely that the phosphoproteome, protein interactions (interactome), and glycomics will soon become major areas of SC proteomics research.

One of the major problems in the SC studies is to obtain consistent results for the same type studies. Proteomics may very well contribute to gaining insight into SC functioning and behavior and thus provide clues for how to tackle these problems. Obviously, gaining more insight into how SCs respond to their environment will improve our ways to control their behavior by applying better-defined culture methods.

Despite increasing conformity in proteomics applications and data storage, it remains difficult to draw consistent conclusions from individual studies because of the use of different cell types, establishment, and maintenance; the number of passages; and the passaging methods applied. Standardization of proteomic methodologies and strategies between different groups of investigators and introduction of standard operation procedures would facilitate the comparability of proteomics results. In addition, the establishment of unique databases for the ever-increasing wealth of information generated by proteome-wide and in-depth proteomic studies of SCs will be indispensable (Fig. 6). To this end, several initiatives were set up to characterize numerous existing hESC lines using standardized assay conditions to allow unrestrained comparison of the data sets generated. Such initiatives have been instigated by the International Stem Cell Initiative [111], the International Stem Cell Forum (, the NIH Stem Cell Unit (, and the American Type Culture Collection (; [112]). Combined, the various proteomic approaches will continue to revolutionize insights into SC biology.

Figure Figure 6..

Morphology and corresponding two-dimensional gel electrophoresis (2-DE) gel of ESC colonies of different species cultured on mouse embryonic fibroblast feeder cells. Phase contrast microscopy of mESC colonies (Royan B1, [120]) derived from C57BL/6 strain (A), monkey ESC colonies (Macaca fascicularis) (a gift from Prof. Taru Kita, Kyoto University, Kyoto, Japan) (B), and human ESC colonies (Royan H5, [121]) (C). (D): We developed a 2-DE database of hESCs that contains hyperlinked 2-DE gel images and descriptive textual information, such as protein name, molecular weight/isoelectric point values, mass spectrometry (MS) score, sequence coverage, and other information that is publicly available at It also provides facilities to search protein spots on two-dimensional gels and retrieve information related to experimental design, MS analysis, and Mascot search results. The information available in this database will be linked to the results of other SC proteome studies, aiming to provide a comprehensive, comparable, and expandable resource for SC research. 2-DE gels from mouse and monkey ESCs have recently been incorporated into this database.

Disclosure of Potential Conflicts of Interest

The authors indicate no potential conflicts of interest.


We gratefully thank Dr. Peter Hains (Australia) for critical reading and helpful comments on the manuscript. This work was supported by grants from the Royan Institute.