Profiling and Differential Expression Analysis
Proteome mapping serves as a starting point for building up a comprehensive database of the SC proteome. Several groups have used proteomics to identify SC-specific proteins in mouse ESCs (mESCs) [52, –54], human ESCs (hESCs) [54, 55], human umbilical cord blood (UCB) MSCs , human bone marrow (BM) MSCs , rat NSCs , and human NSCs . A comprehensive list of proteomic investigations including different SC types, practical approaches, and major achievements is given in Table 1. Although proteins involved in energy metabolism comprise the largest group of identified proteins in adult SCs [56, 58, 59], a significant proportion of identified proteins in ESCs are involved in protein synthesis, processing, and transport [53, 55], reflecting the potential of hESCs to either maintain an undifferentiated state or quickly change phenotype, as observed in rapid differentiation processes. One of the characteristics of the protein subset identified in ESC lines is that they contain relatively abundant nuclear proteins in terms of both variety and protein content. This might be related to the high nucleus-to-cytoplasm ratio of ESCs.
Table Table 1.. A summary of published papers about stem cell proteomics
In all, ninety-two proteins were commonly identified in both mouse and human ESCs (Fig. 5). The comparison of rat NSCs  with human NSCs  and the comparison of human UCB-MSCs  with human adipose-derived (AD) MSCs  revealed 52 and 65 MSC- and NSC-specific proteins, respectively. NSCs have been shown to be more similar to ESCs than to MSCs (Fig. 5). The global overlap between genes expressed in ESCs and NSCs supports previous results at the transcriptome level  and corroborates the observed default differentiation pathway of ESCs to neural lineages. This is in line with observations in which embryonic cells of both frogs  and mice  become neural cells in the absence of cell-to-cell signaling. Possible reasons for discrepancies in proteome analyses of SC are discussed below.
Figure Figure 5.. Venn diagram of shared and unique protein expression in ESCs, MSCs, and NSCs. Regions of overlap between circles indicate common gene expression. Regions that do not overlap between circles indicate unique gene expression signatures of particular cell types. (A): Venn diagram showing unique and common proteins in hESCs (left) and mESCs (right). The overlapping region of hESCs and mESCs contains 92 proteins identified in both ESCs. (B): Venn diagram showing unique and common proteins in human bone marrow MSCs (left) and adipose-derived MSCs (right). The overlapping circles indicate 52 proteins identified in both MSCs. (C): Venn diagram showing unique and common proteins in rat NSCs (left) and human NSCs (right). The overlapping circles indicate 65 proteins identified in both NSCs. (D): Venn diagram showing unique and common proteins in ESCs, MSCs, and NSCs. The overlapping circles indicate nine proteins expressed in all three stem cells (SCs). The gene symbols of SCs specific and common proteins are indicated in boxes. The three SC types (ESCs, MSCs, and NSCs) shared nine proteins identified in proteomics screens, including proteins involved in energy production and metabolisms (Atp5b, ATP synthase β chain; Eno1, Enolase 1; and Tpi1, Triosephosphate isomerase), disease and stress (Stip1, stress-induced-phosphoprotein 1; and Prdx1, Peroxiredoxin 1), protein folding (Cct2, chaperonin-containing TCP1-subunit 2β; Hspa5, 78-kDa glucose-regulated protein precursor; Hspa8; β subunit and Heat shock cognate 71-kDa protein), and an unclassified protein (Tpt1, Translationally controlled tumor protein, or TCTP). Abbreviations: hESC, human ESC; mESC, mouse ESC.
Download figure to PowerPoint
Over the past few years, there has been a growing interest in applying proteomics to study differential expression of SC genes in different developmental stages, thereby specifically aiming at unraveling the regulatory networks active during differentiation of mESCs [63, , , , –68], hESCs , MSCs [57, 60, 69, 70], NSCs [27, 59], and HSCs . Interestingly, the nine SC-specific proteins distinguished in this review were among those differentially expressed; that is Peroxiredoxin 1 [27, 59, 60, 63], Heat shock cognate 71 kDa protein [27, 59], Enolase 1 [59, 66], 78-kDa glucose-regulated protein precursor [27, 66], T-complex protein 1 , Translationally controlled tumor protein , and ATP synthase β chain . A small number of other differentiation-associated proteins have also been published, including proteins involved in stress response and oxidative defense (HSP27 [60, 64, 65, 72]), 60-kDa heat shock protein [59, 64, 66], Peroxiredoxin 4 [59, 64], cell cytoskeleton (Tubulin-α) [27, 54, 59, 63, 64, 72], vimentin [63, 65, 66], receptor for intracellular transport (Syntaxin 7) [27, 60], and multifunctional protein (calreticulin) [64, 66, 72].
Although these studies have generated a wealth of data, it is rather difficult to create a definitive proteome profile of undifferentiated and differentiated SCs in the different published studies. One of the major hurdles that have to be overcome in large-scale proteomic studies in general is the reduction of sample complexity. As yet, there is no single preparation method that allows identification of all proteins present in a sample. Applying different separation methods will produce various sample compositions, thus resulting in different data sets after proteome analysis. Also, different quantitative analysis methods may show variation in accuracy and sensitivity to samples of various complexities. Wu et al.  compared three quantitative methods frequently used in proteomics, 2-DE-DIGE, isotope-coded affinity tag (ICAT), and isobaric Tag for relative and absolute quantitation. They reported that there is a limited overlap of differentially expressed proteins identified by the three methods from two closely related HCT-116 cell lines, suggesting the complementary nature of these approaches . Nevertheless, the complementary information obtained through different methods should potentially provide a better portrait of the biological system under investigation.
On the other hand, differences in culture methodologies applied in different laboratories are likely to induce variations in protein expression. SCs are notoriously difficult to culture compared with more conventional cell lines; this difficulty is mainly due to our lack of knowledge about SCs. Some culture methodologies that work in one laboratory may not work as well in another laboratory because of unknown or less well-defined factors (e.g., serum batches) that affect cell growth and behavior. In addition, the different methods used to derive SC lines will also contribute to differences in cell line characteristics (discussed further in Heterogeneity of Proteome). These factors force individual laboratories to empirically adjust and optimize the culture conditions required to grow SCs. Thus, the different protein separation techniques, proteome analysis methods, and culture conditions, all of which depend on the interest of individual research groups in a specific topic, result in the generation of various proteome data sets that are almost impossible to compare directly. However, the different proteome profiles that have become and will become available are usually supplementary and thereby complement our overall knowledge of SCs and the proteins expressed in different environments as well as under different culture conditions.
Approximately 20%–30% of all genes in an organism encode integral membrane proteins, which are involved in numerous cellular processes. The target residues for tryptic cleavage (i.e., lysine and arginine) are mainly absent in transmembrane helices and preferentially found in the hydrophilic part of these lipid bilayer-incorporated proteins. Because of the protein aggregation step of IEF, 2-DE is unsuitable for the separation of integral membrane proteins and is limited to detection of membrane-associated proteins and membrane proteins with a low hydrophobicity (e.g., those having only one or two transmembrane helices). In contrast, the combination of one-dimensional (1D) gel separation and LC-MS/MS has been applied with success . Another, more successful approach to isolating membrane proteins relies on cell surface-labeling in combination with high-resolution 2D LC-MS/MS. First, cell surface proteins of intact cells are selectively labeled with the membrane-impermeable reagent biotin, and biotinylated plasma membrane proteins are then enriched via affinity capture using immobilized avidin. The biotinylated proteins can be separated by gel electrophoresis and identified with MS . The only record of cell surface proteome characterization of ESCs was made by Nunomura et al. , who studied cell surface proteins using cell surface labeling of undifferentiated mESCs (line D3) coupled to high-resolution 2D LC-MS/MS. They identified 324 proteins, 235 of which had a putative signal sequence and/or transmembrane segments . Using 1D gel followed by LC-MS/MS , Foster et al.  applied an XIC-based quantification method and identified 104 membrane proteins from human MSCs; they found that expression levels changed during differentiation toward osteoblast cells.
In both studies, many of the identified proteins were abundant housekeeping proteins, such as ribosomal constituents, structural molecules, histones, and chaperones. Although some of these proteins might be associated with the membrane, it was difficult to distinguish them from the intracellular components released by dead cells. Combining this method with quantitative proteomic approaches, such as stable isotope labeling, will provide valuable information about stage- and lineage-specific expression of SCs.
Many regulatory steps, especially those involved in cell proliferation, migration, and differentiation, depend on protein PTMs rather than protein abundance . Several 2-DE reports have identified large numbers of isoforms or PTMs in SCs (Table 1). By comparing proteomic data with transcriptome analyses, Unwin et al.  showed that the shift in proteome from long-term reconstituting HSCs (Lin−Sca+Kit+; LSK+) to non-long-term reconstituting progenitor cells (Lin−Sca+Kit−; LSK−) was associated with post-transcriptional control of protein levels. Another study, performed by Schrattenholz et al. , enabled the enrichment of phosphoproteins of neuronal derivatives of mESCs that were exposed to chemical ischemia in a differential and quantitative proteome analysis. Moreover, in a study that was restricted to a defined set of proteins, Prudhomme et al.  used a computational systems biology approach to study phosphorylation states of 31 intracellular signaling network components across 16 different stimuli at three time points. They applied quantitative Western blotting and partial-least-squares modeling to determine which components showed the strongest correlation with cell proliferation and differentiation rates .
Kratchmarova et al.  applied a quantitative phosphoproteomics approach to study the effects of growth factors (epidermal growth factor [EGF] and platelet-derived growth factor [PDGF]) on human MSCs. Carrette et al.  metabolically labeled the proteins in cell culture using stable isotope labeling by amino acids in cell culture (SILAC) (Fig. 3), combined the cell lysates of the three states, and incubated this mixture with antibodies against phosphotyrosine. The precipitated complexes were resolved with 1D SDS-PAGE and proteolytically digested, after which the resulting peptide mixture was analyzed with LC-MS/MS. These results showed that EGF and PDGF modulate ostogenic capability of MSCs through mitogen-activated protein kinase (MAPK)/ERK, P38 kinase, and phosphatidylinositol 3-kinase signaling.
Puente et al.  sought to characterize the SC state by identifying the phosphoproteome of mESCs and their derivatives formed in embryoid bodies (EBs). Samples were loaded onto phosphoprotein-affinity columns, and eluted proteins were separated by 2-DE followed by silver staining. Proteins visualized with silver stain were identified by MALDI MS/MS or LC-MS/MS. The set of proteins that exhibited altered PTM during differentiation included several proteins previously displayed in gene expression arrays as conserved features of the SC phenotype. Proteins related to protein catabolism, protein folding, chromatin remodeling, and other functions were found to exhibit altered phosphorylation between the ESC and EB states. As such, these data suggest that kinase activity and the phosphorylation state of target substrates act as critical regulators of SC function.
Heterogeneity of Proteome
The reproducibility of proteome profiles of individual SC samples or their derivatives generated under similar conditions is a major criterion for large-scale proteomics-based studies. The proteome of a cell is highly dynamic and depends on several parameters, including genetic background, the method of derivation, growth condition, and the stage of the cell cycle during sample collection. Therefore, individual samples of cells in the same physiological state should be made for accurate and reliable quantitative proteome comparisons with respect to protein up- or downregulation.
Zenzmaier et al.  compared CD34+ preparations from five different umbilical cord samples. Out of hundreds of spots detected on 2D gels, they found only 52 common proteins, 22 of which were identified using nano-high-performance LC-MS/MS . Since the purity of the cell samples was >88%, the observed heterogeneity could not be attributed to contaminating cells. Instead, the difference in the protein patterns was interpreted as SC-intrinsic heterogeneity.
We analyzed the proteome of three hESC lines in triplicate and identified 54 and 14 proteins showing quantitative (p ≤ .01) and qualitative changes, respectively . Moreover, van Hoof et al.  reported that the expression levels of proteins such as β-actin and Oct4 were similar between the hESC lines NL-HESC-01  and HES-2 , whereas the expression ratios of several of these proteins were different in another hESC line, HUES-1 . HES-2 and NL-HESC-01 cells were both passaged mechanically by a cut-and-paste method in serum-containing medium [84, 86], whereas HUES-1 cells were passaged enzymatically by trypsinization and cultured in serum replacement with basic fibroblast growth factor . Potential sources of variation among hESC lines included the following (reviewed in [87, 88]): (a) differences due to origin of cell lines ((i) genomic diversity; (ii) stage of preimplantation embryo at derivation; (iii) conditions of early culture (feeder layer, culture conditions); (iv) differences in culture  and derivation procedures applied in laboratories, such as different feeder cell types and densities, culture substrates, culture media, growth factors/other additives, and freezing method; (v) the passage number and method of passaging [2, 90, 91]; and [vi) imprinting and X-inactivation]; (b) differences arising over time in culture ((i) genetic changes (loss or gain of specific sequences); and (ii) general and specific epigenetic changes [DNA methylation, histone acetylation, and micoRNAs); reviewed in ]; and (c) differences due to mosaicism in cultures ((i) partial or terminal differentiation of subpopulations within cultures; and [ii) variation among epigenetic and genetic changes].
In adult SCs, it was shown that the proliferation and osteogenic capacity of human MSCs decrease during serial subculturing . Moreover, passage-specific proteins were found, which were suggested to be differentially regulated and to play a role in the decrease of osteogenic differentiation potential under serial subculturing.
The purification and extraction of specific SC-derived cell types and the consistency and reproducibility of sample generation are thus considered important issues. SC differentiation usually yields mixed and heterogeneous cell populations. Therefore, optimization of protocols for enhancement of differentiation toward a specific cell linage and the following purification should be taken into account (reviewed in ). Feasible methods that may help to achieve this include the following: (a) addition of specific combinations of growth factors or chemical morphogens, (b) changing the physical and geometrical microenvironment, (c) coculture or transplantation of SCs with inducer tissues or cells, (d) implantation of SCs into specific organs or tissues, and (e) overexpression of transcription factors associated with development of specific cell lineages. However, to date, these strategies have not yielded pure populations of mature progeny and apparently require efficient protocols to purify specific cell populations. Methods such as fluorescence-activated cell sorting and magnetic-activated cell sorting allow purification as such, but they depend on the cell to express a surface marker that can be recognized by a fluorescent or magnetic microbead-tagged antibody; to be desirably effective, the marker needs to be cell-type-specific. In most cases, these cell markers are not commercially available; thus, sorting methods rely on, for example, genetic modification of SCs by tagging a lineage-specific promoter to a fluorescent marker. Alternatively, cells could be transduced with a drug-resistance gene instead of a marker, to allow preferential selection of subpopulations.
Application of Protein Array to Stem Cell Proteomics
Protein arrays offer a different solution and have the potential for high-throughput applications to identify novel protein markers and molecular pathways. Hayman and Przyborski  applied SELDI-TOF to rapidly generate protein peakmap bioprofiles. They demonstrated that this approach can be used with up to 100% accuracy to distinguish human ECCs from differentiated derivatives . It should be noted that this approach does not identify the individual molecules expressed in the cell sample. Yet if the identification of a particular protein is required, the current approach can be combined with other technology, such as SELDI-tandem MS. Using cytokine protein arrays, it has been shown that cytokine induction and signal transduction are important for the differentiation of human UCB-MSCs . Sakaguchi et al.  used a ProteinChip system to identify OP9-a BM cell line-conditioned medium molecule responsible for neurosphere formation from NSCs.
The application of reverse-phase protein arrays for the analysis of primary acute myelogenous leukemia samples, as well as leukemic and normal SCs, has been demonstrated . Using this strategy, the differences in protein expression in as few as three cell protein equivalents could be detected. Therefore, it was suggested that this approach can be applied as a highly reliable and reproducible high-throughput system for rapid, large-scale proteomic analyses of protein expression and phosphorylation state in primary acute myelogenous leukemia cells, as well as in human SCs.
The secretome is defined as a subset of the proteome that contains all proteins actively exported out of a cell from any origin. The type of proteins secreted by the cells strictly depends on the type of cell and the cellular state; therefore, the secretome reveals much about what is going on inside the cell.
The proteomics approach was used to characterize an environment that supports the growth of undifferentiated hESCs and to identify factors critical for their independent growth. Proteome analysis of conditioned medium (CM) from mouse embryonic fibroblast feeder layers (STO cell line)  and human neonatal foreskin cell line (HNF02)  resulted in the identification of several proteins involved in cell growth, differentiation, and extracellular matrix formation and remodeling; many intracellular proteins were identified.
Zvonic et al.  compared the secretomes of CMs obtained from four individual primary AD-SC cultures in uninduced or adipogenic-induced conditions and identified several proteins, such as adiponectin and plasminogen activator inhibitor 1, and multiple serine protease inhibitor proteins (serpins).
These studies indicate the complexity of the environment formed by the feeder cells and provide a useful starting point for future studies. Secretome studies show a high potential for identification of biomarkers involved in many cellular processes, including growth, division, differentiation, development, and death.
Although considerable progress in human transplantation medicine has been made, several major obstacles still restrict more widespread application of cell transplantation and in particular that of SCs. The major clinical obstacle that has to be overcome is demonstrating the safety and feasibility of cell therapy. Proteomic analyses of tissues and body fluids after cell therapy could address these concerns. For example, Kaiser et al.  analyzed urine after HSC transplantation (HSCT) and could clearly distinguish between patients with graft-versus-host disease (GVHD) and those with no problems after HSCT with a high specificity (82%) and a sensitivity of 100%.
Wang et al.  quantitatively analyzed the human plasma proteome before and after the onset of GVHD, leading to the identification of a large number of proteins that are affected by GVHD after HSCT. They identified 75 proteins that exhibited quantitative changes between the pre- and post-GVHD samples . Some of these proteins were well-known acute-phase reactants, including serum amyloid A, apolipoproteins A-I/A-IV, and complement C3.
To study salivary protein changes that occur after HSCT, Imanguli et al.  analyzed serially collected saliva samples from 41 patients undergoing allo-HSCT using SELDI-MS in conjunction with 2-DE. Significant changes in multiple salivary proteins that lasted at least 2 months post-transplant were detected, including upregulation of lactoferrin and secretary leukocyte protease inhibitor and downregulation of secretary IgA. Weissinger et al.  could correlate proteomic data with the clinical diagnosis of acute GVHD. From their proteome analysis, a tentatively acute GVHD-specific model consisting of 31 polypeptides was chosen that allowed them to distinguish between patients with GVHD and those with no problems after HSCT with high specificity (98%) and a sensitivity of 100%. The subsequent blinded evaluation of 599 samples enabled diagnosis of acute GVHD, even prior to clinical diagnosis, with a sensitivity of 83.1% and a specificity of 75.6%.
These results showed the power of proteomics as an unbiased laboratory-based screening method, enabling diagnosis and pre-emptive therapy.
Insight into Stem Cell Protein Networks and Signaling Pathways for Pluripotency
Understanding molecular mechanisms underlying SC pluripotency should illuminate fundamental properties of SCs and the process of cellular reprogramming. Proteomics proved to be a powerful approach to gain insight concerning key intracellular signals governing SC self-renewal and differentiation.
In an attempt to analyze the cue-signal-response relationship underlying SC self-renewal versus differentiation, the phosphorylation states of 31 intracellular signaling network components were quantitatively studied under fibronectin, laminin, leukemia inhibitory factor, and fibroblast growth factor-4 treatments . Using a multivariate proteomic approach, Prudhomme et al.  identified a set of signaling network components most critically associated with differentiation (Stat3, Raf1, MEK, and ERK), proliferation of undifferentiated mESCs (MEK and ERK), and proliferation of differentiated cells (PKBα, Stat3, Src, and PKCε).
A quantitative MS-based phosphoproteomics approach has been applied to elucidate critical differences in the signaling mechanisms of EGF and PDGF that led to the differential effects on osteoblast differentiation of human MSC (as described in Post-Translational Modification) . By studying tyrosine-phosphorylated proteins in response EGF and PDGF, Kratchmarova et al.  found that less than 10% of all phosphotyrosine proteins are specific to either the EGF or PDGF activation program in human MSCs, revealing a range of widely shared signaling pathways. Examples include the mitogen-activated MAPK cascades and signal attenuation through receptor ubiquitination followed by endocytic removal from the cell surface. However, based on the observation that EGF-treated human MSCs but not PDGF-treated cells undergo osteogenic differentiation, the variation contained in the 10% of differentially activated genes was clearly of crucial significance. The most striking difference was the preferential activation of phosphatidylinositol 3-kinase exclusively by PDGF, signifying a possible control point in the osteogenic differentiation process. These results demonstrated that, at least in some cases, decisions can be made by preferentially activating a small subset of the signaling network.
Using a chip-based proteomics approach, factors affecting the proliferation of NSCs have been screened. Sakaguchi et al.  used a ProteinChip system to identify molecules present in conditioned medium of OP9, a BM cell line that induces neurosphere formation from NSCs. In this screen, they identified a soluble carbohydrate-binding protein, Galectin-1, as a candidate. Galectins make up a family of carbohydrate-binding lectin proteins that are implicated in cell adhesion, growth, differentiation, neoplastic transformation, and metastasis . Galectin-1 has also been identified as one of the relatively abundant proteins in mouse embryonic fibroblast-conditioned medium  and human foreskin fibroblast-conditioned medium . Based on results from intraventricular infusion experiments and phenotypic analyses of knockout mice, Sakaguchi et al.  suggested that the carbohydrate-binding activity of Galectin-1 is required for its promotion of adult neural progenitor cell proliferation.
In a recent investigation, proteomics has been applied to gain insight into the regulatory protein networks in which Nanog operates in mESCs . A construct bearing pluripotency factor Nanog with a Flag tag as well as a peptide tag that serves as a substrate for in vivo biotinylation was expressed in ESCs. The tagged protein was recovered from cellular lysates with streptavidin beads and further purified using anti-Flag antibodies. MS was then applied to identify its interacting partners. Not surprisingly, many of the candidates were other transcription factors, some of which had already been associated with ESCs. The resulting data set was used to generate a complex network of interacting proteins that were depicted in a concise scheme . Most of proteins in this network were shown to be essential for early development and/or ESC properties. The knockout of several network proteins, including Prmt1, YY1, Rnf2, BAF155, Rybp, Oct4, Cdk1, NF45, Sall4, Elys, Tif1β, Pelo, Dax1, and REST, resulted in defects in proliferation and/or survival of the inner cell mass or other aspects of early development. The knockout of Err2, Rif1, Nac1, and Zfp281 resulted in defects in self-renewal and/or differentiation of ESCs. The coexpression of most of network genes and their roles as both targets and effectors indicate that this interactome may serve as a functional module committed to maintaining ESC pluripotency. This network provided a solid base for further exploration of the signaling pathways involved in ESC maintenance . Sall4 had been found to be involved in these signaling pathways by three other groups independently [107, –109]. This protein was also identified in the large-scale proteome study by van Hoof et al. ; however, the association with Oct4 and Nanog had not been made. This illustrates the likelihood that numerous proteins specifically identified in SCs play a significant role in SC sustaining processes. Venn diagrams such as Figure 5 will narrow down the search for novel ESC-associated proteins; however, the involvement and role of such candidates in SC maintenance needs to be confirmed by additional experiments.