Mesenchymal stem cells (MSCs) retain both self-renewal and multilineage differentiation capabilities. Despite wide therapeutic potential, many aspects of human MSCs, particularly the molecular parameters to define the stemness, remain largely unknown. Using high-density oligonucleotide micro-arrays, we obtained the differential gene expression profile between a fraction of mononuclear cells of human umbilical cord blood (UCB) and its MSC subpopulation. Of particular interest was a subset of 47 genes preferentially expressed at 50-fold or higher in MSCs, which could be regarded as a molecular foundation of human MSCs. This subset contains numerous genes encoding collagens, other extracellular matrix or related proteins, cytokines or growth factors, and cytoskeleton-associated proteins but very few genes for membrane and nuclear proteins. In addition, a direct comparison of this microarray-generated transcriptome with the published serial analysis of gene expression data suggests that a molecular context of UCB-derived MSCs is more or less similar to that of bone marrow–derived cells. Altogether, our results will provide a basis for studies on molecular mechanisms controlling core properties of human MSCs.
Mesenchymal stem cells (MSCs), also dubbed marrow stromal stem cells, stromal precursor cells, mesenchymal progenitor cells, or colony-forming unit-fibroblastic (CFU-F) cells, are highly proliferating and adherent fibroblastic cells that express a panel of characteristic cell surface markers . MSCs retain not only the capacity to self-renew but also the potential to differentiate into a variety of connective tissues such as muscle, bone, cartilage, tendon, and fat, as well as nerve and liver tissues . These properties make them a possible alternative to embryonic stem cells (ESCs) in cell-based therapeutic applications, but little has been known about their nature, in vivo function, and developmental origin. In particular, the molecular parameters to define core stem cell properties have remained largely unexplored.
Self-renewal is one of the most fundamental properties of stem cells, and the cellular and molecular mechanisms underlying it have been a subject of extensive studies. Elaborated searches for key transcriptional players led to discoveries of Oct3/4 , leukemia inhibitory factor–activated Stat3 [4–6], and Nanog [7, 8] in ESCs, and a recent finding of Bmi-1 in both hematopoietic stem cells (HSCs)  and neural stem cells (NSCs) . Also in efforts to envisage a molecular entity of stem cells in the genomic scale, the DNA microarray-based analyses were recently employed in human or mouse ESCs, HSCs, and NSCs, leading to identification of commonly expressed genes, called stemness genes or a stem cell molecular signature [11–14].
As in any other stem cells, the self-renewal of MSCs is likely to be operated by a defined set of molecular factors, but no molecular factor relevant to this function has been identified to date. The gene expression profile of MSCs has been previously investigated through serial analysis of gene expression (SAGE) [15–17] and restriction fragment differential display . Although the studies provided us with a plausible framework to define the MSCs in the genetic level, the presence of abundant housekeeping genes prevented the correct assessment of MSC-specific genetic messages.
In the study reported here, with the specific aim to generate an MSC-specific transcriptome, we performed a DNA microarray-based differential gene expression analysis between a fraction of human umbilical cord blood (UCB)–derived mononuclear cells (MNCs) and its MSC subpopulation. UCB-derived cells were proven to be more advantageous in cell procurement, storage, and transplantation than their bone marrow (BM) counterpart and therefore better suited in tissue engineering and development of cell-based therapeutics. A number of reports from different laboratories [19–23], including ours [24–26], indicate that UCB-derived MSCs are highly similar to the cells of BM origin with respect to cell characteristics and multilineage differentiation potential. Therefore, this study may lead us to reveal the molecular signature that is specific to human MSCs but independent of their origins, and it will assist further studies on molecular mechanisms controlling various core stem cell properties.
Materials and Methods
Cell Growth and Total RNA Isolation
Six full-term UCB units, each containing about 80 mm of blood, were processed as previously described [24–26]. For the isolation of MSCs, MNCs were plated at a density of 1 × 106 cells per cm2 and allowed to adhere to culture flasks for 5 days. Nonadherent cells were removed with medium changes, while adherent cells were further cultured. Once they reached approximately 50%–60% confluence, the cells were detached and subjected to a next round of serial passages. The cell samples used in our microarray experiment consist of two fresh MNC populations from donors 1 and 2, and two MSC populations from donors 3 and 4, which were culture expanded for 3 and 5 passages, respectively. For reverse transcription polymerase chain reaction (RT-PCR) analysis, another MNC population from donor 5, and a fresh MSC sample at the second passage from donor 6 were additionally used. Total RNA was separately prepared from each sample using RNeasy Mini isolation kit (Qiagen, Valencia, CA, http://www1.qiagen.com), according to the protocol provided.
Microarrays and Target Preparation
The Amersham CodeLink system (Amersham Biosciences, Chandler, AZ, http://www.amersham.com) using the UniSet Human 20K Bioarray, containing approximately 20,289 probes on a single glass slide, was used to generate the gene expression profile of the cells. Microarrays were processed in parallel using the CodeLink Shaker Kit and CodeLink Parallel Processing Kit. For each microarray, double-stranded cDNA and subsequent cRNA were synthesized from 1 μg of total RNA using the CodeLink Expression Assay Kit, according to the manufacturer's instructions. Briefly, first-strand cDNA was generated using SuperScript II reverse transcriptase and a T7-oligo(dT) primer. Subsequently, second-strand cDNA was produced using Escherichia coli DNA polymerase 1 and RNase H. The resultant double-stranded cDNA was purified on a QIAquick column (Qiagen), and cRNA was generated via an in vitro transcription reaction using T7 RNA polymerase and biotin-11-uridine-5′-triphosphate, tetralithium salt (Perkin-Elmer, Boston, http://www.perkinelmer.com), then purified on an RNeasy column and quantified by UV spectrophotometry. A total of 10 μg of cRNA was subjected to fragmentation by heating at 94°C for 20 minutes in the presence of Mg ions.
Hybridization, Processing, and Scanning
The fragmented cRNA in 260 ml of hybridization solution was added to each slide and incubated overnight at 37°C in a shaking incubator (Vision Scientific Co., Kyunggi-do, Korea, http://www.visionsci.co.kr) at 300 rpm. After hybridization, the slides were washed in 0.75× TNT buffer (1× TNT: 0.1 mol/L Tris-HCl, pH 7.6, 0.15 mol/L NaCl, and 0.05% Tween 20) at 46°C for 1 hour, followed by incubation with Cy5-streptavidin at room temperature for 30 minutes in the dark. Slides were then washed in 1× TNT twice for 5 minutes each, followed by a rinse in 0.05% Tween 20 in water. The slides were then dried by centrifugation and kept in the dark until scanning. Images were captured on an Axon GenePix Scanner (Molecular Devices Co., Union City, CA, http://www.axon.com).
Microarray Data Processing and Hierarchical Clustering
Scanned data images were processed using CodeLink Expression Analysis Software. The mean intensity is taken for each spot and background corrected by subtracting the surrounding median local background intensity. The intensities were global linearly normalized according to the standard normalization procedure of the software. The normalized intensity of each gene probe was separately averaged over the MNC and MSC populations, and the probes were ranked by the MSC-to-MNC ratio of the average intensity. The full list of the gene probes whose average intensity ratios are greater than 1.5 is available, along with their respective normalized intensity in each sample, at http://callisto.snu.ac.kr/hoeonkim/microarray. For hierarchical clustering, the raw intensity data were exported to GeneSpring software version 6.0 (Silicongenetics, Redwood, CA, http://www.silicongenetics.com). Gene expression data were normalized to the 50th percentile expression level. Rigorous filtration of flagged probes resulted in a total of 11,662 gene probes, which were then subjected to hierarchical clustering using the standard correlation as a similarity measure.
RT-PCR Confirmation of Microarray Data
To confirm the gene expression profile determined by microarrays, a number of select genes were subjected to RT-PCR analysis, using total RNAs derived from the four cell samples used for a microarray experiment, as well as an additional pair of fresh MNC and MSC samples. Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) mRNA amplified from the same samples served as an internal control. The sense and antisense primers used for each gene were as follows:
The thermocycler conditions used for amplification were initial denaturation at 95°C for 5 minutes, followed by 35 cycles at 95°C for 30 seconds, 55°C for 30 seconds, 72°C for 1 minute, and finally 72°C for 7 minutes. The amplified PCR products were resolved in 1% agarose gel, stained with ethidium bromide, visualized and photographed with Chemi Doc XRS (Bio-Rad Laboratories, Hercules, CA, http://www.bio-rad.com).
Integration of Bioarray Data with SAGE Data
The SAGE tag frequency table was downloaded from the Web-site (http://bit.fmrp.usp.br/msc_tags) provided by M.A. Zago's laboratory , inspected, and integrated with our microarray data. To facilitate an integration process between the two data-sets, microarray gene probes and SAGE tags were mapped to the NCBI Human Unigene clusters. If more than one tag or probe corresponded to the same Unigene cluster, then the higher ranked one was selected. This integration process led to selection of a total of 7,898 uniquely represented Unigene clusters. The clusters were then sorted by the average normalized intensity in MSCs. A full list of the integrated data can be found at http://callisto.snu.ac.kr/hoeonkim/microarray.
Reproducibility and Sensitivity of Microarray Experiments
Reproducibility in the microarray experiment from target preparation to data analysis was assessed by repeated experiments using separately prepared target RNAs from a MSC sample. A correlation coefficient between two microarray datasets obtained from repeated experiments turned out to be greater than .98 for all gene probes above noise, indicating that not only each microarray system per se but also a whole experimental procedure were highly reproducible. Figure 1 shows a scatter plot between the two datasets when all 20,289 gene probes were included. Almost all of the probes with a normalized intensity above 5.0 were located within a 1.5-fold limit, showing an excellent correlation in this portion of the data. However, such covariance quickly disappeared when the probe intensity was near 1.0. Therefore, any probe whose normalized intensity was below 1.0 was considered to be inaccurate. To eliminate this noise effect from low-level expression, spots quantified at <1.0 were replaced by the value 1.0 and subjected to further analyses.
Interdonor and Intercell Comparison of Gene Expression Patterns
The correlation coefficient between two MNC populations from donors 1 and 2 was about .95 for all probes above noise (Fig. 2), indicating that the gene expression pattern of cells in the neonatal blood system was almost invariable. This consistent gene expression pattern might be an outcome of the constant operation of genetic programs that are critical for cells and tissues, particularly for the cells in the blood stream, in all healthy individuals. The correlation coefficient between the MSC populations was estimated to be around .92, a little lower than the value of MNCs. However, when it is considered that those MSCs were sampled after extensive cell expansion, this difference is likely to be contributed by variations in culture or passage conditions, rather than intrinsic interindividual variation in the gene expression profile.
In contrast, when the gene expression profile was compared between two different types of cells—that is, MNCs and MSCs—a significant difference was observed, as demonstrated by widely dispersed patterns in the plots and correlation coefficients between .54 and .59 (Fig. 2). When all gene probes above noise were taken into a consideration, only 20%–25% were found to be located within a 1.5-fold limit, and the rest of the genes (75%–80%) could be considered to be differentially expressed between the two cell populations. Among differentially expressed genes, we were particularly interested in a subset of genes that was highly expressed in MSCs but rarely detectable in MNCs because it, as a whole or in part, could be regarded as the molecular signature of MSCs. The normalized intensity of each gene was averaged separately over the MNC and MSC populations. When the genes were ranked by the MSC-to-MNC ratios of average intensity and selected with a ratio greater than 50, a subset containing 47 different genes was generated (Table 1). Comparison of intensity scores of these genes with published SAGE data  indicated that most (but not all) genes in this subset were rich in BM-derived MSCs. These genes were found to cluster together in the hierarchical clustering analysis, confirming again their coherent expression patterns (Fig. 3).
Table Table 1. . Differentially expressed genes in UCB-derived MSCs
Abbreviations: MNCs, mononuclear cells; MSCs, mesenchymal stem cells; SAGE, serial analysis of gene expression; UCB, umbilical cord blood.
Insulin-like growth factor–binding protein 7 (IGFBP7)
Hypothetical protein MGC3047
Bone morphogenic protein, placental (PLAB)
Fibroblast activation protein, alpha (FAP)
Lysyl oxidase (LOX)
BCL2/adenovirus E1B interacting protein 3 (BNIP3)
Hypothetical protein MGC3278
Growth arrest-specific 6 (GAS6)
Four-and-a-half LIM domains 2 (FHL2)
Hypothetical protein FLJ12442
Insulin-like growth factor binding protein 6 (IGFBP6)
Fibulin 6 (FBLN4)
LIM domain protein (RIL)
Pyrroline-5-carboxylate reductase 1 (PYCR1)
Differentially Expressed Genes in UCB-Derived MSCs
The subset contains 42 known and 5 novel genes. Of the known genes, 28 genes (>65%) encode the extracellular molecules that either participate in biogenesis of the extracellular matrix (ECM) or belong to cytokine or related products. The former consists of structural proteins of ECM including seven different types of collagens (Iα1, Iα2, IIIα1, IVα1, IVα2, VIα2, and VIα3), CTHRC1, CRTL1, lumican, and fibulin 4, as well as ECM biogenesis factors, including two types of serpins (SERPINE1 and SERPINH2) and three lysyl oxidases (LOXL1, LOX and its variant), whereas the latter consists of three different types of IGFBPs (IGFBP-6, IGFBP-7, and CTGF), PRSS11, OSF-2, CYR61, Wnt5B, PLAB, FAP, GAS6, and follistatin. It also contains four genes encoding membrane proteins—Thy-1, KDELR3, NDUFA4, and BNIP3—in which the former two proteins are destined to plasma and endoplasmic reticulum, respectively, whereas the latter two belong to mitochondrial membrane proteins. There are also five genes for cytoskeleton-associated proteins: transgelin, α-B-crystallin, tropomyosin I, EPLIN, and RIL; four genes for soluble proteins: NNMT, PKCBP, P311, and PYCR1; and finally two nuclear genes: necdin and a LIM domain protein FHL2. It is noteworthy that among those genes, collagens types I, III, IV, and VI , transgelin , Thy-1 , and FAP  were known as characteristics of MSCs and previously identified in human MSCs.
Of the five novel genes, MGC3047 and MGC17528 were recently predicted to encode an immunoglobulin superfamily protein limitrin and an S100 calcium-binding protein, A16, respectively. And our blast analysis of MGC3278 and FLJ12442, encoding hypothetical proteins containing 563 and 520 amino acids, respectively, showed that the former contains a DUF719 domain of unknown function but conserved in several eukaryotic proteins while the latter belong to a 5′ nucleotidase protein family. However, the identity of AGENCOURT_6683145 could not be resolved.
Confirmation of Gene Expression by RT-PCR
To verify the gene expression profile determined by our microarray analysis, the expression levels of the top 10 genes in the subset were analyzed by RT-PCR, using total RNAs obtained from the four cell samples. The result showed that all tested genes were expressed highly in MSCs, but either weakly or negligibly in MNCs (Fig. 4A). This differential expression pattern was in a good agreement with that from the microarray analysis, confirming the high fidelity in microarray data and analytical methods. Moreover, when an additional pair of fresh MNC and MSC samples was analyzed by RT-PCR, they exhibited a differential expression pattern (Fig. 4B) that was consistent with those of the former samples (Fig. 4A). This finding implies that the gene expression profile determined in this study can be extrapolated to most, if not all, UCB from healthy individuals.
Comparison of Microarray Data with Published SAGE Data
A total of 7,898 unique genes resulted from integration of the microarray and SAGE datasets . The correlation coefficient between the two datasets was calculated to be around .46, but this figure still indicated a meaningful covariance when it was considered that the two analytic methods had different strengths and pitfalls in transcriptional profiling analysis . A scatter plot also showed that the two datasets were weakly but not randomly correlated (Fig. 5). After the genes were ranked and sorted by the microarray intensity score, the top 50 genes were pooled to constitute a subset of the most enriched genes in UCB-derived MSCs (Table 2). All of the genes were found to be also in high frequencies in the SAGE dataset, indicating that the correlation was generally higher for genes with higher expression levels. More than one half of these abundant genes are those involved in protein synthesis, including 24 different ribosomal proteins and three regulatory factors. The remaining genes encode known products, including four cytokines: CTGF, TIMP1, IGFBP7, and TGFBI; four glycolytic enzymes: GAPD, EN01, LDHA, and TPI1; six cytoskeletal proteins: vimentin, MYL6, transgelin, destrin, thymosin, and γ-actin; two heat shock proteins: HSPA8 and HSPB1; and other cytosolic or extracellular proteins. Among known products, CTGF, TAGLN, COL1A1, and IGFBP7 belong to MSC-specific molecules, as mentioned earlier. Most of the rest are housekeeping genes whose expression patterns are more or less constant in all proliferating cells. A most abundant gene is EEF1A1, which is responsible for the enzymatic delivery of aminoacyl tRNAs to the ribosome.
Table Table 2. . First 50 most enriched genes in UCB-derived MSCs
Abbreviations: MSCs, mesenchymal stem cells; SAGE, serial analysis of gene expression; UCB, umbilical cord blood.
Although much attention has been paid to human MSCs as promising for cell-based therapeutics, understanding of MCS biology remains very rudimentary. To introduce them in clinical application, we need to understand the mechanisms that control key properties such as mobilization versus tissue homing and self-renewal versus differentiation. These studies should be based on a strict definition of human MSCs at the molecular level. A number of previous approaches using a SAGE technique [15–17] revealed the genes highly expressed in human MSCs. In those studies, however, an identification of MSC-specific genetic messages was interfered with by the presence of abundant housekeeping genes. This problem led us to examine the DNA microarray-based differential expression profile of an MSC population, using its parental MNC population as a baseline.
For this study, we used UCB-derived cells because UCB, when compared with other sources of the stem cells, has profound benefits in cell procurement and storage. Furthermore, cells in the neonatal blood are less mature than adult cells so that they do not trigger an immense immune reaction in unrelated donor transplantation. Accordingly, they are more readily applicable to stem cell–based therapy and transplantation than are cells of other origins.
Before undertaking an analysis of the differential expression profile between the cells, we carefully examined the reproducibility in microarray experiments, as well as interdonor variation in the transcriptional profile. As it turned out, both microarray systems and all experimental steps from target preparation to data analysis were highly reproducible. Moreover, no significant donor-to-donor variation was found in either MNC or MSC transcriptome. Slightly greater variability in the MSC transcriptome could reflect divergent culture conditions that the cells might experience during cell isolation and expansion. Taken together, these preparatory results provided a basis for recognition and interpretation of differentially expressed genes between UCB-derived MSCs and MNCs, which was the specific goal of this study.
To identify genes preferentially expressed by MSCs versus MNCs, we ranked genes by a ratio of the intensity score of the two cell populations. The RT-PCR analysis confirmed the fidelity of these data by showing that the top 10 genes were all expressed in accordance with their differential expression patterns in microarray data. When the genes were chosen with a ratio above 50, a total of 47 different genes were pooled out. A majority of the genes were found to belong to ECM components or cytokines, indicating that both unique ECM environment and specific cytokine signaling are important determinants of the functional states of the cells. Apparently, seven different collagens mainly constitute the ECM of MSCs, while a number of the serpin and lysyl oxidase family proteins are actively involved in its biogenesis. Connective tissue growth factor, also called IGFBP8, is a most preferentially expressed gene by MSCs. It belongs to an IGFBP family and is known to bind insulin-like growth factors with relatively low affinity. Its high expression in MSCs was previously shown in SAGE experiments [15–17].
Relatively few in number are genes encoding membrane or nuclear proteins. Among membrane proteins, Thy-1, previously known as a T cell and an MSC-related cell surface marker, is differentially expressed by MSCs, suggesting that this antigen can be used in efficient immunoselection of the cells from the neonatal blood, using magnetic bead technology or fluorescence-activated cell sorting. As for nuclear genes, necdin and an LIM protein FHL2 are uniquely identified. The former was previously known as a neuronal growth suppressor  and its disruption confers lethality to mouse , whereas the latter was recently related to epithelial ovarian cancer . Neither of them has been previously described in MSCs, and therefore more studies are needed to determine their functional significance in stem cell properties such as self-renewal and differentiation.
Finally, we found that DNA microarray- and SAGE-generated transcriptomes were weakly correlated. When we examined the 50 most highly expressed genes in terms of microarray intensity, we found that all of them were high in frequency in the SAGE experiment. Since the two datasets were obtained not only by different analytical methods but also from cells of different sources (BM versus UCB), it might indicate that transcriptomes of BM- and UCB-derived MSCs are somewhat similar to each other, which is in good agreement with growing evidence for their equivalent cell characteristics and multilineage potential. This finding also indicates that the two high-throughput techniques are both relevant in absolute quantification of gene expression, especially for highly expressed genes, and they can be used in parallel in accurate assessment of a given transcriptome.
In conclusion, a genome-wide differential expression analysis of human MSCs was performed using UCB-derived cells as a model, and their specific gene expression profile was elucidated, for the first time to our knowledge, by this study. The data will provide the basis for further studies on the molecular mechanisms controlling various core stem cell properties of human MSCs.
This research was supported by a grant (SC13032) from the Stem Cell Research Center of the 21st Century Frontier Research Program funded by the Ministry of Science and Technology, Republic of Korea. J.A. Jeong, S.H. Hong, and E.J. Gang contributed equally to this article.