Genome-Wide Differential Gene Expression Profiling of Human Bone Marrow Stromal Cells



Bone marrow stromal cells (BMSCs) reside in bone marrow and provide a lifelong source of new cells for various connective tissues. Although human BMSCs are regarded as highly suitable for the development of cell therapeutics and regenerative medicine, the molecular factors and the networks of signaling pathways responsible for their biological properties are as yet unclear. To gain a comprehensive understanding of human BMSCs at the transcriptional level, we have performed DNA microarray-based, genome-wide differential gene expression analysis with the use of peripheral blood-derived mononuclear cells (MNCs) as a baseline. The resulting molecular profile of BMSCs was revealed to share no meaningful overlap with those of other human stem cell types, suggesting that the cells might express a unique set of genes for their stemness. By contrast, the distinct molecular signature, consisting of 92 different genes whose expression strengths are at least 50-fold higher in BMSCs compared with MNCs, was shown to encompass largely a gene subset of umbilical cord blood-derived adherent cells, suggesting that adherent cells derived from bone marrow and umbilical cord blood may be defined by a common set of genes, regardless of their origin. Intriguingly, a large number of these genes, particularly ones for extracellular matrix products, coincide with normal or tumor endothelium-specific markers. Taken together, our results here provide a BMSC-specific genetic catalog that may facilitate future studies on molecular mechanisms governing core properties of these cells.

Disclosure of potential conflicts of interest is found at the end of this article.


Bone marrow (BM) is a storehouse of various adult stem cells, including a subset of bone marrow stromal cells (BMSCs) that are multipotent, hematopoietic stem cells (HSCs), and multipotent adult progenitor cells [1]. Particular interest with respect to cell therapeutics and regenerative medicine has been directed toward BMSCs since they were discovered to be responsible for lifelong homeostasis of various connective tissues, such as bone, cartilage, muscle, tendon, and fat [2, 3]. These tissue cells are generated from a BMSC population through a stepwise maturation process, termed mesengenesis by analogy to hematopoiesis [4], each step of which is regarded to be tightly modulated by the local microenvironment and soluble molecular signals [5].

Stem cells in human bone marrow stroma, although rare, can be obtained not only from bone marrow aspirates but perhaps also from the circulatory system, and they may be related to stem cells isolated from other tissues [6]. The cells from all these sources share a high degree of similarity in cell surface phenotype, in vitro differentiation potential, and other biological properties [7, [8], [9], [10], [11]–12], implicating their possession of an identical cellular entity. More compellingly, a number of recent comparative gene expression analyses using either high-throughput or focused profiling techniques indicate that the genetic makeup of the cells is also highly shared among those sources [13, [14], [15], [16], [17], [18]–19]. BMSCs are procurable by simple means of plastic adherence and/or negative depletion [20], amenable to long-term storage, and stable upon genetic manipulation [21, 22]. More importantly, when properly manipulated ex vivo, they are able to expand to a great extent or to differentiate into cells of diverse lineages [23, 24]. Because of these desirable properties, human BMSCs are now placed at the forefront of the field of stem cell-based therapy and regenerative medicine [2].

By and large, all stem cell types share common core functionalities, such as self-renewal and multipotency. It had once been thought that these are the outcomes of selective expression of a common set of genes, which is often referred to as stemness genes or stem cell molecular signature [25, 26]. To identify such genetic elements, a series of DNA microarray experiments have been performed on human or mouse ESCs, HSCs, neural stem cells, and skin epithelial stem cells [27, [28], [29], [30], [31], [32], [33]–34]. These attempts led to an identification of the molecular entity specific for each stem cell type, but they failed to delineate the genetic message common to all stem cells, which consequently raised substantial doubts either on the existence of universal stemness genes or on the strategic validity of the differential transcriptome-based approach [35].

As far as genome-wide differential gene expression profiling is concerned, BMSCs have remained a poorly explored cell type. In this study, with the specific goal of pulling out BMSC-related genetic messages, we have performed DNA microarray-based genome-wide differential gene expression analysis of human BMSCs, with the use of peripheral blood (PB)-derived mononuclear cells (MNCs) as a baseline. This resulting molecular profile was then compared with other published human stem cell molecular signatures [31, 32], as well as with a subgenome profile of umbilical cord blood (UCB)-derived adherent cells [15].

Materials and Methods

Cell and RNA Preparation

Five human BMSC lines (passage 2) from donors of mixed age, sex, and racial background were purchased from Cambrex (Baltimore, MD, These cells were proven to be positive for CD105, CD166, CD29, and CD44 and negative for CD14, CD34, and CD45, and also to retain the differentiation potential into osteogenic, chondrogenic, and adipogenic lineages. The cells were cultured as monolayers in low-glucose Dulbecco's modified Eagle's medium (Invitrogen, Grand Island, NY,, 20% fetal bovine serum (JRH Biosciences, Lenexa, KS,, 2 mM l-glutamine, 1 mM sodium pyruvate, and 1% antibiotics/antimycotics (Life Technologies, Gaithersburg, MD, consisting of 100 U/ml penicillin, 100 μg/ml streptomycin, and 25 μg/ml amphotericin B. For a serial passage, cells at approximately 50%–60% confluence were detached with 0.1% trypsin-EDTA and replated at a density of 2 × 103 cells per cm2. Cells at the fifth passage were collected for the experiment. Human PB was collected from two adult individuals (one male, one female) with informed written consent, and the protocol for this study was approved by internal review boards of our institutions. MNCs were isolated from the blood by the Ficoll/Hypaque density gradient method, and total RNAs were prepared using RNeasy Mini isolation kit (Qiagen, Valencia, CA, according to the protocol provided.

Microarray Experiments

Microarray analysis was performed with the CodeLink Human Whole Genome Bioarray (Amersham Biosciences, Chandler, AZ, targeting approximately 57,000 transcripts and expressed sequence tags on a single glass slide. The RNA target sample preparation, hybridization, and processing were performed as previously described [15]. Briefly, total RNA was reverse-transcribed with Superscript II reverse transcriptase and a T7-oligo(dT) primer, cleaned by using the QIAquick purification kit (Qiagen), and then used as a template for in vitro transcription using T7 RNA polymerase and biotin-11-UTP (PerkinElmer Life and Analytical Sciences, Boston, Biotin-labeled cRNA, after being purified with the RNeasy kit (Qiagen), was hybridized to the bioarray and processed using the CodeLink Shaker and Parallel Processing kits. Images were captured on a GenePix scanner (Axon Instruments/Molecular Devices Corp., Union City, CA, Scanned data images were processed using CodeLink Expression Analysis software. The mean intensity was taken for each spot whose background was corrected by subtracting the surrounding median local background intensity from spot intensity. The intensities were linearly normalized according to the standard normalization procedure of the software. The normalized intensity was used as a measure of gene expression in each sample.

Data Manipulation and Integration

Gene probes whose intensity values were missing in any one of seven data sets were excluded from further analyses. In the case when multiple probes for a single gene were provided, the probe with the highest average intensity in BMSCs was chosen. We eliminated the noise effect from low-level expression by assigning 1.0 to the probe whose normalized intensity was less than the unity. The intensity for each gene probe was then individually averaged over five BMSC and two MNC populations and sorted by the BMSC-to-MNC ratio of the average intensity. The full list of the gene probes whose average intensity ratios were greater than 3 is provided, together with their respective normalized intensities, as supplemental online data. This data set was merged, with references to National Center for Biotechnology Information accession number, with the subgenome microarray data of UCB-derived adherent cells [15]. Finally, the microarray data of human HSCs and ESCs were downloaded from their respective websites ( and, visually inspected, and integrated by manual registration. For the sake of simplified and convincing data analysis, only those genes were taken into account whose expression levels in the given stem cell population were threefold or higher than in the respective baseline cell population.

Reverse Transcription-Polymerase Chain Reaction Confirmation of Microarray Data

To confirm the gene expression data of DNA microarrays, the top 20 genes with the highest BMSC-to-MNC intensity ratios were subjected to reverse transcription-polymerase chain reaction (RT-PCR) analysis, using total RNAs derived from all cell samples. Actin mRNA amplified from the same samples served as an internal control. The sense and antisense primers used for those genes are listed in Table 1. The thermocycler conditions used for amplification were as follows: an initial denaturation at 95°C for 5 minutes; 35 cycles of 30 seconds at 95°C, 30 seconds at indicated annealing temperature (range, 52°C–57°C), and 1 minute at 72°C; and a final extension at 72°C for 7 minutes. The amplified PCR products were resolved in 1% agarose gel, stained with ethidium bromide, visualized and photographed with Chemi Doc XRS (Bio-Rad, Hercules, CA,

Table Table 1.. Primers and annealing temperatures used for RT-PCR analysis
original image


Comparison of Gene Expression Profile Between BMSCs and PB-Derived MNCs

All BMSC samples, despite considerable diversity in genetic backgrounds, were found to share a high degree of similarity in gene expression, as indicated by their interdonor correlation coefficients, ranging from .928 to .972. No particular cell pair with respect to sex, race, or age of donors appeared to be predominant over other pairs (Fig. 1). Similar to this, two PB-derived MNCs of opposite sexes and different ages showed a correlation coefficient of approximately 0.94. These observations suggest that such individuality factors may exert no considerable influence on the genetic resources of these blood cells.

Figure Figure 1..

Tabulation of pairwise CCs and scatter graphs of the log intensity values. The CCs and scatter plots of log-transformed normalized intensity data between any two of the BMSCs and PB-MNCs were generated using CodeLink Expression Analysis software. N indicates the number of gene probes above noise that are present in both data sets and used for calculation of a CC. In each scatter plot, a thick diagonal line indicates an equality of the normalized intensity, whereas thin off-diagonal lines represent twofold differences. Abbreviations: A, Asian; B, black; BMSC, bone marrow stromal cell; C, Caucasian; CC, correlation coefficient; F, female; M, male; N, number; PB-MNC, peripheral blood-derived mononuclear cell; Y, years.

Conversely, the pairwise comparison of different cell populations yielded correlation coefficients ranging between 0.704 and 0.756 (Fig. 1), which were far lower than interdonor coefficients within either cell type. This indicated that BMSCs were quite divergent with respect to the gene expression profile from PB-derived MNCs. To pull out the BMSC-related genetic message, we averaged the normalized intensity of each gene probe separately over the two cell populations and ranked them by BMSC-to-MNC ratios of the average intensity. To facilitate further analyses at a high confidence level, only gene probes with three or higher ratios were taken into consideration. The resulting subset, consisting of 2,273 unique genes, was designated a BMSC-related molecular signature (supplemental online Table 1).

Confirmation of the BMSC-Related Molecular Signature by RT-PCR

To verify the BMSC-related molecular signature, expressions for its top 20 genes were analyzed by RT-PCR, using total RNAs obtained from the seven cell samples. As it turned out, all tested genes were highly expressed in five BMSC populations but were mostly negligible in the other two MNC populations (Fig. 2A). This gene expression pattern exhibited a reasonable agreement with the microarray-generated gene expression data, even though variations in the expression levels of these genes among the five MBSC donors were observed (Fig. 2B). Nonetheless, this result demonstrates that our experimental and analytical procedures have been appropriately applied to the measurement of global gene expression profiles of the cells.

Figure Figure 2..

Reverse transcription-polymerase chain reaction (RT-PCR) analysis of the differentially expressed genes in BMSCs. (A): The top 20 genes in BMSC-related molecular signature were chosen, and their expression levels were analyzed by RT-PCR, using total RNAs obtained from the seven cell samples and β-actin as an internal control. (B): Averages (solid horizontal bars) and standard deviations (error bars) of gene expression intensity values among five BMSC samples. Abbreviations: BMSC, bone marrow stromal cell; PB-MNC, peripheral blood-derived mononuclear cell.

Comparison with Other Human Stem Cell Molecular Signatures

To identify genes that might occur commonly in all human stem cells, the current molecular signature was collated with publicly available data for human ESCs [30] and HSCs [31]. Of 2,273 genes constituting a BMSC-related molecular signature, 16 and 20 genes were found to be overlapping with the molecular signatures for human ESCs (SERPINH1, ACTC, CCNB1, CALU, TK1, PITX2, CDC20, LAPTM4B, CRABP2, SFRP2, MTHFD2, FABP5, GJA1, IMPDH2, MTHFD1, and NME2) and human HSCs (NDN, ARMCX2, ANGPT1, CSRP1, SCD, ENG, LGALS3BP, PDGFC, MGC20781, AGXT, PDIR, NR2F2, MEIS1, ITGB1, APP, PROCR, FLJ32068, HOXA5, PDGFRA, and SLC25A15), respectively (supplemental online Table 1, last two columns). Moreover, none of these overlapping genes were found to be either common or correlated to all three stem cell types. This limited overlap among stem cell molecular signatures indicates that each stem cell type may express a unique set of genes to govern its core stem cell functionalities.

Comparison with the Gene Expression Profile of UCB-Derived Adherent Cells

Next, to address whether the current gene expression profile could be generalized over adherent cell populations of diverse origins, it was merged and compared with the subgenomic profile of UCB-derived adherent cells [15]. The data merge yielded a total of 13,084 common genes, with an overall correlation coefficient of around 0.87. This value was significantly higher than a correlation coefficient for an MNC pair (Fig. 3), indicating that the genetic message of these adherent cells might be less dependent on the cell source than those of other blood cell types. Furthermore, when it was considered that the two array layouts often used the probes with different sequences, the actual similarity in gene expression between BMSCs and UCB-derived adherent cells appeared to be even higher than estimated here. Despite such high similarity, however, they were not without differences. Of 13,084 shared genes between the two cell populations, 1,289 and 1,567 genes were found to have an intensity ratio of 3 or higher in BMSC and UCB-derived adherent cell populations, respectively, but with an overlap of only 811 genes. Table 2 lists the first 20 genes that are preferentially expressed in BMSCs and UCB-derived adherent cells. Notably, the former include genes related to neurogenesis (SERPINEF1, CLDN11, RGS6, and EMX2), as well as genes for immune response (NFAT5, HAS1, FKBP1A, and DEF6), whereas the latter contain many of unknown functions and several genes associated with osteogenesis (ACP5, SPP1, and OGN)

Figure Figure 3..

Tabulation of log-log scatter plots and CCs between microarray data sets. Data sets from the previous subgenome [15] and current genome-wide microarray experiments were merged with references to National Center for Biotechnology Information accession number, resulting in 13,084 overlapped gene probes. n indicates the number of cell samples used for calculation. Abbreviations: BMSC, bone marrow stromal cell; CC, correlation coefficient; PB-MNC, peripheral blood-derived mononuclear cell; UCB, umbilical cord blood.

Table Table 2.. Most differentially expressed genes between BMSCs and UCB-derived adherent cells
original image

Distinct Molecular Signature of BMSCs

Of particular interest within the BMSC-related molecular signature was a subset of genes that were highly expressed in BMSCs but rarely detectable in PB-derived MNCs. Hence, we generated, by applying a cutoff ratio of 50 in BMSC-to-MNC intensity, a subset of 92 different genes, which could be regarded as a distinct molecular signature of BMSCs (Table 3). Size-wise, the molecular signature of BMSCs was found to be approximately twofold that of UCB-derived adherent cells [15], apparently because the former covered twice as much genome space than the latter. As expected, this molecular signature was found to largely encompass the subgenome signature of UCB-derived adherent cells, emphasizing again a high similarity in the gene expression between the two adherent cell populations.

Table Table 3.. List of genes preferentially expressed in BMSCs compared to PB-derived MNCs
original image
original image

This molecular signature was made of 21 unknown and 71 well-characterized genes. The known genes could be grouped into nine different categories, based on subcellular locations and functionalities of their protein products: (a) collagens (COL1A1, COL1A2, COL3A1, COL4A1, COL4A2, COL5A1, COL6A2, COL6A3, and COL12A1); (b) other extracellular matrix (ECM) proteins: (DCN, URB, CTHRC1, LTBP2, SPARC, LUM, MG50, COMP, MMP2, CHI3L1, SULF1, PTX3, and DLC1); (c) ECM biogenesis factors (SERPINE1, SERPINE2, LOX, SERPINH1, LOXL1, and PCOLCE); (d) cytokines or growth factors (CTGF, POSTN, CYR61, PRSS11, FST, IGFBP6, FAP, INHBA, GDF15, PLAU, GAS6, WNT5B, STC2, DKK1, and IGFBP4); (e) integral membrane proteins (THY1, KDELR3, CD164L1, ANTXR1, COX7A1, and ITGBL1); (f) transcription factors (FHL2, IRX3, TWIST1, and SNAI2); (g) miscellaneous nuclear proteins (NDN, P8, ID4, ID3, and LARP6); (h) cytoskeleton-related proteins (CRYAB, TPM1, PDLIM4, and CNN3); and (i) cytoplasmic proteins with diverse functions (NNMT, DPYSL3, S100A16, RAB23, VCHL1, PPP1R3C, HAK, PTPLA, and AKR1C1).

Of 21 unknown genes, our blast analysis could identify 9: two lysyl oxidases (IMAGE:262060 and IMAGE:310474), type III fibronectin (KIAA1866), IGFBP4 (pp9974), IGF2-associated protein (IMAGE:2224022), nuclear factor I/X (IMAGE:1912835), COL4A1 (IMAGE:263716), MMP2 (DKFZp434B147), and PLAC9 (UI-1-BC1p-asi-a-02-0-UI). The molecular identity for the other 12 genes, however, has yet to be resolved. Notably, it was found that a majority of identified gene products could fall into categories of ECM biogenesis factors and cytokines/growth factors. This observation indicates that the transcriptional capacity of BMSCs, at least during their ex vivo expansion culture, is largely devoted to formation of ECM microenvironment and networking of certain cytokine signaling pathways, some of which may be crucial for BMSC-specific functionalities.


Human BMSCs, a subset of which are multipotential stem cells, possess versatile differentiation potential offering great clinical potential in the treatment of tissue degenerative disorders. Their therapeutic efficacy, however, is largely constrained by poor yield in proliferation and differentiation, especially when extensively cultured cells are transplanted. Therefore, potential therapeutic interventions include genetic or biochemical manipulation of the cells to increase in vivo proliferation and target-specific differentiation capabilities, which can be greatly facilitated if the knowledge of key molecular factors governing these properties is available in advance.

In theory, such factors, if any, stand out as common genetic elements from comparative analysis of genome-wide differential transcriptional profiles among different stem cell types. A number of transcriptional profiling techniques, such as DNA microarray, serial analysis of gene expression, expressed-sequence-tag scan, and massively parallel signature sequencing, have been used for a rapid identification of important factors behind various biological processes. Of greatest value among them is a DNA microarray-based differential gene expression profiling approach, which has been popularly used in searches of genes that are either specific or common to human or mouse stem cell lines [27, [28], [29], [30], [31], [32], [33]–34].

This strategy, however, has never been properly introduced to human BMSCs, so their specific molecular basis remains incompletely defined. The previous transcriptomic data from our laboratory [15], based on a half-genome scale analysis of the UCB-derived adherent cells, was considered to be neither comprehensive nor representative, and two recent gene expression profiling studies of human BMSCs by Song et al. [36] and Shahdadfar et al. [37] were not aimed at revealing the bona fide genetic messages specific to the cells but rather at identifying the differentially expressed transcripts during lineage-specific differentiation/dedifferentiation and under different sera/passage conditions, respectively. Therefore, it would be of sufficient importance to explore the differential gene expression profile of these cells on a genome scale, with the use of an appropriate baseline cell populations, such as PB-derived MNCs, as used in this study.

The lack of overlap in terms of the molecular signature between BMSCs and other human stem cells raises two opposing possibilities. One is that, as claimed by some investigators, the different type of stem cells may express a different set of genes that confer stemness. The other possibility is that universal stemness genes may be present but expressed in too low a quantity to be detected by current profiling techniques. To distinguish between the two possibilities, further efforts should be made with an extended and careful experimental design, highly homogeneous cell samples, and, if possible, a more advanced microarray methodology. Because only a subset of BMSCs is a true stem cell population, it is also possible that a different profile would emerge if molecular markers are determined that could separate them from other, more committed stromal cells.

A high similarity in gene expression profile between BMSCs and UCB-derived adherent cells suggests that cells derived similarly from various sources may be defined by a very similar, but not identical, molecular signature. Obviously, a majority of the genes in this molecular signature are found to encode extracellular proteins, such as ECM components or growth factors/cytokines, indicating that the formation of proper extracellular environment and certain cytokine signaling networks may be crucial for the functional states of the cells.

More intriguingly, many of these genes in this signature, including CTGF, COLA1, COL1A2, COL3A1, COL4A2, COL6A1, COL6A2, COL6A3, IGFBP4, SPARC, CD164L1 (TEM1, CD248), MMP2, THY1, and ANTXR1 (TEM8), have previously been known as either normal or tumor endothelial cell (EC)-associated markers [38]. Coincidentally, all of these genes were found to encode extracellular and ectodomain-containing cell surface proteins, indicating that the extracellular environments of BMSCs might be more intimately related with respect to the developmental lineage and biological properties to ECs than previously conceived.

Of special note among those genes is ANTXR1/TEM8, which has been previously identified as a gene encoding an authentic cellular receptor for anthrax toxin [39]. This integral membrane protein has been found to be highly and selectively expressed in the plasma membrane of the epithelial cells of lung, skin, and intestine [40], but its preferential expression in BMSCs over other blood cells has not been reported elsewhere. The strong co-expression of this receptor protein and COL6A3, its natural ligand [41], suggests that the ANTXR1-mediated signaling pathway may play a significant role in BMSC's intrinsic function and also, possibly, anthrax pathogenesis. From a practical point of view, a von Willebrand factor-like ectodomain of this protein may serve as a molecular target for developing monoclonal antibody-based cell isolation and identification methodology.

In conclusion, the comprehensive molecular foundation of human BMSCs has now been established by DNA microarray-based genome-wide differential gene expression analysis. Our results here may facilitate future studies on molecular mechanisms underlying core biological properties of human BMSCs.

Disclosure of Potential Conflicts of Interest

The authors indicate no potential conflicts of interest.


This research was supported by Grant SC3200 from the Stem Cell Research Center of the 21st Century Frontier Research Program, funded by the Ministry of Science and Technology, Republic of Korea.