Transcriptome Profiling of Human and Murine ESCs Identifies Divergent Paths Required to Maintain the Stem Cell State



Human embryonic stem cells (hESCs) are an important source of stem cells in regenerative medicine, and much remains unknown about their molecular characteristics. To develop a detailed genomic profile of ESC lines in two different species, we compared transcriptomes of one murine and two different hESC lines by massively parallel signature sequencing (MPSS). Over 2 million signature tags from each line and their differentiating embryoid bodies were sequenced. Major differences and conserved similarities between species identified by MPSS were validated by reverse transcription polymerase chain reaction (RT-PCR) and microarray. The two hESC lines were similar overall, with differences that are attributable to alleles and propagation. Human–mouse comparisons, however, identified only a small (core) set of conserved genes that included genes known to be important in ESC biology, as well as additional novel genes. Identified were major differences in leukemia inhibitory factor, transforming growth factor-beta, and Wnt and fibroblast growth factor signaling pathways, as well as the expression of genes encoding metabolic, cytoskeletal, and matrix proteins, many of which were verified by RT-PCR or by comparing them with published databases. The study reported here underscores the importance of cross-species comparisons and the versatility and sensitivity of MPSS as a powerful complement to current array technology.


Human embryonic stem cells (hESCs) are a versatile and valuable source of tissue-specific stem cells in regenerative medicine, and the ability to manipulate their growth and differentiation is a major challenge. Many of their molecular characteristics are still poorly understood. All stem cells share properties of pluripotency and self-renewal capacity that are progressively restricted when stem cells undergo differentiation. Progress toward the molecular understanding of these properties has been made, with detailed work on candidate genes for these developmental decisions. An important complementary study is the comprehensive elucidation of the genetic components and programs regulating stem cell fate decisions. To this end, several groups have begun the analysis of the transcriptome of hESCs using the generation of expressed sequence tags (ESTs), serial analysis of gene expression (SAGE), microarray, and massively parallel signature sequencing (MPSS) [18].

ESCs can be propagated as undifferentiated cells in large numbers more easily than can adult stem cells. ESCs are excellent tools for studying early events in development as the generation of ESC-derived embryoid bodies (EBs) recapitulates early embryo development. ESCs have been isolated from multiple species, including murine, swine, simian, and human blastocysts. Mouse and human ESCs are similar in that they grow as colonies of tightly packed cells on inactivated murine embryonic fibroblast (MEF) feeders or in conditioned medium (CM) derived from such MEFs [9]. Both stem cell populations have the potential to form teratomas and to differentiate in vitro into all three germ layers—namely, ectoderm, endoderm, and mesoderm. Many markers characteristic of undifferentiated cells, including oct-4, nanog, sox-2, and utf-1, are expressed by both populations of cells. The expression of these markers, together with the absence of differentiation markers, constitutes a signature profile of undifferentiated ESC cultures irrespective of their species origin [2,8].

Nevertheless, important differences exist in the growth rates, culture requirements, and marker expression of human and murine ESCs. This divergence has generally been ascribed to fundamental differences in the pathways that regulate self-renewal, apoptosis, and proliferation [4,7]. Some examples include SSEA1, SSEA3, and SSEA4 expression [10]; the ability to differentiate into trophoblasts [11]; and the dependency on leukemia inhibitory factor (LIF) [12,13]. These multiple reported differences raise the possibility that additional differences exist and provide a compelling rationale to comprehensively map the transcriptome of ESCs.

Human and mouse ESC transcriptomes have been individually mapped to varying depths and breadths with regard to their respective genomes [2, 4, 5, 8, 14–16], though no detailed pairwise comparisons have been performed. Sato et al. [2] compared human and murine ESCs using microarray. More recently, Ginis and colleagues [7] compared the expression of about 400 genes in human and murine ESCs and showed that at least a quarter of the genes tested have significant differences in their expression. These results suggested that the differences represent, at best, a small fraction of the variations that exist and that a large-scale analysis would identify more important differences.

Comparisons reported so far have been limited by the availability of cross-species and homologous arrays. Other large-scale techniques, such as SAGE and the generation of ESTs, have not been used, perhaps because of the cost and the limitations in gene annotation. More recently, better annotation of genomic data and improvements in technology, together with development of alternative techniques such as MPSS [17], have permitted a deeper and more complete mapping of transcriptomes at a significantly cheaper price than with conventional SAGE and EST generation. The underlying principle of MPSS, like SAGE, is that a signature sequence of 20 bases starting from the 3′-most DpnII (GATC) site is generated for each transcript. In MPSS, at least 1 million 20-base tags are identified. Considering that an estimated 200,000–300,000 transcripts exist in a single cell, this method theoretically allows for all transcripts in a cell to be measured without the constraints of probe availability. It provides an unprecedented coverage in depth and breadth of any transcriptome at a greatly enhanced sensitivity compared with the average SAGE library tag generation, which is typically tens of thousands of tags. MPSS analysis measures transcript levels using a standard unit of measurement, transcripts per million (tpm), rather than a relative unit in reference to a biological RNA standard, thus allowing one to estimate the copy numbers of different transcripts per cell and compare expression patterns across homologous cells of different species. Furthermore, MPSS has the ability to detect novel transcripts.

We have chosen MPSS to examine gene expression in ESCs from the murine E14 line and in independently derived hES Clines:The HES-2 line is from ES Cell International(Singapore,, and the pooled cell lines H1, H7, and H9 are from WiCell Research Institute, Inc.(Madison, WI, MPSS has allowed us to assess the complexity of ESCs and EBs and to generate a far more exhaustive list of differences and similarities between mouse and human ESCs than has previously been reported. By comparing gene expression between hESC lines, we also identified culture, developmental, and allelic differences. The usefulness and power of developing such a comprehensive expression database is highlighted by our ability not only to identify putative signaling or biochemical pathways active in ESCs but also to assess the integrity of these pathways from receptors to signaling intermediates and then target substrates at the transcript level.

Materials and Methods

Cell Culture

For all mouse cell work, the ESC line E14 was used. Undifferentiated mouse ESCs (mESCs) were maintained on inactivated MEFs in ESC medium consisting of Dulbecco's modified Eagle's medium (DMEM; Gibco Life Technology, Gaithersburg, MD, supplemented with 20% defined fetal bovine serum (FBS; HyClone, Logan, UT,, 1% nonessential amino acids (Gibco), 0.1 mM β-mercaptoethanol (Gibco), penicillin/streptomycin (Gibco), and CHO-LIF-conditioned medium (glycosylated LIF produced in Chinese hamster ovary cells generated in the author's B.L. laboratories, equivalent of at least 1,000 U/ml of LIF) to maintain pluripotency. Before collecting cells for RNA, fibroblasts were separated from mESCs by trypsinization and transient adherence for 15 minutes; this separation selectively removes the fibroblasts. Undifferentiated mESCs were then grown feeder-free on gelatin-coated plates for two more passages in the medium described above, prior to the harvesting of RNA or for differentiation into mouse EBs (mEBs). The mEBs were generated by trypsinization and dissociation of undifferentiated mESCs into single-cell suspension at a density of 4 × 105 cells/ml in the above ESC medium but without the LIF supplement. These cells were cultured in nonadherent bacterial Petri dishes, and the medium was initially replaced after 48 hours and subsequently every 24 hours. RNA was isolated on day 4 of differentiation.

The human HES-2 line [18], provided by and cultured at ES Cell International, was used for extracting RNA by the Genome Institute of Singapore (GIS; and subsequently generating the GIS hESC MPSS dataset. These cells (referred to as hESCsESI) were grown on inactivated MEFs in DMEM (Gibco) containing 20% defined FBS (HyClone), 8 ng/ml of basic fibroblast growth factor (bFGF; Gibco), 0.1 mM β-mercaptoethanol, 2 mM L-glutamine, 1% nonessential amino acids, 1% insulin-transferrin-selenium supplement (Gibco), and 0.5% penicillin/streptomycin (Gibco). The hESCsESI were passaged every 7 days by mechanical splitting under a dissecting microscope and dividing the individual colonies into approximately eight pieces. Cells were collected over a period of time, spanning passage numbers 105–115. For RNA isolation, colonies were cut just outside of the inner button and just inside the outer edge of each colony, and the intervening cells were harvested.

The MPSS datasets from NIA (National Institute of Ageing) hESCs and EBs were generated from RNAs of the hESC lines H1, H7, and H9 [10] and are referred to as hESCsWi and hEBsWi. These cells were maintained and passaged under feeder-free conditions as described [9]. Briefly, CM was generated from primary MEFs cultured in hESCWi media comprised of 80% knockout DMEM (KO-DMEM; Gibco), 20% knockout serum replacement (Gibco), 0.1 mM β-mercaptoethanol, 1 mM L-glutamine, 1% nonessential amino acids, supplemented with 4 ng/ml human bFGF (Gibco). This medium was collected daily and used immediately for feeding hESCWi cultures. MEFs for generating CM were re-fed daily and used for a maximum of 7 days. Before it was added to the hESCWi cultures, the MEF CM was supplemented with an additional 4 ng/ml of human bFGF (Gibco). The hESCWi cultures were maintained on Matrigel in this CM and were passaged by incubation in 200 U/ml collagenase IV (Gibco) for 5–10 minutes at 37°C and then gently dissociated into small clusters in CM. Cells were passaged once every week. The hEBWi cultures were formed as described previously [19]. Briefly, undifferentiated hESCWi cultures were harvested by incubation with 200 U/ml collagenase at 37°C for 5–10 minutes. The cells were gently scraped from the dish and resuspended in ultra low attachment polystyrene plates (Corning, Acton, MA, in medium comprised of KO-DMEM, 20% FBS, 1% nonessential amino acids, 1 mM glutamate, and 0.1 mM β-mercaptoethanol. RNA was isolated on day 12 of differentiation.

RNA Isolation

All RNA purifications were done using Trizol, following the manufacturer's protocol. For the pooled samples of hESCsWi and hEBsWi, equal quantities of RNA from the three cell lines were combined. All RNA samples were initially evaluated for the presence or absence of ESC differentiation markers by reverse transcription polymerase chain reaction (RT-PCR). The quality of all RNA preparations was confirmed prior to MPSS analysis by Agilent Bioanalyzer (Palo Alto, CA,

MPSS Analysis

Poly(A)+ RNA was isolated from total RNA samples and used to generate cDNA, which was subsequently digested with the restriction enzyme DpnII. Poly(A)-containing DpnII restriction fragments were purified by hybridization to oligo(dT). An adapter was added to the 5′ end of the fragment that directed a type II restriction enzyme, MmeI, to digest within the cDNA fragment 20 base pairs (bp) from the DpnII site (starting at G of GATC). At this point, all cDNA species (signatures) were a uniform length of 20 bp. A second adapter was added to the 3′ end of each signature. These uniform-length cDNA signatures were subsequently placed into the Mega-clone vector, and their sequence was determined following the previously published protocol [17,20] of Lynx Therapeutics (Hayward, CA,

The abundance for each signature was converted to tpm for the purpose of comparison between samples [21]. Only reliable and significant signatures were considered, this being a signature present in at least two MPSS runs out of multiple runs and presented as at least 4 tpm in at least one sample. The tpm shown in all datasets is the average of four runs.

MPSS classifications are as follows:

Class 1—forward strand, polyA signal, polyA tail, 3′-most

Class 2—forward strand, polyA signal, 3′-most

Class 3—forward strand, polyA tail, 3′-most

Class 4—forward strand, no polyA info, 3′-most

Class 5—forward strand, no polyA info, not 3′-most

Class 22—unknown orientation, polyA signal, last before signal

Class 23—unknown orientation, polyA tail, last before tail

To simplify the MPSS data analysis, multiple signatures mapping to the same Unigene ID (Mm build 130 and Hs build 163) [22] for each dataset were combined into one tpm count as follows: the sum of tpm for signatures of class 1, 2, 3, 22, and 23, if any are found; if those are not found, the sum of class 4; and if still nothing is found, the sum of class 5 signatures. The HomoloGene database ( was used to map human and murine orthologues.

While preparing this manuscript, we established a stem cell database to host all the MPSS results we discuss here. Readers can reach this site at the following GIS link: To organize the data, each of the unique tags was assigned a unique ID and then was mapped to current genome assembly (mm.5 for mouse; hg.17 for human). Those tags that have genomic coordinates were annotated based on the UCSC (University of California, Santa Cruz) genome browser annotation database ( Readers are able to navigate and search for genes of interest to obtain information about expression levels, to compare cell types, and to link to other databases. For detailed information, please refer to the GIS Website.

Focused-Chip Assay and RT-PCR

Focused microarray chips (SuperArray, Frederick, MD, were prepared and hybridized with labeled total RNA following the manufacturer's protocol. RT-PCR was performed following standard established protocols. The RT-PCR primers used are shown in supplementary online data 1.


Evaluation of Transcriptome Complexity in ESCs by MPSS Analysis

Two MPSS datasets were generated from RNAs of undifferentiated feeder-free mESCs and day-4 mEBs. Three MPSS datasets were generated from RNA of hESCs, two representing undifferentiated hESCsESI and hESCsWi and one from 12-day hEBsWi. The hESCWi RNA was from a pool of three National Institutes of Health (NIH)–approved lines (H1, H7, and H9) from WiCell, grown feeder-free [23], and hEBWi RNA was derived from EBs from these three cell lines. The hESCESI line was grown in the presence of MEF feeders.

RNA samples that passed through quality-control checks were subjected to MPSS analysis (see Materials and Methods). Signature sequence tags of 20 bp in length were generated to a depth of greater than 2.2 million tags for each sample. Tag counts of each unique signature were expressed as tpm. From the MPSS libraries, the total number of tags successfully sequenced from four different runs was 2,660,962; 2,367,247; 2,295,140; 2,403,315; and 2,591,008 for mESCs, mEBs, hESCsESI, hESCsWi, and hEBsW, respectively. Distinct signatures present in at least two MPSS runs and presented as at least 4 tpm per run totaled 13,824; 9,845; 20,027; 23,500; and 17,278 for mESCs, mEBs, hESCsESI, hESCsWi, and hEBsWi, respectively.

We first evaluated the total complexity of the signature generated for each sample, expressed as total significant signatures with the cumulative tpm (with cutoff at >10 tpm) of signature distributions for mESCs, mEBs, hESCsESI, hESCsWi, and hEBsWi , as shown in Table 1 (for complete data, see supplementary online data 2 for hESCs and hEBs and supplementary online data 3 for murine equivalents). The distributions of the number of unique signature tags and their percentages are compiled cumulatively from the highest abundant signatures (0.04%–0.20%) to the lowest abundant signatures (total 55%–70%).

Table Table 1.. Distribution of genes with expression levels from >10,000 tpm to >10 tpm in murine E14 embryonic stem cells (mESCs); day-4 murine embryoid bodies (mEBs); human HES-2 (hESs ESI); pooled human ESC lines H1, H7, and H9 (hESCsWi); and human day-12 EBs derived from hESCsWi
  1. a

    Only about 20% of expressed genes in ES or EBs are expressed at a frequency of >50 tpm.

Abundance (tpm)No. of unique signatures%No. of unique signatures%No. of unique signatures%No. of unique signatures%No. of unique signatures%
Total signature count13,824100.009,845100.0020,027100.0023,500100.0017,278100.00
No. of distinct Unigene clusters6,712 5,779 9,093 9,953 8,950 

Despite the difference in transcriptome complexity, the distribution of signature sequences based on abundance was strikingly similar in human and mouse ESCs. Fewer than 2% of the signature sequences were expressed at the high level of greater than 1,000 tpm; more than 70% of signatures were expressed at less than 50 tpm; and more than 30% of all signatures were present at a level of 10 tpm or lower (Table 1 and supplementary online Table 1). The typical detection limit by microarray analysis and SAGE is estimated at around 55 tpm [21,24].

The unique signature sequences were then mapped to Unigene clusters (Mm.130 and Hs.163), resulting in 6,712; 5,779; 9,093; 9,953; and 8,950 unique Unigene IDs identified in mESCs, mEBs, hESCsESI, hESCsWi, and hEBsWi, respectively (Table 1; see supplementary online Table 1 and accompanying text for explanation). The reduced number of Unigene signatures resulted from multiple signature sequences mapping to the same Unigene cluster. Thus, the complexity of ESCs is comparable to that seen in somatic cell populations examined by MPSS [25]. Interestingly, for both the mouse and human samples, the undifferentiated cells had a slightly higher level of complexity than the corresponding EBs had. Examination of the most abundant genes showed that, to a large extent, the top 200 or so genes were comprised of ribosomal, mitochondrial, and housekeeping genes, while growth factors, transcription factors, and regulators of gene expression were expressed in the low tpm range. The low abundance of regulatory and biologically relevant genes highlighted the importance of analyzing expression at high resolution by methods such as MPSS.

MPSS Provides a Robust Assessment of the ESC State

Theoretically, MPSS at greater than 2.2 million signatures generated per sample should provide relatively comprehensive coverage of gene expression in a given cell type. To directly ascertain the quality of data, we first examined a list of ESC-specific genes that are known to be expressed at moderate abundance in murine and human ESCs. As shown in Table 2, genes such as oct-4/pou5f1, sox-2, utf-1, and tdgf-1 were well represented in MPSS-derived transcriptome maps of both human and murine ES lines. Markers of differentiation known to be upregulated in differentiating EBs (e.g., COL4A2 [collagen type IV] and AFP [α-fetoprotein]) showed the expected increase in transcript frequency as ESCs differentiated.

Table Table 2.. Expression frequency (tpm) of selected ESC–specific genes and differentiation markers (COL4A2 [collagen IV] and AFP [α- fetoprotein]), along with housekeeping markers (GAPD and ACTB [actin])
  1. a

    Abbreviations: hEB, human embryoid body; hESC, human embryonic stem cell; mEB, mouse embryoid body; mESC, mouse embryonic stem cell.

GeneMouse IDHuman IDmESCsmEBshESCsESIhESCsWihEBsWi

As a further test of general robustness, the MPSS data were compared against a list of 283 “ES-specific” genes derived from the intersection of three independent studies comparing murine ESCs with various differentiated cells by microarray analysis aimed at identifying genes preferentially expressed in mESCs [16]. Table 3 shows a list of selected examples of these genes and the corresponding MPSS tpm values from mouse and human ESC and EB datasets and the average rank of each gene in the three microarray datasets (complete dataset in supplementary online data 4).

Table Table 3.. Comparison of ESC–enriched gene expression in human and murine ESCs
  1. a

    Selected list of genes common to three microarray studies [16] to identify murine genes specifically or preferentially expressed in mESCs (by comparing ESCs with non-ESCs). The MPSS tpm readings of the corresponding murine genes are shown. The average rank column refers to the average ranking of the murine genes from the three microarray profiling studies: Rank 1 is the gene with the highest difference between ESCs and non-ESCs in all three microarray studies. The table shows the MPSS readings for the corresponding homologous genes in hESCs and hEBs. It illustrates examples of the failure to obtain MPSS readings due to (a) the signature having repeat sequence (human nanog and zfp42/rex 1); (b) the absence of a DpnII site (murine trap1a); (c) the absence of a homologue (trap1a).

  2. b

    Abbreviations: hEB, human embryoid body; hESC, human embryonic stem cell; mEB, murine embryoid body; mESC, murine embryonic stem cell; MPSS, massively parallel signature sequencing.

GeneMouse IDHuman IDmESCsmEBshESCsESIhESCsWihEBsWiAvg. rank of three microaray studies
NANOGMm.6047Hs.329296112456Falls in repeat sequence4
TRAP1AMm.1297NONENo DpnII Homologue unknown7
ZFP42Mm.285848Hs.335787245110Falls in repeat sequence2

This initial assessment of MPSS analysis allowed us to make several pertinent observations about the interpretations and usefulness of MPSS-generated datasets. The first observation was that almost all the genes detected by micro-array can be detected in mESCs by MPSS. While there was good correlation between transcript presence or absence as measured by MPSS and by microarray chip analysis, there was not a highly predictable correlation between transcript levels estimated by these two methods. These differences may be due to compression of signal intensities that is often observed in microarrays.

The second observation was that some genes detected by microarrays were not detected by MPSS. This arose from some technical limitations of the MPSS technology that include failure to identify cDNAs lacking a DpnII site (e.g., murine trap1a, Table 3), cDNAs containing a double palindrome within the tag (preventing sequencing by MPSS), and cDNAs with the respective tag falling in a repeat region (e.g., human nanog and rex-1). For these reasons, it is important to know the tag status of the genes of interest before concluding that it is not expressed based on the MPSS data alone. From EST data, trap1a is known to be expressed in mESCs, and likewise with nanog and rex-1. For all subsequent analysis of genes presented in the remaining figures, we took into consideration the technical limitations of MPSS data before calling a tag count as zero.

A third observation was that mouse and human ESCs appeared to differ in fundamental ways based on differing expression levels between homologous genes identified from our analysis. Many of the differences could be verified by RT-PCR (see next section), suggesting that MPSS can be used for cross-species comparisons. Thus, while MPSS was unable to detect a small fraction of genes, this methodology appeared sensitive and reliable. Additional expression data, not described here, further confirmed that MPSS analysis accurately and robustly described the transcriptome of human and murine ESCs and revealed true differences between the species.

Global Comparison of Mouse and Human ESC Transcriptome

To capture an overall impression of the similarities and differences between the ESCs, we compared the transcriptome of human and murine ESCs on a global scale in which homologues that could be reliably identified were compared in a pairwise manner and displayed in dot plots (Fig. 1). Based on 5,921 identified homologous genes, transcriptomes of mouse cells (mESCs) and human cells (hESCsESI) were significantly different, with a poor correlation coefficient of .41 (Fig. 1B). This degree of correlation is less than correlation typically observed between different lineages of the same species compared using microarray profiling. The coefficient was also lower than that between ESCs and their differentiated derivative EBs: .82 for mouse (Fig. 1C) and .49 for human (Fig. 1D). The discrepancy between the ESC/EB correlation coefficients was most likely a result of the difference in the length of time allowed for EB formation: 4 days for murine and 12 days for human cells. Hence, murine day-4 EBs were less differentiated than human day-12 EBs.

Figure Figure 1..

Global comparison of hESC and mESC transcripts. Scatter plots for murine and human homologous genes comparing (A) hESCsWi with hESCsESI (10,084 data points; at least one tpm per Unigene Hs.163 cluster found in either sample); (B) hESCsESI with mESCs (5,921 data points; one tpm per mouse Mm.130 or human Hs.163 for which homology is known, found in either mouse or human); (C) mESCs with mEBs (6,889 data points; one tpm per Unigene Mm.130 cluster found in at least either ESCs or EBs); (D) hESCsWi with hEBsWi (10,182 data points; one tpm per Hs.163 found in either ESCs or EBs). All scatter plots were drawn after the removal of ribosomal proteins (with mitochondrial genes filtered out in the original lists). The corresponding correlation coefficients are shown in the panels. Abbreviations: EB, embryoid body; hESC, human embryonic stem cell; mESC, murine embryonic stem cell.

Thus, despite the overall similarity in their self-renewal capacity, the expression of some ESC markers and their pluripotential capabilities, human and murine ESCs differ significantly from each other on a global scale. The low correlation is unlikely to be attributable to major technical issues, as the two independently derived and maintained hESC populations (hESCsWi and hESCsESI) showed a very high degree of correlation: .90 (Fig. 1A). Overall, these results confirmed that there are fundamental differences in the transcriptomes of human and mouse ESCs that cannot be attributed to differences in annotation and species-specific differences in MPSS analysis.

The differences between murine and human transcript levels ranged from one- or twofold to over 50-fold, and even genes known to be important for ESC self-renewal varied by as much as six- to sevenfold. For example, the oct-4 level was 2,173 tpm (hESCsESI) or 658 tpm (hESCsWi) in hESCs and 388 tpm in mESCs. Therefore, by a global pairwise comparison, we developed sublists of genes that varied between human and murine ESCs by 5-fold, 10-fold, or 50-fold and were expressed at 50 tpm or higher (Table 4 and supplementary online data 5, 6, and 7). At the least stringency (>fivefold difference, tpm >50-fold if the other species' corresponding tpm is zero), we found 1,153 genes higher in human than murine ESCs (supplementary online data 5A) and 427 genes higher in murine than human ESCs (supplementary online data 5B). At the highest stringency (>50-fold differences, tpm >250 if the other species' tpm is zero), 101 genes were found to be higher in human ESCs (supplementary online data 7A) and 64 in murine ESCs (supplementary online data 7B).

Table Table 4.. Summary of global comparison of human and mouse embryonic stem cell (ESC) gene expression profiles
  1. a

    Three sets of criteria for murine and human differences were used: a fivefold and above difference with >5-fold tpm as the least stringent criterion (supplementary online data 5), >10-fold difference with a tpm of more than 100 for mid-stringency (supplementary online data 6), and >50-fold differences with a tpm of more than 250 (supplementary online data 7) as the most stringent criterion for comparison.

 Least stringent (>5-fold)Moderately stringent (>10-fold)Most stringent (>50-fold)

The differences noted between human and murine ESCs cannot be discussed in complete detail, and readers can examine and use the supplementary online data (5, 6, and 7) for detailed information.

Genes differentially expressed were functionally categorized (by the gene ontology classification) to determine if differences were restricted to particular classes.

Data depicting the global comparison of the genes showing greater than fivefold difference between species was plotted as a pie chart (Fig. 2). As can be seen, human and murine ESCs differ from each other in a wide spectrum of genes, with the largest difference being due to “unknown” genes (22%).

Figure Figure 2..

Global comparison between the species based on the least stringent criteria of >10-fold differences. The categories of genes as derived by the gene ontology classification of genes are shown with the respective percentage distribution. The hES and mES cells differ in a wide spectrum of genes, with the largest cause of the difference being due to “unknown” genes (22%). Abbreviations: hES, human embryonic; mES, murine embryonic.

A subset of genes that differ between species, as defined by MPSS readings, are shown in Table 5, with further verification by RT-PCR of selected genes shown in Figure 3B.

Table Table 5.. Embryonic stem cell (ESC) genes differentially expressed between species
  1. a

    Genes that show >10-fold difference in tpm between human and murine ESCs were computed from the massively parallel signature sequencing data. This table shows a randomly selected list of genes, including SLC2A8, SLC2A1, and ATP5G1, to illustrate the major differences between murine and human ESCs in glucose metabolism (see text). Boldface indicates genes that were analyzed by reverse transcription polymerase chain reaction, shown in Figure 3B.

Gene nameMouse IDHuman IDmESCshESCsESIhESCsWi
hESCs > mESCs
mESCs > hESCs
Figure Figure 3..

Difference between hESCs and mESCs. Analaysis by RT-PCR was done to validate the massively parallel signature sequencing demonstration of differences between species. Total RNA was isolated from hESCs and mESCs, and the total RNA was reverse transcribed in the presence of oligo-dT. PCR was then performed by using gene-specific primers. The PCR products were electrophoresed in 2.0% agarose ethidium bromide gels. To confirm the quantity of reverse-transcribed cDNA in hESCs and mESCs, semiquantitative RT-PCR was performed using 2- to 16-fold dilutions of each first-strand cDNA reaction mix with primers for G3PDH at 28 cycles. (A): The quantity of RT products in hESCs and mESCs was equivalent. G3PDH was used as an internal control. (B): Examples of genes differentially expressed between human and murine ESCs. Abbreviations: hESC, human embryonic stem cell; mESC, mouse embryonic stem cell; RT-PCT, reverse transcription polymerase chain reaction.

Using MPSS to Identify Distinctive Molecular or Biochemical Pathways

Despite having many common characteristics, the large number of differences observed between murine and human ESCs suggests that species-specific transcripts are likely to define biological pathways that distinguish murine and human ESCs. These pathways would include genes for cytokine and signal transduction, membrane protein, and structural and matrix genes. A full list of the differences is provided in the supplementary online information, but we have highlighted some pathways here. To assess the molecular basis of growth differences between human and murine cells, we queried the MPSS databases for >50-fold differences between them. In the 63 genes of murine > human category, 5/63 of the genes (highlighted in supplementary online data 7B) were directly involved in generating ATP by oxidative phosphorylation. This suggests that mESCs have a greater capacity to generate ATP and have a higher metabolic activity powered by mitochondrial oxidation. Consistent with their higher metabolic activity, mESCs have more GLUT1/SLC2A1 transcripts than hESCs have, while hESCs have more GLUT8/SLC2A8 transcripts (Table 5). GLUT1 maintains basal glucose uptake for metabolism in many cell types, including oocytes, and many stages of embryonic development through the blastocyst stage [26,27] and haploinsufficiency of GLUT1 results in deficient glucose transport [28]. However, GLUT8 is an insulin-regulated glucose transporter that translocates from an intracellular pool to the plasma membrane upon insulin stimulation [29]. Therefore, the differential transcript levels of GLUT1 and GLUT8 in human and mouse ESCs suggest that glucose uptake is more efficient and less insulin-dependent in mESCs and provides a biochemical basis for their higher level of oxidative phosphorylation. In contrast, glucose uptake in hESCs is likely to be insulin-dependent, and this may be the underlying biochemical basis for the need to optimize hESC culture media but not mESC media with insulin supplement.

To examine further the ability of MPSS to derive biologically relevant insights from transcriptomes, we selected four specific signaling pathways that have prominent roles in the growth and development of ESCs: LIF, gp130, FGF, Wnt, and transforming growth factor–beta (TGF-β) pathways (Tables 6 and 7, Fig. 4).

Table Table 6.. Comparison of leukemia inhibitory factor (LIF), fiberblast growth factor (FGF), and Wnt pathway genes in human and murine embryonic stem cells (hESCs and mESCs, respectively) (see Fig. 4)
GeneMouse IDHuman IDmESCshESCsESIhESCsWi
LIF signaling pathway
FGF signaling pathway
Wnt signaling pathway
Low-density lipoprotein receptor-related protein
Table Table 7.. Transforming growth factor–beta (TGF-β) signaling pathway in murine and human embryonic stem cells (mESCs and hESCs, respectively)
  1. a

    Massively parallel signature sequencing (MPSS) readings (tpm) for genes known to be in the TGF-β pathway were compared with the results from probing a focused array for the TGF-β pathway using RNA from murine E14 ESCs and hESCsWi. Detection or nondetection of transcripts by the array are shown in the table as presence or absence (+ or –), respectively. The results showed a good concordance between MPSS and array analysis. MPSS also is shown to be a more sensitive assay since many genes that are not detectable by array were picked up by MPSS. The MPSS results for two genes, smad7 and smad4, not included (NI) in the microarray chip, are included in the table.

GeneMouse IDHuman IDmESCshESCsESIhESCsWimESCsFocus array hESCsWi
Figure Figure 4..

Cross-species comparison of the expression levels (tpm) of genes from key signaling pathways and the reverse transcription polymerase chain reaction validation of massively parallel signature sequencing readings. (A): LIF and LIF transducers. (B): FGF and FGF receptors. (C): Wnt/β-catenin pathway. Abbreviations: hES, human embryonic stem; mES, mouse embryonic stem.

LIF and LIF Transducers

The propagation of mESCs depends on the presence of LIF to engage a heterodimeric cytokine receptor complex consisting of gp190 LIF-specific receptor chain (LIFR) and the gp130 chain, a common component of various cytokine receptors. The LIFR complex activates Janus-associated tyrosine kinases (JAK), which then phosphorylate the signal transducer and activator of transcription (STAT) [30]. Unlike mESCs, hESCs are strikingly unresponsive to LIF-mediated proliferation and maintenance of the undifferentiated state. An explanation for this difference is provided by the MPSS data, which showed that murine, but not human, ESCs express LIFR transcripts (Table 6) together with significant levels of JAK and STAT3. Transcripts for gp130 were absent in hESCs, although upon differentiation, human EBs expressed LIFR (32 tpm, supplementary online data). The absence of LIFR and JAK in hESCs, along with higher levels of SOCS genes (which inhibit LIF-mediated signaling), is consistent with the failure of LIF to support hESC self-renewal and suggests that other members of the LIF/interleukin-6 signaling family cannot substitute for LIF. However, the presence of STAT3 in hESCs, though significantly lower than in mESCs by MPSS, raises the possibility of recruitment and activation of STAT3 by an alternate LIF-independent pathway. Intriguingly, MPSS indicated that mESCs had no or low gp130 transcripts. This low level was supported by the lack of mESC-derived gp130 ESTs in the public databases (see Mm.250251). However, RT-PCR analysis revealed that besides LIFR, transcripts for gp130 and Stat3 were easily detected in mESCs (Fig. 4A).

FGF and FGF Receptors

Basic FGF (FGF2) is currently used for the propagation of hESCs [12], suggesting a requirement for Fgf signaling in the maintenance of pluripotency in these cells. Culture media for mESCs is not supplemented with any of the 22 known FGFs. Furthermore, mESCs (and the inner cell mass) are known to synthesize FGF4, which is required for paracrine signaling to the trophectoderm and the primitive endoderm for normal development to continue beyond the peri-implantation stage of development [31,32]. For these reasons, we compared the expression of molecules involved in the FGF signaling pathway. Clearly, hESCs are poised to respond to FGF signals, with three of the four FGF receptors (FGFR-1 ,-3, and -4) having substantial levels of expression (Table 6). In addition, frs2, one of the major downstream effectors of FGF receptor signaling, was detected by MPSS in hESCs. In contrast, mESCs contain a minimal level of fgfr1 (14 tpm) and zero tag counts for the other three FGF receptors and frs2. Curiously, hESCs express significant levels of FGF2 transcripts, whereas FGF2 was undetectable in mESCs.

As expected, FGF4 was found at significant levels in mESCs but was apparently absent in hESCs. Both the FGF2 and FGF4 MPSS data were confirmed by RT-PCR (Fig. 4B).

Wnt/β-Catenin Network

The Wnt-signaling pathways mediate important decisions between proliferative self-renewal and differentiation [3336]. Recently, Sato et al. [37] have suggested that the canonical GSK3/β-catenin pathway may be active in undifferentiated cells and inhibition of glycogen synthase kinase-3 (gsk-3) was sufficient to maintain the undifferentiated phenotype in both murine and human ESCs. Comparison of pathway gene expression confirmed that most of the components in the canonical Wnt/β-catenin signaling pathway were present in both cell types (Table 6). In hESCs, RT-PCR (Fig. 4C) generally confirmed the presence of the key components, as predicted by MPSS (Table 6). However, in mESCs, the low level of transcripts for most of the components of the canonical Wnt/β-catenin signaling pathway—including the absence of some key molecules—suggested that this pathway may not be active. MPSS readings, confirmed by RT-PCR, showed that APC [38,39], while low in hESCs, was not detected in mESCs. EST data also supported this finding. The low tpm readings for lrp5 and lrp6 [36] (Table 6) were confirmed by our negative RT-PCR result (Fig. 4C). In contrast, these components were present in human cells both by RT-PCR/MPSS and EST data (data not shown). The presence of an intact or complete Wnt-signaling pathway with all the attendant positive and negative regulators was further evidenced by the high expression of frizzled-related proteins (FRPs) in human but not mESCs (Table 6). FRPs are known to antagonize Wnt signaling [4043]. Therefore, hESCs appeared better poised than mESCs to engage the Wnt-signaling pathway.

TGF-β Superfamily

The TGF-β/bone morphogenic protein (BMP) family has been shown to play important and pleiotropic parts in early development and in regulating self-renewal of somatic stem cells [44,45]. It has also been shown that a combination of BMP4 and LIF can support propagation of mESCs in a serum-free condition [46]. We examined the expression of both the TGF-β/activin/nodal subfamily and the BMPs, along with their receptors and modulators, and the downstream Smads that they activate [47]. As shown in Table 7, the absence of all the receptor-associated Smads (Smad-1 ,-3 ,-5, and -8) in mESCs suggests that any TGF- β/BMP signaling in mESCs would likely be through a Smad-independent route. The presence of these receptor Smads in hESCs, as detected by MPSS, suggests that the Smad-mediated TGF-β pathway is functionally important to hESCs. Furthermore, there are distinctive differences in the ID, BMP, and activin receptor genes between the species. A TGF-β–focused chip containing probes for a spectrum of the TGF-β/BMP superfamily was compared with the MPSS data. The presence (+) or absence (–) of hybridization signals in the chip, as shown in Table 8, indicated a good concordance between the MPSS and the array results; however, the MPSS approach was more quantitative and sensitive. Overall, the results showed that significant differences exist between the species and that Smad-dependent TGF-β/BMP signaling appeared to be much more actively recruited in hESCs than in mESCs.

Table Table 8.. Genes with expression pattern conserved between murine and human embryonic stem cells (mESCs and hESCs, respectively)
  1. a

    Genes that showed downregulation during ES to EB transition were identified for both species and separated into three categories:

  2. b

    I: ES/EB ratio >2; if EB = 0, ES in both species <50 tpm (supplementary online data S8).

  3. c

    II: ES/EB ratio >5; if EB = 0, ES in both species <50 tpm (supplementary online data S9).

  4. d

    III: ES/EB ratio >10; if EB = 0, ES in both species >50 tpm (supplementary online data S10).

  5. e

    For each category, genes that are expressed in both species (in at least one of the hES lines) were annotated. The table shows the total of 16 genes from category I. Boldface indicates genes known to be ES specific and to have a role in ES pluripotency, as well as additional genes not previously associated with ESCs (discussed in text). For some genes, a differential expression pattern between ESCs and EBs was seen in murine and both human ES lines, suggesting that these genes may be prime candidates to examine further for a role in maintenance of the undifferentiated ESC state.


Similarities between Murine and Human ESCs

While we have highlighted differences between murine and human ESCs, we noted that similarities exist as well. In particular, some well-known genes thought to be involved or related to ESC self-renewal pathways were conserved (Table 2 and supplementary online data). These included expression of oct-4, sox-2, bmpr, nodal, lefty, tert, and cripto. We reasoned that if genes were coexpressed in both species, despite the overall low concordance, then this subset would likely be enriched for genes important in the self-renewal process of both species. If this was further limited to genes that were downregulated as EB differentiation occurred, then the specificity would be higher. Therefore, we identified genes that were expressed in both mouse and human undifferentiated ESCs and were low or downregulated upon differentiating into EBs. Three separate lists were generated based on levels of expression. List 1 (607 genes, supplementary online data 8) shows all the genes with an ESC/EB ratio of two-fold or higher, and an ESC <50 tpm in both species if EB is zero. List 2 (119 genes, supplementary online data 9) shows all genes with an ESC/EB ratio of five-fold or higher and an ESC <50 tpm in both species if EB is zero. List 3 (16 genes, supplementary online data 10) consists of genes with an ESC/EB ratio 10-fold or higher and ES >50 tpm in both species if EB is zero. The 16 genes in List 3 are shown in Table 8. As expected, known genes that are ESC-specific were identified (oct-4, leftB). Other known ESC-specific genes (see supplementary online data) fell within the 2- to 10-fold range such as dnmt31, utf-1, sox-2 , tdgf, and dppa2. Several additional genes not previously known to be conserved and elevated in ESCs were identified as well. Of particular interest was lin-28, a heterochronic gene known to be important in regulating the appropriate timing of differentiation [48]. Another gene, mortality factor 4 (morf412), is a member of a novel family of genes with transcription-like motifs that induces a senescent-like phenotype in immortal cell lines [49]. SUMO-specific protease 3 (senp3) is a member of a novel class of regulators of Sentrin/SUMO (small-ubiquitin-like modifiers) [50]. CCCTC-binding factor (ctcf) is a ubiquitous zinc finger (ZF) protein that is not only involved in transcriptional silencing or activating in a context-dependent fashion but also organizes epigenetically controlled chromatin insulators that regulate imprinted genes in soma [51,52].

Comparison of hESC Lines

The high similarity between hESCWi lines grown feeder-free and the hESCsESI grown on MEFs suggested that, overall, different hESC lines are similar and the differences observed between murine and human cells must represent fundamental species-specific differences. However, differences between human lines likely exist. We and others have noted some differences between ESC lines [4, 7, 23, 53], although no comprehensive comparison has been performed. We therefore examined the MPSS dataset to identify genes that showed a 10-fold or higher difference between human samples. A complete list is provided in supplementary online data 11A for hESCWi > hESCESI and 11B for hESCESI > hESCWi. Overall, even at a stringent criteria of 10-fold and higher, over 1,000 genes were highly expressed in hESCsESI and absent in hESCsWi. Several of these genes were shared by murine and human cells but not by the two ESC populations tested (supplementary online data 11). Figure 5 and Table 9 show a selected list of genes and a confirmation of the differential expression of a subset of these genes by semiquantitative RT-PCR. For example, differences in expression of collagen and BMP-related genes were seen. These were likely due to the difference in culture conditions, while other differences—such as those in FoxD3—represent allelic differences. Other differences noted include matrix proteins, junction proteins such as claudin, insulin-like growth factor binding proteins, and several novel genes of unknown function.

Table Table 9.. Differences between hESCESI and hESCWi lines
Wi > ESI
Collagen, type I, alpha 2Hs.232115062511,601
Collagen, type VI, alpha 3Hs.23324001991,444
SH3-domain binding protein 4Hs.17667013336
IGF-II mRNA-binding protein 3Hs.7944001090
Hypothetical protein FLJ20403Hs.30622101040
GLI-Kruppel family member GLI2Hs.1118670911
Homo sapiens cDNA FLJ14332 fis, clone PLACE4000344Hs.10005711233
Strawberry notch homologue 1 (Drosophila)Hs.3066651986
ESI > Wi
Hypothetical protein MGC20262Hs.35187118800
Forkhead box D3Hs.42421218800
CD99 antigenHs.2834771034381,258
Tuberous sclerosis 2Hs.90303136014
Hypothetical protein FLJ10374Hs.2181111200
Claudin 3Hs.2564010200
Insulin-like growth factor binding protein 2, 36kDaHs.4333269400
Figure Figure 5..

Differences between hESC lines. MPSS of hESCsESI and hESCsWi was examined for differential expression of genes at >10-, >50-, and >100-fold differences between the two ESC lines. Shown here are examples of genes expressed at markedly different levels (tpm) between the two cell lines. Ethidium bromide gel analysis of selected examples of genes from Table 9 (marked in bold) showed the concordance of the RT-PCR results with MPSS. Additional genes that are discussed in the text but are not in the table (rex-1, lif-R, fgf4) were included in the RT-PCR analysis. Abbreviations: hESC, human embryonic stem cell; MPSS, massively parallel signature sequencing; RT-PCR, reverse transcription polymerase chain reaction.

Some of these differences may be attributed to differentiated cell types known to exist, though at a relatively low level, in hESCs grown under feeder-free conditions [9]. Another source of sequence tag difference between the two hESC lines is from mouse cells contaminating the hESCESI line that was grown on mouse feeders, MEFs. We estimated that this contamination occurs at a frequency of approximately 0.3% (see supplementary online Table 2 for data and explanation of computation).

Overall, our results showed that MPSS is sensitive and versatile in successfully identifying multiple differences and similarities between and within species; it can be used to obtain a unique profile of each individual cell line. The results also indicate that murine and human ESCs differ fundamentally in the network of genes that are conscripted to confer their apparently similar cellular properties of totipotency and high self-renewal capacity.


Our study essentially generated directories of expressed transcripts in mouse and human ESCs before and after loss of pluripotency, as well as transcripts that are differentially expressed in human versus murine ESCs. To be rigorous in our analysis, we restricted assessment to class 1, 2, and 3 signatures that map uniquely to the genome, but we have presented the entire dataset in supplementary online information. The robustness and comprehensiveness of the databases were illustrated by the analysis of selected pathways, and we reported differences that were confirmed by at least one independent method of analysis. No gene presented in the text was reported as absent until the technical limitations of the MPSS methodology were considered or detection was confirmed in other cell types by MPSS. Users of the datasets posted in the supplementary online data should be aware of the technical limitations of MPSS when mining the database and interpreting the results. In addition, we also set different cutoff tpm values for criteria of various comparisons of gene expression. While this level of rigor prevented us from reporting all possible differences observed, it provided reliability and a lower limit to the number of results reported. Our results provide compelling evidence for both conserved and divergent pathways in the regulation of ESC pluripotent state and self-renewal in mouse and human.

MPSS data detected expression on the order of 6,000 to 10,000 mapped genes (or at least unique Unigene IDs) in each of the samples assessed, and this is consistent with previous reports of MPSS analysis. The distribution of gene frequencies was similar to most other cell types, with the most abundant genes being housekeeping genes that were common to most cell types. Only a few ESC-specific genes (e.g., Esg-1 and Utf-1 in murine cells) were expressed in the top 200 transcripts. Most cell-specific genes, including genes coding for transcriptional factors, cytokine receptors, and growth regulators, are actually present at low to very low levels that are likely to be missed by less in-depth analysis. For instance, a SAGE study [4] of two other hESC lines derived by the same group as derived the hESCESI line used here indicated Stat3 levels at 0 tpm in the HES3 line and 13 tpm in the HES4 line. The actual tag count for these was 0 of 67,807 total tags and 1 of77,208 totaltags, respectively. Compare this with our MPSS data in which Stat3 levels were determined to be 4 tpm in the hESCESI line and 22 tpm in the hESCWi line, this calculated from actual tag counts of 9 of 2,295,140 total tags and 53 of 2,403,315 total tags, respectively.

Perhaps the most important general observation was the remarkably low correlation coefficient between human and murine ESCs. One reason for the difference between mouse and humankind arises from the incomplete annotation of the human and mouse genomes, in particular the incomplete annotation of full-length 3′UTRs in which the furthest 3′ DpnII frequently reside. Examples of this are the mouse LIFR and human Fgf4, each having more than one Unigene cluster mapping to the true full-length mRNA. The class 1 tag for each is found in the furthest 3′ Unigene cluster (Mm.24003 and Hs.362432, respectively), neither of which is named appropriately as sequences within each do not overlap those from the clusters (Mm.149720 and Hs.1755, respectively) spanning the coding sequence. The low correlation coefficient of .42 between murine and human ESCs could not be attributed to differences in sensitivity, labeling efficiency, or other technical limitations, as the overall complexity as assessed by MPSS was similar and we restricted our analysis to genes for which homologues were reliably identified. Additionally, variation could not be attributed to major differences in culture conditions, as both the mESCs and hESCsESI were grown on feeders. Furthermore, the correlation coefficient of the two human populations was remarkably high (.90) despite the fact that they were grown in different laboratories under different culture conditions. The differences may represent species-specific gene expression. Another possibility is that some of the genes identified are dispensable for the stem cell state. The significant variation observed between murine and human ESCs also raises the possibility that human and murine ESCs may represent slightly different stages of early development or they may use independent pathways to maintain self-renewal.

While a detailed discussion of all the observed differences is impossible, we have highlighted a few differences that were independently verified. Our comparative analysis provided a basis for the differing growth requirements of human and mESCs. The much higher proliferation rate of mESCs, compared to hESCs, presumably had to be sustained by a higher metabolic rate. Consistent with a higher metabolic rate, mESCs expressed higher GLUT1 transcript, a major glucose transporter, while insulin-dependent GLUT8 was higher in hESCs.

Our results confirmed the lack of LIF and gp130 signaling in hESCs and the activity of this pathway in mESCs. Interestingly, levels of gp130 were low in mESCs, and examination of the corresponding Unigene cluster revealed no ESTs from ESCs. The low level of the gp130 transcript raises the possibility that this is a critical, tightly regulated step in LIF-mediated signaling.

The MPSS profile of FGFRs clearly indicates that hESCs are molecularly positioned to respond to extracellular FGF signals, whereas these same molecules are virtually absent in mESCs. As MPSS was not able to identify the known spliced isoforms of the FGF receptors and multiple FGFs work through the same receptors, further study is required to identify the FGF molecules interacting with the FGF receptors[1,3,4]expressed in hESCs. Considering that FGF2 (bFGF) is a common supplement to hESC media, it is surprising that both hESC lines synthesize their own FGF2. As FGFs work predominantly through a paracrine action, this could suggest that the apparent benefit of FGF2 supplementation works indirectly through the MEF feeder layer. Also of note is the differing expression of FGF4, a known transcriptional target of the synergistic action between oct-4 and sox-2 in mESCs [54] and an essential molecule in peri-implantation during embryo development. Unlike in the mouse, the MPSS and RT-PCR data indicate that hESCs do not synthesize FGF4. This absence or very minimal expression of FGF4 in the human may be indicative of a developmental difference between mouse and human ESCs.

It was clear that almost all the genes in Wnt/β-catenin pathway were expressed at significantly higher levels in human cells than in murine cells, as shown in Table 6. This suggests that the canonical Wnt/β-catenin signaling pathway was very likely not active in mESCs but was functional in hESCs [37]. Given the reported effects of a GSK-3β inhibitor on mESCs and the evidence for the activation of the PI3kinase/Akt pathway, we would suggest that the reported nuclear accumulation of β-catenin in undifferentiated ESCs may be due to endogenously active PI3 K/Akt signaling rather than active Wnt signaling.

Likewise as shown in Table 8, the absence of the receptor-associated Smads-1 ,-3, -5, and -8 in mESCs but their presence in hESCs indicated that the canonical TGF-β pathway may be operatively important to hESCs. Indeed, recent reports have suggested that TGF-β in combination with FGF may be sufficient to maintain hESCs (but not mESCs) in an undifferentiated state [55]. Furthermore, the MPSS data suggest that the known effects of BMP4 on mESCs is through a Smad-independent pathway.

Multiple differences were also identified in every category of genes examined. These included structural genes, metabolic pathways, and housekeeping genes as well. Nevertheless, similarities also existed. We reasoned that if such similarities exist despite widespread differences, then these may represent critical core pathways that are important for the undifferentiated state. Indeed, examining the lists generated (see Results), we identified multiple ESC-specific genes, including oct-4, tdgf1, sox2, utf-1, dnmtl, and leftB. The relatively large list of genes shown in supplementary online data 5, 6, and 7 suggests that additional common pathways required for stem cell self-renewal remain to be identified; the possible candidates include heterochronic genes, methylation agents, ubiquitin/SUMO genes, and components of the DNA repair machinery.

Our comparison of hESC lines maintained in separate laboratories revealed a high degree of similarity. We reasoned that most human lines were isolated from the same stage of development. Differential expression of some of the genes between the human lines may reflect allelic expression that is unique to each cell line. Some of the results may be a reflection of different methods used to culture and propagate the cells in the individual laboratories. The high overall similarity between hESC lines suggests that a core set of stem cell markers for all hESC lines can be generated and that cell populations can be identified by allelic differences as well. The variable expression of genes such as Rex, FoxD3, and LIFR seen in this comparison and in other experiments suggests that these molecules and pathways are not critical for maintaining hESC lines in culture. The expression level of certain ESC-specific genes may provide a prediction on the growth rate and stability of different human lines, but this will require additional detailed comparisons. Overall, it is clear that such a detailed analysis provides important insights into the biology of ESCs.

While we have highlighted the power of a large-scale analysis such as MPSS, it is important to remember that, as with any other methodologies, there are potential problems of which investigators need to be cognizant. For example, in a 20-nucleotide base pair run, which was ultimately used in this study, the β-catenin signature was excluded because of a sequencing error in one of the four routine sequence runs. In reviewing the MPSS raw data, we found that in the 17-nucleotide base pair sequence run, a unique signature for β-catenin was indeed present at 92 tpm. While an extremely rare occurrence, this discrepancy highlights the importance of verification. Furthermore, since the signature depends on DpnII sites and not all cDNAs contain a DpnII site, some genes will not be detected. A palindromic sequence within the signature after the DpnII site will result in a hairpin loop that prevents its sequencing, giving a false negative reading. Repetitive sequences in the signature result in nonspecificity of the signature and thus cannot be annotated. However, these errors and limitations occur very infrequently. In some instances in which the MPSS result showed no tags (zero tpm), expression of the gene could be detected by other methods such as SAGE, an EST search of the database, or RT-PCR. For most genes, however, the absence of expression by MPSS could be demonstrated by RT-PCR as well, confirming the validity of the MPSS assay. While it is important to be alert to possible sources of error in MPSS readings, the use of other methods does not undermine the overall reliability of MPSS in transcriptome analysis. It is important to emphasize, as well, that such errors are common to most large-scale analytical processes and suggest that comparisons across techniques and across species may be useful.

In summary, our analysis provides, for the first time, a direct in-depth comparison between murine and human ESCs. Our data provide unambiguous evidence for the presence of both convergent and divergent pathways critical for self-renewal. Our results highlight the similarities and differences between murine and human ESCs. Although there appears to be a core set of ESC-specific pathways that are conserved across species, other divergent pathways are equally critical and extreme care must be used in extrapolating mESC work to hESC analysis. Our results suggest that human cells isolated by different groups and maintained under different culture conditions are overall highly similar and that the few differences observed likely represent allelic differences or variation in the propagation of cells. The comprehensive database we have developed and deposited for public use may be explored for more detailed information and identification of other novel genes. This database will provide a unique resource for additional comparative genomic analysis and for identifying novel candidates critical for regulating ESC growth and survival, self-renewal, and differentiation.


This study was supported by A-Star (Singapore) and grants from NIH DK47636 (B.L.) and NIH (M.R.). We thank M. Bakre, Nicolas O. Fortunel, Huck Hui Ng, Leonard Lipovich, and Janet Buhlman for reading and helpful discussions of the manuscript.