• SAGE;
  • Human embryonic stem cells;
  • Transcriptome;
  • POU5F1;
  • REX1;
  • SOX2;


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References

Human embryonic stem (ES) cell lines that have the ability to self-renew and differentiate into specific cell types have been established. The molecular mechanisms for self-renewal and differentiation, however, are poorly understood. We determined the transcriptome profiles for two proprietary human ES cell lines (HES3 and HES4, ES Cell International), and compared them with murine ES cells and other human tissues. Human and mouse ES cells appear to share a number of expressed gene products although there are numerous notable differences, including an inactive leukemia inhibitory factor pathway and the high preponderance of several important genes like POU5F1 and SOX2 in human ES cells. We have established a list of genes comprised of known ES-specific genes and new candidates that can serve as markers for human ES cells and may also contribute to the “stemness” phenotype. Of particular interest was the downregulation of DNMT3B and LIN28 mRNAs during ES cell differentiation. The overlapping similarities and differences in gene expression profiles of human and mouse ES cells provide a foundation for a detailed and concerted dissection of the molecular and cellular mechanisms governing their pluripotency, directed differentiation into specific cell types, and extended ability for self-renewal.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References

Immortal human embryonic stem (ES) cells and their derivatives promise to revolutionize the future of reparative medicine through the development of stem cell-based therapies [15]. ES cells form teratomas when injected into severe combined immunodeficient (SCID) mice [3, 5] and can differentiate into a variety of cell types from all three primitive germ layers in vitro and in vivo [5, 69]; this distinguishes ES cells from other stem cells. Several lines of evidence suggest that human and mouse ES cells do not represent equivalent embryonic cell types [10]. In vitro differentiation of human ES cells leads to the expression of AFP and HCG, which are typically produced by trophoblast cells in the developing human embryo, while mouse ES cells are generally believed not to differentiate along this lineage. In addition, human ES cells express stage-specific embryonic antigen (SSEA)-3, SSEA-4, tumor rejection antigen (TRA)-1-60, and TRA-1-81 surface antigens prior to differentiation but only SSEA-1 upon differentiation, while mouse ES cells only express SSEA-1 prior to differentiation [3, 5, 11]. Human ES cell lines have heterogeneous genetic backgrounds and appear to behave differently in culture. For example, not all human ES cell lines are amenable to bulk and feeder-free culture protocols, doubling times differ considerably between different lines, and the degree of spontaneous differentiation in vitro also appears to show much variation [12, 13].

Several groups have used comparative data from microarray studies to propose a blueprint for the molecular basis of “stemness” in human and mouse stem cells [1416]. They have also demonstrated that a large proportion of the transcripts expressed in stem cells are expressed sequence tags (ESTs) with indeterminate functions. Recent evidence has suggested that a small, unique network of transcription factors, including Nanog, Oct-4, and Sox-2 may be sufficient to establish self-renewal and/or suppress lineage differentiation in mouse ES cells [1721]. Nevertheless, despite the proposed stemness molecular blueprint, many of the genes and molecular mechanisms involved in self-renewal, pluripotency, and differentiation in human ES cells are poorly understood. Moreover, considering the uniqueness of the human ES cell phenotype and the difficulty in obtaining embryonic tissues and preimplantation embryos for research due to ethical reasons, it is probable that many novel genes important for the stemness phenotype in human ES cells remain to be discovered.

We have shown previously that undifferentiated, pluripotent human ES cell lines can be derived from the inner cell masses (ICMs) of 5-day-old human embryos [5, 13]. Since human ES cells lines are capable of differentiating into all three germ layers despite the reported differences in their behavior in vitro, we hypothesized that a quantitative comparison of the transcriptome profiles of selected human ES cells lines might allow the determination of key regulators involved in the maintenance of the stemness property, as previously defined [15, 16], as well as help identify a basis for line-specific cellular and behavioral differences.

Serial analysis of gene expression (SAGE) allows quantitative characterization and has the added value over microarray expression profiling in its ability to identify novel splice variants, exons, and genes [2224]. Since SAGE libraries comprise discrete data, they can be subjected to pairwise comparison to statistically analyze the differential expression of genes [25] and to generate a comparative digital gene expression profile [24].

We have used SAGE to obtain the transcriptome profiles of two human ES cell lines, HES3 and HES4, which have different gender and ethnic backgrounds. SAGE should identify genes that comprise a distinct molecular signature of human ES cells. To delineate genes that were differentially regulated in human ES cells, the human ES SAGE libraries were subjected to pairwise comparisons with 21 normal and cancer SAGE libraries. Finally, comparison with the mouse ES SAGE library [26] was conducted to determine differences between the SAGE molecular signatures of ES cells between these two mammalian species.

Materials and Methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References

Colony Selection

HES3 (46XX, Chinese; passage 40) and HES4 (46XY, Caucasian; passage 40) cell lines (proprietary cell lines of ES Cell International; Singapore; were cultured on murine embryonic fibroblast (MEF) feeders. Human ES cell colonies were serially cultured according to protocols established previously [5]. Six-day-old human ES cell colonies that appeared morphologically undifferentiated (> 90%) were microdissected under a microscope. They routinely tested negative for two early differentiation markers, NEUROD1 and AFP, by quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR). Also, >95% of human ES cells stained positive for TRA-1-60, indicating minimal contamination with MEF or differentiated human ES cells [27]. Using the TRA-1-85 antibody, which detects the pan-human antigen Ok(a) that is present on all human cells but absent on mouse cells, fluorescence-activated cell sorting analysis indicated that >98.9% of the cells harvested were human ES cells [28]. Microdissection was made with a sterile 30-G needle into the perimeter of the colony to avoid selecting MEFs adjacent to the colony edge, and care was taken to avoid harvesting differentiated regions of the colony. Serially passaged human ES cell colonies by microdissection resulted in the growth of larger human ES cell colonies that were, on average, 200–300 μm in diameter. This made the selection of morphologically undifferentiated human ES cells easier.

SAGE Library Construction, Clone Preparation, and Sequencing

Total RNA was extracted from ∼800,000 undifferentiated human ES cells using TRIzol (Invitrogen; Carlsbad, CA; and Poly [A+] RNA subsequently prepared with Oligo(dT)-conjugated magnetic beads (Dynal Biotech; Oslo, Norway; SAGE library construction was performed with the I-SAGE kit (Invitrogen) according to the manufacturer's protocol. The anchoring enzyme was NlaIII and the tagging enzyme used was BsmFI. Concatemerized ditags were cloned into pZERO-1 (Ecogen; Barcelona, Spain; The ligated products were transformed into One Shot® Top 10 electrocompetent Escherichia Coli (Invitrogen), and transformants were selected on low-salt LB/zeocin agar plates. Blue-white selection was used to enhance the efficiency of selecting clones with longer concatemerized inserts [29]. Plasmids were prepared with the Wizard SV 96 plasmid purification kit (Promega; Madison, WI; DNA sequencing was performed using ABI Big Dye V3.0 or V3.1 sequencing kit (Applied BioSystems; Foster City, CA; and analyzed on the ABI 3100 capillary DNA sequencer (Applied BioSystems). The Gene Expression Omnibus (GEO) accession numbers for the human HES3 and HES4 SAGE libraries were GSM9220 and GSM9221, respectively.


Predesigned Assays on Demand and Assay by Design TaqMan probes and primer pairs were obtained from Applied BioSystems. Total RNA was extracted from undifferentiated and differentiated human ES cells and reverse transcribed using SuperScript II (Invitrogen). Differentiated human ES cells were obtained by subjecting them to high-density culture conditions for an extended period of 20 days. qRT-PCR analysis was conducted using the ABI PRISM 7000 Sequence Detection System (Applied BioSystems). After an initial denaturation for 10 minutes at 95°C, qRT-PCR was carried out using 40 cycles of PCR (95°C for 15 seconds, 60°C for 1 minute). Changes in gene expression levels were calculated using the ΔΔCt method after the data (in triplicates) were normalized to the 18S rRNA levels. qRT-PCR experiments were repeated at least once with reproducible results.


Gene expression was also determined by semiquantitative RT-PCR. Initial denaturation was carried out at 94°C for 2 minutes, followed by 35 cycles of PCR (94°C for 30 seconds, 55°C for 30 seconds, 72°C for 1 minute). Primers used were: ACTB: product 400 bp, 5′-TGGCACCACACC TTTCTACAATGAGC-3′, 5′-GCACAGCTTCTCCTTAA TGTCACGC-3; BTF3: product 281 bp, 5′-GAACTGCTC GCAGAAAGAAG-3′, 5′-ACTAGTCAGACTATCCGC AC-3′; CKS1B: product 409 bp, 5′-ACATGTCATGCTGC CCAAGG-3′, 5′-ACACTCAGCTTAGGCTGTGG-3′; CLDN6: product 373 bp, 5′-AGATGCAGTGCAAGGTG TAC-3′, 5′-CAAGTGCAGCACAGCAACC-3′; DNMT3B: product 433 bp, 5′-CTCTTACCTTACCATCGACC-3′, 5′-CTCCAGAGCATGGTACATGG-3′; ERH: product 495 bp, 5′-GAATGAATCCCAACAGTCCC-3′, 5′-TGGAACCAA CATTAAGTGACG-3′; FLJ10713: product 285 bp, 5′-CA GAGAAGTCGAGGGAAGAG-3′, 5′-GCTCAGCTTCA ATTGTTGGC-3′; FLJ21837: product 449 bp, 5′-GCAG CTTCTGAACATTTGGAC-3′, 5′-GCAGTAGTCTAGAA CACACC-3′; GJA1: product 492 bp, 5′-GGAGTTCAAT CACTTGGCGTG-3, 5′-CTTACCATGCTCTTCAATAC CG-3′; HESX1: product 309 bp, 5′-GGATTTCATTCCCT AGCGTGG-3′, 5′-GTGATTCTCTATGGGACCTTTTC-3′; HMGA1: product 469 bp, 5′-GAAGTGCCAACACCTAA GAG-3′, 5′-AGTGGGATGTTAGCCTTGTC-3′; LIN-28: product 420 bp, 5′-AGTAAGCTGCACATGGAAGG-3′, 5′-ATTGTGGCTCAATTCTGTGC-3′; NANOG: product 493 bp, 5′-GGCAAACAACCCACTTCTGC-3′, 5′-TGTT CCAGGCCTGATTGTTC-3′; NPM1: product 343 bp, 5′-TGGTGCAAAGGATGAGTTGC-3′, 5′-GTCATCATCTT CATCAGCAGC-3′; POU5F1: product 247 bp, 5′-CGRG AAGCTGGAGAAGGAGAAGCTG-3′, 5′-CAAGGGCC GCAGCTTACACATGTTC-3′; REX1: product 418 bp, 5′-TCTAGTAGTGCTCACAGTCC-3′, 5′-TCTTTAGGTAT TCCAAGGACT-3′; SOX2: product 370 bp, 5′-CCGCATG TACAACATGATGG-3′, 5′-CTTCTTCATGAGCGTCT TGG-3′; and TNFRSF6: product 396 bp, 5′-AGAGTGACA CACAGGTGTTC-3′, 5′-TGGCAGAATTGGCCATCATG-3′.

SAGE Data Analysis

Tag extraction and pairwise comparison were performed with the SAGE2000 software v.B (Invitrogen) and database construction and management with Microsoft Access and Excel. Tags with ambiguous bases, duplicate ditags, and ditags with abnormal length (< 22 or > 24 bp) were removed by SAGE2000. The SAGE tag to gene database based on UniGene Build #157 was used. Approximately 60% of all SAGE tags match more than one clustered UniGene entry [22, 30]. To partially overcome the problem of multiple ambiguous tag-to-gene assignments associated with the SAGE technique, we used two publicly available SAGE resources, the CGAP SAGEgenie ( [24] and the NCBI SAGEmap ( [31, 32] to assist in identifying the best SAGE tag for a particular gene. The assignment of molecular function of the proteins was based on the LocusLink database (

Statistical Treatment

The Z-test [33], based on the normal approximation of the binomial distribution, was used to determine p values for all pairwise library comparisons:

  • equation image

Since no a priori knowledge about the direction of the effects is available in SAGE experiments, all decision rules were formulated for a 2-sided test of the null [25]. The GEO accession numbers for the human SAGE libraries used were: GSM1498, GSM693, GSM765, GSM670, GSM671, GSM755, GSM731, GSM678, GSM686, GSM757, GSM761, GSM676, GSM728, GSM708, GSM668, GSM709, GSM785, GSM762, GSM719, GSM716, and GSM784. Excel analysis was used to determine the union/intersection of the 21 pairwise statistical tests. Monte Carlo simulation was also carried out to compare the HES3 and HES4 SAGE libraries using the SAGE2000 software.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References

HES3 and HES4 SAGE Libraries

Transcriptional profiling of mRNA isolated from undifferentiated human ES cells was performed using SAGE. Undifferentiated human ES colonies were carefully selected and individual SAGE libraries were constructed. A combined total of 145,015 SAGE tags were sequenced from HES3 (67,807) and HES4 (77,208) SAGE libraries. This translated into 31,852 distinct transcripts. Approximately 64.2% (20,447) of these distinct transcripts were found only once in the combined human ES SAGE library (HES3: 9,977; HES4: 10,470). This is probably indicative of the abundance of rare transcripts in human ES cells, although it is possible that some singletons might have resulted from sequencing errors or leaky transcription as a result of epigenetic dysregulation [34]. A vast majority of singletons that could be reliably assigned to UniGene clusters matched ESTs or hypothetical genes (46.2%), although we have also noted the existence of distinct transcripts that matched to genes like FOXD3 and GBX2, which have been previously identified to be important to mouse ES cells or are expressed in the ICM of mouse blastocysts. We omitted these singletons from our analysis to provide a more accurate estimation of distinct transcripts.

Few early markers of differentiation were detected in our human ES SAGE libraries. Tags for early ectodermal markers of differentiation like SOX1, NESTIN, and βIII–TUBULIN; early endodermal markers like PDX1, MIXER, and SOX17b; and mesodermal markers like cardiac ACTIN and β-GLOBIN were not detected in both SAGE libraries, indicating that contamination of our starting material with differentiated cells was indeed very low.

The Overall Transcriptome Profiles of HES3 and HES4 Are Similar

The exclusion of singletons from the combined human ES SAGE dataset left us with 11,404 distinct transcripts. Among these transcripts, 1.0% had more than 135 copies, 11.2% had between 14–135 copies, 14.1% had between 7–13 copies, and 73.7% had fewer than 6 copies. Altogether, 1,511 distinct transcripts (13.2%) could not be reliably assigned (orphan transcripts) to UniGene clusters. The remaining 9,893 distinct transcripts were matched to 12,721 UniGene clusters. Of these, 4,475 (37.6%) matched only ESTs or hypothetical genes, while 313 (2.5%) have unknown functions (Fig. 1A). A putative functional breakdown of the genes expressed in HES3 and HES4 revealed that a preponderance of the genes are involved in DNA repair, stress responses, apoptosis, cell cycle regulation, and development (Fig. 1A). Based on the presence of numerous distinct transcripts that could not be reliably assigned to UniGene clusters and the prevalence of hypothetical proteins and ESTs, we conclude that a large proportion of the mRNA species in human ES cells is likely to be novel and expressed only in ES cells or cells derived from the ICM.

thumbnail image

Figure Figure 1.. SAGE analysis of undifferentiated human ES cells.A) A pie chart depicting the percentage of distinct genes encoding proteins in various functional categories in the combined human ES SAGE libraries. B) The distribution of distinct transcripts (SAGE tags) and genes expressed in human ES cells. Numbers in parentheses represent the number of genes that can be assigned to the distinct transcripts. Figures above the values in parentheses represent the total number of distinct transcripts while those below represent transcripts with no reliable UniGene assignment. C) A scatter plot showing the comparative distribution of distinct transcripts of the two human ES SAGE libraries. Pairwise comparison was performed using the compare function of the SAGE2000 software. Tag frequencies were plotted on a logarithmic scale and p values calculated using the Z-test.

Download figure to PowerPoint

Of the 9,917 and 9,828 distinct transcripts that were identified in the HES3 and HES4 SAGE libraries, respectively, 8,341 were common to both (Fig. 1B). Moreover, most of the 3,063 transcripts that were not detected in both human ES cell lines are rare transcripts (<3 copies). With a more detailed SAGE profiling, it is likely that the majority of these transcripts would be detected in both human ES cell lines. More importantly, the vast majority of the highly expressed transcripts were expressed in both HES3 and HES4. A pairwise comparison of all the distinct transcripts found in HES3 and HES4 was also performed and the results are presented as a scatterplot (Fig. 1C). Statistical analysis using the Z-test [33] indicated that the expression levels of the majority of the distinct transcripts in HES3 and HES4 were highly correlated.

Highly Expressed Genes in Human ES Cells

The most abundant transcripts in the combined human ES SAGE libraries include many housekeeping genes important for key metabolic processes such as glycolysis, the ETS pathway, and protein synthesis, or genes that encode for cytoskeletal related proteins, transporters, and RNA processing. Furthermore, with few exceptions, human ES cells expressed these genes to a much higher extent than any other cell types. Notably, POU5F1, a POU transcription factor, and SOX2, which is important for the pluripotency of ES cells [35, 36], were the two most highly expressed transcription factors. Additional transcription factors that were highly expressed include HMGA1, ERH, and BTF3. As a result of a single nucleotide polymorphism (SNP) within the SAGE tag sequence, BTF3 has two matched tags, CTGAGACGAA and CTGAGACAAA. Table 1 lists the 30 most abundantly expressed genes in human ES cells. Emphasizing the view that the transcriptome of human ES cells is not well characterized, 6 of the top 200 most abundant transcripts had no UniGene match. Another unusual aspect of human ES cells is the high abundance of two tight junction proteins, CLDN6 and GJA1. Several cytoskeletal and actin-binding proteins like profilin, cofilin, thymosin, and a vasa-type RNA helicase, DDX5, were also very highly expressed.

Table Table 1.. The 30 most abundant transcripts expressed in human ES cells
  1. a

    Orphan tags that cannot be assigned to any UniGene clusters are listed as “No reliable UniGene match.” Ribosomal genes, mitochondrial genes, and tags that match to more than three different UniGene entries have been omitted from this list.

Tag sequenceHES3HES4TotalUniGeneDescriptionFunction
TACCATCAAT439515954169476Glyceraldehyde-3-phosphate dehydrogenaseMetabolism
    79877Myotubularin-related protein 6Signal transduction
TGTGTTGAGA501403904181165Eukaryotic translation elongation factor 1 α1Protein synthesis
TGTACCTGTA343388731334842Tubulin, alpha, ubiquitousCytoskeletal
TAGGTTGTCT243423666401448Tumor protein, translationally controlled 1Apoptosis
CCTAGCTGGA222383605401787Peptidylprolyl isomerase A (cyclophilin A)Protein modification
GAAGCAGGAC217339556180370Cofilin 1 (non-muscle)Cytoskeletal
    156814KIAA0377 gene productEST/Hypothetical
TGAAATAAAA204180384355719Nucleophosmin (numatrin)Nucleolus
TGTTCTGGAG19717337074471Gap junction protein, α1 (connexin 43)Gap junction
GGTCCAGTGT133193326181013Phosphoglycerate mutase 1 (brain)Metabolism
TCCCTATTAA147179326 No reliable UniGene matchNo UniGene match
GCATTTAAAT127186313421608Eukaryotic translation elongation factor 1 β2Protein synthesis
GGCTGGGGGC95215310408943Profilin 1Cytoskeletal
    352407Chromosome 1-amplified sequence 3EST/Hypothetical
TGGGCAAAGC126178304256184Eukaryotic translation elongation factor 1γProtein synthesis
TTGGAGATCT13117130250098NADH dehydrogenase (ubiquinone) 1α subcomplex, 4, 9kDaMetabolism
TTGGTGAAGG19110930075968Thymosin, β4, X chromosomeCytoskeletal
    426138Human promyelocytic leukemia cell mRNA, clones pHH58 and pHH81EST/Hypothetical
TCCCCGTACA20093293 No reliable UniGene matchNo UniGene match
AGCACCTCCA13914128075309Eukaryotic translation elongation factor 2Protein synthesis
TGAGGGAATA7618426083848Triosephosphate isomerase 1Metabolism
TTGGGGTTTC137119256418650Ferritin, heavy polypeptide 1Transport; iron
CTAGCCTCAC11014025014376Actin, γ1Cytoskeletal
TGTAATCAAT127119246376844Heterogeneous nuclear ribonucleoprotein A1RNA processing
GCACAAGAAG98136234289721Homo sapiens mRNA; cDNA DKFZp564D0164EST/Hypothetical
CTCCTCACCT8113221393213BCL2-antagonist/killer 1Apoptosis
    389335Ribosomal protein L13aProtein synthesis
ATTGTTTATG107101208181163High-mobility group nucleosomal binding domain 2Chromatin regulation
    380159KIAA1393 proteinEST/Hypothetical
GCCTTCCAAT7812920776053DEAD/H box polypeptide 5 (RNA helicase)RNA Processing
GGAATGTACG83124207429ATP synthase, H+ transporting, mitochondrial F0 complex, subunit c (subunit 9) isoform 3Transport; ion
ACTCCAAAAA7513120675914Coated vesicle membrane proteinTransport; protein
CTGTTGATTG11687203376844Heterogeneous nuclear ribonucleoprotein A1RNA processing
GAAACAAGAT1187619478771Phosphoglycerate kinase 1Metabolism

Overall, one of the most striking observations is the high expression level of genes that are involved in protein synthesis and mRNA processing. In particular, genes that encode for the 73 ribosomal proteins were, on average, about 3.70–8.41 times more abundant than in normal tissues like brain cortex, cerebellum, colon, kidney, stomach, and liver. Only the pancreas has a higher proportion of SAGE tags that were derived from ribosomal genes. This is indicative that the human ES cells must devote a large proportion of their cellular resources to the synthesis of proteins, which is certainly not unexpected given the rapid cellular proliferation rate of human ES cells.

Genes Differentially Expressed in HES3 and HES4

Although the general transcriptome profiles of the two human ES cell lines we profiled were similar, a number of genes were found to be differentially represented. A pairwise comparison of the HES3 and HES4 SAGE libraries (Fig. 1C) using the Z-test statistical analysis (p ≤ 0.01) and fold differences revealed 175 differentially expressed transcripts. Monte Carlo simulation gave identical results (data not shown). A list of 25 differentially expressed HES3 and HES4 genes with the greatest fold difference is presented in Table 2. Most conspicuously, the transcript for REX1 was absent in the HES4 line. SNPs and splice variants/isoforms account for some of the differences in the HES3 and HES4 SAGE transcriptomes. For example, six differentially expressed genes were found to have two assigned SAGE tags. RPS27A, NDUFB1, and BTF3 were represented by two different SAGE tags containing an SNP within each tag sequence, while the second alternative tag for TPI1, FSCN1, and SLC2A3 resulted from the expression of a second isoform in HES3. Several transcription factors, REX1, BTF3, ZFX and XBP1, were upregulated in HES3, but only CTBP1 was upregulated in HES4.

Table Table 2.. The top 25 differentially expressed transcripts in HES3 or HES4 cells showing the greatest fold difference
  1. a

    Orphan tags that cannot be assigned to any UniGene clusters are listed as “No reliable UniGene match.” A Z value of >3.30 corresponds to p value of <0.001. A tag count of 0 for a gene entry was substituted with 1 to calculate fold differences.

Tag sequenceHES3HES4TotalUniGeneDescriptionZvalueFold diff
CTGAGACGAA73174101025Basic transcription factor 38.9583.1
CTGAGACAAA15253101025Basic transcription factor 36.5545.7
TGATTTCACT1205125 Mitochondrial protein11.0427.3
CTCTGTTGAT2012183383Peroxiredoxin 44.4522.8
AAGAATTTGA16117183435NADH dehydrogenase (ubiquinone) 1β subcomplex, 1, 7kDa3.9118.2
CACGCGCTCA1511624301Polymerase (RNA) II polypeptide E, 25kDa3.7717.1
    101299Cullin 5  
GAATGAGGAC13114167791Reticulocalbin 13.4614.8
GAATCCAACT11112433328Neuronal protein 17.33.1212.5
    44143Polybromo 1  
CATTGAAGGG911079026Myeloid leukemia factor 22.7410.3
TTTGTGACTG21719343926C-terminal binding protein 13.177.5
GTCACTCATA13215285317Hypothetical protein FLJ128913.107.4
    376146Homo sapiens cDNA FLJ39106  
GTGCCCGTGC9109183848Triosephosphate isomerase 110.18103.6
TACCAATGAT0105105 No reliable UniGene match9.6192.2
AAAATTTACA2902997932Leukocyte cell-derived chemotaxin 15.7533.0
AAGAATCTGA03636183435NADH dehydrogenase (ubiquinone) 1β subcomplex, 1, 7kDa5.6231.6
TGCTCCGGGT26026 No reliable UniGene match5.4429.6
GCTGCTATTT0202023395Myosin IXA4.1917.6
AAGAGGAGAC13013284216Hypothetical protein FLJ102833.8514.8
TGAAGGATGC01616180911Ribosomal protein S4, Y-linked3.7514.1
ATGTGACTGT12012 No reliable UniGene match3.7013.7
CATCTCACTC12012118400Fascin homolog 1, actin-bundling protein3.7013.7
TTTCTTAACA11011 No reliable UniGene match3.5412.5

In contrast to HES3, the upregulated genes in HES4 included mainly ribosomal proteins, cytoskeletal proteins, and enzymes involved in metabolic pathways, which probably reflect the higher metabolic and proliferation rates of HES4. Three genes, LECT1, TGFα, and IFRD1, which are associated with differentiation, were upregulated in HES3, perhaps indicative of a small subpopulation of differentiating cells. Some of the cell line-specific differential gene expression could be attributed to different gender backgrounds. For instance, the Y-linked RPS4 was found only in HES4, while all five X-linked genes were more highly expressed in HES3. About 8.7% of the differentially expressed transcripts were ESTs or hypothetical proteins and 9.1% were orphan SAGE tags.

Genes Differentially Upregulated in Human ES Cells

To determine genes that were upregulated in ES cells, we compared the combined human ES SAGE dataset with 21 publicly available SAGE libraries from normal adult and fetal peripheral tissues and cancer tissues. Upregulated transcripts were identified based on p values (p < 0.01) and fold differences (fold difference > 4) in 21 pairwise comparisons. The 192 upregulated transcripts included known ES-specific transcription factors like POU5F1, SOX2, REX1, and NANOG as well as other less well-characterized transcription factors, hypothetical proteins, and several DNA/RNA-modifying proteins like LIN28 and DNMT3B, an embryonic DNA methyltransferase [37]. A large number of orphan SAGE tags, hypothetical genes, and ESTs were found to be abundantly expressed and highly restricted in their expression to human ES cells. A selected list of differentially upregulated transcripts is presented in Table 3.

Table Table 3.. Differentially upregulated genes in human ES cells
Tag sequenceUniGeneDescriptionFunction
GGGCTGTGAA146329CHK2 checkpoint homolog (S. pombe)Cell cycle regulation
TTAAAAGCCT348669CDC28 protein kinase regulatory subunit 1BCell cycle regulation
TGCCATCTGT23960Cyclin B1Cell cycle regulation
ACAGTGGGGA16426Podocalyxin-likeCell surface protein
TAATTCTACC75561TDGF-1 (cripto)Cell surface protein
CTTTTGCAGC56145Thymosin, beta, identified in neuroblastoma cellsCytoskeletal
GGCTGGGGGC408943Profilin 1Cytoskeletal
GAAGCAGGAC180370Cofilin 1 (non-muscle)Cytoskeletal
TAGCTACAGG251673DNA (cytosine-5-)-methyltransferase 3βDNA methylation
GGCGTGAACC78996Proliferating cell nuclear antigenDNA replication
GATGAGTACC194562Telomeric repeat binding factor (TERF-1)DNA replication
CAAATTTTAT9536Hypothetical protein FLJ10713EST/hypothetical
TGGCAGCTTT6153CGI-48 proteinEST/hypothetical
GTAGTCGATG86232GDF3 (growth and differentiation factor 3)Extracellular secreted
CACTTTGTAT180780TERA proteinFunction unknown
TGTTCTGGAG74471Gap junction protein, α1 (connexin 43)Gap junction
TTTTGTTAGT247902Claudin 6Gap junction
TGAAATAAAA355719Nucleophosmin (numatrin)Nucleolus
TACAAAACCA79110NucleolinRNA binding
GAAAGAAAGA82359TNF receptor superfamily, member 6Signal transduction
TATCAATATT7306Secreted frizzled-related protein 1Signal transduction
TACAGATCAC173859Frizzled homolog 7 (Drosophila)Signal transduction
TATCACTTTT2860POU domain, class 5, transcription factor 1, Pou5f1Transcription
TCCTCAAGAT433413Enhancer of rudimentary homolog (Drosophila)Transcription
GAGAAAACCC816SRY (sex determining region Y)-box 2, Sox2Transcription
TTTACTGCTA86154RNA-binding protein LIN-28Transcription
ATTTGTCCCA57301High-mobility group AT-hook 1Transcription
TCATAGCCCT335787Zinc finger protein 42 (Rex1)Transcription

Pairwise statistical comparisons also revealed that the medulloblastoma (886), embryonic kidney (941), and ovarian carcinoma (1,720) have the least number of differentially expressed transcripts and thus most closely resemble human ES cells. The scatterplots depicting the distribution of distinct transcripts in these three tissues and adult kidney with respect to human ES cells are shown in Figure 2. Many of the upregulated genes in the combined human ES data set were also highly represented in cancer SAGE libraries, therefore, although human ES cells do not closely resemble cancer cells in their generalized transcriptome profiles, they do appear to share certain characteristics.

thumbnail image

Figure Figure 2.. Scatter plots showing the comparative distribution of distinct transcripts in four selected tissues.The combined human ES SAGE library was compared with (A) embryonic kidney, (B) adult kidney, (C) medulloblastoma, and (D) ovarian carcinoma. Tag frequencies were plotted on a logarithmic scale and p values calculated using the Z-test.

Download figure to PowerPoint

Independent Confirmation of SAGE Expression Data by qRT-PCR

To confirm the SAGE tag frequency results, we performed qRT-PCR on total RNA derived from undifferentiated (7D) and high-density (20D) differentiated human ES cells. Genes studied were POU5F1, SOX2, REX1 HESX1, DNMT3B, ERH, STAT3, LIF, LIFR, IL6ST, AFP, BMP4, NEUROD1, and FGF4 (Table 4). While ES cell markers like HESX1, POU5F1, REX1, SOX2, and STAT3 showed a decline, there was a strong increase in the expression of AFP and NEUROD1, but not BMP4, in the differentiated human ES cells. HESX1 expression showed the greatest decline during ES cell differentiation. Interestingly, there was also a significant decrease in DNMT3B expression during human ES cell differentiation. FGF4 could not be detected in undifferentiated or differentiated human ES cells with qRT-PCR or SAGE. For LIF and LIFR, although SAGE tags were not detected, qRT-PCR indicated that both were expressed at low levels, with LIFR expression showing an increase during HES3 and HES4 differentiation. Expression data for HES3 and HES4 matched very well; overall correlation between qRT-PCR and SAGE analyses was 0.67, which is similar to that reported for mouse ES cells [26].

Table Table 4.. Real-time RT-PCR gene expression between undifferentiated and differentiated human ES cells
  1. a

    *UnDiff = undifferentiated human ES cells.Diff = differentiated human ES cells from high-density cultures undergoing spontaneous differentiation in vitro.FD = fold difference in relative mRNA levels of the target gene in undifferentiated and differentiated human ES samples calculated by the ΔΔCT method using 18S rRNA as the normalized internal standard. For genes that were not detectable in undifferentiated cells, a CT of 40 was substituted to calculate fold differences in gene expression. nd = not detected after 40 PCR cycles; tpm = tags per million.

 (tpm)(tpm)Diff FDFD
REX1162025.3(±0.05)27.7(±0.10)−4.1nd nd 
AFP1513nd 25.1(±0.04)+30,15337.7(±0.40)25.0(±0.20)+7000
NEUROD100nd 38.29(±0.3)+4.4nd 33.4(±0.10)+96.7
FGF400nd nd nd nd 

Expression of Candidate Human ES Cell-Specific Genes

We examined the expression profiles of 18 known and candidate ES-specific genes identified by our SAGE analysis by semiquantitative RT-PCR. The expression of these genes was determined in undifferentiated and differentiated human ES cells: six adult peripheral tissues and two fetal tissues (Fig. 3). Of the known ES transcription factors, POU5F1, SOX2, and REX1 were expressed only in human ES cells, while low levels of NANOG expression were detected in fetal brain and adult testis. Several new candidate human ES-specific genes such as DNMT3B, an embryonic DNA methyltransferase; LIN28, an RNA-binding protein; NPM1, a nucleolar protein; OC90, a PLA2-like protein; and FLJ14549, a germ cell Zn-finger transcription factor, were expressed only in human ES cells and showed decreased expression during ES cell differentiation.

thumbnail image

Figure Figure 3.. Gene expression of candidate human ES-specific marker genes.Transcriptional analysis of the 19 genes and ACTB, which is included as loading control, were carried out by RT-PCR with total RNA prepared from fetal brain, fetal liver, adult brain, placenta, adult testis, adult kidney, adult lung, adult heart, undifferentiated (7D) HES3 and HES4 cells, and differentiated (20D) HES3 and HES4. Input RNA amounts were controlled for all first-strand RT reactions. Ten percent of the PCR product was loaded into each lane and analyzed on a 1.5% agarose gel.

Download figure to PowerPoint

The expression of DNMT3B was further evaluated with qRT-PCR to confirm a decline during ES cell differentiation (Table 4). De novo methylation of genomic DNA is a developmentally regulated process that is believed to play a pivotal role in development, genome imprinting, and gene silencing in mammals [38, 39]. LIN28, an RNA-binding and heterochronic gene, was downregulated during ES differentiation. LIN28 is a negative regulator controlling the embryonic development of a variety of somatic cell types in many organisms [40]. Downregulation of LIN28 expression has also been associated with a progress to differentiation in embryonal carcinoma cells. Other genes, such as CLDN6, GJA1, CKS1B, ERH, and HMGA1, were expressed in some peripheral tissues, but the expression levels appeared to be much higher in human ES cells. However, no marked decline in the expression of these genes was detected during the onset of ES differentiation. Of the five transcription factors assayed by qRT-PCR (Table 4), HESX1 gene expression showed the most dramatic decline during ES differentiation. However, HESX1 was also expressed in several peripheral adult and fetal tissues.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References

We used SAGE to obtain the transcriptome profiles of the human ES cells lines HES3 and HES4. The profiles of these two human ES cell lines were largely similar. The most conspicuous difference between the two lines was the absence of REX1 expression in HES4. Taken together, we conclude that a generalized gene expression profile of the human ES cell lines can be reliably derived based on the combined HES3 and HES4 SAGE libraries. Additionally, we hypothesized that genes involved in the maintenance of pluripotency are restricted in their expression to ES cells, with low or nondetectable expression in somatic tissues. Pairwise comparisons between our human ES and publicly available SAGE data from peripheral adult tissues enabled us to identify a group of genes that were both restricted and upregulated in human ES cells. Subsequently, we used RT-PCR to confirm if the expression of these genes declined in differentiated human ES cells and evaluated the expression of these genes in eight peripheral tissues. This allowed us to detect known ES-specific genes like POU5F1, REX1, and SOX2, as well as to identify several new human ES cell marker genes.

REX1 Is not Expressed in HES4

No SAGE tag for REX1 was detected in HES4, and this was confirmed by quantitative and semiquantitative RT-PCR. It is tempting to speculate that this could account for some of the differential gene expression between HES3 and HES4. The lack of REX1 expression in HES4 is surprising because it has been serially propagated for over 100 passages and is capable of forming teratomas in SCID mice. In the mouse, Rex1 is a direct downstream target of Pou5f1 [41] and its promoter is functional in human ES cells [42]. The exact involvement of Rex1 in the self-renewal of mouse ES cells is still unclear. However, F9 cells induced to differentiate along the visceral endoderm pathway showed increased Rex1 mRNA levels and F9 Rex1−/− cells; however, they do not form primitive and visceral endoderm upon retinoic acid-induced differentiation [43]. Taken together, these findings suggest that REX1 expression may not be essential for self-renewal in human ES cells. However, we cannot rule out if REX1 has a role in the establishment of the ICM or in specific differentiation pathways. The confirmation that HES4 carries a null allele of REX1 might have practical implications on its directed differentiation into specific cell types. It would also be prudent to determine REX1 expression in the other human ES cell lines, several of which share a similar ethnic background and source with HES4.

Comparison of the Human and Mouse ES Transcriptomes by SAGE

Some basic similarities in the SAGE profiles of human and mouse ES cells exist. Highly expressed genes in both of these mammalian ES cell types include metabolic enzymes, ribosomal proteins, and cytoskeletal proteins (TUBB2, TMSB10, PFN1). However, there are significant differences between the mouse and human ES cell transcriptomes. Transcription factors with a defined role in the maintenance of pluripotency and whose expression is downregulated upon differentiation, including SOX2, HESX1, UTF1, and REX1 [18, 41, 44, 45], are consistently expressed at higher levels in human ES cells, with POU5F1 expression reaching ∼10-fold higher. In contrast, members of the leukemia inhibitory factor (LIF) signaling pathway (STAT3, LIF, LIFR, and IL6ST), FGF4, and TDGF1 are very highly expressed in mouse ES cells only. Galanin and sialoadhesin, which are highly expressed in mouse ES cells [26], are expressed at lower levels in human ES cells (Table 5). Conversely, genes that are differentially expressed in human ES cells are expressed at much lower levels in mouse ES cells. The absolute difference in the expression levels of these ES-restricted transcription factors, coupled with an inactive LIF signaling pathway, indicate there are fundamental differences in the regulatory networks that control pluripotency and self-renewal in human and mouse ES cells.

Table Table 5.. Comparison of human ES and mouse ES SAGE libraries
  1. a

    The number of SAGE tags that were reliably assigned to each UniGene entry within the respective SAGE libraries.If a gene had more than one tag, the sum of all corresponding tag frequencies is listed.

GeneUniGeneSAGE TagHES (tpm)UniGeneSAGE TagMES (tpm)

Tight quantitative regulation of Pou5f1 gene expression is essential for the maintenance of mouse ES cell pluripotency [17]. While the high expression of POU5F1 is atypical of transcription factors, its expression does not decline precipitously in differentiated human ES cells, implying that it might regulate human ES cell pluripotency through a similar mechanism. An additional implication is that downstream targets of POU5F1 should also be upregulated in human ES cells. Indeed, this is the case for H2AFZ, SOX2, REX1, RPS7, and KPNB1 [46].

Stemness Phenotype of Human ES Cells

A list of candidate human ES cell marker genes responsible for stemness in human ES cells is presented in Table 6. All of these genes were present in our list of 192 upregulated transcripts. Five of them, POU5F1, SOX2, REX1, NANOG, and FLJ10713, have been previously identified in mouse ES cells [10, 15, 20, 21, 26], and eight of them, including TGIF, TDGF1, CHEK2, GDF3, GJA1, and FLJ21837, have been identified as upregulated in a recent microarray study of the human ES cell transcriptome [47]. None of the remaining genes have been previously implicated to be important for human ES cells. These candidate human ES marker genes are either very highly expressed in human ES cells (GJA1, CLDN6, CKS1B, ERH, HMGA1) or show highly restricted expression patterns (LIN28, DNMT3B, FLJ14549, FLJ21837, TNFRSF6, NPM1, OC90). In addition, some of these new marker genes (LIN28, DNMT3B, FLJ14549, OC90, HESX1) were strongly downregulated during ES cell differentiation. Besides these known genes, we have also identified eight orphan SAGE tags that are both highly expressed and restricted to human ES cells. These genes should also prove to be extremely useful as markers for undifferentiated human ES cells.

Table Table 6.. Candidate human ES marker genes
  1. a

    Indicates genes that were detected as upregulated in our study and in Sato et al. [47].

POU5F1POU domain class 5, transcription factor 1
SOX2Sox 2
HESX1Homeobox expressed transcription factor in ES cells
REX1Zinc finger protein 42
FLJ14549Hypothetical protein FLJ 14549
TGIFTGF-β-induced factor (TALE family homeobox)
DNMT3A/BDNA (cytosine-5) methyltransferase 3α/β
LIN-28RNA-binding protein LIN-28
TNFRSF6TNF superfamily member 6
TDGF1Teratocarcinoma-derived growth factor 1
GDF3Growth differentiation factor 3
FLJ21837Hypothetical protein FLJ 21837
FLJ10713Hypothetical protein FLJ 10713
HMGA1High mobility group AT-hook 1
ERHEnhancer of rudimentary homolog
CKS1BCDC28 protein kinase regulatory subunit 1B
CHEK2CHK2 Checkpoint homolog
CLDN6Claudin 6
GJA1Connexin 43

We have also identified components of the fibroblast growth factor (FGF), transforming growth factor (TGF)-β/ bone morphogenetic protein-4, and WNT signaling pathways that are believed to be important in human ES cells. In particular, the downstream transcription factor of the WNT pathway, TCF3, the TGF-β-induced factor (TALE homeobox transcription factor), and LEFTB were highly expressed in human ES cells. Other genes believed to be important for the ES cell phenotype, such as CHEK2 and GDF3, were also detected at high levels in our SAGE data.

Besides the identification of putative transcription factors and signaling pathways that are important for the maintenance of pluripotency and self-renewal in human ES cells, a huge amount of potentially important hypothetical genes, ESTs, and novel transcripts have been uncovered. The presence of many potentially novel transcripts has partially validated our decision to rely on SAGE for the profiling of the human ES cell transcriptome. Despite past failure to identify transcripts that are exclusively restricted to ES cells, some of these orphan SAGE tags are detected only for the first time in human ES cells, indicating that ES-specific genes might exist. The next phase would be to convert these short 10-bp tags to longer cDNA sequences for gene identification purposes and the subsequent evaluation of these genes as key regulators of stemness phenotype. The identification and cloning of the large number of rare human ES cell transcripts will remain a formidable challenge. The enrichment of human ES cells, by cell lineage marking or the erasure of differentiating human ES cells, in combination with single-cell transcript analysis or a micro-cDNA libraries-based approach, may help to quickly refine and identify important human ES-specific genes [4850]. Single-cell gene expression profiling might be able to confirm if there are functional subsets of human ES cells [51].

While our results have helped to confirm many of the essential attributes of stemness proposed previously [15], we have been unable to demonstrate the involvement of certain key signaling molecules such as FGF-4 and LIF, which are central to the concept of stemness in mouse ES cells. Since these studies [14, 15, 26] have employed LIF to suppress mouse ES cell differentiation, we are inclined to believe that some of these differences might be attributed to an active LIF pathway in mouse ES cells. Nevertheless, these human ES genes that we have identified, in combination with what has been reported earlier for mouse ES cells and other adult stem cells, will remain extremely useful for the dissection of the key molecular pathways involved in the maintenance of pluripotency, self-renewal, and perhaps, even the mechanism used by human ES cells to suppress differentiation.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References

This study was supported by a grant from Embryonic Stem Cell International (ESI) Pte. Ltd.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. References