These authors contributed equally to this study.
Gene Expression Signatures of Seven Individual Human Embryonic Stem Cell Lines
Article first published online: 1 OCT 2005
Copyright © 2005 AlphaMed Press
Volume 23, Issue 9, pages 1343–1356, October 2005
How to Cite
Skottman, H., Mikkola, M., Lundin, K., Olsson, C., Strömberg, A.-M., Tuuri, T., Otonkoski, T., Hovatta, O. and Lahesmaa, R. (2005), Gene Expression Signatures of Seven Individual Human Embryonic Stem Cell Lines. STEM CELLS, 23: 1343–1356. doi: 10.1634/stemcells.2004-0341
- Issue published online: 2 JAN 2009
- Article first published online: 1 OCT 2005
- Manuscript Accepted: 6 MAY 2005
- Manuscript Received: 2 DEC 2004
- Embryonic stem cells;
- Genetic variation;
- Top of page
- Materials and Methods
- Results and Discussion
Identification of molecular components that define a pluripotent human embryonic stem cell (hESC) provides the basis for understanding the molecular mechanisms regulating the maintenance of pluripotency and induction of differentiation. We compared the gene expression profiles of seven genetically independent hESC lines with those of nonlineage-differentiated cells derived from each line. A total of 8,464 transcripts were expressed in all hESC lines. More than 45% of them have no yet-known biological function, which indicates that a high number of unknown factors contribute to hESC pluripotency. Among these 8,464 transcripts, 280 genes were specific for hESCs and 219 genes were more than twofold differentially expressed in all hESC lines compared with nonlineage-differentiated cells. They represent genes implicated in the maintenance of pluripotency and those involved in early differentiation. The chromosomal distribution of these hESC-enriched genes showed over-representation in chromosome 19 and under-representation in chromosome 18. Although the overall gene expression profiles of the seven hESC lines were markedly similar, each line also had a subset of differentially expressed genes reflecting their genetic variation and possibly preferential differentiation potential. Limited overlap between gene expression profiles illustrates the importance of cross-validation of results between different ESC lines.
- Top of page
- Materials and Methods
- Results and Discussion
Human embryonic stem cells (hESCs) are pluripotent cells that maintain their ability to self-renew and give rise to differentiated progeny representing all three embryonic germ layers [1, 2]. Differentiation of hESCs in vitro into several cell types, such as cardiac, neural, hematopoietic, pancreatic, and hepatic lineages, has been described [3–7]. Derivatives of hESCs could thus potentially be used for cell transplantation therapies in various severe degenerative diseases [8, 9].
Several genes involved in maintaining the mouse ESC characteristics have been functionally characterized, but there is still little known about the molecular control of hESC pluripotency, self-renewal, and differentiation. To understand these mechanisms, it is essential to first identify genes and gene products important for pluripotency. Comparison of the transcriptional profiles of different hESC lines may advance the identification of a core set of molecular components that define pluripotent hESCs and could provide understanding of molecular mechanisms underlying these properties.
DNA microarray analysis allows large-scale gene expression profiling of several thousands of genes, also from very limited amounts of starting material. This is especially important with hESCs due to the fact that large-scale culturing of hESCs is still difficult. High-density arrays, containing most of the known human genes as well as thousands of unknown expressed sequence tags (ESTs), are especially useful tools for studying unknown phenomena in hESCs. During recent years, the first large-scale expression profiles of hESCs have been reported using DNA microarrays [10–13], EST sequencing , serial analysis of gene expression , and massively parallel signature sequencing [16–18]. In this paper, we describe for the first time comparative large-scale transcriptional analyses of seven new hESC lines using DNA microarrays. Comparison of gene expression profiles between different hESC lines has been difficult due to the dissimilar culture conditions. To overcome this problem, we have analyzed differences and similarities in gene expression profiles of seven individual hESC lines cultured in identical conditions. It is known that gene expression profiles differ between individuals of different genetic backgrounds . Each hESC line carries a unique human genome, and fundamental characteristics of each line may be determined by the unique genome. For this reason, it is unlikely that all hESCs are identical with their gene expression profiles. In this work, a high degree of correlation of gene expression in all seven hESC lines was found, but differences in the expression patterns between individual lines were also observed. In addition, all seven hESC lines expressed a panel of specific transcripts not expressed in nonlineage-differentiated cells or in fibroblasts, potentially representing genes responsible for pluripotency.
Materials and Methods
- Top of page
- Materials and Methods
- Results and Discussion
Seven hESC lines, four (FES lines) from the University of Helsinki, Finland, and three (HS lines) from Karolinska University Hospital, Huddinge, Stockholm, Sweden, were used for analyses. The HS hESC lines (HS181, HS235, HS237) were derived and cultured on human foreskin fibroblast feeder cells (CRL-2429; American Type Culture Collection, Manassas, VA, http://www.atcc.org). Characterization of the line HS181 has been described in detail , and the lines HS235 and HS237 have been characterized in a similar way . All of these three lines express markers typical of hESCs, alkaline phosphatase, SSEA-4, TRA-1-60, TRA-1-81, and Oct-4 but are SSEA-1–negative. The cells from the line HS181 have been shown to form teratomas when injected into severe combined immunodeficiency mice, as described by Hovatta et al. . The same procedure has been applied to cell lines HS235 and HS237, and teratomas were formed. The pluripotency has also been demonstrated by in vitro differentiation of embryoid bodies (EBs) expressing markers from three embryonic layers. The karyotype of all HS cell lines is 46, XX.
Two of the four FES hESC lines (FES21 and FES22) were derived on mouse feeders but transferred to human foreskin fibroblast feeders after passage 25. Two other FES cell lines (FES29 and FES30) were derived and cultured on human foreskin fibroblasts feeder cells similarly to the HS lines. All FES lines have been characterized equally with HS lines. All of them form teratomas in mice. The karyotype of three of the FES cell lines (FES21, FES22, and FES29) is 46, XY, and one (FES30) has a karyotype 46, XX.
All seven hESC lines were cultured in identical conditions. The human foreskin fibroblasts used as feeder cells were cultured in Iscove's medium supplemented with 10% fetal bovine serum (StemCell Technologies, Vancouver, British Columbia, Canada, http://www.stemcell.com) to form a confluent monolayer and mitotically inactivated using irradiation (35 Gy) or Mitomycin C (Sigma-Aldrich, St. Louis, http://www.sigmaaldrich.com) The hESC lines were cultured on feeder cells in serum replacement (SR) medium containing knockout Dulbecco's modified Eagle's medium (DMEM) supplemented with 20% knockout SR, 2 mM L-glutamine, ×1 penicillin streptomycin, ×1 nonessential amino acids, 0.5 mM 2-mercaptoethanol, ×1 ITS (insulin, transferrin, selenite) Liquid Media Supplement (Sigma-Aldrich), and 4–8 ng/ml basic fibroblast growth factor (bFGF) (R&D Systems, Minneapolis, http://www.rndsystems.com).
EBs were prepared from all seven hESC lines. To induce formation of EBs, differentiated hESC colonies grown for 10–14 days after passaging were transferred by mechanical disaggregation. Cells were cultured in suspension in SR medium without bFGF for 10 days to form EBs. The EBs were then replated on gelatin-coated dishes to form a pool of spontaneously differentiated cells. We use the term nonlineage-differentiated cells to highlight the fact that these spontaneously differentiated cells represent a mixture of various cell types. These cells were cultured for 3 weeks before harvesting in media containing DMEM F12 supplemented with ×1 ITS (Sigma-Aldrich), 2% HEPES (Sigma-Aldrich), 2 mM L-glutamine, ×1 penicillin streptomycin, and 200 μg/ml fibronectin (Sigma-Aldrich). All of the chemicals were from Gibco-Invitrogen Corporation (Grand Island, NY, http://www.invitrogen.com) unless stated otherwise.
RNA Isolation and Microarray Studies
At the onset of RNA isolation, the hESC line HS181 was at passage levels 32 and 34, HS235 at passage levels 50 and 56, HS237 at passage levels 35 and 37, FES21 at passage levels 20 and 30, FES22 at passage levels 39 and 48, FES29 at passage levels 23 and 33, and FES30 at passage levels 25 and 35. The hESC colonies that appeared morphologically undifferentiated were microdissected for RNA isolation under a microscope to avoid harvesting any differentiated regions of the colony. The total RNA of each sample was isolated using RNeasy mini kit (Qiagen, Hilden, Germany, http://www1.qiagen.com) and SV Total RNA isolation kit (Promega, Madison, WI, http://www.promega.com). Because the possible RNA contamination from the fibroblasts in the isolated RNA could not be entirely avoided, the total RNA from human foreskin fibroblast was included in the study as a control sample. From all RNA samples, 100 ng of total RNA was used as a starting material for the microarray sample preparation. The sample preparation was performed according to the Affymetrix two-cycle GeneChip Eukaryotic small sample target labeling assay version II (Affymetrix, Santa Clara, CA, http://www.affymetrix.com). Biotin-labeled cRNA 15 μg was fragmented and hybridized to HG-U133A and HG-U133B arrays containing probes for approximately 39,000 transcripts and variants, including greater than 33,000 well-substantiated human genes. Arrays were stained and scanned according to Affymetrix protocols.
The gene transcript levels were determined from data images with algorithms in the GeneChip Microarray Suite software (Affymetrix MAS version 5.0), and further analysis of data was performed with Kensington software (InforSense, London, http://www.inforsense.com). Biological function classification of the genes was performed according to the NetAffex database (Affymetrix). At the detection level, each probe set was assigned a call of present (P), absent (A), or marginal (M). Only a gene with detection call P was considered to be expressed. The comparison level analysis of each hESC line to nonlineage-differentiated reference sample defined a gene as significantly upregulated if the average signal fold-change (AverageFC, calculated using two biological replicates) between the hESC and the reference sample was larger than 2 and gene in hESC sample was present. A gene was defined as significantly downregulated if the AverageFC was less than −2 and the gene in reference sample was present. As recommended by Affymetrix, the probe sets were excluded if the detection call for both target and reference was absent (A) or if the change call gave no change (NC, change p > .05) in comparison analysis. Only those genes that fulfilled all filtering criteria in two biological replicates were considered significant.
Validation of the Microarray Results
For validation of the microarray results with TaqMan real-time reverse transcription–polymerase chain reaction (RT-PCR), a set of 12 genes (Oct-4, NANOG, FOXD3, CDKN1C, SLC16A1, SMAD2, SMAD4, WNT5A, gp130, LIFR, GDF3, and DNMT3B) was selected. Primers and probes (Table 1) were ordered from CyberGene AB (Huddinge, Sweden, http://www.cybergene.se) and Applied Biosystems (Foster City, CA, http://www.appliedbiosystems.com). Total RNA samples (100 ng) from cells were made as described above, but cRNA from one-cycle in vitro transcription was used for cDNA synthesis made with Sensiscript reverse transcript enzyme (Qiagen). The TaqMan experiments were performed using the ABI PRISM 7700 Sequence Detection System (Applied Biosystems) as described previously . The relative levels of target mRNA expression were normalized against GAPDH (glyceraldehyde 3-phosphate dehydrogenase) expression. All measurements were performed in duplicate in two separate runs. The standard deviation of individual TaqMan measurements had to be < 0.5.
|Gene||1) 5′-Forward Primer-3′ 2) 5′-Reverse Primer-3′ 3) 5′-Probe-3′ for TaqMan|
|Oct-4||1) TCTGCAGAAAGAACTCGAGCAA 2) AGATGGTCGTTTGGCTGAACAC 3) CCTCTTCTGCTTCAGGAGCTTGGCAA|
|NANOG||1) TGCAGTTCCAGCCAAATTCTC 2) CCTAGTGGTCTGCTGTATTACATTAAGG 3) TCCAAAGCAGCCTCCAAGTCACTGG|
|WNT5A||1) TGCCAGTATCAATTCCGACATC 2) TGCATCACCCTGCCAAAA 3) ATCCACAGTGCTGCAGTTCCACCG|
|SMAD2||1) GTCGTCCATCTTGCCATTCA 2) CCAGACCCACCAGCTGACTT 3) CTTCCATCCCAGCAGTCTCTTCACAAC|
|SMAD4||1) ACAGCCATCGTTGTCCACTG 2) CTGTCGATGCACGATTACTTGG 3) AGGACATTCAATTCAAACCAT|
|FOXD3||Applied Biosystems ID: Hs00255287_s1|
|CDKN1C||Applied Biosystems ID: Hs00175938_m1|
|SLC16A1||Applied Biosystems ID: Hs00161826_m1|
|gp130||1) GCCTCAACTTGGAGCCAGATT 2) GTTTAAGGTCTTGGACAGTGAATGAAG 3) CTCCTGAAGACACAGCATCCACCCGA|
|LIFR||1) ACTGTGGAAGATATAGCTGCAGAAGA 2) CACTGTTGCTGTCTATGGATCTAGGA 3) ATAAAACTGCGGGTTACAGACCTCAGGCC|
|GDF3||1) TGCCGTTGACCCAGAGATC 2) AGAGCATGGAAATGGGAGACA 3) CCAGGCTGTGTGTATCCCCACCAAG|
|DNMT3B||1) CGAAAGGATGTTTGGCTTTCC 2) GACCTTCCCAGCAGCTTCTG 3) ACAGACGTGTCCAACATGGGCCGT|
|GAPDH||1) GTTCGACAGTCAGCCGCATC 2) GGAATTTGCCATGGGTGGA 3) ACCAGGCGCCCAATACGACCAA|
Results and Discussion
- Top of page
- Materials and Methods
- Results and Discussion
In this study, the gene expression profiles of seven independent hESC lines were assessed by comparing their expression levels to nonlineage-differentiated cells. The signal values and detection calls of this analysis are available as Supplementary Table S1 on our Web site (http://stemcells.btk.fi). Repeated biological replicate experiments for all samples showed a correlation coefficient ≥ 0.966, indicating high reproducibility of the data. We found that more than 45% and 34% of the total probe sets on the HG-U133A (22283 probe sets) and HG-U133B (22645 probe sets) arrays gave a detection call “present” for all samples, respectively. To exclude redundant genes included in HG-U133A and HG-U133B probe sets, Unigene IDs were used in analyses. Using this approach, we identified 8,464 nonredundant transcripts (13,763 probe sets) to be expressed in all seven hESC lines (overlap between hESC lines ∼80%), 10,085 nonredundant transcripts (16,749 probe sets) in nonlineage-differentiated cells, and 9,970 non-redundant transcripts (17,406 probe sets) in fibroblasts (Fig. 1). All 8,464 nonredundant transcripts expressed in all seven hESC lines are presented in Supplementary Table S2 (available at http://stemcells.btk.fi). Interestingly, 3,792 (45%) of these genes have no yet-known biological function, and further analysis of these unknown genes may reveal new mechanisms involved in the regulation of the growth of hESCs.
Gene Expression Profiles Specific for Pluripotent Stem Cells
One of our goals was to identify genes expressed in all seven hESC lines but not in nonlineage-differentiated cells, as we anticipated that these genes could be potential markers of pluripotency. Among the 8,464 genes expressed in all seven hESC lines, 970 were not expressed in nonlineage-differentiated cells. Next, we further filtered this list of genes by extracting the genes that were not expressed in fibroblasts either. As a result of this, we identified 280 genes that were not expressed either in nonlineage-differentiated cells or in fibroblasts (Supplementary Table S3, available at http://stemcells.btk.fi).
All of the hESC-specific genes were classified by biological function, and more than 40% of these had no known biological function. Functional annotation of the known genes revealed mainly genes involved in cell communication, regulation of transcription, development, and cell proliferation (Fig. 2). The list of transcriptional regulators contained many forkhead box family members such as FOXH1, FOXO1A, and FOXA3. Zinc finger proteins were also highly expressed, including ZFP42, ZNF26, ZNF165, ZNF198, ZNF493, ZNF511, ZNF577, ZNF586, ZNF589, and ZNF339. ESCs are known to proliferate continuously through a characteristic cell-cycle structure . Several cell cycle–associated genes were included in our list of hESC-specific genes, such as MYBL2, GTSE1, CDC25A, MPHOSPH9, MKI67, and CCNF, suggesting their important role in hESC proliferation.
We also studied the chromosomal distribution of the 280 genes specific for hESCs. The analysis was based on the normalization against the known number of genes in a particular chromosome (Ensembl23.34.e.1). Our results show that the distribution of genes with a known chromosomal location (262 genes) was relatively even, but, intriguingly, the highest proportion of genes was located in chromosome 19, including genes such as FOXA3, whereas only one gene (MBD2) was located in chromosome 18 (Fig. 3).
Expression of Previously Described ESC Genes
Several genes expressed in all seven lines have previously been associated with hESCs (Table 2). For example, NANOG, LEFTB, TDFG1, Oct-4, FOXH1, REX-1, and GDF3 are believed to be important for hESCs and were expressed in all of our seven lines but not in nonlineage-differentiated cells or in fibroblasts. Although many of the known ESC markers were expressed in all lines, differences in their expression levels were observed. By using real-time RT-PCR, the expression levels of NANOG, Oct-4, and GDF3 in different hESC lines were compared (Fig. 4). The results show greater than twofold difference in expression levels of these genes between individual hESC lines, with highest expression in HS235 and HS237. Interestingly, we found FOXD3, a gene critical for endodermal differentiation, to be expressed in all seven hESC lines, although Ginis et al.  reported that it was only expressed in mouse but not in human ESCs. The expression of FOXD3 in all seven hESC lines was confirmed by real-time RT-PCR (Fig. 4). Some of the putative hESC marker genes, including DNMT3B, SOX2, Lin28, and CD24, have been previously shown to be decreased during hESC differentiation [15, 16]. All of these genes were expressed in all seven lines and also in nonlineage-differentiated cells, suggesting that these genes are not downregulated immediately after induction of differentiation (Table 2). The expression of DNMT3B in all hESC lines was also confirmed using real-time RT-PCR (Fig. 4).
|Gene||Expressed in all seven hESC lines||Expressed in nonlineage-differentiated cells||Expressed in fibroblasts||Reference|
|POU5F1 (OCT-4)||Yes||No||No||[40, 41]|
|Nanos homolog 1||Yes||Yes||No|||
|connexin 43(GJA1)||Yes||Yes||Yes||[10, 55]|
There were also some hESC-associated genes, including STAT3, gp130, and LIFR, which were expressed also in fibroblasts (Table 2). To validate the expression of LIFR and gp130 in all hESC lines, real-time RT-PCR was used (Fig. 4). The expression of these genes in fibroblasts indicates that the use of this group of genes as markers in expression studies is complicated because there is a possibility that some signal may come from the contaminating feeder cells. To estimate the amount of possible fibroblast contamination in hESC samples, the expression of genes highly expressed in fibroblasts (such as fibulin2 and 5, fibrillin 1, cartilage oligomeric matrix protein, keratin 4, matrix metalloproteinase 19 and 24, and growth differentiation factor 5) was analyzed. However, none of these genes was detected in hESCs, suggesting that fibroblast contamination in the hESC samples was minimal. According to microarray results, markers for cells with differentiated phenotypes, such as HAND1, GATA6, SOX1, and AFP, were not expressed in any of the seven hESC lines but were expressed in nonlineage-differentiated cells. The absence of these differentiation markers in hESC lines indicates that also contamination from differentiated cells was very low among the hESC colonies that were collected for microarray analyses.
Differentially Expressed Genes in hESC Lines Compared with Nonlineage-Differentiated Cells
We assumed that genes implicated in the maintenance of the pluripotent state of hESCs could be significantly upregulated in all hESC lines compared with nonlineage-differentiated cells, whereas genes significantly upregulated in nonlineage-differentiated cells or genes downregulated in hESCs could be involved in early differentiation. According to the Affymetrix algorithm, among 8,464 nonredundant transcripts expressed in all seven hESC lines, 1,527 were differentially expressed (change call p < .05) and 219 (Supplementary Table S4, available at http://stemcells.btk.fi) were greater than twofold differentially expressed in all lines compared with nonlineage-differentiated cells (Fig. 1).
Among the 1,527 differentially expressed genes, 932 were upregulated and 595 were downregulated in all seven hESC lines compared with nonlineage-differentiated cells. Forty-five percent of these differentially expressed genes had no yet-known biological function, which further supports the notion that several unknown factors are responsible for the hESC pluripotent stage. Among the known genes upregulated in all hESC lines, there were mainly genes involved in cell communication, regulation of transcription, development, cell proliferation, and cell cycle (Fig. 5A). The list of downregulated genes included mainly genes involved in cell growth and/or maintenance, development, and regulation of transcription (Fig. 5B).
Among the 219 genes that were more than differentially expressed in all hESCs, 183 were upregulated and 36 were downregulated in all seven hESC lines compared with nonlineage-differentiated cells. Among the group of upregulated genes, 37 were novel (Table 3). Two of these ESTs (Hs.197683 and Hs.67624) were also identified as hESC-enriched genes by Miura et al. . As expected, the known genes included many established ESC markers such as NANOG, TDGF1, Oct-4, FOXD3, CD24, DNMT3B, and TERF1. This group of genes also included nuclear autoantigenic sperm protein (NASP) and cytochrome c (CYCS). NASP, previously known as a testis- and sperm-specific cell cycle–regulated histone H1-binding protein, is known to be expressed in mouse two-cell embryos, but its function in the early embryo is unclear . The specific expression pattern of NASP is suggestive of an important function also in hESCs. Li et al.  reported that embryonic cell lines established from early CYCS-null mouse embryos had increased sensitivity to cell death signals triggered by tumor necrosis factor. High expression in all seven hESC lines suggests that CYCS may have an important function also in hESC apoptosis-signaling pathways.
|ProbeSetID||Unigene ID||Gene title||Chromosomal location|
|210463_x_at||Hs.411456||Hypothetical protein FLJ20244||chr19p13.13|
|218564_at||Hs.77510||Hypothetical protein FLJ10520||chr16q22.3|
|219121_s_at||Hs.24743||Hypothetical protein FLJ20171||chr8q22.1|
|222728_s_at||Hs.301732||Hypothetical protein MGC5306||Chr:11q14.3|
|223178_s_at||Hs.348720||Hypothetical protein HT023||Chr:6q22.2|
|223258_s_at||Hs.79828||Hypothetical protein FLJ20333||Chr:14q12|
|223560_s_at||Hs.433466||Hypothetical protein PRO1853||Chr:2p22.2|
|224048_at||Hs.154848||Hypothetical protein DKFZp434D0127||Chr:12q23.1|
|225846_at||Hs.24743||Hypothetical protein FLJ20171||Chr:8q22.1|
|226663_at||Hs.183475||Homo sapiens clone 25061 mRNA sequence||—|
|226750_at||Hs.151973||Hypothetical protein FLJ10378||Chr:4q28.1|
|226926_at||Hs.32343||Hypothetical gene ZD52F10||Chr:19q13.11|
|227133_at||Hs.24968||Hypothetical protein BC016683||Chr:Xq22.1|
|227152_at||Hs.323822||Hypothetical protein FLJ20696||—|
|228956_at||Hs.274293||H. sapiens mRNA; cDNA DKFZp761G1111||—|
|229033_s_at||Hs.267263||Hypothetical protein FLJ22283||Chr:19p13.3|
|229518_at||Hs.59771||Hypothetical protein MGC16491||Chr:1p35.3|
|232180_at||Hs.133065||Human clone CE29 7.2 (CAC)n/(GTG)n repeat-containing||—|
|232489_at||Hs.40337||Hypothetical protein FLJ10287||Chr:1pter-q31.3|
|232774_x_at||Hs.127988||Hypothetical protein LOC284307||Chr:19q13.43|
|232985_s_at||Hs.9536||Hypothetical protein FLJ10713||Chr:3q13.13|
|235103_at||Hs.63368||H. sapiens mRNA; cDNA DKFZp686H1529||—|
|235955_at||Hs.124740||Hypothetical protein FLJ30532||Chr:5q13.1|
As expected, the genes that were downregulated or not expressed in hESCs but were expressed in differentiated cells contained many markers for differentiation, such as CRABP1, HBB, ACTA2, EDG3, and EBF, further supporting previous findings from other hESC lines. Interestingly, we found p57 (CDKN1C) to be upregulated over twofold in nonlineage-differentiated cells compared with all hESC lines. p57 is known as a negative regulator of the cell cycle [27, 28]. During mouse retinal development, p57 regulates cell-cycle exit coincident with induction of differentiation . Low expression of p57 could thus be one of the mechanisms ensuring cell-cycle progression in hESCs. We also noticed that retinoic acid–binding protein (CRABP1) was upregulated in nonlineage-differentiated cells compared with all hESC lines. CRABP1 is assumed to play an important role in retinoic acid–mediated differentiation and proliferation processes , and it is possible that it has a similar role in hESC differentiation.
To confirm the microarray results, the expression of SLC16A1 (upregulated in each cell line 2.5- to 3-fold) and CDKN1C (p57) (downregulated in each cell line 2.5- to 3-fold) was studied using real-time RT-PCR. These results showed that both of these genes were expressed in all seven hESC lines and the changes in expression levels between single cell lines were less than twofold (Fig. 4).
We also studied the chromosomal distribution of the 183 genes upregulated in all seven hESC lines to determine whether particular chromosomes have an increased representation of these genes. Our result show that the distribution of the hESC-enriched genes with known chromosome location (131 genes) is relatively even, showing concordance with results published by Brandenberger et al. . The highest proportion of hESC-enriched genes in our data are located in chromosomes 14 and 19. Among the 36 downregulated genes (33 with known chromosome location) in all hESC lines, the highest number of genes were located in chromosomes 8, 10, and 13 (Fig. 3). Recently, Draper et al.  reported amplification of chromosome 17q and 12p in karyotypic changes of hESCs, suggesting that genes located in these areas may be important for the regulation of self-renewal. Also, Miura et al.  reported some bias to chromosomes 12 and 17 in their gene expression data. Our results do not reveal over-representation of genes located in chromosome 12 or 17 (Fig. 3). Among the 8,463 genes expressed in all seven hESC lines, 328 were located in the long arm of chromosome 17 and 102 in the short arm of chromosome 12, including many hypothetical proteins and ESTs, possibly important for hESC self-renewal.
Comparison with Previously Published hESC Microarray Data
We performed a systematic comparison of our data with the available microarray data from different hESC lines [10–13]. Overall, we found a 30%–93% overlap of genes expressed in hESC lines between our study and data by others. The overlap of genes expressed in all seven hESC lines was highest (93%) with the expression data of three hESC lines by Abeyta et al.  and lowest (30%) with the analysis of Sperger et al. , who compared hESC-enriched genes with somatic cells. Approximately 200 common genes were differentially expressed in hESCs compared with differentiated cells both in our study and in the study of Sato et al.  (Table 4). Seventy-five percent (147) of these genes were also expressed in the three hESC lines reported by Abeyta et al. . However, when the 92 hESC-enriched genes reported by Bhattacharya et al.  are added to the comparison, only 11 common genes are left. The concordance of data between experiments is thus highly variable. The limited overlap of gene expression results from comparison studies may be partly explained by the variety of culture conditions, microarray platforms, and control cells used in comparison, but true genetic variation of hESC lines is also apparent. Without doubt, gene expression profiling offers an important tool for revealing the molecular basis of pluripotency, but the limited overlap between studies emphasizes the importance of methodological standardization and cross-validation of results between different hESC lines.
|Genes shared with data of Sato et al. ||TRA1, HMGB1, HSPA9B, SCD, NOL5A, CKB, FKBP4, PPT1, AARS, PAICS, EIF1A, SSB, OAZ2, PHGDH, SOX4, TALDO1, DKC1, CTSC, SNRPN, PODXL, FXR1, GJA1, VDP, MYO10, SIAT1, ALDH3A2, ADSL, SLC16A1, HFD1, UNG, KEAP1, CRMP1, SNRPD1, CPSF5, CKMT1, HPRT1, RARG-1, MAP7, NDUFV2, IMPA2, TRIM14, APRT, JMJ, TGIF, CBF2, CCNA2, TERF1, COX11, KIAA0020, CGI-48, KOC1, SLC31A1, SYT1, TPST2, ZNF195, ASK, PIM2, PMAIP1, FGF2, GAP43, AND-1, KIAA0237, NUDT1, BMPR1A, GLDC, GMDS, GPC4, PRIM1, ADD2, TMSNB, CRABP1, DBT, CHEK1, MRE11A, MJD, MPP6, UGP2, DIAPH2, ZFD25, POLE2, GTF2E1, SNRPA1, KIAA0186, GRB14, TDGF1, LECT1, OGT, RBPMS, POU5F1, GCDH, LRP8, RRAS2, ATP6V1C2, ADPRT, TKT, ATIC, CLU, HSPA4, THY1, DD5, C1QBP, ILF3, SPS, DNAJB6, PRKCBP1, SART3, KIF1B, PPM1B, PWP2H, C20orf104, PPAT, RRS1, HRASLS3, PNMA2, ABCB7, PAI-RBP1, M96, FRAT, AASS, DLAT, CDK7, FGFR1, EIF4B, NOLC1, KPNB3, DKFZP564M182, MDN1, PCCB, M9, PAPOLA, CBS, KIAA0406, FLJ10719, PFAS, PRMT3, IPW, PDCD2, SFRS7, KIAA0471, MBD2, TRN-SR, MCM5, NS, FLJ12666, FLJ20758, MRPL16, RPC5, MRPS34, NIF3L1, NUP54, SAV1, MRPL48, PHC1, FLJ10036, Jade-1, MRS2L, FLJ10652, PUS1, VAV3, FLJ10781, C20orf6, SIRT1, FLJ23468, FLJ20485, MGC5528, FLJ21901, C21orf45, DXS9879E, OSBPL10, FLJ20171, RDGBB, FLJ22555, BRIX, HSP70-4, ZNF463, FLJ12439, FLJ20105, FLJ10713, LIN-28, DDX25, FLJ10884, SZF1, ANKT, FLJ22408, HELLS, EPB41L4B, FLJ12581, SBBI26, FLJ23392, C14orf115, FLJ10462, TEAD4, E2IG2, DNMT3B, DKFZP434E2135, DC50, GTF2H2, MGC2477, ASC, FLJ10439, NEFL, NLGN, SNX5 + 2 ESTs|
|Genes shared with data of Sato et al.  and Abeyta et al. ||TRA1, SCD, NOL5A, FKBP4, PPT1, OAZ2, TALDO1, DKC1, CTSC, PODXL, GJA1, MYO10, ADSL, SLC16A1, UNG, CRMP1, SNRPD1, RARG-1, APRT, JMJ, TGIF, CBF2, TERF1l, COX11, CGI-48, ZNF195, ASK, GAP43, KIAA0237, BMPR1A, GLDC, GPC4, PRIM1, TMSNB, CRABP1, CHEK1, UGP2, DIAPH2, GTF2E1, SNRPA1, KIAA0186, GRB14, TDGF1, LECT1, ATP6V1C2, ADPRT, TKT, CLU, HSPA4, THY1, DD5, C1QBP, SPS, KIF1B, PPM1B, RBPMS, HRASLS3, PNMA2, ABCB7, M96, FRAT2, POU5F1, PAI-RBP1, FGFR1, RRAS2, PCCB, M9, CBS, KIAA0406, FLJ10719, PFAS, PDCD2, MBD2, MCM5, HMGB1, RPC5, NUP54, SAV1, FLJ10652, FLJ10781, SIRT1, OSBPL10, RDGBB, FLJ22555, BRIX, HSP70-4, FLJ10884, SZF1, SBBI26, E2IG2, DKFZP434E2135, MGC2477, FLJ10439, PAICS, PHGDH, ALDH3A2, MTHFD1, CPSF5, CKMT1, NDUFV2, TPST2, POLE2, GCDH, C20orf104, RRS1, EIF4B, NOLC1, KPNB3, DKFZP564M182, HSPA9B, NS, MRPL16, MRPS34, MRPL48, Jade-1, MRS2L, VAV3, FLJ20485, C21orf45, FLJ10713, LIN-28, DDX25, EPB41L4B, DNMT3B, DC50, GTF2H2, CKB, AARS, EIF1A, SOX4, SNRPN, VDP, HPRT1, IMPA2, KIAA0020, NUDT1, ATIC, DNAJB6, CDK7, IPW, FLJ20758, NIF3L1, FLJ23468, DXS9879E, TEAD4 + 2 ESTs|
|Genes shared with data of Sato et al. , Abeyta et al. , and Bhattacharya et al. ||PODXL, GJA1, MTHFD1, CRABP1, TDGF1, POU5F1, HSPA4, SPS, NS, BRIX, LIN-28|
The Expression of Genes Related to Cell Signaling
Leukemia inhibitor factor (LIF) is required to maintain pluripotency of mouse ESCs . However, hESCs seem to lack this response to LIF [1, 2]. Many of the components of LIF signaling, such as STAT3 and LIR receptors LIFR and gp130, were expressed in all of our hESC lines, although opposite results have been reported on other lines [24, 33]. In our study, the expression of LIFR and gp130 was lower (average, FC-1.6 and -3.2, respectively) in all hESC lines compared with nonlineage-differentiated cells, suggesting that the expression of LIF receptors is upregulated during differentiation. The expression of LIFR and gp130 in all hESC lines was confirmed by real-time RT-PCR (Fig. 4). Instead of LIF, the serum-free medium requires supplementation with bFGF to prevent differentiation of hESCs . Among the FGF signaling-related genes, all FGF receptors (FGFR1–4) were expressed, and FGF2 as well as FGFR1 was expressed at a higher level in all seven lines compared with the differentiated cells.
Also, Wnt signaling has been implicated in the self-renewal of hESCs . More than 30 genes related to Wnt signaling were expressed in all of our seven hESC lines. These included WNT5A and WNT6, which have previously been found only in differentiated cells and not in undifferentiated hESCs . We confirmed our results by studying the expression of WNT5A by real-time RT-PCR, which showed WNT5A to be expressed in all seven hESC lines, with highest level in FES21 line (Fig. 4). It has been suggested that activation of the canonical Wnt pathway by inactivation of GSK3 is sufficient to maintain self-renewal of hESC . In accordance with this, we found that GSK3 was upregulated (≥ 1.5-fold) in nonlineage-differentiated cells compared with all seven undifferentiated hESC lines.
Transforming growth factor (TGF)-β signaling pathways are likely to be critical for the maintenance of the undifferentiated hESCs [12, 14]. According to our results, more than 20 TGF-β signaling–related genes were expressed in all seven hESC lines. The expression of SMAD1, SMAD2, SMAD4, SMAD5, SMAD7, DRAP1, LEFTB (antagonist for Nodal signaling), ACVR1B, ACVR2B, and NODAL was detected in all hESC lines, and the expression of FOXH1 and TDGF1 (regulators of Nodal signaling) was increased in all hESC lines compared with nonlineage-differentiated cells. Because of the previous findings that SMAD2 and SMAD4 are expressed only in differentiated cells and not in hESCs , we decided to confirm their expression by real-time RT-PCR. The results showed that both of these genes were expressed in all hESC lines, although there were quantitative differences between the lines (Fig. 4). In summary, our results show that hESCs express both agonists and antagonists of the Nodal pathway, supporting the idea that Nodal signaling pathways are tightly controlled to allow the growth of hESCs in the undifferentiated state.
Influence of Genetic Background on Gene Expression Profiles of hESCs
A high degree of correlation in gene expression of all seven hESC lines was confirmed by hierarchical clustering of the 1,527 differentially expressed genes in all hESCs compared with nonlineage-differentiated cells (data not shown). Interestingly, this analysis showed that the four Finnish FES cell lines (FES21, FES22, FES29, and FES30) clustered more closely together than the three Swedish HS cell lines (HS181, HS235, and HS237). A possible sex effect is unlikely because of the fact that only 12 out of the 9,229 genes expressed in all XY karyotype lines have a known location in chromosome Y, suggesting that genes located in Y chromosome do not have a major influence in this analysis. We also used hierarchical clustering to analyze whether hESC lines (FES21 and FES22) originally derived on mouse feeder cells clustered more closely together than other lines, but the results (Fig. 6 and data not shown) clearly demonstrated that this was not the case.
It has been shown that the genetic background of unrelated individuals causes variance in tissue gene expression levels . It is possible that the FES lines are genetically more closely related with each other due to the relative genetic homogeneity of the Finnish population. To further investigate this, we used Student's t-test as a statistical tool to find out whether the 8,464 nonredundant transcripts expressed in all hESC lines were expressed at different levels between HS and FES lines. Indeed, we found a significantly (p < .05) different overall expression level between the groups. This result suggests that although a high degree of overall correlation was observed in the qualitative gene expression analysis between hESC lines and nonlineage-differentiated cells, there are clear quantitative differences in expression levels when two genetically independent groups of hESC lines are compared.
Unique Expression Signatures of Single hESC Lines
We further investigated differences in gene expression patterns by single hESC lines. Among genes represented in the microarrays used, the number of genes expressed by the line HS181 was the highest and that of HS237 was the lowest (Fig. 6A). The gene expression profiles of single-cell lines were compared using Venn diagrams (Fig. 6B). These illustrations demonstrate that the lines HS181 and FES21 have more unique genes expressed than the other lines. A list of the 10 most highly expressed genes that were not expressed in other lines is presented in Table 5. Interestingly, none of the myosin heavy chain (MHC) class I or II genes was found among these, indicating that hESC lines cannot be distinguished based on MHC gene expression. Next, we compared the gene expression levels of the individual lines against the nonlineage-differentiated cells. The results show that HS235 cells have a higher number of greater than twofold upregulated and downregulated genes than the other lines (Fig. 6C).
|hESC line||Unigene ID||Gene title|
|HS181||Hs.83623||MRNA; cDNA DKFZp686K10163 (from clone DKFZp686K10163)|
|Hs.195161||Nuclear receptor subfamily 6, group A, member 1|
|Hs.165263||Polyhomeotic-like 2 (Drosophila)|
|Hs.288742||Hypothetical protein FLJ32001|
|Hs.230501||Carboxypeptidase X (M14 family)|
|Hs.7967||G protein–coupled receptor 153|
|Hs.20930||Poly(rC) binding protein 4|
|Hs.81994||Glycophorin C (Gerbich blood group)|
|Hs.283132||Hypothetical protein LOC221143|
|HS235||Hs.386685||Chromosome 21 open reading frame 105|
|Hs.68879||Bone morphogenetic protein 4|
|Hs.75678||FBJ murine osteosarcoma viral oncogene homologue B|
|Hs.16611||Tumor protein D52-like 1|
|Hs.432360||Sodium channel modifier 1|
|Hs.343586||Zinc finger protein 36, C3H type, homologue (mouse)|
|Hs.94672||Biogenesis of lysosome-related organelles complex-1, subunit 1|
|Hs.285313||Core promoter element binding protein|
|HS237||Hs.445818||Transcribed sequence with moderate similarity to protein sp: P39194 (Homo sapiens)|
|Hs.94392||Zinc finger protein 580|
|Hs.107911||ATP-binding cassette, subfamily B (MDR/TAP), member 6|
|Hs.520970||Transcribed sequence with weak similarity to protein ref: NP_071431.1 (H. sapiens)|
|Hs.166244||Hypothetical protein MGC2494|
|Hs.128738||Absent in melanoma 1-like|
|Hs.408730||Contactin-associated protein 1|
|Hs.507978||Hypothetical protein LOC375035|
|Hs.278441||Protein phosphatase 1F (PP2C domain containing)|
|Hs.233955||Ras-interacting protein 1|
|FES21||Hs.439141||Gamma-aminobutyric acid (GABA) A receptor, pi|
|Hs.36563||Immune costimulatory protein B7-H4|
|Hs.75431||Fibrinogen, gamma polypeptide|
|Hs.512708||Transglutaminase 2 (C polypeptide, protein-glutamine-gamma-glutamyltransferase)|
|Hs.902||Neurofibromin 2 (bilateral acoustic neuroma)|
|Hs.432613||Leishmanolysin-like (metallopeptidase M8 family)|
|Hs.91546||Cytochrome P450, family 26, subfamily B, polypeptide 1|
|Hs.169274||Interferon-induced protein with tetratricopeptide repeats 2|
|Hs.352240||Hypothetical protein MGC15523|
|Hs.250894||T-cell leukemia translocation-altered gene|
|Hs.153610||Regulating synaptic membrane exocytosis 2|
|Hs.325309||Chromosome 9 open reading frame 54|
|Hs.201372||Transcribed sequence with moderate similarity to protein sp: P39188 (H. sapiens)|
|Hs.233634||Chromosome 20 open reading frame 39|
|Hs.79769||Protocadherin 1 (cadherin-like 1)|
|Hs.21639||Aortic preferentially expressed protein 1|
|FES29||Hs.75139||ADP-ribosylation factor interacting protein 2 (arfaptin 2)|
|Hs.458657||Neurofilament 3 (150-kDa medium)|
|Hs.408702||Hypothetical protein FLJ13154|
|Hs.6111||Aryl-hydrocarbon receptor nuclear translocator 2|
|Hs.373498||Solute carrier family 22 (organic cation transporter), member 17|
|Hs.444172||Tumor necrosis factor receptor–associated factor 6|
|Hs.65734||Aryl hydrocarbon receptor nuclear translocator-like|
|Hs.187802||Transcribed sequence with weak similarity to protein ref: NP_060312.1 (H. sapiens)|
|FES30||Hs.150540||Hypothetical protein BC002942|
|Hs.23019||Zinc finger protein 16 (KOX 9)|
|Hs.105606||Hypothetical protein FLJ20512|
|Hs.14070||Hypothetical protein FLJ14166|
|Hs.155048||Lutheran blood group (Auberger b antigen included)|
|Hs.348012||LYST-interacting protein LIP8|
|Hs.318529||Hypothetical protein FLJ37478|
|Hs.442530||Thromboxane A2 receptor|
Finally, we made comparative analyses of gene expression levels between single cell lines derived in the same laboratory. We identified the genes expressed in all four FES lines, and FES21 expressed 440, 396, and 470 genes with greater than twofold difference compared with FES22, FES29, or FES30 cell lines, respectively. Some of these genes showed higher than sixfold changes in expression levels, indicating substantial variation between the analyzed cell lines. For example, interleukin 8 and cxcl-1, both members of the CXC chemokine family, were clearly upregulated in the FES21 cell line compared with other FES lines. Similar analyses showed that among the genes expressed in all three HS lines, the HS235 line expressed 647 and 469 genes with greater than twofold differences compared with HS181 or HS237 lines, respectively. The list of greater than twofold upregulated genes in the HS235 line contains genes involved in cell differentiation, such as EGR3 and EGR4, which could be associated with the fact that this cell line has a tendency to differentiate more easily than HS181 and HS237 lines. These results clearly show that in addition to qualitative differences in genes expressed among the seven hESC lines (Figs. 6A, 6B), there is significant quantitative variability in the gene expression levels.
In summary, a high correlation between gene expression profiles of seven hESC lines was found, although the expression level of genes varied between lines. The systematic differences between HS and FES lines might be due to local methodological differences, but the fact that all hESC lines were cultured in similar conditions with the same type of feeder cells decreases the likelihood of this possibility. The individual differences between hESC lines may reflect preferential spontaneous early differentiation, because hESC colonies usually contain some differentiating cells. On the other hand, these genes can also be regarded as the fingerprint of a certain cell line. It is obvious that these genes do not represent important regulators of pluripotency because all of our lines are capable of differentiation into multiple cell lineages. Although data on genes expressed by hESC have accumulated, it is obvious that all important genes involved in hESC characteristics have not been identified yet and that unknown pathways regulating hESC pluripotency are likely to exist. The database on functionally poorly characterized hESC genes identified in this study provides an important resource for future studies in this field.
- Top of page
- Materials and Methods
- Results and Discussion
We thank Miina Miller for technical advice on Affymetrix technology, Tuomas Nikula for advice and expertise on Kensington software, and the Finnish DNA Microarray Center at Turku Centre for Biotechnology for providing facilities for the microarray analysis. We thank the personnel of the IVF Unit of Karolinska University Hospital, Huddinge, for their support in stem cell research. These studies were supported by grants from the Academy of Finland, Finnish Cultural Foundation, NorFA, Swedish Research Council, Sigrid Jusélius Foundation, and the Juvenile Diabetes Research Foundation International (JDRFI).
The authors indicate no potential conflicts of interest.
- Top of page
- Materials and Methods
- Results and Discussion