Discovery of Consensus Gene Signature and Intermodular Connectivity Defining Self-Renewal of Human Embryonic Stem Cells


  • Jeffrey J. Kim,

    1. Laboratory of Stem Cell and Cancer Epigenetic Research and Dental Research Institute, UCLA, Los Angeles, California, USA
    Search for more papers by this author
  • Omar Khalid,

    1. Laboratory of Stem Cell and Cancer Epigenetic Research and Dental Research Institute, UCLA, Los Angeles, California, USA
    Search for more papers by this author
  • AmirHosien Namazi,

    1. Laboratory of Stem Cell and Cancer Epigenetic Research and Dental Research Institute, UCLA, Los Angeles, California, USA
    Search for more papers by this author
  • Thanh G. Tu,

    1. Laboratory of Stem Cell and Cancer Epigenetic Research and Dental Research Institute, UCLA, Los Angeles, California, USA
    Search for more papers by this author
  • Omid Elie,

    1. Laboratory of Stem Cell and Cancer Epigenetic Research and Dental Research Institute, UCLA, Los Angeles, California, USA
    Search for more papers by this author
  • Connie Lee,

    1. Laboratory of Stem Cell and Cancer Epigenetic Research and Dental Research Institute, UCLA, Los Angeles, California, USA
    Search for more papers by this author
  • Yong Kim

    Corresponding author
    1. Laboratory of Stem Cell and Cancer Epigenetic Research and Dental Research Institute, UCLA, Los Angeles, California, USA
    2. UCLA's Jonsson Comprehensive Cancer Center, Los Angeles, California, USA
    • Correspondence: Yong Kim, Ph.D., Laboratory of Stem Cell and Cancer Epigenetic Research and Dental Research Institute, UCLA, CHS 73-041, Los Angeles, California 90095, USA. Email:

    Search for more papers by this author


Molecular markers defining self-renewing pluripotent embryonic stem cells (ESCs) have been identified by relative comparisons between undifferentiated and differentiated cells. Most of analysis has been done under a specific differentiation condition that may present significantly different molecular changes over others. Therefore, it is currently unclear if there are true consensus markers defining undifferentiated human ESCs (hESCs). To identify a set of key genes consistently altered during differentiation of hESCs regardless of differentiation conditions, we have performed microarray analysis on undifferentiated hESCs (H1 and H9) and differentiated EBs and validated our results using publicly available expression array datasets. We constructed consensus modules by Weighted Gene Coexpression Network Analysis and discovered novel markers that are consistently present in undifferentiated hESCs under various differentiation conditions. We have validated top markers (downregulated: LCK, KLKB1, and SLC7A3; upregulated: RhoJ, Zeb2, and Adam12) upon differentiation. Functional validation analysis of LCK in self-renewal of hESCs using LCK inhibitor or gene silencing with siLCK resulted in a loss of undifferentiation characteristics—morphological change, reduced alkaline phosphatase activity, and pluripotency gene expression, demonstrating a potential functional role of LCK in self-renewal of hESCs. We have designated hESC markers to interactive networks in the genome, identifying possible interacting partners and showing how new markers relate to each other. Furthermore, comparison of these datasets with available datasets from induced pluripotent stem cells (iPSCs) revealed that the level of these newly identified markers was correlated to the establishment of iPSCs, which may imply a potential role of these markers in gaining of cellular potency. Stem Cells 2014;32:1468–1479


Embryonic stem cells (ESCs) possess infinite self-renewal capacity and pluripotent capability to differentiate into any cell type in the human body. ESCs hold a great promise for regenerative therapies in a wide range of human diseases [1-3]. However, one of the major obstacles in stem cell-based therapies is inducing stem cells to take on a specific form or to differentiate into a particular cell type [4].

Currently, when ESCs are cultured in a defined media, they are not homogenously self-renewing stem cells [5-8]. They are composed of heterogeneous cells with gradients in their differentiation status. Moreover, researchers and clinicians have concerns that some of transplanted stem cells will remain in their pluripotent state, which could potentially grow to form teratoma in patients. Thus, in order for ESCs to be useful in a clinical setting, it is essential to identify these guiding signals and understand how they regulate self-renewal and differentiation.

In 2006, Shinya Yamanaka made an important advancement in stem cell research by discovering four key stemness factors essential for self-renewing abilities and developing induced pluripotent stem cells (iPSCs) by transducing mouse fibroblasts with Oct-3/4, Sox2, c-Myc, and Klf4 [9]. Patient-specific iPSCs have great potential therapeutic merits without the controversial ethical issue of ESCs. Over the years, extensive efforts have contributed to the development and refinement of iPSC reprogramming, which mostly benefited by advances in our knowledge on the molecular mechanisms underlying reprogramming process. However, we are still lack of detailed knowledge on the reprogramming process and also many reports have demonstrated potential discrepancy between ESCs and iPSCs at molecular levels [10-12].

We understand that there are certain key genes within stemness, yet we still do not know all the genes and pathways implicated in this process. Several groups have examined stem cell differentiation and performed whole-genome microarray analysis [13-18]. We have examined these datasets and assessed if there are certain genes that are consistently altered upon human ESC (hESC) differentiation. We used an R package known as weighted gene coexpression network analysis (WGCNA) that has the power to analyze all this information and determine clusters of genes known as modules that behave the same across the different datasets [19].

Here, we show the list of conserved genes that play a crucial role in the identity of self-renewal from 19 different microarray data [13-18]. The top candidates were functionally annotated by DAVID (database for annotation, visualization and integrated discovery) analysis [20] and validated by qRT-PCR. Identifying these genes can be helpful to researchers to identify how similar their systems are to hESCs. For example, we constructed a consensus list from five microarrays of iPSCs and compared those genes to the conserved gene list in hESCs. We found that 46% of genes were in common between the conserved lists for hESCs and iPSCs including the genes we filtered and considered the “top” genes from this list. Other groups can similarly use WGCNA to develop identifying signatures for their system and compare to the gene lists presented in this work to potentially assess the level of stemness in their system.

Materials and Methods

Cell Culture

Early passage hESC lines H1, H9, and UCLA6 (typically passage 34–36) were obtained from UCLA Stem Cell Core Facility and maintained on mouse embryonic fibroblast feeders in hESC growth medium (hESGM: Dulbecco's modified Eagle's medium [DMEM]/F-12 supplemented with 20% knockout serum replacer, 1% nonessential amino acids, 1 mM l-glutamine, 0.1 mM betamercaptoethanol, and 4 ng/ml basic fibroblast growth factor [bFGF]). Cells were transferred to Matrigel coated dishes (BD Biosciences, San Jose, CA) and grown in mTeSR1 (Stem Cell Technologies, Vancouver, Canada) using standard conditions [21-23]. For embryoid body (EB) formation, cells were dissociated with Accutase (Stem Cell Technologies, Vancouver, Canada) and plated on ultra low attachment plates (Corning Life Science, Tewksbury, MA) in hESGM (without bFGF) with 10 µM Y-27632 (Chemdea, Ridgewood, NJ) for designated period of time. For adherent differentiation, cells were treated with human differentiation medium (DMEM/F-12 supplemented with 20% fetal bovine serum (FBS), 1% glutamate, 1% pyruvate, and 1% non essential amino Acids) for 24 hours.

Immunofluorescence Analysis

Cells were fixed in 100% methanol for 15 minutes at room temperature. For staining, samples were permeabilized for 15 minutes in freshly prepared Immunofluorescence (IF) staining solution [phosphate buffered saline (PBS) containing 0.02% saponin, 1% bovine serum albumin, 0.05% sodium azide, and 0.2% Triton X-100]. Samples were incubated in a 37°C water bath for 1 hour with 1 µg/ml of primary antibody diluted in IF solution. Samples were transferred to a 1:500 dilution of Goat anti-mouse IgG Rhodamine (Thermo Scientific, Waltham, MA) in IF solution and incubated for in 37°C water bath for 1 hour. Processed were mounted on a glass slide with mounting medium with DAPI (Vectashield, Burlingame, CA) and visualized with an inverted light microscope (Olympus IX81 and CellSens Dimension software, Center Valley, PA).

Silencing of LCK in hESCs

Exponentially growing hESCs were collected using Accutase (Stem Cell Technologies, Vancouver, Canada) and seeded on either coverslips or six-well plates coated with Matrigel (BD Biosciences, San Jose, CA). Cells were cultured in mTeSR1 medium (Stem Cell Technologies, Vancouver, Canada) supplemented with 10 µM Y-27632 ROCK inhibitor (Chemdea, Ridgewood, NJ) for 24 hours. On the day of transfection, cells were fed with fresh medium without ROCK inhibitor and transfected with 25 nM ON-TARGETplus human LCK siRNA SMARTpool or nontarget scrambled siRNA control (Thermo Fisher Scientific, Pittsburgh, PA) using DharmaFECT1 (Thermo Fisher Scientific, Pittsburgh, PA). After 48 hours transfection, cells were collected for RNA isolation or fixed with 4% paraformaldehyde/PBS for IF analysis.

Bioinformatics Analysis

WGCNA was performed by following the tutorial written by Langfelder and Horvath [19, 24]. CEL files (raw data from the UCLA microarray core) were normalized to Affymetrix internal controls by creating MAS5 files using publically available R code from MIT ( We preprocessed the MAS5 files to check for excessive missing data values and to identify outliers in the samples according to the WGCNA manual [19, 24]. All data files have been deposited at the gene expression omnibus (GEO) repository (GSE54186). Unsigned coexpression networks were built using the WGCNA package in R software. Clusters of genes that behaved similarly were grouped together into different color modules. These modules were related to specific traits. In heatmaps, red represents genes upregulated within that dataset and green represents genes downregulated within that dataset. The top 1,000 connections within a gene network were determined by WGCNA. For the multiple array consensus analysis, we performed WGCNA on individual dataset first as suggested by the Langfelder and Horvath's tutorial. Using “1 step function for network construction and detection of consensus modules,” we chose a default WGCNA soft thresholding power β which coexpression was raised to calculate adjacency of each dataset. The soft thresholding power β allowed us to compare each dataset by approximate scale-free topology to compensate for scale differences between datasets (consensus normalization) [19, 24].

Cytoscape 2.8.3 was used in making the network and determining the top hub genes within the network [25]. For the interconnectivity graphs, we used spring embedded and circular layouts. We installed CytoHubba plugin for more advance network analysis features. The significant modules were annotated by DAVID Functional Annotation Bioinformatics Microarray Tools [26]. The differentially expressed gene list was uploaded onto “Enter the gene list” window in DAVID. Official gene symbol was selected as “Identifier.” “Gene list” was selected as List type. After the list was submitted and analyzed, “Pathways” was selected from the Annotation summary results. Under Pathways KEGG_Pathway was selected. To generate HIVE Plot, HIVE 0.0.11 was downloaded and ran on Mac OS X 10.7.4. Edges and nodes were normalized to uploaded data.


Construction of hESC Network Undergoing Differentiation

Using H1 and H9 hESC lines, we performed gene expression microarray analysis (Affymetrix Human Genome U133 Plus 2.0 Array) and analyzed the interaction patterns of 54,614 genes using WGCNA in R programming (Fig. 1A). Microarray data are more completely represented by considering the relationships between measured transcripts, which can be assessed by pairwise correlations between gene expression profiles. WGCNA identifies gene modules, and finally uses intramodular connectivity, gene significance, and gene ontology (GO) information to identify key genes in the pathways for further validation. Instead of relating thousands of genes, it focuses on the relationship between a few modules and the phenotypic outcome. Modules are constructed without regard to biological status. Because the modules may correspond to biological pathways, focusing the analysis on intramodular hub genes (or the module eigengenes) amounts to a biologically motivated data reduction scheme [19, 27].

Figure 1.

Construction of consensus stem cell network in human embryonic stem cells (hESCs). (A): To identify consensus gene signatures that are coherently associated with self-renewing hESCs, we have performed gene expression microarray analysis with undifferentiated H1 and H9 hESCs and their differentiated EBs. The result was further analyzed by weighted gene coexpression network analysis (WGCNA) analysis to identify coexpression modules. (B): WGCNA identified differentiation-specific modules and cell-line-specific module and their corresponding heatmaps are shown (the green shows downregulation and the red shows upregulation). (C): Dot plots of seven top modules showed that genes in each module are strongly correlated each other in their potential biological functions. (D): We then further correlated our own dataset to publicly available datasets from gene expression omnibus [9-14] as described in Materials and Methods to identify consensus modules defining self-renewal under various differentiation settings. (E): Statistically the turquoise module includes genes significantly downregulated upon differentiation and the blue module includes genes upregulated upon differentiation. (F): DAVID analysis was performed to annotate the genes in the turquois and the blue modules from (D). Abbreviation: EB, embryoid body.

We found six modules that showed changes in expression upon differentiation, which were independent of cell lines (H1 vs. H9) (turquoise, brown, blue, green-yellow, violet, and magenta) (Fig. 1B). We also found two modules that showed cell line-specific expression changes upon differentiation (bottom panel in Fig. 1B). Number of modules and genes within the modules suggest that H1 and H9 hESCs show similar gene signature profiles; however, there are small subsets of genes that behave differently upon differentiation, which could be gender-specific genetic differences (H1-male and H9-female). This needs to be further validated with several other gender-specific hESC lines.

To functionally annotate the different modules, seven top statistically significant modules were analyzed using GO terms in R programming. GO result is represented in a dot plot graph (Fig. 1C). We found that significant numbers of differentiation-related genes were involved in repeating structure pattern binding, polysaccharide binding, RNA binding, and structural constituent of ribosome. We also found that R-SMAD binding, tyrosine and protein phosphatase activity, and histone acetyl-lysine binding were different between H1 and H9 hESCs upon differentiation (Fig. 1C).

Identifying Consensus Modules in hESC Undergoing Differentiation

There are certain well-characterized molecular markers that define self-renewing undifferentiated stem cells; however, many of genes that are involved in this process are not yet known. We hypothesized that there are molecular markers defining self-renewing stem cells that can be determined by examining hESCs under undifferentiating and differentiating conditions. To determine these molecular markers, we compared our own microarray results to 19 different publicly available microarray data in which other labs differentiated hESCs (including H1, H9, and T3ES) to mesodermal, neuronal, hepatic lineages, and other lineages [13-18]. Those datasets were all done using the same platform GPL570 (Affymetrix Human Genome U133 Plus 2.0 Array). We examined those datasets with WGCNA and identified consensus modules across the different datasets that represent unique gene signatures that are pertinent to self-renewal and differentiation (Fig. 1D) [13-18]. These consensus modules as well as the correlation of each trait (differentiated or undifferentiated) are shown in Figure 1D. The top consensus modules based on significance and gene number were found and are represented by the colors of turquoise and blue (Fig. 1D, black box). Heatmaps across datasets of turquoise and blue consensus modules are shown in Figure 1E. Consensus turquoise module represents an exclusive set of genes from all 19 microarrays that showed decrease in their expression as different types of stem cells differentiated into various cell types. Consensus blue module contains a list of upregulated genes upon stem cell differentiation (Fig. 1E). Furthermore, we performed DAVID analysis on the genes within the turquoise and the blue module (Fig. 1F). We observed pathways involved in splicing, cell cycle, repair, and metabolism for the turquoise module (Fig. 1F, left), and observed pathways for various cancers and cell death in the blue module (Fig. 1F, right). This analysis suggests that some of these pathways are potentially involved in self-renewal and to a certain extent have been understood for sometime [28-30]. For example, stem cells have the high proliferation rate compared to other cell types and have unusual cell cycle structure, characterized by a short G1 phase and S-phase preference [22]. Cancer cells having stem cell-like properties and the existence of cancer stem cells first in leukemia and now in solid tumors are a well-known phenomenon [30-32].

Identifying the Top Genes Within the Consensus Modules Based on Significance and Fold Change

Next, we wanted to filter and rank the top genes within the turquoise and blue modules to determine what are the best markers for self-renewal and differentiation, respectively. Using fold change and significance we made Volcano plots of the different datasets within each module; representative datasets are shown in Figure 2A. We used twofold change and a p-value of less than .01 as the minimum cutoff. Using multiple datasets from various differentiation conditions (hepatic, hematopoietic, neural, and mesenchymal specifications) [13-18], we have found the top genes that are upregulated and downregulated during hESC differentiation (Fig. 2B, Supporting Information Tables S1, S2). Some of the genes that we found downregulated upon differentiation are already very well known such as OCT4 and NANOG. However, we also found other interesting genes that also met our criteria, such as LCK, KLKB1, and SLC7A3, in the downregulated gene set and RHOJ, ZEB2, and ADAM12 in the upregulated gene set. The criteria we used were quite stringent since all the genes in the consensus modules were significant based on the initial individual WGCNA analysis.

Figure 2.

Identification of top genes within consensus modules and validation. (A): The turquois and blue modules were subjected to Volcano plot to identify gene signatures with significant fold changes. We used the cut-off of twofold change and the p value of <.05. (B): The genes with top fold changes across 19 arrays from various differentiation conditions were compared and represented by the Venn-diagrams [9-14]. (C): To validate the identified genes we used undifferentiated cells or differentiated EBs from human embryonic stem cells (hESCs). Immunofluorescence analysis confirmed a decreased expression of OCT4 as undifferentiated H9 hESC underwent differentiation. (D): We validated three of the top genes upregulated (LCK, KLKB1, and SLC7A3) and downregulated (RHOJ, ZEB2, and ADAM12) in the hESC consensus modules by qRT-PCR in H1 and (E) in H9 hESCs. Abbreviations: EB, embryoid body; MSC, mesenchymal stem cell; NPC, neural progenitor cells.

We then verified some of the top genes from the turquoise and blue consensus modules. We used two hESC lines H1 and H9 cells for validation studies [33]. To do this, we differentiated hESCs as represented in Figure 2C and confirmed differentiation by observing decreased OCT4 expression and changes in morphology. After differentiation of H1 and H9 hESCs, we examined expression level of top 6 genes from the consensus modules by qRT-PCR. We validated that all six genes behaved as expected, being downregulated or upregulated upon differentiation (Fig. 2D, 2E).

Discovery of Intramodular Hub Genes in the Differentiation-Specific Module

Successful generation of iPSCs has significantly advanced our understanding of stem cell biology; however, the detailed mechanisms behind how these four factors could transform terminally differentiated cells into stem cell-like cells still remain elusive. Therefore, we analyzed our microarray data to look for coregulators of OCT4, NANOG, c-MYC, and SOX2. First, we calculated interconnectivity values of all genes that are related to the four factors using WGCNA and R-programming (Fig. 3A–3D). Then we ranked those genes according to the interconnectivity values and top four genes as shown in Figure 3A–3D.

Figure 3.

Interconnectivity analyses on stem cell factors and novel markers. We have performed interconnectivity analysis to examine molecular regulators that are potentially correlated with stemness markers and our novel markers. (A): TRMT112, RASGEF1, HRNBP3, and SNHG5 are highly connected to OCT4. (B): BLOC1, PCIF1, and ZBTB44 are connected to NANOG. (C): ETV5, FHDC1, HDAC8, and ASMT are connected to c-MYC. (D): TEX10, CDCA5, C16ORF, and RUNX1 are connected to SOX2. (E): A graphic representation of designating six hub genes (OCT4, NANOG, SOX2, LCK1, SLC7A3, and KLKB1) to specific interactive networks in consensus module. (F): HIVE plot on the six factors shows that they are closely related to each other. (G): HIVE plot showing the relationship between the six factors and genes within consensus module. This demonstrates that most of genes in the consensus module are correlatively regulated by the six factors.

After identifying top genes from each module, we identified centrally located genes, which are intramodular hub genes, within the module. The module-centric analysis allows us to focus on the relationship between a few key genes and the self-renewal/differentiation of stem cells instead of relating thousands of unnecessary variables [27]. Since the intramodular hub genes are centrally located in the module, they are considered the driving forces of that particular module and have been used to discover therapeutic targets or candidate biomarkers [34-37]. Intramodular hub genes also display the highest intramodular connectivity with the rest of genes in the module. Thus, the hub genes may function as representative and/or regulatory elements in the module. We measured intramodular connectivity and graphically represented top genes from the differentiation-specific module (turquoise).

We found that there were six intramodular hub genes that are highly connected, SLC7A3, NANOG, SOX2, KLKB1, OCT4, and LCK (Fig. 3E). Six intramodular hubs formed mainly four large clusters (represented numbers 1–4 in Fig. 3E). We analyzed four large clusters by DAVID analysis and found that cluster number 1 contained NANOG and OCT4 as the main intramodular hubs and may be involve in the lysine degradation pathway. Cluster number 2 contained LCK and SLC7A3 and was possibly responsible for purine metabolism, cell cycle, and Huntington's disease. Cluster number 3 contained KLKB1 and may be involved in spliceosome. Cluster number 4 included SOX2 as the intramodular hub and may be involved in RNA degradation (Fig. 3E).

Genomic analysis is many times difficult to understand and interpret. A HIVE plot offers a number of simple graphing solutions for complex network analyses [38]. HIVE plot is to visualize networks by mapping and positioning nodes on radially distributed linear axes based on network structural properties. HIVE plot shows quantitative understanding of important aspects of network structure. First, we used a scalable linear layout HIVE plot to show how top six intramodular hub genes are related to each other (Fig. 3F). The HIVE plot confirmed that six intramodular hub genes are highly connected to each other. Moreover, we found that there is a gradient in how close or distant each intramodular hub gene is to one another. For example, KLKB1 is more closely related to OCT4 compared to SLC7A3 based on the genes that particular intramodular gene represents. Second HIVE plot was used to relate those six hub genes to the consensus turquoise module (Fig. 3G). We found that some genes from the consensus turquoise module are highly connected to all six hub-genes (x-axis, Fig. 3H) and some genes are more hub-specific (y-axis, Fig. 3G).

Validating Potential Involvement of LCK in Self-Renewal of hESCs

Among three validated upregulated genes, we chose LCK for further functional validation in self-renewal of hESCs since its silencing has been indirectly demonstrated in differentiation of ESC. LCK has been implicated in self-renewal and stem cell maintenance in mESCs [39, 40]. Differentiation of ESCs to embryoid bodies was associated with rapid transcriptional silencing of LCK and with the loss of the corresponding kinase protein [39, 41, 42].

To validate the potential function of LCK in maintenance of hESC self-renewal, we have examined the effect of a selective LCK inhibitor (7-cyclopentyl-5-(4-phenoxyphenyl)-7H-pyrrolo[2,3,-d]pyrimidin-4-ylamine) on hESC differentiation [43]. It was found that an LCK inhibitor induced spontaneous differentiation of hESCs (Fig. 4). As shown in Figure 4A, treatment of undifferentiated hESCs exponentially growing in culture with LCK inhibitor (0.5 and 1 µM) for 72 hours resulted in a morphological change to differentiated cells. Cells lost typical cellular characteristics of ESCs. The effect was more profound on the edges of colonies with less densely packed cells. Differentiation of hESCs after treatment with LCK inhibitor was further confirmed by reduced alkaline phosphatase (ALP) staining along with reduced ALP activity that are typically used for marking undifferentiated ESCs (Fig. 4B, 4C). We also found that treatment with LCK inhibitor resulted in a significant downregulation of OCT4 expression (Fig. 4D). This result demonstrates that activity of LCK is necessary to maintain hESCs in an undifferentiated self-renewing state and downregulation of LCK induces differentiation of hESCs.

Figure 4.

Effects of inhibiting LCK on self-renewal and pluripotency of human embryonic stem cells (hESCs). (A): To examine a potential role of LCK in hESC self-renewal, H9 hESCs were treated with 0.1% DMSO (vehicle control), 0.5 mM or 1.0 mM LCK inhibitor (Pyrrolo[2,3-d]pyrimidines) (A-419259, Sigma-Aldrich Corp., St. Louis, MO) for 72 hours. Morphologic assessment showed significant change in the presence of LCK inhibitor. Subsequent images were taken by Leica DMIL inverted light microscope with AxioVision 4.6 software. (B): Alkaline phosphatase (ALP) activity, as indicated by the red staining, was downregulated in LCK inhibitor-treated cells compared with the control DMSO-treated cells in dose dependent manner. ALP staining was followed closely by the recommended manufacturer's protocol (Sigma-Aldrich Corp). ALP images were taken using Olympus 1X81 microscope with CellSens 1.8 software. (C): Treatment with LCK inhibitor resulted in reduced ALP activity in hESCs. The lysate and ALP reaction mixture was incubated at 37°C. OD (405 nm) was measured at 20 and 40 minutes. Relative ALP activity was calculated using DMSO (control) as a base value. Bar represents mean ± SD (n = 9; *, p < .05 vs. control) from triplicate samples. (D): Downregulation of OCT4 expression in hESCs treated with LCK inhibitor. Fold change was calculated using DMSO (control) as a base value. Bar represents mean ± SD (n = 9; *, p < .05 vs. control). To examine the specific effect of LCK in hESCs, we have performed gene knockdown experiment as described in Materials and Methods. (E): IF analysis showed that knocking-down of LCK using 25 nM of siLCK SMARTPool (Thermo Fisher Scientific) resulted in the downregulation of OCT4 compared to nontransfected cells or cells transfected with scrambled control siRNA. (F): Quantitative RT-PCR analysis on several hESC pluripotency genes (OCT4, NANOG, FOXD3, and DNMT3B) showed significant reduction of their levels after silencing LCK. Assay was done for triplicate samples to evaluate statistical significance.

To further validate that LCK plays a role in self-renewal of hESCs, we specifically knocked down LCK by siRNA (Fig. 4E, 4F). We were able to successfully knockdown LCK, which can be seen in its protein level (Fig. 4E, left) and in its transcription level (Fig. 4F, most left). By performing immunofluorescence, we found that OCT4 level was significantly decreased when LCK siRNA was transfected but not by control or scramble siRNA (Fig. 4E, right). Transcriptional analysis of major stem cell markers (OCT4, NANOG, FOXD3, and DNMT3B) showed that knocking down LCK leads to decrease in gene expression of the stem cells markers compared to control and scramble siRNA (Fig. 4F). Taken together with the LCK inhibitor data, LCK-specific knockdown experiment proved that LCK is one of key factors in self-renewal of human ESCs.

Comparing the Gene Signatures Determined in hESCs to iPSCs

Next, we assessed how close iPSCs were to hESCs with their molecular signatures. This will help us answer the question as to whether any novel self-renewal factors exist and how significant their molecular roles could be during the acquisition of stemness in iPSCs. To address this, we determined consensus modules of five different datasets examining iPSCs and their differentiated cell of origin (Fig. 5A) [44-48]. This analysis allowed us to determine what were the key molecular signatures that defined an iPSC.

Figure 5.

Comparison of consensus modules and molecular hubs correlated with iPSCs and self-renewing hESCs. (A): Consensus modules were formed using WGCNA by examining iPSCs and their cell of origin across five datasets. These five datasets were obtained from Gene Expression Omnibus from various groups [9-14]. (B): Cytoscape interconnectivity analysis ( to visualize molecular interaction networks and pathways revealed SLC7A3, KLKB1, and LCK as top hub genes in the consensus turquoise module of iPSCs. It shows that SLC7A3 and KLKB1 formed closely related molecular hubs, whereas the molecular hub associated with LCK was distal from them. (C): DAVID functional annotation of the best modules from hESCs and iPSCs showed several important biological processes correlated to hESCs, iPSCs, or commonly to both. (D): We have identified and compared interconnectivity genes from hESC hubs and iPSC hubs and annotated the common genes. We found several biological processes that may have critical role in self-renewal and potency of hESCs and iPSCs. Abbreviations: hESC, human embryonic stem cells; iPSC, induced pluripotent stem cell.

Using WGCNA and R, we found two significant modules—blue and turquois modules. The blue module showed downregulated expression in iPSCs, whereas the turquois module showed upregulated levels in iPSCs. Since we are interested in genes that may have significance in acquiring self-renewal and stem cell characteristics in iPSCs, we focused on the turquois module with upregulated expression. We performed Cytoscape analysis to visualize molecular interaction networks and biological pathways and integrate these networks with annotations, gene expression profiles ( We mapped out interconnectivity of the turquoise module (upregulated in iPSCs vs. parental somatic cell counterparts) to find intramodular hub genes (Fig. 5B). We found that SLC7A3, KLKB1, and LCK that we initially identified as novel markers correlated to undifferentiated hESCs were the key genes that might play an important role in self-renewal ability in iPSCs according to our bioinformatics analyses. In addition, it showed that SLC7A3 and KLKB1 could form closely interconnected molecular networks, whereas the molecular hub associated with LCK was somewhat distal from these two factors. This suggests that LCK may be involved in reprogramming process through different molecular function.

To compare hESCs and iPSCs further, we functionally annotated the best representative consensus modules from hESCs and iPSCs (Fig. 5C). We found that purine and pyrimidine metabolism were more strongly correlated to hESCs; and alanine, aspartate, and glutamate metabolism, endometrial cancer pathway, glycolysis and glucogenesis, and p53 signaling pathway to be more closely associated to iPSCs (Fig. 5C). When we identified and compared genes that are associated with hESC hubs and iPSC hubs, we found that the common genes were associated with the spliceosome, cell cycle, RNA degradation, and DNA replication (Fig. 5D). This suggests that genes involved in these molecular functions have critical roles in the maintenance of self-renewal and potency of hESCs and iPSCs.

To examine transcriptional hubs that could play critical roles in self-renewal of hESCs and iPSCs, we have analyzed promoter sequence motifs in genes from the best consensus modules correlated to undifferentiated hESCs and iPSCs. The best hESC consensus module contained 671 unique genes and the iPSC module contained 3,178. We found that 46% of the genes (573 genes) in the hESC top consensus module matched the iPSC top consensus module (Fig. 6A, the gene list is shown in Supporting Information Table). By promoter sequence analysis, we found 14 common sequence motifs among promoters of hESC and iPSC genes. Out of 14, hESCs had five unique sequence motifs (red), iPSCs with five unique motifs (yellow), and four in common (orange) (Fig. 6A). Based on the promoter sequences, we found transcription factors that may bind differentially to hESCs and iPSCs. The OCT family transcription factor was unique in hESCs and AIRE, OTX, and CF2 were unique in iPSCs. XFD, CROC, FOX, MEF, CRP, and SPF transcription factors were shared between hESCs and iPSCs (Fig. 6B). This data show that there are indeed common transcription factors that may play critical roles in self-renewal of both hESCs and iPSCs, but it also suggest that molecular regulatory networks in iPSCs may somewhat be different from hESCs that could subsequently have functional effects in the maintenance or downstream manipulation (e.g., differentiation into a certain lineage) of stem cells from different sources. Knowing these differences will greatly benefit us to design experimental strategy for future utilization of hESCs and iPSCs.

Figure 6.

Comparison of sequence motifs and transcription factors associated with consensus genes from iPSCs and self-renewing hESCs. To examine transcriptional hubs that could play critical roles in self-renewal of hESCs and iPSCs, we have performed promoter sequence analysis and identified sequence motifs of genes from the best consensus modules correlated to undifferentiated hESCs and iPSCs (RSAT: (A): The best representative modules from hESC and iPSC are shown in Venn diagram. Using the gene lists, we compared common sequence motifs of promoters of hESCs and iPSCs. Five unique sequence motifs were found from consensus modules from hESCs and iPSCs had five unique sequences that are closely correlated to their respective consensus modules (red for consensus modules associated with hESCs, yellow for consensus modules associated with iPSCs, and orange for both). (B): Using the common sequence motifs identified in (A), transcription factors related to those sequence motifs were examined. The OCT family transcription factor was strongly correlated to consensus modules from self-renewing hESCs and AIRE, OTX and CF2 were strongly correlated to consensus modules from iPSCs. XFD, CROC, FOX, MEF, CRP, and SPF transcription factors were shared between self-renewing hESCs and iPSCs. Abbreviations: hESC, human embryonic stem cells; iPSC, induced pluripotent stem cell.


Using gene expression microarray and WGCNA bioinformatics analysis, we have identified new self-renewal factors that are specific to stem cell differentiation from H1 and H9 human ESCs (hESCs). We constructed consensus modules to compare and validate our results with 19 publicly available microarray data that are specific to stem cell differentiation. These datasets included hESCs guided to differentiate into various somatic cells and lineages such as hemangioblasts, mesoderm, hepatocytes, hematopoietic cells, and neural/mesenchymal cells [13-18]. A variety of datasets were selected to enhance statistical power for consensus module formation but more importantly to find the set of genes responsible for the identity of stem cells.

The major consensus module representing a set of genes that are consistently downregulated upon hESC differentiation across datasets includes some of the well-known players in stemness such as OCT4, and NANOG. In addition to these known markers, we have identified a total of 1,244 genes that are consistently downregulated upon hESC differentiation from our own microarray results as well as across the 19 microarrays analyzed. From this list of genes, we have identified the top genes based on fold change and significance across datasets. These include genes such as LCK, KLKB1, SLC7A3 as well as others. Furthermore, we validated these genes across three independent hESC lines (H1, H9, and UCLA6) undergoing differentiation. A set of new genes we have identified could potentially be involved in maintenance of self-renewal in undifferentiated stage. Their individual functional role should be experimentally validated.

Next, we asked how OCT4, NANOG, c-MYC, and SOX2 connected to other genes to get a glimpse into possible mechanism of these important genes in stem cell and differentiation, which is not yet known. Interestingly, the top genes that are most connected to the four stem cell factors may play a role in epigenetic, transcriptional regulation, and male sex characteristics. This in part can be explained by the origin of ESCs. The human embryo reaches the blastocyst stage after 4–5 days after fertilization. hESCs are derived from the inner cell mass of a blastocyst [49]. It is during this blastocyst stage that the embryo is going through major de novo methylation and is getting primed to go through sex determination and other dynamic changes in gene expression [50-52].

When we measured interconnectivities of the turquoise consensus module, we found that SLC7A3, NANOG, SOX2, KLKB1, OCT4, and LCK to be the major foci of downregulated genes upon differentiation, which means these factors are important in maintaining stem cell self-renewal while preventing differentiation. The importance of OCT4, SOX2, and NANOG in these roles has been well documented, but it is not much known about the self-renewing role of SLC7A3, KLKB1, and LCK that we identified from this study. LCK is a lymphocyte-specific protein tyrosine kinase. It is 1 of 11 members in SRC family kinase [53]. Previous work has implicated the SRC family kinase in the self-renewal and differentiation of mouse and hESCs [39, 40, 54, 55]. Meyn and Smithgall have shown that level of LCK is highly expressed in undifferentiated mouse ESCs (mESCs) and decreases rapidly upon differentiation into EBs and we were able observed comparable changes in LCK expression during differentiation in hESCs [40]. By treating hESCs with physiological concentration of LCK inhibitor (A-419259), we were able to decrease self-renewal capacity of hESCs with morphology of differentiated cells, decrease in ALP activity and OCT4 expression, which in turn suggest LCK plays an important role in SRC family kinase signal transduction that governs stem cell self-renewal. Interestingly, LCK inhibitor (A-419259) of same concentration failed to differentiate mESCs [40]. This result may imply (a) self-renewal capacity affected by LCK inhibitor is minimal in mESCs and significant in hESCs, and (b) mESCs may be able to compensate for decrease in LCK by some other parallel pathways. Although mESCs and hESCs share many basic factors and signaling mechanisms, our LCK inhibitor study and opposite differentiation outcome in presence of leukemia inhibitory factor between mESCs and hESCs implicate there are differences in terms of the role LCK plays in self-renewal of these two stem cells. In addition to LCK inhibitor, we specifically knockdown LCK by siRNA and confirmed its role.

We further analyzed the major consensus module to better understand how the stem cell factors relate to each other as well as to the module. The HIVE plot allowed us to detect emerging patterns in our complex hESCs network. Normalized connectivity of stem cell factors showed how close or far each factor is to one another. The significant overlap between OCT4 (Blue) and KLKB1 (Orange) affirms that KLKB1 could be a major player in differentiation of hESCs. Furthermore, it may imply a possible shared functionality between OCT4 and KLKB1.

We then examined if our findings using hESC model can be related to iPSCs. Unexpectedly, four Yamanaka factors were not the major hub genes of iPSCs. Although OCT4 and SOX2 did exist in the overall hub gene list, when the stringent cut off we used for hESCs was used for iPSCs, only SLC7A3, KLKB1, and LCK remained. Based on our functional analyses, interconnectivity measurements, transcription factor, and sequence motif analyses, we could state that iPSCs may possess certain characteristics of hESCs; however, there are distinct molecular differences.

Our analyses may provide an insight into specific pathways and a set of genes critical in hESC self-renewal. The genes discovered could also be used as a tool for other researchers to delineate stemness in a particular system. We examined how close iPSCs were to hESCs and found that 46% (573 genes) behaved the same between the two cell types when examining the top consensus module for hESCs. When the iPSCs top consensus module were compared to the hESCs top consensus module, the iPSCs had 3,178 unique genes suggesting that there may be other pathways in the background dependent upon their somatic cell of origin prior to induction. Detailed knowledge on these novel factors and molecular mechanisms associated with specific cell type or lineage specifications will help us design experimental strategy to improve success rate and efficiency in reprogramming or directed differentiation process.


WGCNA represents a powerful tool to help understand how genes are behaving across datasets and are correlated to each other within specific modules. We have utilized this tool to understand stemness across different datasets. From our analysis we have determined some of the top upregulated and downregulated genes in hESC differentiation that may be useful to researchers in gaining further insight into this field.


We would like to thank Dr. Steve Horvath for his help with his tutorials for WGCNA. We acknowledge the UCLA Clinical Microarray Core for excellent technical service. We also acknowledge the UCLA Broad Stem Cell Institute for providing us with hESC lines. This work was supported by CIRM Basic Biology Award II (RB2–01562), NIH/NIAAA R01 grant award (1R01AA21301), and UCLA SOD SEED Grant to Y.K. and T32 training grant award from NIDCR (T32DE07269 to J.J.K. and O.K.).

Author Contributions

J.J.K., O.K., and Y.K.: participated in designing experiments, execution of described works, statistical analysis, interpretation of data, and writing the manuscript; A.H.N., T.G.T., O.E., and C.L.: contributed to the execution of experiments and writing the manuscript. J.J.K. and O.K. contributed equally to this article.

Disclosure of Potential Conflicts of Interest

The authors indicate no potential conflicts of interest.