Two-Phase Analysis of Molecular Pathways Underlying Induced Pluripotent Stem Cell Induction§


  • Zhaoyu Lin,

    1. MOE Key Laboratory of Model Animal for Disease Study, Model Animal Research Center of Nanjing University, Nanjing, Jiangsu, People's Republic of China
    2. Fay Simon Center for Hearing Research at the Department of Otolaryngology, Washington University School of Medicine, St. Louis, Missouri, USA
    Search for more papers by this author
  • Philip Perez,

    1. Fay Simon Center for Hearing Research at the Department of Otolaryngology, Washington University School of Medicine, St. Louis, Missouri, USA
    2. Division of Biology & Biomedical Science and Neuroscience Program, Washington University School of Medicine, St. Louis, Missouri, USA
    Search for more papers by this author
  • Debin Lei,

    1. Fay Simon Center for Hearing Research at the Department of Otolaryngology, Washington University School of Medicine, St. Louis, Missouri, USA
    Search for more papers by this author
  • Jingyue Xu,

    1. MOE Key Laboratory of Model Animal for Disease Study, Model Animal Research Center of Nanjing University, Nanjing, Jiangsu, People's Republic of China
    Search for more papers by this author
  • Xiang Gao,

    1. MOE Key Laboratory of Model Animal for Disease Study, Model Animal Research Center of Nanjing University, Nanjing, Jiangsu, People's Republic of China
    Search for more papers by this author
  • Jianxin Bao

    Corresponding author
    1. Fay Simon Center for Hearing Research at the Department of Otolaryngology, Washington University School of Medicine, St. Louis, Missouri, USA
    2. Division of Biology & Biomedical Science and Neuroscience Program, Washington University School of Medicine, St. Louis, Missouri, USA
    • CID at Washington University School of Medicine, 4560 Clayton Avenue, St. Louis, Missouri 63110, USA
    Search for more papers by this author
    • Telephone: 314-747-7199; Fax: 314-747-7230

  • Author Contributions: Z.L.: conception and design, collection and assembly of data, data analysis and interpretation, and manuscript writing; P.P.: collection and assembly of data and manuscript writing; D.L. and J.X.: collection and assembly of data; X.G.: conception and design and financial support, J.B.: conception and design, financial support, administrative support, collection and assembly of data, data analysis and interpretation, manuscript writing, and final approval of manuscript.

  • Disclosure of potential conflicts of interest is found at the end of this article.

  • §

    First published online in STEM CELLSEXPRESS September 28, 2011.


Induced pluripotent stem cells (iPSCs) can be reprogrammed from adult somatic cells by transduction with Oct4, Sox2, Klf4, and c-Myc, but the molecular cascades initiated by these factors remain poorly understood. Impeding their elucidation is the stochastic nature of the iPS induction process, which results in heterogeneous cell populations. Here we have synchronized the reprogramming process by a two-phase induction: an initial stable intermediate phase following transduction with Oct4, Klf4, and c-Myc, and a final iPS phase following overexpression of Sox2. This approach has enabled us to examine temporal gene expression profiles, permitting the identification of Sox2 downstream genes critical for induction. Furthermore, we have validated the feasibility of our new approach by using it to confirm that downregulation of transforming growth factor β signaling by Sox2 proves essential to the reprogramming process. Thus, we present a novel means for dissecting the details underlying the induction of iPSCs, an approach with significant utility in this arena and the potential for wide-ranging implications in the study of other reprogramming mechanisms. STEM Cells 2011;29:1963–1974.


Briggs and King's landmark 1952 discovery that the transfer of differentiated nuclei to enucleated frog eggs could produce viable tadpoles established an impressive case against the unidirectional differentiation dogma, and recent studies that have successfully transferred somatic cell nuclei in mammalian species continue to flout the notion of one-way movement along the path of differentiation [1–3]. Subsequent studies have demonstrated the presence of dominant factors in eggs or embryonic stem cells (ESCs) that can reverse a cell from the differentiated state back to the pluripotent state [4–7]. Pioneering work by Yamanaka and colleagues (2006) identified the first set of these dominant factors: Oct4, Sox2, Klf4, and c-Myc (OSKM). Induced pluripotent stem cells (iPSCs) obtained by transduction of these Yamanaka factors or other combinations are epigenetically and developmentally very similar, if not identical, to ESCs [8–12]. Although the identification of factors able to induce this reprogramming represents a major advancement, much work remains to elucidate the exact molecular mechanism underlying this direct cell reprogramming [13].

Based on genome-wide mapping of transcription factor targets, recent studies have revealed that ESC pluripotency is mainly directed by the interconnected autoregulatory loop of Oct4, Sox2, and Nanog, and by subsequent transcriptional networks, which enhance transcription of pluripotency genes and repress genes associated with differentiation [14–19]. The networks important for ESC pluripotency and self-renewal may not contribute to the iPSC induction, however, as endogenous Nanog is not present at the early phase of reprogramming [17], and the promoters of Oct4 and Sox2 are not available for self-activation, due to DNA methylation [20]. The same genome-wide mapping method has been applied to study the function of Yamanaka factors among ESCs, iPSCs, and what have been termed “partially reprogrammed cells.” The last group of cells have ES-like morphology without ES-like pluripotency, following transduction of all four Yamanaka factors [21–23]. Examination of these cells can enhance our understanding of reprogramming barriers, but identified molecular mechanisms from comparison of these partially reprogrammed cells and iPSCs may not be critical for the reprogramming process, as, by definition, these cells fail to become fully reprogrammed iPSCs. Recent identifications of small molecules capable of catalyzing or even replacing certain reprogramming factors provide an alternative means of analyzing the role of these four reprogramming factors in iPSC induction [22, 24–27]. Complicating this method, however, is the fact that small chemical compounds can bind to multiple targets and act on multiple signaling pathways that may not be relevant to the reprogramming process. No chemicals identified so far operate using the exact same molecular pathways initiated by the reprogramming factors they replace. Thus, it would be difficult to rely on this approach alone to reveal molecular mechanisms initiated by Yamanaka factors.

A major obstacle in analyzing the causal molecular events that drive reprogramming is the fact that the induction of iPSCs is a gradual process involving stochastic epigenetic events. This characteristic leads to heterogeneous cell populations composed of individual cells at different reprogramming phases for any given time point after the initial induction [28–30]. In this study, we have firmly established a stable phase during the reprogramming process through the overexpression of only Oct4, Klf4, and c-Myc (OKM). Once at this stable phase, iPSC induction can be driven forward with subsequent Sox2 overexpression. Based on the clear distinction between the two phases of this iPSC induction process, we have dissected the Sox2-induced molecular cascades important for iPSC induction with defined temporal resolution.


All procedures followed NIH guidelines and were approved by the Animal Care and Use Committee of Washington University.

Cell Culture

Mouse ESCs, iPSCs, and OKM cells were maintained in Dulbecco's modified Eagle's medium (DMEM; Gibco Invitrogen, Carlsbad, CA, supplemented with 15% fetal bovine serum (FBS) (Gibco), 100 μM β-mercaptoethanol, 2 mM nonessential amino acid, 2 mM L-glutamine, 2 mM HEPES, and 10 ng/ml leukemia inhibitory factor (Chemicon, Temecula, CA, on feeder cells, mitomycin C-treated primary mouse embryonic fibroblast (MEF) cells. All experimental cultures were maintained at 37°C in a moist atmosphere of 95% air and 5% CO2.

Derivation of MEF Cells

MEFs were derived from 13.5 day mouse embryos. The head and visceral tissues were first removed. Remaining parts were then washed with phosphate-buffered saline (PBS), minced into small pieces by scissors, and digested by 0.25% trypsin/1 mM EDTA solution (3 ml per embryo) in 37°C water bath for 15 minutes. After trypsinization, equal volumes of MEF medium (DMEM with 10% FBS) were added and pipetted up and down to dissociate cells. Next, cells were collected by centrifugation (1000g for 3 minutes) and resuspended in MEF medium. A total of 1 × 106 cells per 100 millimeter dish were cultured and marked as “Passage 0.” MEFs within three passages were used.

Plasmid Construction

To generate pBabe-puro-Nr6a1, Zfp371, and Zfp459, we amplified the coding regions of these genes by reverse-transcription polymerase chain reaction (RT-PCR) and cloned them to PCR2.1-TOPO (Invitrogen). These cDNAs were sequenced and inserted into the BamHI/SalI site (Nr6a1 and Zfp459) or EcoRI/SalI site (Zfp371) of the pBabe-puro three plasmid. The other retroviral vectors were obtained from Addgene ( Primers used are listed in Supporting Information.

Retroviral Infection and iPSC Induction

The day before transformation, Phoenix cells were seeded at 2 × 106 cells per 100 millimeter dish. Fugene six transfection reagent (27 μl; Roche, Madison, WI, was mixed with 273 μl of DMEM and incubated for 5 minutes at room temperature. Then 9 μg plasmid DNA was added, mixed, and incubated for 30 minutes at room temperature. After incubation, the mixture was added to Phoenix cells. After 48 hours, supernatant of Phoenix cells was filtered through a 0.45 μm filter, centrifuged (16,000g for 60 minutes), and resuspended in fresh medium (100 μl for 1 ml). The virus medium was supplemented with 4 μg/ml of polybrene and applied to target cells overnight. MEF or intermediate phase cells at 105 per milliliter were trypsinized and plated 1 day before infection. Clones were picked 2–3 weeks after the infection.

Quantitative RT-PCR and RT-PCR

Total RNA was extracted by SV Total RNA Isolation kit (Promega, Madison, WI, For reverse transcription, the RETROscript kit (Ambion, Austin, TX, was used. Each gene was quantified by a LightCycler (Roche) as described previously [31]. RT-PCR was performed on a PCR machine (Eppendorf, German, PCR primers used are listed in Supporting Information.

Alkaline Phosphatase Staining and Immunocytochemistry

Alkaline phosphatase (AP) staining was performed with the AP Detection Kit (Chemicon). For immunostaining, cells were fixed by 4% paraformaldehyde/PBS (PFA/PBS) for 30 minutes and washed with PBS, 0.1% Triton X-100 (PBST) three times. After blocking with 10% goat serum for 1 hour at room temperature, cells were washed with PBST three times. Primary antibodies, including stage-specific embryonic antigen 1 (SSEA1) (MC-480, 1:100; Developmental Studies Hybridoma Bank, Iowa City, IA,, Nanog (ab21603, 1:1000; Abcam, Cambridge, MA,, Oct4 (sc-5279, 1:50; Santa Cruz, Santa Cruz, CA,, Sox2 (ab15830, 1:500; Abcam), βIII-tubulin (T8660, 1:200; Sigma-Aldrich, St. Louis, MO, α-Smooth muscle actin (ab5694, 1:100; Abcam), Gata4 (ab61170, 1:500; Abcam), and Rex1 (09-0019, 1:100; Stemgent, San Diego, CA, were applied overnight. Secondary antibodies, including Alexa546 goat anti-rabbit IgG (A-22283, 1:500; Invitrogen), Alexa488 goat anti-rabbit IgG (A-11008, 1:1,000; Invitrogen), and Alexa488 goat anti-mouse IgG (A-11001, 1:1,000; Invitrogen) were stained for 30 minutes.

Embryoid Body Assay

Colonies of ESCs and iPSCs were dissociated with 0.25% trypsin/1 mM EDTA. The single cell suspension (1 × 105) was maintained in bacteria Petri dishes (10 ml per dish). Medium was changed every other day. At day 4, the number of embryoid bodies (EBs) was counted. Then EBs were seeded in 24-well plates (100 EBs per well) and maintained in the same culture condition with 1 μM retinoic acid. At day 8, the cells were collected for RT-PCR and immunostaining.

Teratoma Formation Assay

ESCs or iPSCs (1 × 107 cells per milliliter) were digested with 0.25% trypsin/1 mM EDTA and resuspended in MEF medium. Nude mice were anesthetized, and the cells at a volume of 100 μl were injected into the dorsal flank. Four weeks after injection, teratomas were surgically dissected from the mice. Samples were fixed by 4% PFA/PBS for H&E stain and immunostaining.

Chimera Formation Assay

C57BL/6J diploid blastocysts (4.5 days after human chorionic gonadotropin injection) were placed in a drop of M16 medium (Sigma-Aldrich) under mineral oil. CD1 iPSCs were injected into blastocysts around the inner cell mass. Blastocysts were then cultured in M2 medium (Sigma) at 37°C until they were transferred to recipient females.

Microarray Assay

Total RNA was extracted by SV Total RNA Isolation kit (Promega). The samples were sent to the core service center at Washington University for reversal to cDNA and hybridized to MouseRef-8 array (Illumina, San Diego, CA, http://www. Data were analyzed using GeneSpring GX software (Agilent, Santa Clara, CA, The data were deposited at = GSE28197.

Bisulfate Sequencing Assay

Genomic DNA isolated from cells was treated with CpGenome DNA modification kit (Chemicon) following the manufacturer's recommendations. The promoter regions of Nanog, Oct4, and Trpc1 were amplified by PCR, and PCR products were inserted into PCR2.1-TOPO (Invitrogen). Single clones of each sample were sequenced by M13 primer. Primers are listed in Supporting Information.

Statistical Analysis

Means and SDs were from at least three independent experiments, as presented in the figure legends. In figures, bar graphs represent means, whereas error bars represent SEM. Statistical analysis was performed using a two-tailed Student's t test. Pearson's correlation coefficient was used to analyze the microarray data, and the rest of the microarray analyses were performed using GeneSpring GX software.


Somatic Cells—Intermediate-Phase Cells—Final-Phase iPSCs

To study the molecular mechanisms underlying iPSC induction, we formulated a new hypothesis that the iPSC induction could occur through either one-step induction by four Yamanaka factors or a stepwise process with at least two stable phases (Fig. 1A). If true, this hypothesis would imply that stable colonies could be obtained by the expression of only three reprogramming factors and that subsequent overexpression of the remaining reprogramming factor in these stable colonies would convert these cells to iPSCs. To test our hypothesis, we examined whether iPSCs could be obtained from MEF cells by a two-phase induction: first by overexpression of only three reprogramming factors and then by overexpression of the remaining reprogramming factor. Because Klf4 and c-Myc are present in fibroblasts and are not essential reprogramming factors [27], we tested only two sets of combinations focusing on Oct4 and Sox2 for the first phase of induction: OKM, or Sox2, Klf4, and c-Myc (SKM). After 3 weeks of infection, AP-positive colonies were obtained only from the positive control group (OSKM) and the OKM group (Fig. 1B), whose colonies showed morphology similar to ESCs. Single colonies were picked (a total of 72 clones) from the OKM group and cultured for more than 10 passages. Four stable colonies with over 10 passages were further characterized. The cells in these colonies were found to express SSEA1 but no other ES markers such as Sox2 (one sample, OKM-SC42, is shown in Fig. 2A and Supporting Information Fig. S1). Cells from these colonies could also form EB-like structures in the standard EB assay but failed to differentiate. These data confirmed the first part of our hypothesis that a stable intermediate phase could be achieved with expression of only three factors. Furthermore, these results suggested that the overexpression of Oct4 had to occur before that of Sox2, as no stable colonies were observed in the SKM group. To confirm the second part of our hypothesis, we overexpressed Sox2 in the OKM cells. Two weeks after Sox2 overexpression, iPS-like cells were obtained from these OKM cells. Based on Rex-1 immunostaining findings, the reprogramming efficiency of this two-phase approach was greater than that of the traditional transduction with all four factors at the same time: 88 ± 3.2 colonies per 10,000 OKM cells versus 24 ± 3.4 colonies per 100,000 MEF cells, respectively. To determine whether the cells reprogrammed by the two-phase method were real iPSCs, we first used RT-PCR to examine the expression of specific markers of ESCs in two lines of these cells. The results showed that these cells expressed endogenous stem cell markers such as Oct4, Sox2, Eras, Esg1, Nanog, and Rex1 (Fig. 2A). Immunocytochemical assays confirmed the presence of Nanog and SSEA-1 proteins in these cells (Fig. 2B). Bisulfate sequencing was performed to determine the methylation status of Oct4 and Nanog gene promoters for OKM cells and iPS-like cells derived from OKM cells after Sox2 overexpression (OKM-S). Extensive methylation at both promoters was found for OKM cells, whereas widespread demethylation of the same regions occurred in these iPS-like cells (Fig. 2C). Next, teratomas were produced by subcutaneous transplantation of these iPS-like cells (OKM-S) into nude mice, and histological examination showed that all three embryonic germ layers were present in the teratomas (Fig. 2D). This finding was also confirmed by immunostaining of protein markers specific for these three germ layers (Fig. 2E). To further assess their pluripotency, individual iPS-like cells (OKM-S) made by this two-phase procedure from white CD1 mice were injected into diploid blastocysts from C57BL/6J black mice, resulting in the generation of viable, high-contribution chimeras (Fig. 2F). All of these results were consistent with the classification of these stepwise-derived cells as iPSCs.

Figure 1.

Identification and formation of stable intermediate-phase cells. (A): Diagram of our two-phase iPSC induction and the common induction process. (B): Comparison of morphology and AP staining of MEF reprogrammed by OSKM, OKM, and SKM. The cells were fixed after 21 days culture. Scale bar = 50 μm. Abbreviations: AP, alkaline phosphatase; iPSC, induced pluripotent stem cells; MEF, mouse embryonic fibroblast; OKM, Oct4, Klf4, and c-Myc; OSKM, Oct4, Sox2, Klf4, and c-Myc; SKM, Sox2, Klf4, and c-Myc.

Figure 2.

Pluripotency of two-phase induced induced pluripotent stem cells (iPSCs). (A): RT-PCR analysis of embryonic stem cell (ESC) molecular markers in OKM cells (OKM-MEF-SC42), two-step induced iPSCs (OKM-S-MEF-SC3-1, 3-3), ESCs (R26), and MEF cells. (B): Immunostaining of Nanog and SSEA-1 for two-phase induced iPSCs (OKM-S-MEF-SC3-1). Scale bar = 100 μm. (C): Methylation patterns of Oct4 and Nanog promoters in cells at the stable intermediate phase (OKM-MEF-SC42) and pluripotent phase (OKM-S-MEF-SC3-1). (D): H&E staining of tissues from teratoma derived from OKM-S-MEF-SC3-1 cells. (E): Immunostaining of tissues from the same teratoma as in (D). Gata4: endoderm; βIII-tubulin: ectoderm; α-smooth muscle actin: mesoderm. Scale bar = 100 μm. (F): Chimeras formed by injection of OKM-S-MEF-SC3-1 iPSCs from CD1 strain (white coat) into C57BL/6J embryo (black coat). Abbreviations: GAPDH, glyceraldehyde-3-phosphate dehydrogenase; MEF, mouse embryonic fibroblast; OKM, Oct4, Klf4, and c-Myc; RT-PCR, reverse-transcription polymerase chain reaction; SSEA, stage-specific embryonic antigen.

Confirmation of Two-Phase iPSC Induction

Our hypothesis was also confirmed by another experiment originally designed to address a different question: whether iPSCs were equivalent to ESCs in their properties of self-renewal and differentiation. We started to address this question by comparing the growth and differentiation abilities of ESCs and iPSCs. There was no significant group difference in the proliferation rate of ESCs (R26 and B6) versus iPSCs (O9 and TT-O25) [10]. However, one cell line, TT-O25, was consistently different from other cell lines (Supporting Information Fig. 2A-2F). To dissect the causes underlying a unique phenotype for the TT-O25 cell line, we examined the expression profile of Yamanaka factors by quantitative RT-PCR (qRT-PCR). Interestingly, the expression level of each factor was dramatically higher in TT-O25 iPSCs (Supporting Information Fig. 2G). The possibility that reactivation of viral expression contributed to this increase was confirmed through RT-PCR with viral-specific primers (Supporting Information Fig. 3A), and this expression was significantly higher in the late passage (after 30 passages) of TT-O25 cells (Supporting Information Fig. 3B). To determine whether the viral reactivation occurred uniformly for all four reprogramming factors, we isolated single cell-derived subclones from the late passage and quantified the viral reactivation level for each, finding that most of the subclones had reactivation of all four viral reprogramming factors. Interestingly, three stable subclones out of a total of 48 subclones (6.25%) lacked viral Sox2 expression, despite elevated expression of the other three factors (Supporting Information Fig. 3B). This unexpected finding was consistent with our hypothesis that a stable intermediate phase between fibroblasts and iPSCs could be achieved through the expression of only three reprogramming factors (Oct4, Klf4, and c-Myc). In fact, similar colonies were present in the original discovery [24, 32], although subsequently identified differences in potential for reprogramming set these apart [21–23].

Next, we examined whether the TT-O25 subclones overexpressing Oct4, Klf4, and c-Myc only could be induced to iPSCs with Sox2 overexpression. Cells from one subclone of TT-O25 (SC23) were transfected with either the control plasmid (pMax–green fluorescent protein) or the plasmid expressing Sox2 (pMax-Sox2). The development of pluripotency (endogenous Oct4) was monitored by G418 drug selection, taking advantage of the neo gene's insertion after the Oct4 promoter in the TT-O25 cell line [10]. Numerous AP-positive colonies with endogenous Oct4 activation were obtained 2 weeks after the Sox2 transduction, and a few colonies also appeared in the control plates (Supporting Information Fig. 4A, 4B). Colonies were isolated, amplified, and cultured for five passages after Sox2 overexpression. They showed the same methylation pattern on the Nanog promoter as ESCs and were able to differentiate into cells representing all three germ layers in the EB assay (Supporting Information Fig. 4C, 4D).

Analysis of Sox2 Downstream Gene Expression

Because stable OKM cells could be induced to iPSCs after overexpression of the remaining Yamanaka factor, Sox2, this two-phase process provided an opportunity to dissect the Sox2 molecular pathways underlying the iPSC induction process. Gene expression microarrays were used to compare the global expression pattern (a total of 25,696 genes) among MEFs, the OKM cells, the iPSCs (OKM-S cells), and ESCs. This data showed that the OKM cells (OKM-CD1), iPSCs (OKM-S-CD1), and ESCs (R26) were different from MEFs in their global gene expression pattern (Fig. 3A). Analysis of the transcriptional profile using the Pearson correlation coefficient showed that the expression profile of stable OKM cells was between that of MEF and that of iPSCs (Fig. 3B). By comparing the expression patterns between MEF and OKM cells, we identified genes regulated by a combination of Oct4, Klf4, and c-Myc (Supporting information Table 1), which included an upregulation of pluripotent genes such as Dppa4 and Dppa5. After further comparing the expression profiles between the OKM cells and subsequently derived iPSCs with a focus on transcription factors or master genes of pluripotency and proliferation, we identified possible Sox2 downstream genes (Table 1 and Supporting Information Table 2). Certain genes identified were previously found to be unable to replace Sox2 in the conversion of somatic cells into iPSCs when combined with the other three reprogramming factors [11, 32–36]. The remaining five genes were: Nr6a1, Tgfbr1, Tgfb2, Zfp371, and Zfp459. Tgfbr1 and Tgfb2 belong to the transforming growth factor β (TGF-β) signaling pathway important for the mesenchymal to epithelial transition, a process recently shown to play a key role in the initiation of MEF reprogramming [37, 38]. Nr6a1, Zfp371, and Zfp459 are transcription factors possibly involved in the pluripotency and self-renewal of ESCs [39]. To confirm the microarray findings, we applied qRT-PCR assays for these five genes and for Nanog and Dnmt3b genes, which are both implicated in the maintenance of pluripotency in ESCs. These assays confirmed our microarray data, showing the downregulation of Tgfbr1 and Tgfb2 and upregulation of the rest of the candidate genes occurring in the transition from the stable phase to iPSCs after Sox2 overexpression (Fig. 3C-3I).

Figure 3.

Gene expression differences among MEF, intermediate-phase cells, induced pluripotent stem cells (iPSCs), and embryonic stem cells (ESCs). (A): Microarray heat map matrix of MEFs, OKM cells (OKM-CD1), two-phase induced iPSCs (OKM-S-CD1), and ESCs (R26). n = 4. (B): Pearson correlation coefficients among MEFs, OKM cells, two-phase induced iPSCs (OKM-S-CD1), and ESCs. (C–I): Quantitative RT-PCR analysis of possible Sox2 downstream genes (C: Dnmt3b; D: Nanog; E: Nr6a1; F: Tgfb2; G: Tgfbr1; H: Zfp371; I: Zfp459). Average relative expression levels and standard deviations from three quantitative RT-PCR reactions are shown. Abbreviations: GAPDH, glyceraldehyde-3-phosphate dehydrogenase; MEF, mouse embryonic fibroblast; OKM, Oct4, Klf4, and c-Myc; RT-PCR, reverse-transcription polymerase chain reaction.

Table 1. Sox2-induced gene regulation in intermediate-phase cells
inline image

Temporal Expression Patterns of Sox2-Regulated Genes

Taking advantage of the stable nature of OKM cells and their potential for conversion to iPSCs by Sox2, we further examined the temporal relationship between Sox2 overexpression and the expression pattern of the same seven downstream genes. Gene expression levels were quantified at 12, 24, 48, and 96 hours after Sox2 transduction and compared to OKM cells and two-phase derived iPSCs (OKM-SOX2, over 10 passages). Three types of expression patterns were observed: (1) a significant increase in the expression pattern of Dnmt3b and Zfp459 was observed 12 hours after Sox2 overexpression, implicating these two as possible immediate downstream genes regulated by Sox2 (Fig. 4A); (2) a significant increase in the expression patterns of Nr6a1 and Zfp371 occurred within 24 hours after Sox2 overexpression, while a significant decrease in the expression patterns of Tgfb2 and Tgfbr1 occurred within 48 hours after Sox2 overexpression (Fig. 4B); and (3) no increase in Nanog expression was seen within 96 hours of Sox2 overexpression, demonstrating that the increase of Nanog observed in OKM-Sox2 cells beyond 10 passages must occur later (Fig. 4C).

Figure 4.

Temporal expression patterns of Sox2-regulated genes. (A): Quantitative RT-PCR (qRT-PCR) analysis of Dnmt3b and Zfp459 12, 24, 48, and 96 hours after the Sox2 transduction with two control groups, OKM cells (OKM) and two-phase derived induced pluripotent stem cells (iPSCs; OKM-SOX2 over 10 passages). Data are means ± SEM (n = 3). (B): qRT-PCR analysis of Nr6a1, Tgfb2, Tgfbr1, and Zfp371 12, 24, 48, and 96 hours after the Sox2 transduction. Data are means ± SEM (n = 3). (C): qRT-PCR analysis of Nanog 12, 24, 48, and 96 hours after the Sox2 transduction. Data are means ± SEM (n = 3). (D): Bisulfite sequencing results of methylation patterns on Trpc1 promoter before and after Sox2 transduction. (E): qRT-PCR analysis of Trpc1 at 12, 24, 48, and 96 hours after the Sox2 transduction. Data are means ± SEM (n = 3). OKM represents the intermediate-phase cells, and OKM-Sox2 represents the iPSCs isolated after overexpression of Sox2. Abbreviations: GAPDH, glyceraldehyde-3-phosphate dehydrogenase; MEF, mouse embryonic fibroblast; OKM, Oct4, Klf4, and c-Myc; RT-PCR, reverse-transcription polymerase chain reaction.

If Dnmt3b is in fact a downstream gene immediately regulated by Sox2, it is possible that genes with promoters methylated during the transition from OKM cells to iPSCs are downregulated quickly after Sox2 overexpression. Based on previous studies [33], we selected one such gene, Trpc1, to be the test case. As hypothesized, the promoter of Trpc1 was methylated during the transition (Fig. 4D), accompanied by a rapid temporal decrease in its expression after Sox2 overexpression (Fig. 4E). Thus, methylation of differentiation genes could occur during the iPSC induction, which might be directly regulated by the Sox2 gene via Dnmt3b or via cooperation of Dnmt3 and other genes.

One Critical Sox2 Signaling Pathway for Late-Phase iPSC Reprogramming

Given our microarray data, we reasoned that one or a combination of the above-identified genes should be able to replace Sox2 if they truly belonged to the Sox2 downstream signaling essential for iPSC induction. We excluded Dnmt3b based on earlier studies showing its inability to fulfill this role in place of Sox2 [32, 33]. We replaced Tgfbr1 and Tgfb2 with Smad7, as Smad7 can block TGF-β signaling, which mimics the effects of downregulating Tgfbr1 and Tgfb2 [37, 40]. An immunostaining assay for Rex-1, a specific pluripotency stem cell marker [41], was used to screen for iPS-like cells. We introduced all five genes (Smad7, Nanog, Nr6a1, Zfp371, and Zfp459) into the OKM cells by retroviral transduction (OKM-5F), which generated a much higher number of Rex-1-positive colonies compared with MEFs directly transduced with the four reprogramming factors (OSKM) or OKM cells transduced with Sox2 (OKM-S; Fig. 5A). Thus, one or a combination of genes in these five factors was sufficient to replace Sox2 in the reprogramming of OKM cells into iPSCs. To narrow down the possible key factors, we repeated the negative selection method established by Takahashi and Yamanaka [32]. We examined the effect of withdrawal of individual factors from the pool of these five genes on the formation of Rex-1-positive colonies, and only the withdrawal of Smad7 (−Smad7) resulted in no colony formation 3 weeks after transduction (Fig. 5A). This finding suggested that Smad7 is the key factor able to replace Sox2 in the reprogramming process and also demonstrated that the intermediate-phase cells were at a stable stage, rather than simply reprogramming at a slow pace. To further confirm these results, we infected OKM cells with only Smad7. Rex-1-positive colonies were obtained, as expected, although the number was fewer than that from the OKM cells transduced with all five factors. The pluripotency of colonies derived from OKM cells transduced with only Smad7 was examined further in two cell lines: OKM-Smad7-SC16 and OKM-Smad7-SC24. Similar to ESCs, endogenous pluripotent markers such as Oct4, Sox2, and Nanog were expressed in these cells (Fig. 5B). In the EB assay coupled with RT-PCR and immunostaining, markers for all three germ layers were detected (Fig. 5C, 5D). Thus, using a simple microarray assay, we identified the signaling pathway downstream of Sox2 that is essential to the second phase of iPSC induction: downregulation of TGF-β signaling, a role that is notably different from the function of TGF-β signaling in the initial phase of iPSC induction [37, 38, 42].

Figure 5.

TGF-β pathway as the critical Sox2 signaling pathway during reprogramming. (A): Statistic results of Rex-1-positive colonies obtained from three independent reprogramming screens. Data are means ± SEM (n = 3). (B): RT-PCR analysis of embryonic stem cell molecular markers for OKM-Smad7 induced pluripotent stem cells (iPSCs). (C): RT-PCR analysis of markers for three germ layers in embryoid bodies formed by OKM-Smad7 iPSCs. AFP: endoderm; FGF5: ectoderm; Brachyury: mesoderm. (D): Immunoflorescence staining for markers of three germ layers in cells derived from OKM-Smad7 iPSCs. Gata4: endoderm; βIII-tubulin: ectoderm; α-smooth muscle actin: mesoderm. Scale bar = 20 μm. Abbreviations: GAPDH, glyceraldehyde-3-phosphate dehydrogenase; MEF, mouse embryonic fibroblast; OKM, Oct4, Klf4, and c-Myc; OSKM, Oct4, Sox2, Klf4, and c-Myc.


Our studies have defined a two-phase iPSC induction process, demonstrating that transduction of Oct4, Klf4, and c-Myc, but not Sox2 can lead to a stable intermediate phase. In addition, we found that the cells at this stable phase can be induced to form iPSCs through overexpression of Sox2. The clear delineation of this stepwise iPSC induction provided us with a unique opportunity to identify the genes downstream of Sox2 during iPSC induction (Fig. 6). This identification, in turn, allowed us to discover that the downregulation of TGF-β pathways by Sox2 is critical in conversion from the intermediate phase and initiation of pluripotency in somatic cells. Although the cells we used to carry out this two-phase induction of iPSCs, MEFs, are differentiated cells, they express genes involved in the reprogramming process, such as Klf4 and c-Myc. Previous studies have identified possible intermediate cells derived from other cell types, such as B cells [20, 22], although these intermediate cells were not characterized in detail. Thus, this two-phase induction process may be feasible for many other cell types.

Figure 6.

Molecular network of Sox2 induced reprogramming process. (A): Schematic representation of the two-phase reprogramming process. (B): Schematic representation of Sox2 downstream genes important for the second reprogramming phase. The green arrows are promotion and red arrows are inhibition.

Despite significant attention from many researchers, the exact mechanisms by which Yamanaka factors lead to iPSC induction have remained difficult to dissect [20, 43, 44]. Among the four factors, c-Myc is not necessary for the reprogramming of somatic cells into iPSCs [45, 46]. Furthermore, recent studies have shown that both mouse and human neural stem cells can be reprogrammed to iPSCs with Oct4 and Klf4, or Oct4 alone [47–49]. However, neural stem cells express endogenous Sox2, suggesting that Sox2 and Oct4 are sufficient for iPSC induction, while Klf4 and c-Myc may act to enhance the efficiency of this induction. Consistent with this assumption, human iPSCs were generated from fibroblasts through the overexpression of only Oct4 and Sox2 plus valproic acid [27]. Although the key factors have been identified, the mechanisms by which Oct4 and Sox2 reprogram somatic cells into iPS cells remain unknown.

Using an elegant transgenic system with doxycycline-inducible expression of the four Yamanaka factors in MEFs, several studies have defined the sequential expression of pluripotency markers during iPSC induction [28–30]; these studies have also revealed the stochastic nature of this reprogramming process, indicating that cells between the initial MEF cells and final iPSCs represent highly heterogeneous reprogramming populations that complicate epigenetic and gene expression analyses. In this study, we have synchronized these heterogeneous reprogramming cells by establishing a stable intermediate phase, providing a means for temporal studies of gene expression. We have identified immediate (within 12 hours), intermediate (within 48 hours), and late (after 96 hours) downstream genes regulated by Sox2. Sox2 most likely exerts direct control over expression of the immediate downstream genes (Dnmt3b and Zfp459), while expression of intermediate and late downstream genes is likely indirectly regulated by Sox2 or regulated by Sox2 in cooperation with other transcription factors. In support of this view, chromatin immunoprecipitation–seq studies show that Sox2 can bind directly to the promoter of Dnmt3b [50, 51], and a dramatic decrease of Zfp459 expression is observed after Sox2 knockout [52]. Our data also reveal that the methylation process begins immediately after Sox2 overexpression, while the induction of endogenous Nanog begins much later. Although loss of DNA methylation at the late reprogramming phase, such as that on that on Oct4 and Nanog promoters, is well established [22], our study suggests a role for Sox2 in the increase of DNA methylation to suppress the expression of differentiation genes at the second phase of reprogramming. This increase of DNA methylation in specific genes could be at least partially due to Sox2-induced upregulation of Dnmt3b.

The important role of Sox2 in the second phase of reprogramming is also consistent with the main finding of a recent study that has identified a small molecule (RepSox) able to replace Sox2 for reprogramming [24]. In agreement with our work, this study showed that inhibition of TGF-β signaling is critical to RepSox's replacement of Sox2 in iPSC induction. One main difference between this study and our own is our inability to replace Sox2 with Nanog. Although Nanog was able to replace Sox2 in their work [24], cells in the colonies obtained by Nanog overexpression showed limited differentiation ability. Our results are in line with recent findings showing late induction of Nanog, even after overexpression of all four Yamanaka factors [38]. In addition, previous studies have reported that Sox2 function during reprogramming cannot be replaced by Nanog [11, 32]. Therefore, Nanog plays a minimal role, if any, in the reprogramming initiation phase.

Although our data has clearly shown that downregulation of TGF-β signaling is critical for Sox2's induction of iPSCs in the second phase of reprogramming, three recent studies have indicated an important role for TGF-β or bone morphogenetic protein signaling in the early phases of reprogramming [37, 38, 42]. One explanation for these phase distinctions could lie in the different experimental approaches used: Sox2 overexpression in combination with other Yamanaka factors defined the beginning phase in their studies, whereas we overexpressed Sox2 in cells that were taken to an intermediate phase through initial treatment with the other Yamanaka factors. Our data are consistent with the ability of RepSox or Alk5 inhibitors to replace Sox2 at the late induction phase [24, 42], further supporting the necessity of downregulating TGF-β signaling at this phase. Therefore, another explanation for this disparity between their findings and ours is the possibility that TGF-β signaling functions at both early and late phases of iPSC inductions.


Identifying the downstream genes for Yamanaka factors during iPSC induction is a critical first step in the process of understanding how these transcriptional regulatory factors reverse the unidirectional differentiation pathway that occurs naturally during development. Our results suggest that a two-phase iPSC induction process can provide valuable insight into the cellular reprogramming directed by Yamanaka factors. Using this method, we have identified downstream genes regulated by Sox2 in the second reprogramming phase and have demonstrated the importance of Sox2-regulated TGF-β signaling factors in this phase. Similar approaches can be applied to reveal the role of Oct4 in the first phase of the reprogramming and further elucidate the underpinnings of this important reprogramming by Yamanaka factors.


We thank Marius Wernig and Rudolf Jaenisch for providing O9 and TT-O25 iPSC lines; Seth Crosby at the Genome Technology Access Center of Washington University for the gene expression microarray analysis; Mingjie Li at the P30 Neuroscience Blueprint Viral Vector Core for viral production; June Ho Shin for the initial characterization of iPSCs; Drew Michael for data mining; Jing Zhao for the generation of chimeras, and Gerald Schatten, Peter Andrews, and Matthew Holley for critical review of the manuscript. J.B. was supported by National Institutes of Health grants R01-AG024250 and R21-DC010489 and by the Model Animal Research Center of Nanjing University.


The authors indicate no potential conflicts of interest.