Stem Cell and Developmental biology, Genome Institute of Singapore, Singapore
School of Biological Sciences, Nanyang Technological University, Singapore
Department of Biological Sciences, National University of Singapore, Singapore
Correspondence: Lawrence W. Stanton, Ph.D., Genome Institute of Singapore, 60 Biopolis Street #02-01, 138672, Singapore. Telephone: +65-68088006; Fax: +65-68088291; e-mail: firstname.lastname@example.org; or Prasanna R Kolatkar, Ph.D., Qatar Biomedical Research Institute, P.O. Box 5825, Doha, Qatar. Telephone: +974 4454 5889; Fax: +65–68088291; e-mail: email@example.com
Author contributions: I.A.: conception and design, collection and/or assembly of data, data analysis and interpretation, manuscript writing, and final approval of manuscript; R.J.: conception and design, manuscript writing, and final approval of manuscript; V.E., A.C.W.B., J.C., U.D., and C.K.L.N.: collection of data; P.K. and L.W.S.: financial support, administrative support, data analysis and interpretation, manuscript writing, final approval of manuscript. I.A. and R.J. contributed equally to this article.
The unique ability of Sox2 to cooperate with Oct4 at selective binding sites in the genome is critical for reprogramming somatic cells into induced pluripotent stem cells (iPSCs). We have recently demonstrated that Sox17 can be converted into a reprogramming factor by alteration of a single amino acid (Sox17EK) within its DNA binding HMG domain. Here we expanded this study by introducing analogous mutations to 10 other Sox proteins and interrogated the role of N-and C-termini on the reprogramming efficiency. We found that point-mutated Sox7 and Sox17 can convert human and mouse fibroblasts into iPSCs, but Sox4, Sox5, Sox6, Sox8, Sox9, Sox11, Sox12, Sox13, and Sox18 cannot. Next we studied regions outside the HMG domain and found that the C-terminal transactivation domain of Sox17 and Sox7 enhances the potency of Sox2 in iPSC assays and confers weak reprogramming potential to the otherwise inactive Sox4EK and Sox18EK proteins. These results suggest that the glutamate (E) to lysine (K) mutation in the HMG domain is necessary but insufficient to swap the function of Sox factors. Moreover, the HMG domain alone fused to the VP16 transactivation domain is able to induce reprogramming, albeit at low efficiency. By molecular dissection of the C-terminus of Sox17, we found that the β-catenin interaction region contributes to the enhanced reprogramming efficiency of Sox17EK. To mechanistically understand the enhanced reprogramming potential of Sox17EK, we analyzed ChIP-sequencing and expression data and identified a subset of candidate genes specifically regulated by Sox17EK and not by Sox2. Stem Cells2013;31:2632–2646
Induced pluripotent stem cells (iPSCs) can be generated by direct reprogramming of somatic cells via ectopic expression of defined transcription factors (TFs). Oct4, Sox2, c-Myc, and Klf4 were the first combination of factors shown to induce reprogramming . c-Myc and Klf4 can be omitted or replaced without drastic loss of the reprogramming efficiency [2-5]. However, the Sox2/Oct4 pair appears fundamentally important; without these two factors reprogramming can only be achieved with very low efficiency [4-8]. Sox2 and Oct4 belong to the Sox and POU (Pit-Oct-Unc) families of TFs, respectively [9-12]. Members of these two families have been shown to interact with each other and play key regulatory roles in embryonic development and directing cellular fate [13-22].
The SRY-related high-mobility-group box (Sox) TFs were identified and named based on their homology to Sry . They are widely recognized as one of the most important classes of TFs involved in embryonic development and cellular fate determination . In mice and humans, there are 20 Sox genes, classified into 8 different groups (SoxA–SoxH) based on their amino acid sequence homologies [9, 12, 16, 25]. All Sox TFs are composed of an N-terminal region, a 79-amino acid high-mobility group (HMG)-box domain, and a C-terminus containing either a transactivation or transrepression function. Sox members belonging to the same group have a high degree of identity (∼70%–95%) both within and outside the HMG-box domain. In contrast, Sox proteins from different groups shares only partial identity (≥46%) in the HMG-box domain . All Sox proteins recognize and bind a related DNA sequence motif similar to CTTTG [26, 27]. Unconventionally, the HMG domain binds the minor groove of the DNA and induces a kink in the DNA helix [28-31]. Besides its role in DNA binding and bending, the HMG domain also mediates protein–protein interaction [17, 30, 32] and nuclear localization . Because Sox TFs have similar DNA binding specificities, their ability to trigger specific biological processes is thought to be mediated by selective interactions with cofactors. Indeed, Sox2 cooperates with Oct4 to induce pluripotency [34-37] but also partners with Brn2 in neural development .
In a previous report, we demonstrated that interchanging a single amino acid within the HMG domain of Sox2 (creating the Sox2KE mutant) and Sox17 (creating the Sox17EK mutant) dramatically swapped their biological functions . Using iPSCs induction as an assay for pluripotency capability, wild-type and point-mutated versions of Sox2 and Sox17 showed that the re-engineered Sox17EK can replace Sox2 in reprogramming differentiated cells with high efficiency. Not only did the Sox17EK mutant give rise to iPSCs it also did so at threefold to fivefold greater efficiency than wild-type Sox2. The results demonstrated that the differential abilities of Sox2 and Sox17 to drive cell fates could be swapped by a single amino acid substitution in the DNA binding/Oct interaction domains. We also demonstrated that the genomic redistribution of Sox/Oct4 dimers by alternative partnering with Sox2 and Sox17 is a fundamental regulatory event of pluripotency and endodermal specification as these partnerships involve specific DNA binding sequences: the canonical motif for Oct4/Sox2 and the compressed motif for Oct4/Sox17 [13, 40]. To build upon our previous findings, we sought to examine other members of the Sox family to determine if they too can be converted to reprogramming factors. To better understand how Sox TFs gain specific functions after re-engineering, we further studied the importance of the HMG and C-terminal transactivation domains in the reprogramming context. Moreover, we showed by genome-wide expression analysis in pluripotent embryonic stem cells (ESCs) that the re-engineered Sox17 factor and Sox2 regulate specific subsets of target genes. These results provide additional insights in the reprogramming mechanisms and a molecular dissection of the Sox family.
Materials and Methods
Mutagenesis and Cloning in the pMXs-GW Vector
The Sox library was prepared by introducing full length coding sequences into in the Gateway pDONR221 (Invitrogen, Carlsbad, CA, http://www.invitrogen.com) plasmid by Gateway BP cloning. Amino acid substitutions were introduced using the QuikChange-XL site-directed mutagenesis kit (Stratagene, La Jolla, CA, http://www.stratagene.com) with DNA oligos listed in the Supporting Information Table S1, comprising also the oligos for the chimeric mutants. Sequencing was performed to verify the successful mutagenesis. All wild-type and mutated genes were then cloned into the pMXs-Gateway (pMXs-GW) vector by using the Gateway technology (Gateway LR Clonase Enzyme Mix from Invitrogen) according to the manufacturer's instructions. The pMXs-GW vector has been generated from the pMXs vector by using the Gateway vector conversion system from Invitrogen.
hAd-MSCs (Invitrogen) were grown in Dulbecco's modified Eagle's medium (DMEM)/F12, 10% fetal bovine serum (FBS), 2 mM l-glutamine, 1 × 10−4 M nonessential amino acids, and 10 ng/mL of basic fibroblast growth factor (Invitrogen) on 0.1% gelatin-coated plates. Cells were used for reprogramming after two to four passages. Human iPSC and ESC (H9) were grown in feeder free conditions on Matrigel (BD Biosciences, San Diego, CA, http://www.bdbiosciences.com) in mTESR1 media (Stem Cell Technologies, Vancouver, BC, Canada, http://www.stemcell.com) according to the manufacturer's instructions. Mouse ESCs (E14) and iPSC were grown in DMEM media supplemented with 15% of ES-FBS (Invitrogen), 2 mM l-glutamine, 1 × 10−4 M nonessential amino acids, 1 × 10−4 M 2-mercaptoethanol (Invitrogen), and 0.5% penicillin and streptomycin, and 1,000 U/mL of LIF.
MEFs were reprogrammed to iPSC by retroviral transduction of TFs following established protocols . Human Ad-MSCs were reprogrammed to iPSC by retroviral transduction of TFs following established protocol . Methods are described in details in Supporting Information Method.
Expression of marker genes by iPSC clones was performed by quantitative RT-PCR (qRT-PCR). Total RNA was extracted in Trizol (Invitrogen) and purified using the RNeasy Mini Kit with DNaseI treatment (Qiagen, Hilden, Germany, http://www1.qiagen.com) to remove contaminating genomic DNA. Synthesis of cDNA was performed using the High Capacity cDNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA, http://www.appliedbiosystems.com). qRT-PCR was performed using specific primers (listed in Supporting Information Table S2) and SYBR Green mix on the ABI 7900HT Real-Time PCR System.
Cells were plated on ethanol-treated coverslips and fixed with 4% paraformaldehyde in phosphate-buffered saline at 4°C and incubated with Tris buffer saline containing 5% of serum, 1% bovine serum albumin (Sigma-Aldrich, St. Louis, http://www.sigmaaldrich.com), and 0.1% of Triton X-100 (Sigma-Aldrich) for 45 minutes at room temperature. Primary antibodies against mouse SSEA-1 (MAB4301, Millipore, Billerica, MA, http://www.millipore.com) and Nanog (SC-30328, Santa Cruz Biotechnology Inc., Santa Cruz, CA, http://www.scbt.com) and human OCT3/4 (SC-5279), NANOG (SC-30331), SSEA4 (MAB4304), and TRA1–60 (MAB4360) were incubated with fixed cells overnight at 4°C, and cells were subsequently stained with secondary antibody conjugated to Alexa Fluor 594 or 488 (Molecular Probes, Eugene, OR, http://probes.invitrogen.com) for 1 hour at room temperature in the dark. Images were captured using a ZEISS AxioObservor DI inverted fluorescence microscope (Carl Zeiss International, Jena, Germany, http://www.zeiss.com).
Detection of Alkaline Phosphatase Activity
Dishes were fixed in methanol for 15 minutes and then stained for 15 minutes with a solution containing 1 mg/mL Fast Red TR salt (Sigma-Aldrich) dissolved in 0.1 M Tris, pH 9.2, containing 200 mg/mL Naphtol AS-MX phosphate (Sigma-Aldrich).
1 × 106 cells from each iPSC clones were injected intramuscularly into SCID mice. Three to five weeks after injection, tumors were dissected, weighted, fixed at room temperature for 48 hours in Bouin's fixing solution followed by bleaching in ammonium hydroxide in 70% ethanol overnight. Four micron sections were obtained by using an automatic tissue processor and embedded in paraffin. Dried slides were then stained by Mallory's tetrachrome solutions, washed in ethanol, and mounted with Depex.
Generation of Chimeric Embryos
Chimeric embryos have been generated using a protocol by Kraus and colleagues . Oct4-GFP iPSC were first infected with a lentiviral vector (H2B-mCherry from Addgene, Mark Mercola lab) constitutively expressing the fluorescent protein mCherry. The virus was produced using the ViraPower lentiviral packaging mix from Invitrogen (K4975-00) according to the manufacturer's instructions. For infection, iPSC were plated at a density of 50,000 cells in six-well plates with 5 mL of culture supernatant from 293FT virus-producer cells. After 72 hours, cells expressing mCherry were sorted by using BD FACSAria sorter (Beckton Dickinson, Franklin Lakes, NJ, http://www.bd.com), and plated on inactivated CF1 MEFs.
Generation of ESCs Stably Expressiong Sox Factors
E14 mouse ESCs were cultured under feeder free conditions in DMEM supplemented with 15% FBS (ES qualified, Invitrogen), 0.055 mM β-mercaptoethanol, 2 mM l-glutamine, 0.1 mM nonessential amino acid, 5,000 units/mL penicillin/streptomycin, and 1,000 U/mL of LIF. The cells were maintained at 37°C with 5% CO2. They were infected with the CSII-EFMCS-IRES-Venus lentiviral vector expressing Sox2, Sox2KE, Sox7, Sox7, Sox7EK, Sox17, and Sox17EK. Lentiviral vectors were produced using the ViraPower lentiviral packaging mix from Invitrogen (K4975-00) according to the manufacturer's instructions. For infection, E14 cells were plated at a density of 50,000 cells in six-well plates with 5 mL of culture supernatant from 293FT cells. After 72 hours, cells expressing Venus were sorted by using BD FACSAria sorter (Beckton Dickinson), plated on inactivated CF1 MEFs for one passage, and expanded on gelatin-coated dishes for the self-renewal test gene expression analysis.
Microarray Hybridization and Data Analysis
Mouse Ref-8 v2.0 Expression BeadChip microarrays (Illumina) were used for genome-wide expression analysis of coding genes. For hybridization on the Illumina arrays, cRNAs, from duplicates or triplicates for each sample, were synthesized and labeled using TotalPrep RNA Amplification Kit (Ambion, Austin, TX, http://www.ambion.com), following the manufacturer's instructions. Scanned data from the BeadChip raw files for all samples were retrieved and background corrected using BeadStudio (Illumina), and subsequent analyses were completed in GeneSpring GX (Agilent, Palo Alto, CA, http://www.agilent.com). Data were normalized both within and between arrays and corrected for multiple testing according to Benjamini-Hochberg. We defined genes as significantly differentially expressed when the FDR is <0.05. Gene expression data from Schulz et al.  were extracted from the EMBL EBI Array express database using the following link: http://www.ebi.ac.uk/arrayexpress/experiments/E-TABM-672 and plotted using the raw data files.
ESCs were plated at a density of 1,000 cells per 10 cm culture dish and grown for 7 days in ES media containing different concentration of LIF (100%, 60%, 20%, and 0%). Cells were then fixed and stained for alkaline phosphatase. Undifferentiated, mixed, and differentiated colonies were then calculated under a bright-field microscope.
ChIP-sequencing (Chromatin immunoprecipitation followed by sequencing) data were analyzed using published datasets (GEO (Gene Expression Omnibus) accession GSE43275). β-Catenin sites were defined by intersecting GSM1065517 (FLAG ChIP) and GSM1065518 (Biotin ChIP) using intersectBed. Sox2, Sox17EK, and Oct4 sites were defined as described in  with some modifications. Peaks were retained when MACS (Model-based Analysis for ChIP-Seq)  summits were found within 200 bp in replicate ChIPSeq experiments as analyzed using bedtools  (windowBed-w 200). Sox/Oct4 cobinding was assumed when MACS summits were within 200 bp and the Venn diagram and gene category definitions were generated using a 100-bp summit distance criterion (windowBed-w 100 with -u or -v options). Integration of ChIPseq and microarray data was performed using glbase (https://bitbucket.org/oaxiom/glbase/wiki/Home). Genes were considered bound if a MACS summit was called within 50 kb of a RefSeq transcription start site. Intersection of gene lists was done using the glbase map method and RefSeq gene symbols.
Point-Mutated Sox7 Is Able to Generate iPSCs with High Efficiency
We have previously shown that Sox17 can be re-engineered into a highly efficient pluripotency factor by replacing glutamate (E) at position 56 by lysine (K), which is found at the equivalent position in Sox2 . Given these results, we sought to determine whether other Sox factors could be similarly re-engineered in their HMG domains to impart new biological functions. The Sox family of TFs comprises 20 members divided into 9 groups, which are based on homologies within their HMG domains  (Fig. 1A). We selected representative members of all major Sox groups with distinguishing amino acids at the Oct4 interaction interface for this experiment. The SoxF group (Sox7, Sox17, and Sox18), the SoxE group (Sox8, Sox9, and Sox10), as well as the SoxC group (Sox4, Sox11, and Sox12) all harbor an acidic glutamate (E) at position 56; these were mutated into a lysine (K) to mimic the interaction chemistry of the Sox2 HMG domain (Fig. 1A). The SoxD group proteins (Sox5, Sox6, and Sox13) exhibit a dipeptide variation at positions 55 and 56 within the Oct4 recognition helix; glutamine-alanine (QA) instead of AK found in Sox2. To thoroughly explore these residues, we generated single amino acid substitutions (A to K) as well as double mutants (QA to AK) (Fig. 1A). We did not test the other members of the SoxB1 group, Sox1 and Sox3, as they harbor a K at position 56 like Sox2 (Fig. 1A), and both of these factors are able to replace Sox2 for reprogramming, although at a lower efficiency . Wild-type and the rationally engineered Sox variants were introduced into retroviral vectors and tested for their ability to reprogram mouse embryo fibroblasts (MEFs), which express a green fluorescent protein (GFP) reporter controlled by an Oct4 promoter. In three independent reprogramming experiments, we found that none of the wild-type Sox factors were able to replace Sox2 as a pluripotency inducer (Fig. 1B). The Sox factors containing mutations in the HMG domains were also tested. It was found that grafting Sox2-like residues into the Oct4 interaction surface did not confer to SoxC, D, and E family members the ability to generate iPSCs (Fig. 1B). However, Sox7, like Sox17 a member of the SoxF group, was successfully re-engineered. Moreover, Sox7EK produced five to seven times more GFP+ colonies than Sox2, and 1.5 times more than Sox17EK (Fig. 1B). We confirmed that the high efficiency of Sox7EK to generate iPSCs was not due to any variability in virus production and verified the silencing of the transgenes (Supporting Information Figs. S1, S2). A number of tests confirmed that the GFP+ colonies from Sox7EK were iPSCs: (a) the Sox7EK colonies were alkaline phosphatase positive (Supporting Information Fig. S3) and (b) real-time polymerase chain reaction (RT-PCR) analysis and immunostainings revealed that Sox7EK-derived colonies robustly express the pluripotency markers Nanog, Eras, Utf1, Zfp42, and SSEA1 (Fig. 1C, 1D). Pluripotency was further established by transplantation of Sox7EK iPSCs that express mCherry reporters into mouse embryos at the 2–4 cell stage. The resulting chimeric embryos established that Sox7EK iPSCs contributed to tissues derived from all three germ layers (Fig. 1E). Collectively, these data indicate that only the F-group proteins Sox7 and Sox17 can be converted into reprogramming factors with strategically placed mutations in the HMG domain that mediates Oct4 interaction.
Reprogramming of Human Cells by Sox7EK and Sox17EK
Sox TFs are highly conserved between species, so we tested if Sox7EK and Sox17EK could also generate iPSCs from human somatic cells. The mouse Sox variants were used for this experiment, and the reprogramming assay was conducted with human adipose tissue-derived mesenchymal stem cells (hAd-MSCs). In two independent iPSCs assays, we found that Sox7EK and Sox17EK accelerated the reprogramming of hAd-MSCs as compared to the original factor, Sox2. When reprogrammed with Sox7EK or Sox17EK, the first ES-like colonies appeared at day 8 after transduction, whereas they appeared not until day 14 for Sox2 (Fig. 2A). Moreover, Sox7EK and Sox17EK produced 10–50 times more pluripotent colonies than wild-type Sox2 (Fig. 2A, 2B), as indicated by the number of alkaline phosphatase positive colonies. Human Sox7EK and Sox17EK iPSCs clones expressed pluripotency markers at levels comparable to human ESCs (Fig. 2C, 2D). To determine the differentiation ability of human iPSCs in vitro, the cells were aggregated into embryoid bodies (EBs), and after 8 days in suspension culture the EBs were transferred to gelatin-coated plates for another 8 days. RT-PCR confirmed that these differentiated cells expressed markers of the three germ layers: AFP (α-fetoprotein) and hepatocyte nuclear factor 4α for the endoderm; Sox1 and glial fibrillary acidic protein for the ectoderm; and desmin and vimentin for the mesoderm lineages (Fig. 2E). Immunocytochemistry detected cells that were positive for βIII-tubulin (ectodermal marker), α-smooth muscle actin and vimentin (mesodermal markers), and AFP (endodermal marker) (Fig. 2F). Collectively, these findings establish that Sox7EK and Sox17EK can replace Sox2 to efficiently convert human somatic cells into iPSCs with faster reprogramming kinetics and higher yields.
The C-Terminal Domain of Sox7 and Sox17 Enhances the Efficiency of iPSCs Generation
We showed that a single amino acid substitution in the HMG domain of Sox7, and Sox17 convert these endoderm-promoting factors into reprogramming factors. The HMG domains of Sox TFs play a role in protein–protein interactions, nuclear localization, and direct recruitment to cis-regulatory elements in the genome. Given its functional importance, we tested whether the HMG domain was sufficient to induce reprogramming. The wild-type and mutant HMG domains of Sox2, Sox7, and Sox17 were fused to VP16, a potent transactivation domain from the herpes simplex virus, and tested for their effects on reprogramming (Fig. 3A). HMG-VP16 fusion with Sox2, Sox7EK, and Sox17EK, in co-operation with OCK, induced reprogramming (Fig. 3A) in contrast to Sox2KE, Sox7, and Sox17 HMG-VP16 constructs. However, the number of GFP positive colonies obtained was much lower compared to the respective full length proteins and the efficiency was similar in all three conditions. Only an average of 16 colonies was obtained for Sox2, 14 for Sox7EK, and 17 for Sox17EK HMG domains. These results indicate that Sox HMG domains are necessary but other portions of the Sox TFs likely contribute to their potent reprogramming activity.
While the amino acid sequences of Sox factors are highly conserved within the HMG domain, their sequences diverge considerably within the amino- (N) and carboxyl (C)-terminal regions . To explore how the termini of Sox2 and Sox17 influence the ability of these proteins to induce pluripotency, the N- and/or C-terminal regions of Sox2 and Sox17 were swapped to produce six chimeric proteins (Fig. 3B). These chimeric Sox factors were evaluated for reprogramming activity. Two independent iPSCs experiments were done using Oct4-GFP MEFs and GFP+ colonies were counted on day 21 (Fig. 3B). We found that neither the N- nor the C-terminus of Sox2 could induce pluripotency in the context of the wild-type Sox17-HMG domain (constructs Sox17N2, Sox17C2, and Sox17NC2). By contrast, the chimeras containing the N- or C- termini of Sox17 in conjunction with the Sox2 HMG domain produced sevenfold (Sox2C17) and fivefold (Sox2NC17) more colonies than wild-type Sox2 (Fig. 3B). The Sox2N17 chimera generated iPSCs colonies at similar levels to wild-type Sox2 further supporting the importance of the Sox17 C-terminal region for the enhanced reprogramming efficiency. Since Sox7EK was also able to generate iPSCs with high efficiency, we tested chimeric constructs containing the Sox2 HMG domain harboring the Sox7 N-terminus (Sox2N7), C-terminus (Sox2C7), both N-and C- termini (Sox2NC7) and observed an increase in the reprogramming efficiency compared to Sox2 (Fig. 3C). However, the Sox7 C-terminus was less potent than the Sox17 C-terminus in generating iPSCs, and we therefore decided to focus on Sox17 for further analysis.
Given that exchanging the highly divergent termini between Sox2 and Sox17 did not qualitatively affect their ability to induce pluripotency, these results highlight the importance of subtle variations within the amino acid sequence of the Oct4 recognition helix of the otherwise conserved HMG domain. Nevertheless, the C-terminal regions do impact on the quantity of iPSCs colonies produced, and the C-terminus of Sox17 appears to be more efficient. By contrast, the N-terminus of Sox17 is weakly detrimental to iPSCs formation. More importantly, the HMG domain alone, even though at very low efficiency compared to full length and chimeric constructs, is by itself sufficient to induce reprogramming.
To further investigate the role of the C-terminus of Sox17, additional chimeric molecules were constructed harboring the HMG domain of Sox factors for which the single amino-acid mutation was insufficient to convert them into a reprogramming factor. Selected for this analysis were Sox18, the only F group Sox that could not be converted into a reprogramming factor with the E to K substitution, and the C group member, Sox4. We replaced the C termini of Sox18 and Sox4 by the corresponding regions of Sox2 or Sox17 (Fig. 3D). None of the constructs containing the wild-type Sox4 or Sox18 HMG domain gave rise to GFP+ colonies in the presence of the C-terminal region of Sox2 or Sox17 (Fig. 3D). However, when the C-terminus of Sox17 was attached to the point-mutated HMG domains (Sox18EKC17 and Sox4EKC17) iPSCs colonies were produced, albeit with low efficiency. By contrast, the C-terminus of Sox2 was not able to transform Sox18EK or Sox4EK into pluripotency reprogramming factors (Fig. 3D).
These results indicate that the EK mutation is necessary but insufficient to convert Sox18 and Sox4 into pluripotency reprogramming factors. However, a combination of the E to K mutation and the potently transactivating C-terminal region of Sox17 bestow reprogramming activity of Sox4 and Sox18.
Sox17 Interaction Site with β-Catenin Promotes Reprogramming
Next we sought to determine which portion of the C-terminal region of Sox17 promotes efficient reprogramming. It has been reported that 9aa [DKTEFEQYL] within the Sox17 C-terminal domain is involved in the transcriptional regulation of its target genes . Specifically, this region mediates the interaction of Sox17 with β-catenin, an effector of the canonical Wnt signaling pathway, to induce transcriptional activation . We tested in a reprogramming assay the function of this region by generating Sox17EK-ΔTAD, a mutant where the C-terminal portion starting from the 9aa β-catenin interaction site was deleted (Fig. 4A) [41, 43]. We observed a substantial decrease (∼10-fold) in the number of iPSC colonies generated with the Sox17EK-ΔTAD mutant compared to Sox17EK (Fig. 4B). Moreover, two to five times fewer GFP positive colonies were generated, as compared to wild-type Sox2. Another mutant (Sox17ΔN-HMG) harboring an intact C-terminal domain but missing the N-terminus and portion of the HMG domain did not give rise to any colonies (Fig. 4B).
To further test if the interaction of Sox17EK with β-atenin enhances reprogramming, we generated Sox17EK-3G and Sox17EK-ΔTA mutants that were previously reported to impair the interaction of Sox17 with β-catenin . In the Sox17EK-3G construct, we mutated the [EQY] sequence into [GGG] and for Sox17EK-ΔTA, we deleted the [EFEQYL] sequence from the β-catenin recognition motif (Fig. 4A). Both mutants markedly decreased the potential of Sox17EK to induce reprogramming compared to Sox2 (approximately 10 times less for Sox17EK-ΔTA and 27 times less for Sox17EK-3G) and Sox17EK (approximately 7 times less for Sox17EK-ΔTA and 20 times less for Sox17EK-3G) (Fig. 4B). As a further test if Wnt signaling contributes to the reprogramming potential of Sox17EK, we specifically inhibited β-catenin-mediated transcription by using XAV-939, an inhibitor of Wnt signaling  and found that inhibition of Wnt signaling strongly interferes with Sox17EK-mediated induction of pluripotency (Fig. 4B).
To test whether Sox17EK collaborates with β-catenin to execute a specific gene expression program, we compared our recently published Sox2/Oct4Sox2, Sox17EK/Oct4Sox17EK ChiP-seq (Chromatin immunoprecipitation followed by sequencing) data generated in mouse ESCs  with β-catenin binding sites detected by Zhang et al. . We found a large number of intersecting sites that account for 38% of all Sox17EK/Oct4Sox17EK binding sites (Fig. 4C). A subset of those sites is exclusively bound by β-catenin and Sox17EK/Oct4Sox17EK raising the possibility that Sox17EK and β-catenin but not Sox2 regulate those genes (Fig. 4C). These exclusive binding events could play a role in the more potent induction of pluripotency of Sox17EK as compared to Sox2. It is possible that Sox17EK enhances β-catenin recruitment to those sites. Alternatively, Sox17EK could install β-catenin binding at novel sites. Yet, the latter would only be detectable if a β-catenin ChIP-seq is carried out after Sox17EK overexpression.
It was not surprising to see common binding sites between Sox2 and β-catenin as it has been shown that a combinatorial binding pattern exist between β-catenin and the core pluripotency factors that include Sox2 and Oct4 . We next asked if Sox17EK binds and regulates target genes of the Wnt/β-catenin pathway. For this, we used gene expression data generated after Tcf3 knockdown in ESCs . Tcf3 is a downstream effector of the Wnt/β-catenin pathway and has been shown to predominantly repress Wnt target genes . Downregulation of Tcf3 in ESCs has been shown to increase the expression of pluripotency genes including Nanog, Oct4, Sox2, Lefty2, Nodal, and Dkk1 . Therefore, to assess whether selective genomic binding by Sox17EK leads to the regulation of genes regulated by the Wnt/β-catenin pathway we used our previously generated Sox17EK ChIP-seq data and assigned genes to binding sites if they map within 50 kb at the ChIP-seq summits. We compared the genome-wide occupancy of Sox17EK and also Sox2 to the gene expression differences induced after Tcf3 knockdown. Differentially expressed genes were ranked by log2 transformed fold expression changes and scanned for the occurrence of Sox2 and Sox17EK binding sites (Fig. 4D, 4E). A significant enrichment was observed for both Sox2 and Sox17EK near the TSS (transcription start site) of genes upregulated after Tcf3 knockdown suggesting that binding correlates with changes in mRNA regulation (Fig. 4D, 4E). Interestingly, the occurrence for Sox17EK seems to be higher than Sox2 for the most highly upregulated genes. Collectively, these data suggest that β-catenin interaction mediated by the C-terminus contributes to the potent reprogramming potential of Sox17EK.
Sox7EK and Sox17EK Confer Resistance to Leukemia Inhibitory Factor Deprivation to ESCs
Given that a single amino acid substitution in the HMG domain of Sox7 and Sox17 convert these factors into reprogramming factors that are more efficient than the original Sox2 factor, we asked whether overexpression of Sox7EK and Sox17EK in ESCs would allow them to overcome spontaneous differentiation induced by leukemia inhibitory factor (LIF) deprivation. ESCs overexpressing Sox2, Sox7EK, Sox17EK, Sox2KE, Sox7, and Sox17 were generated by infection with CS2-Sox2/Sox7EK/Sox17EK/Sox2KE/Sox7 and Sox17 lentiviral vectors, respectively. In order to avoid differentiation due to elevated Sox2 expression levels , we selected a pure population of ESCs expressing exogenous Sox2 at the same level as the endogenous Sox2 (Supporting Information Fig. S4). Likewise, we controlled that the other exogenous Sox factors were expressed at comparable levels as Sox2 (Supporting Information Fig. S5). Engineered ESCs were then plated at clonal density and further cultured for 7 days in the presence of gradually decreasing LIF concentrations: LIF100%, LIF60%, LIF20%, and LIF0%. We classified the resulting colonies as undifferentiated, mixed, and differentiated (Fig. 5A) to quantitatively assess to what degree the various Sox factors confer LIF resistance to the cells . We found that at the LIF100% condition control, Sox2, Sox7EK, and Sox17EK displayed similar proportions of undifferentiated colonies (control, 76.9%; Sox2, 82.3%; Sox7EK, 73.2%; and Sox17EK, 83.5%), mixed colonies (control, 18.8%; Sox2, 16.5%, Sox7EK, 26.5%; and Sox17EK, 13.4%), and differentiated colonies (control, 4.2%, Sox2, 1.1%, Sox7EK, 0.4%; and Sox17EK, 3.1%). A significant decrease in the percentage of undifferentiated colonies was observed for control and Sox2 expressing cells in the LIF60% condition with 43.3% and 59.1%, respectively. In contrast, the percentage of undifferentiated colonies for Sox7EK and Sox17EK cells remained at 79.3% or 72.9%, respectively. Moreover, whereas for the LIF20% condition the control and Sox2 plates had very few undifferentiated colonies (6.9% and 6.4%), 58.8% and 69.5% of Sox7EK and Sox17EK colonies remained undifferentiated. However, if LIF is completely removed, then none of the Sox factors rescued pluripotency. The expression of wild-type Sox7 and Sox17 induced almost complete differentiation of ESCs even at the LIF100% condition (Fig. 5A). Likewise, the Sox2KE mutant also induced differentiation at levels comparable to wild-type Sox7 and Sox17 at the LIF100% condition. These results suggest that Sox7EK and Sox17EK confer a stronger tolerance of ESCs to LIF deprivation than Sox2 and provide further support that they are more potent drivers of pluripotency.
In order to understand how Sox17EK confers an advantage to ESCs self-renewal, we conducted genome-wide expression profiling after Sox17EK and Sox2 overexpression in ESCs (Supporting Information Table S3). We identified 48 genes that were upregulated by both factors and 220 genes that were commonly downregulated (p value <.05, fold-change [mt]1.5) (Fig. 5B, 5C). Next, we focused on genes exclusively regulated by Sox17EK as they might explain the more potent function of Sox17EK in pluripotency induction and self-renewal. There were 91 genes exclusively upregulated by Sox17EK and 117 genes exclusively downregulated. Sox2 was observed to exclusively upregulate 112 genes and downregulate 85 genes (Fig. 5B, 5C).
A useful way to unravel genes that are important for pluripotency of ESCs is to evaluate their expression profile during differentiation . Therefore, the lists of specifically regulated genes was compared to genes previously identified as differentially regulated upon ESCs differentiation induced by EB formation in a time frame of 10 days . We observed that 37% of the genes exclusively upregulated by Sox17EK were downregulated after ESC differentiation into EBs whereas only 8% of them were upregulated (Fig. 5D). In contrast, 33% of Sox2 exclusively upregulated genes were upregulated during ESCs differentiation and only less than 5% were downregulated (Fig. 5E). Moreover, when we analyzed Sox17EK specifically downregulated genes, 38% of them showed an upregulation upon differentiation (Fig. 5F) whereas 38% of the genes downregulated by Sox2 were also downregulated during differentiation (Fig. 5G).
To assess whether selective genomic binding by Sox2 and Sox17EK leads to the regulation of different sets of genes, we used our Sox2 and Sox17EK ChIP-seq data  and assigned genes to binding sites if they map 50 kb near to the ChIP-seq summits. We first compared the genome-wide occupancy of Sox2 and Sox17EK to the gene expression differences induced after Sox2 and Sox17EK overexpression in ESCs. Differentially expressed genes were ranked by log2 transformed fold expression changes and scanned for the occurrence of Sox2 (Fig. 5H) or Sox17EK (Fig. 5I) binding sites. A significant enrichment was observed for Sox2 and Sox17EK near the TSS of genes upregulated after their overexpression suggesting that binding correlates with changes in mRNA synthesis (Fig. 5H–5I). We then intersected lists of genes bound by Sox2 or Sox17EK to identify genes uniquely bound by Sox2 or Sox17EK as well as jointly bound genes (Fig. 5J). Next we asked how the resulting three gene categories (common Sox2/Sox17EK genes, unique Sox2 [“Sox2 only”] genes, unique Sox17EK [“Sox17EK only”]) genes respond when Sox2 or Sox17EK are exogenously overexpressed in mouse ESCs. We found that genes uniquely bound by Sox17EK are more strongly upregulated after Sox17EK overexpression (t test p value = 2.02e − 06) (Fig. 5K). By contrast, genes uniquely bound by Sox2 tend to respond more strongly after Sox2 overexpression (t test p value = 1.04e − 10) (Fig. 5K). These data indicate that Sox2 and Sox17EK bound genes are directly regulated by these factors. Furthermore, there is a subset of differentially bound genes that respond differently to the exogenous introduction of Sox2 versus Sox17EK in mouse ESCs. When we directly compared the expression changes of genes uniquely bound by Sox17EK, we identified several genes that are selectively targeted and regulated by Sox17EK compared to Sox2 as shown by the scatter plot (Fig. 5L). When inspecting the genomic ChIP-seq profiles for some of those differentially bound and regulated genes, we identified some genes bound by Sox17EK/Oct4Sox17EK but not Sox2/Oct4Sox2 dimers (Fig. 5M). The differential regulation of those genes may contribute to the higher potency of Sox17EK to induce pluripotency.
These results show that Sox2 and Sox17EK regulate common genes, which explains why Sox17EK is able to replace Sox2 during the reprogramming process. We also show here that Sox17EK upregulates and downregulates a specific subset of pluripotency and differentiation genes that are unaltered or even inversely regulated by Sox2. These results suggest that these Sox17EK exclusively regulated genes contribute to the more potent induction of pluripotency by Sox17EK.
Sox7EK, Sox17EK, and LIF/STAT3 Signaling in ESCs Self-Renewal and During Reprogramming
We showed that both Sox7EK and Sox17EK overexpression induce LIF resistance to ESCs. In order to better understand the mechanisms underlying this property, we analyzed the expression level of LIF/STAT3 targets that have been identified by Bourillot et al.  after induction of either Sox2, Sox7EK, or Sox17EK (Fig. 6A). Downregulation of these genes by RNA interference has been shown to be detrimental for self-renewal of ESCs. We observed that both Sox7EK and Sox17EK are more efficient to induce the expression of some LIF/STAT3 direct target genes compared to Sox2 including Pim1 and Pim3, Gbx2, Klf4, Ocln, Sgk, Icam1, and Ccrn4l. Pim1 and Pim3 have been shown to play an important role in ESCs self-renewal and their overexpression to delay differentiation of ESC induced by LIF deprivation . Gbx2 has been shown to induce high reprogramming efficiency in collaboration with OCK+Sox2; its downregulation induces differentiation of ESCs , whereas its overexpression allows LIF-independent self-renewal . Similar to Gbx2, Klf4 overexpressing ESCs are able to self-renew in the absence of LIF . These genes might be responsible for the ability of Sox7EK and Sox17EK to overcome LIF deprivation in ESCs.
As both Sox7EK and Sox17EK were more efficient than Sox2 in generating iPSCs, we analyzed the activation kinetics and levels of STAT3 target genes during reprogramming after induction of iPSCs formation with Sox2, Sox7EK, or Sox17EK. We generated these data at two different time points, d4 and d8 after transduction of MEFs with the different cocktails. We observed that many of LIF/STAT3 target genes were induced either before or at higher levels in OCK+Sox7EK and OCK+Sox17EK compared to OCK+Sox2 cells. These genes are Dact1, Sulf1, Zfp36l1, Spry2, Cnnm1, Pim1, Pim3, Klf4, Klf5, Gbx2, Ocln, Sall4, and also STAT3 itself (Fig. 6B). Other LIF/STAT3 target genes including, Ccrn4l, Cyr61, Rgs16, Sgk, Ier3, Smad7, Vim, and Icam1, did not show any significant increase in their expression and we did not observe a differential regulation when comparing OCK+Sox7EK/OCK+Sox17EK with OCK+Sox2 (Supporting Information Fig. S6).
These data indicate that Sox7EK and Sox17EK activate the expression level of LIF/STAT3 genes at higher levels compared to Sox2 in both ESCs and during the reprogramming of MEFs to iPSCs. This differential activation might explain the difference in reprogramming efficiency and resistance to LIF deprivation between the three factors.
The combinatorial action of TFs executes spatially and temporally distinct gene expression programs that are required to trigger cell fate decisions. Yet, how different factor combinations cooperate to selectively target genomic control regions remains only vaguely understood. Oct4, Sox2, and Nanog are at the core of a regulatory network governing ESCs pluripotency [15, 21, 54]. Among this trio, Oct4 and Sox2 have been shown to physically interact and regulate the expression of numerous pluripotency genes [37, 55, 56]. In fact, the Sox2/Oct4 pair is one of the few examples of nonparalogous TFs for which biochemical co-operativity could be established; most other TFs likely co-operate indirectly . Analogously, Oct4 co-operates with Sox17 to induce primitive endoderm differentiation [13, 58]. We recently showed that Sox2 and Sox17 compete for Oct4 and redirect its genomic binding landscape thereby executing contrasting gene expression programs leading to pluripotent versus endodermal phenotypes . We also showed that a specific mutation at the Oct4 recognition helix site of Sox2 and Sox17 gave rise to a fundamental change in the developmental outcomes triggered by these proteins . It was demonstrated that the mutant Sox17EK was able to reprogram MEFs into iPSCs more efficiently than Sox2. These data demonstrate that Sox17EK has gained the ability to specify pluripotency by acquiring the capacity to interact with Oct4 and target selective genes.
In this study, we have extended these investigations to other members of the Sox family and found that, other than Sox17, only Sox7 can be converted into a pluripotency reprogramming factor by generating analogous point mutations. Both Sox7 and Sox17 belong to the SoxF group and have been shown to induce endodermal differentiation when overexpressed in ESCs [59, 60]. This corroborated the idea arising from biochemical assays that the cooperation of Sox2 versus Sox17 with Oct4 underlies their biological uniqueness [13, 39].
It is notable that Sox7EK and Sox17EK accelerate the induction of pluripotency and consistently result in higher colony numbers as compared to wild-type Sox2. This is consistent with our recent observation that Sox17EK co-operates more efficiently than Sox2 with Oct4 on the canonical DNA recognition motif found in many pluripotency enhancers . In addition, attaching the C-terminus of Sox7 or Sox17 to the Sox2 HMG further enhances the reprogramming process. The C-terminus of Sox17 also bestows pluripotentiality upon the otherwise inactive Sox4EK and Sox18EK constructs, whereas the C-terminus of Sox2 does not. An increase in both the rate and efficiency of the reprogramming process is useful for facilitated production of patient-specific stem cells that could be used in regenerative medicine.
In this study, we used the reprogramming technology as an assay to better characterize the functions of specific Sox TFs. This strategy allowed us to highlight the importance of both the HMG domain and the C-terminus of these proteins. We showed that the HMG domain, which is the central component of Sox proteins structure as it mediates DNA binding and bending, Oct4 interaction, and nuclear localization, is sufficient to impart reprogramming functions but at very low efficiency. It is apparent that the N- and/or C-termini, absent in Sox2/Sox7EK/Sox17EK HMG-VP16 fusion constructs, are contributing to full functionality. These observations are in line with other studies showing that the C-terminus of Sox2 is involved not only in the transactivation but also in the selection of specific target genes . Together, these results suggest that both the efficient co-operation with Oct4 mediated by an appropriately shaped HMG domain, as well as an efficient interaction with transactivators mediated by the C-terminus, contribute to the induction of pluripotency. The HMG domain of Sox2 evolved to co-operate strongly with Oct4 on specific enhancers leading to the execution of pluripotency gene expression programs, although its C-terminus is only moderately active. Conversely, Sox17, despite having a very potent transactivation domain, cannot activate pluripotency genes due to its inability to cooperate with Oct4. Our results indicate that cooperation with Oct4 is a necessary condition for pluripotency induction, but it is now also clear that the C-terminus plays an important role in cell fate specifications. Indeed, through refined dissection of the C-terminal region of Sox17. we showed that the domain interacting with β-catenin is critical for high reprogramming efficiency. By analyzing Sox2, Sox17EK, and β-catenin ChIP-sequencing data, we showed overlap between the three TFs and also identified a unique set of genes bound only by Sox17EK/Oct4Sox17EK/β-catenin and not by Sox2/Oct4Sox2/β-catenin. These results might explain the higher reprogramming efficiency of Sox17EK.
Beside their ability to generate iPSCs, we showed that both Sox7EK and Sox17EK were able to confer LIF resistance to ESCs more markedly than Sox2. This led us to analyze LIF/STAT3 target genes regulation by these factors. We showed that Sox7EK and Sox17EK (a) induced the expression of some LIF/STAT3 genes that Sox2 did not and (b) induced, for another set of genes, higher expression levels compared to Sox2. These genes include Pim1, Pim3, Gbx2, and Klf4 that have all been shown to promote the ability of ESCs to self-renew in LIF-restricted conditions and Ocln which is an epithelial marker and whose increased expression has been shown to be important for mesenchymal-to-epithelial transition during reprogramming . Moreover, we show that Sox7EK and Sox17EK induce the expression of STAT3 itself very early during reprogramming compared to Sox2 which can explain the higher reprogramming efficiency. Genome-wide expression analysis showed that Sox17EK and Sox2 regulate common genes, but their difference in reprogramming efficiency may rely on a specific subset of target genes exclusively regulated. It is well known that Sox2, beside its fundamental roles in pluripotency and reprogramming, is also involved in the differentiation process of ESCs into the neural lineage when expressed at high levels. The efficiency of reprogramming in a Sox2 context might depend on the expression level of this factor, as high expression of this TF induces differentiation of ESCs. Moreover, it is possible that Sox17EK gained the pluripotency function of Sox2 but not its neural induction property. Possibly, Sox17EK is not able to team up with other POU factors such as Brn2 which was reported to cooperate with Sox2 during neural cell fate switches . In order to look further into this, we compared ChIP-sequencing data of Sox2 and Sox17EK and identified unique regions that were bound by (a) Oct4 and Sox2 but not by Sox17EK or (b) by Oct4 and Sox17EK but not Sox2. Functional characterization of these specific target genes will help for a better understanding and improvement of the reprogramming process.
It is interesting that Sox17EK (a) induces the expression of LIF/STAT3 signaling pathway target genes including STAT3 itself and (b) interacts with β-catenin and regulates genes involved in the Wnt/β-catenin pathway. These signaling pathways have been shown to act synergistically in order to maintain mouse ESCs' pluripotency . Moreover, it has been demonstrated that the Wnt/β-catenin pathway upregulates STAT3 expression in order to prevent differentiation of mouse ESCs . These results may help explain our observation that Sox17EK is highly potent for the reprogramming process as it is able to activate target genes for convergent pluripotency signaling pathways.
Finally, when testing all the Sox factors in the reprogramming assay, a surprising observation was that Sox18EK, another member of the SoxF family, that includes Sox7 and Sox17, was not able to generate iPSCs. This result agrees with previous data showing that Sox18 was unable to replace Sox2 for efficient reprogramming . Sox18 harbors high homology with Sox7 and Sox17 within the HMG domain but also outside including the C-terminus. Whereas both Sox7 and Sox17 have been shown to physically interact with β-catenin [44, 65], no evidence is available for Sox18. Moreover, in a few biological contexts, including cardiovascular development, redundancy between Sox7, Sox17, and Sox18 have been described [66, 67]. However, only Sox7 and Sox17 induce endodermal differentiation when overexpressed in ESCs [66, 68] whereas Sox18 induces their differentiation into endothelial cells . These observations suggest that in a pluripotency context Sox7 and Sox17 have more similar functions than Sox18. Establishing which cofactors are differentially recruited by Sox7, Sox17, or Sox18 will help provide a clearer understanding of the molecular details that govern cell fate switches.
In conclusion, our data reveal that the HMG domain of Sox transcription factors is crucial for their conversion into reprogramming factors as it provides an interface for their interaction with Oct4. However, this is not sufficient and the C-terminal, even though is not necessary, is important in driving an efficient process for Sox7 and Sox17. Indeed, we showed that it allows both factors to interact with β-catenin and therefore efficiently activate the Wnt but also the LIF signaling pathways. Our study helps understanding the reprogramming process and highlights the importance of transcription factor pairings on target genes selection which is crucial for lineage specification.
We thank Dr Paul Robson and Dr. Chaoyang Wang for the generous sharing of the Sox2 HMG-VP16 construct; Andrew Hutchins for providing glbase analysis tools. We are grateful to Petra Kraus and V Sivakamasundari from the GIS-GAP for cell injections, teratoma removals, and histology; Dr James M. Wells for the generous sharing of Sox17-ΔTAD and Sox17-ΔN-HMG expressing vectors. This work is supported by the Agency for Science, Technology and Research (A*STAR; www.a-star.edu.sg) Singapore.
Disclosure of Potential Conflicts of Interest
The authors indicate no potential conflict of interest.