Differentiation of embryonic stem cells (ESCs) into neurons requires a high level of transcriptional regulation. To further understand the transcriptional regulation of neural differentiation of ESCs, we used oligonucleotide microarray to examine the gene expressions of the guided differentiation (GD) model for dopaminergic (DA) neurons from mouse ESCs. We also determined the gene expression profiles of the random differentiation (RD) model of mouse ESCs into embryoid bodies. From K-means clustering analysis using the expression patterns of the two models, most of the genes (1,282 of 1,884 genes [68.0%]) overlapped in their expression patterns. Six hundred twenty-two differentially expressed genes (DEGs) from the GD model by random variance F-test were classified by their critical molecular functions in neurogenesis and DNA replication (Gene Ontology analysis). However, 400 genes among GD-DEGs (64.3%) showed a high correlation with RD in Spearman's correlation analysis (Spearman's coefficient ps ≥ .6). The genes showing marginal correlation (−.4 < ps < .6) were present in the early stages of differentiation of both GD and RD, which were non-specific to brain development. Finally, we distinguished 66 GD-specific genes based on ps ≤ −.4, the molecular functions of which were related mainly to vesicle formation, neurogenesis, and transcription factors. From among these GD-specific genes, we confirmed the expression of Serpini1 and Rab33a in P19 differentiation models and adult brains. From these results, we identified the specific genes required for neural differentiation by comparing gene expressions of GD with RD; these would potentially be the highly specific candidate genes necessary for differentiation of DA neurons.
Development of the mammalian central nervous system (CNS) is a complex process involving an orchestrated regulation of structural and regulatory genes through differentiation stages of multipotent stem cells into neurons . Mouse embryonic stem cells (ESCs) retain the characteristics of multipotent stem cells, exhibiting the infinite self-renewal activity and the potential of differentiation into various kinds of lineages . Numerous efforts have been made to induce ESCs into neurons by regulating transcriptions of critical genes in the hope of using these cells for therapy for neurological disorders such as Parkinson's disease.
In an early effort to make neurons from P19 embryonic teratocarcinoma cells, retinoic acids were used to induce neural precursor cells . Another efficient differentiation method is coculturing of the ESCs with the PA-6 feeder cells, producing stromal cell-derived inducing activity . Lee et al. established the five-stage differentiation method to induce the ESCs into dopaminergic (DA) neurons . Recently, transcription factors like Nurr1  and Pitx3  were introduced into stem cells to enhance the efficiency of DA neuron production. Moreover, the methods to make human DA neurons from human ESCs (hESCs) were established by coculturing with PA-6 stromal cells [8, 9].
During the step-wise differentiation into DA neurons, we can follow the levels of differentiation: pluripotent stem cell (ESC, stage I), committed multipotent precursor cells (embryoid body [EB], stage II), neural precursors (stage III), expanded neural precursors (stage IV), and fully differentiated neurons (stage V) . To further dissect this DA neuron differentiation, genome-wide gene expression profiling can be used, providing a more comprehensive molecular understanding of neural differentiation using mouse ESCs  or hESCs . For example, in-depth examinations of gene expression profiles of ESCs and their differentiated progenies are likely to reveal information about the “stemness” as well as the pathways involved in neural differentiation. These relevant genes, once identified, are good candidates to investigate for their role in neural differentiation. However, the guided differentiation (GD) of ESCs is difficult to control, resulting in a heterogeneous population of fully differentiated neurons. To correctly identify the specific genes related to neural differentiation, we need to minimize the contributions from non-neuronal and undifferentiated cells in the samples.
If the ESCs are induced to enter into the differentiation pathway in vitro, they can form EBs with all types of mesodermal, hematopoietic, endothelial, muscle, and neural lineages . Once we screen the markers for the germ layers or a variety of cell lineages, developmental stages of cells can be estimated in terms of early postimplantation development of mouse embryos . Thus, EB formation is the best in vitro model system for studying early lineage determination and organogenesis in mammals; this system will prove to be a useful tool for identifying developmental genes whose expression is restricted to the particular lineages.
We profiled the transcriptions of two differentiation models: GD into DA neurons and random differentiation (RD) into EBs. We could find the marked correlations between the two models but not the specific expressions of genetic markers of each model. We have compared the sequential expression patterns of GD and RD models by Spearman's correlation analysis in order to find marked overlap between them. Finally, we subtracted those overlapped, common profiles from GD profiles to select GD-specific genes. From this result, we could propose the sizable overlaps between two differentiation models and select the most likely candidate genes for GD for further studies of their roles in neural differentiations.
Materials and Methods
Mouse ESC Culture and Differentiation
We induced differentiation of mouse ESCs (R1) as described previously . Briefly, undifferentiated ESCs (stage I) were grown on gelatin-coated tissue-culture plates in knockout (KO)-Dulbecco's modified Eagle's medium (DMEM) media. To induce EB formation (stage II), the cells were dissociated into a single-cell suspension and plated onto nonadherent bacterial culture dishes at a density of 2.5 × 104 cells per cm2 in the KO medium. After 4 days, the cells were transferred to the original tissue-culture dish in a serum-free ITSF (insulin/transferrin/selenium/fibronectin) medium to select the nestin-positive cells (stage III). After 6 days of selection, the cells were expanded (stage IV) by transferring to the plate coated with polyornithine and laminin in N2 medium supplemented with laminin/basic fibroblast growth factor (bFGF)/sonic hedgehog/fibroblast growth factor 8. After 6 days, bFGF was removed to induce the differentiation (stage V) in N2 medium supplemented with laminin and ascorbic acid for 6 days. For RD, EBs were dissociated and plated onto a tissue-culture dish in DMEM with fetal bovine serum and antibiotics for indicated periods.
Total RNAs from undifferentiated mouse ESCs were used as a reference group in all experiments. Three independent biological replicates were taken at four stages of DA differentiation. For the RD model, three biological replicates were made at days 4, 8, 15, and 21 to extract total RNA. Total RNA was prepared by using TriZol reagent (Invitrogen, Carlsbad, CA, http://www.invitrogen.com). The array used in this experiment was the Macrogen Mouse Oligo 11K Chip (Macrogen Inc., Seoul, Korea, http://www.macrogen.com) as described previously [13, 14]. Cy3 and Cy5 fluorescent intensities were determined using the GenePix scanner (Axon Instruments, Union City, CA, http://www.axon.com), and images were analyzed using the GenePix Pro to calculate relative ratios and to determine confidence intervals.
Fluorescence intensities were processed and measured using GenePix Pro software. Intensity data were imported to an in-house microarray database. The variance stabilizing normalization by Huber et al. was applied with the “vsn” package in Bioconductor using the R statistical package . After performing intensity-dependent global LOWESS (locally-weighted scatterplot smoothing) regression, spatial and intensity-dependent effects were managed by pin-group LOWESS normalization following by the approach of Yang et al. .
The gene expression dataset was registered in the Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo) under the accession numbers GSE3527 (for RD) and GSE3528 (for GD).
Reverse Transcription-Polymerase Chain Reaction (RT-PCR) and Quantitative Real-Time RT-PCR
The first-strand cDNA was synthesized from 1 μg of total RNA using reverse transcriptase and 1 μM olgo-dT primer. Each cDNA sample was amplified by using specific primers (supplemental online data A). Specific bands corresponding to the estimated sizes were analyzed after agarose gel electrophoresis. To quantify the amount of transcripts, real-time RT-PCR based on MyQ system (Bio-Rad Laboratories, Hercules, CA, http://www.bio-rad.com) was performed as described in the manufacturer's recommendations. The relative amount of each transcript was normalized with the level of glyceraldehyde-3-phosphate dehydrogenase.
P19 Mouse Teratocarcinoma Cells and Mouse Tissues
Undifferentiated P19 mouse teratocarcinoma cells were maintained in a growth medium like α-minimal essential medium supplemented with 10% fetal bovine serum and antibiotics as previously described . We induced the production of neurospheres by adding 1 μM all-trans-retinoic acid to the media for 4 days. All the suspended EBs were dissociated and plated onto poly-L-lysine-coated dishes with neurobasal media supplemented with B-27 and Ara-C to select neuron-like cells (NLCs). Mouse brain tissues were isolated and pooled from the embryos at days 9, 11, 13, 15, and 17 or from neonates at days 0 and 7 by dissecting out under the stereomicroscope. The whole brain and other tissues from adult mice were also isolated to extract total RNA.
Results and Discussion
DA Neuron and EB Formation
In an effort to find differentially expressed genes (DEGs) during neural differentiation, we established the five-stage neural differentiation model as described previously . We used this model as a GD of mouse ESCs into DA neurons. First, we checked the expression of sets of markers for the differentiation of DA neurons by RT-PCR in the five-stage GD model (Fig. 1A). Neural markers like Tubb3 (tubulin, beta 3), Gfap (glial fibrillary acidic protein), and Neurod1 (neurogenic differentiation 1)  were gradually increased from stage III–V, and Nestin  and Pax5 (paired box gene 5)  were turned on at stage III and gradually decreased to stage V. In addition, we could detect the expression of DA neuron-specific genes emerging through the GD as in previous reports . The genes related to midbrain development, such as Nr4a2 (nuclear receptor subfamily 4, group A, member 2, Nurr1)  and En1 (engrailed 1) , started to express at stage II and sustained their expression until stage V. On the other hand, the transcriptions of the genes for DA neuron phenotype, such as Ddc (doapmine decarboxylase, AADC), Slc6a3 (solute carrier family 6, member 3, DAT), and Th (tyrosine hydroxylase) , were turned on at stage V. According to the expressions of genetics markers, we could confirm the establishment of GD models as in previous reports .
For RD, mouse ESCs were maintained as EBs for 3 weeks by being attached to culture dishes for the indicated period. ESC markers such as Pou5f1 (POU domain, class 5, transcription factor 1, Oct4) and Dppa5 (developmental pluripotency associated 5, Esg1)  were gradually decreased in the course of EB formation (Fig. 1B). The differentiation status of EBs could be checked by the expression level of the lineage markers of three layers : T (brachyury) for mesoderm, Nodal for ectoderm, and Gata4 (GATA binding protein 4) for endoderm (Fig. 1B). These results showed that EBs contained all types of three germ layers and their derivatives in quantities large enough to cover any type of cells or tissues.
Overlapping of the Gene Expression Patterns in GD and RD Models
Using GD and RD models of ESCs, we tried to profile the genome-wide gene expression using long oligonucleotide microarray containing 11,376 genes in 13,680 spots. Total RNAs from each stage of differentiating cells (stage II–V for GD and EB day 4–21 for RD) were hybridized with those of mouse ESCs as a reference in biological triplicates for each dataset. From the 24 microarray experiments in total, we could have the gene expression profiles of four stages in two datasets for each differentiation model. The whole dataset of microarray experiments can be browsed in GEO as accession numbers GSE3527 and GSE3528 (http://www.ncbi.nlm.nih.gov/geo).
To capture expression profiles of GD and RD as a whole, K-means clustering analysis was applied to GD and RD normalized log activation fold ratios using the Euclidean distance metric. Genes showing minimal variation across the set of arrays were excluded from the clustering analysis. We took the mean value from the gene expression ratio of each of the three independent experiments and selected genes whose expression levels differed by at least a 1.5-fold change at one or more stages. From all datasets of GD and RD models, 1,884 genes showed more than a 1.5-fold change in any one of eight stages. In the cluster analysis, global patterns of whole datasets could be visualized and summarized (Fig. 2). We could divide all 1,884 genes into seven clusters based on K-means clustering (supplemental online data B) after trying different numbers of clusters to find minimal numbers of RD- and GD-specific clusters. Four out of seven clusters including 1,282 genes (68.0%) showed exactly the same patterns of upregulation (C3 and C5) and downregulation (C6 and C7). The overall pattern of gene expression of two differentiation models reflected a sizable overlapping gene expression between GD and RD. The other two clusters (308 genes) showed the elevation in their gene expression in RD but not in GD. Only one cluster, C4, containing 294 genes (15.6%), showed the upregulated pattern in a GD-specific way.
For the common features, C6 and C7 clusters contained the downregulated genes in both types of differentiation, which were mostly related to development and stem cell markers like Pou5f1 (Oct4) and Nanog (Nanog homeobox). As we expected, any kind of differentiation might lead to a loss of stem cell characteristics such as proliferation and cell cycle. On the other hand, the genes in the C3 and C5 clusters were upregulated in both differentiation models. More interestingly, the genes in C3, like Fst (follistatin) and Notch3 (Notch gene homolog 3), were related to development, whereas the C5 cluster contained genes, such as Elavl4 (ELAV [embryonic lethal, abnormal vision, Drosophila]-like 4, HuD) and Pfn2 (profilin 2), related to the neuronal functions. The genes in the C3 and C5 clusters reflect that a certain part of the mechanism regulating neural differentiation could be overlapped with any other types of differentiation.
DNA Methyltransferases and Imprinted Genes in the Overlapping Clusters
As an example of common features shared between the two models, we found that Dnmt3-â (DNA (cytosine-5-)-methyltransferase 3-β) and Dnmt3l (DNA (cytosine-5-)-methyltransferase 3-like) were downregulated in both the GD and RD models (C6 and C7 clusters in Fig. 2, respectively). Global regulation of transcription through chromatin regulation like DNA methylation seems to be an essential mechanism in the early stages of mammalian development . We confirmed that the expression levels of Dnmt3-â from microarray data (supplemental online data C) and RT-PCR (Fig. 3) clearly showed the downregulated patterns by the induction of any type of differentiation. Because the downregulated methyltransferases could lead to the transcriptional activation of genes imprinted in ESCs, we checked the expressions of the imprinting genes such as Nnat (neuronatin), Igf1 (insulin-like growth factor 1), Mest (mesoderm specific transcript), and Ndn (necdin)  by RT-PCR in both GD and RD (Fig. 3B and 3D). In a series of analyses on the gene expression of the two differentiation models, DNA methylation and chromatin regulations occurred in the early stage of differentiation. As shown in supplemental online data C, for the genes related to chromatin regulation, the global regulation through chromatin seemed to be a general mechanism for the developmental regulation of gene expression, especially in the early stages of differentiation.
Identification of DEGs
For each RD and GD model, the multivariate permutation test was applied to evaluate the statistical significance of changes in gene expression. Qualite-quantile plot analysis showed that the residual quantiles deviated from the theoretic quantiles (supplemental online data D). Hence, we regarded the distribution of the data as abnormal . Typically, the F-test assumes that the input data hold normal distribution. Therefore, we used the multivariate permutation test to collect the genes at a 90% confidence with a false discovery rate of less than 10% . The test statistics used were random variance versions of F-tests for each gene . Although the F-test was used, the multivariate permutation test is nonparametric and does not require the assumption of normal distributions. By random variance F-test analysis, we selected 622 DEGs (supplemental online data E). Functionally, the genes from GD DEG were classified as being related to DA neurons, neurogenesis, and transcription factors (supplemental online data F). However, 188 genes were differentially expressed in the RD model (supplemental online data G). Because triplicate genes in RD show a larger variance of expression, the number of DEGs in RD is smaller than in GD. Moreover, 69 genes among the 188 RD-related genes were also found in GD DEGs (supplemental online data G).
To characterize the DEGs of each model, we analyzed gene ontology (GO) categories of the DEGs in GD and RD . This procedure finds gene ontology categories that have higher-than-expected numbers of genes differentially expressed among the different classes of the samples based on random variance F-tests . By analyzing the GO groups, rather than the individual genes, we were able to reduce the number of tests conducted and enable findings among biologically related genes to reinforce each other (Table 1). In GD, the genes related to neurogenesis, organ development, and DNA replication show a significant variance. On the other hand, the genes related to cell adhesion, cell communication, and structural molecular activity show a dynamic change in RD. The 69 common genes in GD and RD are related to “development (GO: 0007275)” and “organelle organization and biogenesis (GO: 0006996)” categories in GO analysis.
Table Table 1.. Gene ontology analysis of GD-, RD-, and common DEGs
Common Genes for Two Differentiation Models
Spearman's correlation is a nonparametric test for measuring the strength of the association between the two variables. We compared the expression profiles of the 622 DEGs in GD with an RD profile based on the Spearman's correlation coefficient (ps). For example, 400 genes from the previously mentioned 622 DEGs (64.3%) showed similar expression patterns in the two models (ps ≥ .6, supplemental online data E). As we observed in K-means clustering (Fig. 2), half of the GD-DEGs were also disregulated in the RD model. These common genes could be categorized specifically to cell cycle, DNA replication, and morphogenesis in GO analysis (Table 2).
Table Table 2.. GO analysis of GD-specific, marginal, and common genes among GD-DEGs
For example, Sox4 is a member of the SOX (SRY-related HMG-box) family of transcription factors involved in the regulation of embryonic development and in the determination of the cell fate . Sox4 was suggested to be upregulated at stage V of DA neuron differentiation, in which ascorbic acids might induce the expression of Sox4 and other DA neuron-specific genes . We observed that Sox4 was also present in the 622 GD-DEGs in Table 1. However, the expression pattern of Sox4 in GD was similar to that of RD (ps = .6, supplemental online data E). To validate the upregulation of Sox4, we quantified Sox4 expressions in another neural differentiation model using P19 teratocarcinoma cells . We used Nestin and Rest (RE1-silencing transcription factor) as controls to monitor the differentiation status of P19 cells. Nestin is an intermediate filament protein that is expressed predominantly in stem cells of the CNS in the neural tube . Upon terminal neural differentiation, nestin is downregulated and replaced by neurofilaments. Rest, which binds a DNA sequence called the neuron-restrictive silencer element (NRSE), was present in the promoter regions of neuron-specific genes. The expression of Rest mRNA was detected in most of the non-neuronal progenitor cells but was absent in differentiated neurons . Whereas Rest and Nestin were downregulated in EB and NLCs, Sox4 could be induced in EB and NLCs (Fig. 4A). We also tried to confirm the expression of genes related to GD-DEGs, using in vivo embryonic mouse brains, by real-time RT-PCR. Like in vitro differentiation models, Sox4 started to express in the brain of embryonic day 11 (E11) mouse embryos, whereas Rest and Nestin were downregulated in embryonic brain after E15 (Fig. 4B). However, in other mouse tissues like spleen and kidney, we were able to detect the expression of Sox4 more than any part of the brains (Fig. 4C). These results revealed that some genes reported to be GD-related could be non-specific to brain and neural differentiation. Although there were variations within neural tissues, Sox4 seems to have an overlapping function in both the neural and non-neural cell types.
Using Spearman's correlation, we could find the expressions of a large number of genes such as Sox4 in GD-DEGs that showed an overlapping pattern with that of RD. At the other extreme of Spearman's correlation, we could find only 66 highly specific genes upregulated in the GD model (ps ≤ −.4). To validate those genes, we analyzed and quantified the expression levels of Rab33a and Serpini1 in the P19 neural differentiation model (Fig. 5A). As detected in the microarray experiment, Rab33a (member of RAS family oncogene, ps = −.88)  and Serpini1 (serine [or cysteine] peptidase inhibitor, clade I, member 1, ps = −.4)  were upregulated in NLCs, but not in the EB stage of P19 cells. We also tested the expressions of these genes in embryonic or neonatal brains as well as in a variety of adult tissues. Two genes that are highly specific to adult brain started to express from E15 embryonic brains. Serpini1, a neuroserpin, has been reported to be expressed in the late stages of neurogenesis, during the process of synapse formation corresponding to stage V of the GD model . We also tested four GD-specific genes, Gng3 (ps = −.4), Uchl1 (ps = −1.0), Vamp8 (ps = −.8), and Sms (ps = −.4), by real-time RT-PCR in the P19 differentiation model (supplemental online data H).
Another group of genes (−.4 < ps < .6) has been upregulated in the early stages of GD, but also in RD marginally. Like cluster 3 in Figure 2, those genes were also upregulated in RD. However, they were downregulated again at the last stage of GD, and the levels of gene expression were maintained in RD until EB day 21. We checked the expressions of Cdk4 (cyclin-dependent kinase 4; ps = .2)  and P4hα2 (procollagen-proline, 2-oxoglutarate 4-dioxygenase [proline 4-hydroxylase], alpha II polypeptide; ps = −.2) , as examples of this group, in the P19 differentiation model, embryonic brain, and adult mouse tissues (Fig. 5). The expression of Cdk4 started in EB or E13 embryos, but it was not specific to brain tissue. P4hα also expressed as early as E9 embryos and was also present in heart and liver. These data suggest that genes expressed in the early differentiation stages such as stage III and IV of the GD model were not specific to brain. In another words, genes responsible for the early neural differentiation can be also important to any other type of cells or tissues, even from other germ layers.
After subtracting the group of genes showing a high resemblance to RD, we could finally select the 66 GD-specific genes meeting the criterion of ps < −.4. These genes were related to the vesicle-associated functions and transcriptional regulation in GO analysis (Table 2; GD-specific DEG). Because the majority of these genes were upregulated at stage V of the GD model, the early differentiation pathways seem to be shared with the other cell types. Even the downregulated genes were very few, implying that many of the stem cell-related genes were non-specifically lost during any kind of differentiation. We still need to validate the specificity and functional implications of these genes in neural differentiations. We found that 15 genes might have functions in neurons (Serpini1, Stx4a, Stxbp2, Scg2, Syt1, Vamp8, Ap3b2, Rab33a, Lrrn3, Dab2, Sorcs3, Gdap1, Resp18, Crmp1, and Ckb), and 12 genes were reported in relation to the DA neurons (Gng5, Scg2, Apod, Slc2a1, Kcnip1, Gng3, Tbrg1, Gpx3, Ets-1, Uchl1, Aplp1, and Ptgds) [35, , , , , , , , –44]. We suggest that those highly selectable genes for neural differentiation by direct comparing might include genes with possible function in neural differentiations.
In this study, we set out to find GD-specific genes by comparing GD profiles with RD profiles. First, we used statistical analysis to find DEGs from both groups, which were categorized according to their characteristics of GD or RD. Whereas the genetic markers for GD and RD were exclusively expressed in the two models, the overall expression patterns looked similar to each other in K-means clustering. More than half of the GD-DEGs were also found to be similarly changed in RD. We proposed that those overlaps might originate from the heterogeneous cell populations and should be subtracted from the GD-DEGs. The final GD-specific gene list contained only 66 of 11,376 genes in the microarray analysis; these genes were related mainly to transport and transcriptional regulation. Even though we need to further determine the functions of these genes in DA neuron differentiation, this approach has revealed more information about the guided neural differentiation and its role in neural differentiation as a whole.
The authors indicate no potential conflicts of interest.
We thank S.H. Lee and J.Y. Kim for helping us to establish the DA neuron differentiation procedure. This work was supported by grants from the Stem Cell Research Center, KOREA, to W-Y.P. (SC11021) and H-S.P. (SC11022) and also by a grant from Korea Health 21 R&D Project, Ministry of Health & Welfare, Republic of Korea, to J.H.K. (0405-BC02-0604-0004).