Validation of Messenger RNA Isolation and Amplification
Using a modification of previously described protocols , we have generated microgram amounts (106-fold amplification) of amplified RNA (aRNA) from isolated mRNAs from duplicated sets of ICM, TE, and intact blastocysts from separate individuals. Similar protocols have been used to generate microgram amounts of aRNA needed for expression profiling [26–28]. To verify the success of immunosurgery, mRNA isolation, and subsequent in vitro transcription to generate enough aRNA for hybridizations, gene-specific RT-polymerase chain reaction (PCR) was performed for previously established markers of both ICM and TE (Figs. 1A–1C). The transcription factors OCT4, NANOG, REX1/ZFP42, and SOX2 and the gene encoding the signal transduction adaptor protein, DAB2, are overexpressed in the ICM. During mouse preimplantation development, Oct4, Nanog, and Dab2 are required in vivo and in vitro for the establishment and maintenance of the pluripotent ICM in blastocysts and in cultured ES cells [16, 28–32]. All of the ICM-enriched transcripts shown in Figure 1C have been designated as stemness genes [2–10]. Trophoblastic-determining genes HCG/CGB5, KRT18, HAND1, PSG3, CDX2, and TBX1  were enriched in the TE samples. Among these genes, the transcription factor Cdx2 has been shown to be instructive for TE differentiation during mouse preimplantation development . Having satisfactorily established that the pools of the TE and ICM aRNA samples do indeed reflect the transcriptomes of these cells, we proceeded to carry out whole-genome expression analysis. Transcription profiles were generated by using a cDNA microarray (Ensembl Chip) consisting of 15,529 resequenced and annotated clones.
Figure Figure 1.. Isolation of ICM and TE cells, RNA amplification, and expression of known ICM and TE markers. (A): Photograph of ICM cells isolated by immunosurgery. (B): RNA amplification of cells derived from duplicate sets of ICM, TE, and intact blastocysts, with RNA size standard shown to the left. (C): Confirmation of expression of known ICM/ES and TE markers. β-ACTIN and GAPDH are used as endogenous controls. Abbreviations: ICM, inner cell mass; TE, trophectoderm.
Download figure to PowerPoint
The reproducibility of amplification and subsequent hybridizations between replicates were assessed by calculating Pearson correlation coefficients (supplemental online Fig. 1). The values indicate a high degree of reproducibility in mRNA isolation, amplification, and array hybridization.
Expression Profiling Distinguishes ICM from TE
ICM and TE cells were isolated from two blastocysts from different individuals. Four independent hybridization experiments were performed for each biological replicate, including Cy-dye swaps. Additionally, RNA from two blastocysts was pooled to generate a reference sample. An overview of comparative gene expression in ICM and TE cells is shown in Figure 2A and reveals as expected a high overall correlation between the data (0.90), which helps to validate them. To judge whether a given gene is expressed in these cell types, we compared its signal against a negative control sample and computed a numerical value to judge gene expression (BG-tag; see Materials and Methods). This number reflects the proportion of background noise in relation to the actual signal . Typically, a BG-tag of 0.9 indicates a detectable signal for the probe (Fig. 2B). Using this criterion, we found that 7,481 (48%) genes represented on the chip (probes) were detected in one of the three cell types (ICM, TE, or blastocyst). As demonstrated in Figure 2B, most of these genes are either specific to the intact blastocyst (2,880) or common to all three cell types (2,031). The number of genes that are expressed exclusively in the immunosurgically isolated ICM or TE is rather low (292 and 345, respectively).
Figure Figure 2.. Global statistics. (A): Regression analysis revealed a global correlation (r) between mean TE (X-axis) and ICM (Y-axis) signals of 0.90, representing 4,601 probes corresponding to expressed genes, i.e., mean BG >0.9 in TE or ICM. There was no general trend in overexpression and underexpression. The box shows the regression parameters for the linear regression model y = 1.01x + 0.19. The lines show the twofold (blue) and fourfold (magenta) boundaries, and the signal range was approximately three orders of magnitude. (B): Venn diagram. Probes expressed in the different tissues. Expression was judged by signal detectability using a negative control sample present on each array. The average proportion of negative sample that is expressed below the probe's signal threshold across the replicate experiments indicates the detectability (BG value). An illustration with spots with different BG values shows that the level correlates with visual detectability. A BG level of 0.9 was used to judge expression of genes. (C): Global cell–type clustering. Principal component analysis (PCA) on approximately 600 preselected genes revealed no separation between biological replicates but rather between cell types, i.e., TE-1 and ICM-1 in blastocyst 1 and TE-2 and ICM-2 in blastocyst 2, and the pool of the two other blastocysts. The small image shows the same effect with a hierarchical clustering of the cells performed with approximately 8,000 genes using Pearson correlation as pairwise similarity measure and average linkage as an update rule. (D): Cell-type clustering using ICM- and TE-specific markers. PCA and hierarchical clustering on the set of 78 markers shows a clear biological separation across the individual blastocysts. Cluster analysis and PCA were generated with the J-Express Pro software (http://www.molmine.com). Abbreviations: BL, pool of two entire blastocysts; ICM, inner cell mass; TE, trophectoderm.
Download figure to PowerPoint
Although these global statistics might point to trends in overexpression and underexpression of genes, it should be noted that, globally, the differences between ICM and TE are not large enough to account for the variance in biological replicates. For example, if we apply standard statistical procedures, such as principal component analysis (PCA) and hierarchical clustering for grouping the biological replicates using a large unfiltered set of genes, we observe that the clustering result reflects technical reproducibility rather than biological characteristics (Fig. 2C). To specifically address biological significance, the two blastocysts were analyzed separately. This was accomplished by comparing the TE and ICM replicates pertinent to each blastocyst using statistical tests for differential expression (see Materials and Methods). Three distinct tests in parallel (Student's t-test, Welch test, Wilcoxon rank sum test) were used to help overcome individual bias . By adopting this approach, a subset of 78 candidate marker genes was identified, consisting of genes that are differentially expressed in ICM and TE (within one of the blastocysts) at the 0.05 level of significance. Of these 78 genes, 23 (29%) were completely novel and 17 were common to both blastocysts. By repeating the PCA and clustering analysis using these genes, we observed a clear biological separation of ICM and TE across the biological replicates (Fig. 2D).
These newly identified genes may complement existing markers for ICM and TE, such as OCT4 and CDX2 (Figs. 1B, 3A, 3B) and represent diverse biological functions. For example, the TE marker SFN (stratifin) is an epithelial cell antigen, which is exclusively expressed in keratinocytes. Its role in cell proliferation and apoptosis suggests that the protein could be relevant to the regulation of growth and differentiation of multiple cell types through the protein kinase C signaling pathway . Among the putative ICM markers, HMGB1 and GLTSCR2 have been identified as potential stemness genes based on expression studies in the human ES lines HSF-1, HSF-6, and H9 . HMGB1 is a member of the high-mobility group of transcription factor–encoding proteins that act primarily as architectural facilitators in the assembly of nucleoprotein complexes, as in the initiation of transcription of target genes. Murine Hmgb1 has been shown to be a coactivator of Oct4 . GLTSCR2 is a gene of unknown function residing in the glioma tumor suppressor region of chromosome 19q. Of the proposed list of stemness genes [4, 5, 12], only ITGA6 (integrin alpha 6 chain) appeared in all the stem cell lines analyzed, but by comparing this list with our ICM markers, we have also identified ITGA5 as another candidate for stemness. These findings highlight the potential importance of integrins in cell adhesion as well as in cell surface–mediated signaling for establishing and maintaining pluripotency.
Figure Figure 3.. Validation of selected markers. (A): Hierarchical clustering of the 78 markers shows biological separation of TE and ICM. For each cDNA, the log-ratio (base 2) of the signal intensity and the mean intensity across all five different samples were calculated. Samples were ICM and TE from two blastocysts (ICM-1 and ICM-2, TE-1 and TE-2) and a pool of two intact blastocysts. Clustering shows a clear separation of the biological material. Two large groups are obtainable, ICM-overexpressed genes (lower part of the dendrogram) and TE-overexpressed genes (upper part). (B): Real-time polymerase chain reaction confirmation of array-derived expression ratios in duplicate TE and ICM samples derived from blastocysts 1 and 2. Array-derived ratios are denoted as BL1 (yellow) and BL2 (light blue) and real-time ratios as BL1 (dark blue) and BL2 (violet). The genes IPL and OCT4 were not on the chip. Because ratios were presented as log-ratio (base 2), values above zero denote TE >ICM expression whereas values under zero indicate ICM >TE expression. The data represent averages from four independent hybridization experiments and triplicate reverse transcription–polymerase chain reactions. (C): Box-plots. Four independent experiments, including a dye swap, were performed for each cell type in both of the blastocysts (TE-1, TE-2, ICM-1, ICM-2). Boxes show the range of the two inner replicates; the whiskers extend to the minimum and the maximum within each sample. The line displays the median value. Abbreviations: BL, pool of two entire blastocysts; ICM, inner cell mass; TE, trophectoderm.
Download figure to PowerPoint
The expression levels obtained from microarrays were verified independently for selected markers using real-time PCR (Fig. 3B). For example, ATP1B3, a Na+/K+-ATPase, is overexpressed approximately threefold in TE, thus reflecting the roles played by these ATPases in driving transepithelial Na+ and fluid transport for blastocoele formation . Another ICM marker, HMGB1, is also overexpressed to the same degree. Unexpectedly, the ribosomal proteins RPL14, RPL7A, RPL19, and RPL32 were identified as ICM-specific markers, corroborating previous findings implying that some large subunits of ribosomal proteins are stem cell markers [2–10] and that these proteins might bind and inhibit the translation of specific mRNAs.
The full sets of data for global gene expression in the duplicate ICMs, TEs, and pooled blastocysts are presented (supplemental online Table 1). In general, the magnitude of expression recorded after RT-PCR was greater than from microarrays, an observation that is consistent with previous findings [4, 26].
Functional Annotation of Expressed Genes
The GO vocabulary provides a unified terminology for the description of genes and their products . It is divided into three main categories: Molecular Function, Biological Process, and Cellular Component. A comparative analysis of these three categories at a global level within the ICM, TE, and intact blastocyst did not reveal a bias toward a particular category in these cell types (data not shown). In contrast, by repeating this analysis on Molecular Function using the 78 marker genes, 51 of which have GO annotations, we observed a slight bias toward specific molecular functions within ICM and TE cells (Fig. 4). For example, the ribosomal proteins RPL14, RPL7A, RPL19, and RPL32, under structural molecular activity (GO:0005198), are all expressed in the ICM (Fig. 4A). A more detailed description of these markers with respect to their chromosomal localization and ontology is presented (supplemental online Table 2). For a global overview, we combined the expression data from the ICM, TE, and intact blastocysts to create a database for searching for expression levels and related GOs (http://goblet.molgen.mpg.de/blastocyst).
Figure Figure 4.. Functional annotation of ICM and TE marker genes. (A): Distribution of the genes with respect to the molecular function (GO:0003674) of the gene product. The terms under “obsolete function” are those that have been removed from the active function ontology by the GO curators. Full details of the obsolete lineages can be viewed in online supplemental Table 2. (B): Distribution of genes within the lineages of molecular function (GO:0005488) defined as having binding activity. A high proportion of genes bind to nucleic acids, i.e., transcription factors, chromatin, and RNA binding proteins. (C): Distribution of genes within the lineages of molecular function (GO:0003824) defined as having catalytic activity. Abbreviations: GO, gene ontology; ICM, inner cell mass; TE, trophectoderm.
Download figure to PowerPoint
Developmentally Conserved Signaling Pathways
During embryogenesis, the specification and proper arrangement of new cell types require the coordinated regulation of gene expression and precise interactions between adjacent cells. These morphogenetic changes depend on the interaction of extracellular ligands with their receptors.
Delineation of signaling pathways will be fundamental for understanding the mechanisms regulating pluripotency and self-renewal in cultured ES cell lines. We searched the ICM and TE data for components of these pathways by assigning p values using Wilcoxon matched-pair signed rank test, as described in Materials and Methods. This strategy is distinct from the commonly used strategy for identifying differentially expressed genes using repeated measurements with a two-sample location test, such as Student's t-test or Wilcoxon rank sum test, because we directly involve groups of genes associated with particular pathways instead of conventional gene-wise analysis. The data indicate the involvement of WNT, mitogen-activated protein kinase (MAPK), transforming growth factor-β (TGF-β)/bone morphogenic protein (BMP), NOTCH, integrin-mediated cell adhesion, apoptosis-signaling pathways, and metabolic processes such as glycolysis, sterol biosynthesis, androgen, and estrogen metabolism. The full list of signaling and metabolic pathways identified by these analyses is in Table 1 and supplemental online Table 3. Pathway annotations were adopted from the KEGG (Kyoto Encyclopedia of Genes and Genomes; http://www.genome.jp/kegg) database.
Table Table 1.. Analysis of metabolic and signaling pathways operative in the blastocyst
|ID||Pathway description||# Genes||Z-Score||pvalue||TE-up||ICM-up|
|hsa04010||Mitogen-activated protein kinase signaling pathway||131||5,694842||6,19205E-09||97||34|
|hsa00500||Starch and sucrose metabolism||43||4,986955||3,07171E-07||35||8|
|hsa04310||Wnt signaling pathway||72||4,629647||1,83341E-06||54||18|
|hsa00632||Benzoate degradation via coenzyme A ligation||36||3,927644||4,29089E-05||30||6|
|hsa00562||Inositol phosphate metabolism||31||3,782133||7,77705E-05||24||7|
|hsa04620||Toll-like receptor signaling pathway||32||3,683691||0,000114972||24||8|
|hsa00280||Valine, leucine, and isoleucine degradation||28||3,575113||0,000175077||24||4|
|hsa04510||Integrin-mediated cell adhesion||37||3,537734||0,00020183||28||9|
|hsa04610||Complement and coagulation cascades||38||3,487797||0,000243554||30||8|
|hsa04070||Phosphatidylinositol signaling system||30||3,218945||0,000643379||22||8|
|hsa00760||Nicotinate and nicotinamide metabolism||32||3,216218||0,000649524||22||10|
|hsa00903||Limonene and pinene degradation||17||3,053308||0,001131736||16||1|
|hsa00071||Fatty acid metabolism||31||2,939482||0,001643875||25||6|
|hsa04350||Transforming growth factor-beta signaling pathway||39||2,832865||0,002306706||28||11|
|hsa00590||Prostaglandin and leukotriene metabolism||13||2,690598||0,003566252||11||2|
|hsa00450||Selenoamino acid metabolism||14||2,542448||0,005503972||11||3|
|hsa00220||Urea cycle and metabolism of amino groups||15||2,499032||0,006226668||13||2|
|hsa00563||Glycosylphosphatidylinositol (GPI)–anchor biosynthesis||15||2,38544||0,008529342||13||2|
|hsa00020||Citrate cycle (TCA cycle)||16||2,378603||0,008689179||11||5|
|hsa00860||Porphyrin and chlorophyll metabolism||13||2,341169||0,009611712||10||3|
Such explorative approaches have also been used by means of expressed sequence tag and array data generated from undifferentiated and differentiated human ES cells [8, 39] and array data relating to mouse preimplantation development [27, 28].
Apoptosis in the Mammalian Blastocyst
Regulation of cell population size and lineage determination is mediated by cell cycle control, differentiation, and programmed cell death or apoptosis. The latter is characterized by chromatin condensation, nuclear membrane blebbing, and fragmentation in the cytoplasm and nucleus .
Apoptosis is evident at the blastocyst stage, if not earlier. It is mainly restricted to the ICM to regulate the size of the cell mass and perhaps to eliminate cells retaining the potential to form TE ectopically . A list of expressed genes involved in the apoptosis signaling pathway is provided in supplemental online Table 4.
The TGF-β family consists of multifunctional growth and differentiation factors regulating many cellular processes through complex signal-transduction pathways. The family members include TGF-β isoforms, activins, and BMPs. Expression of the signaling type I and type II receptors for TGF-β in mouse and human fertilized oocytes and blastocysts suggested a role for TGF-β in early preimplantation development, potentially in the outgrowth of parietal endoderm . Differential expression of TGF-β isoforms, activins, BMPs, and MADHs/SMADs was also evident. In particular, BMP4, previously shown to induce the differentiation to trophoblast when overexpressed in human ES cells , is 2.28-fold enriched in the TE. Other components of the TGF-β signaling cascade are shown (supplemental online Table 6).
Integrin and Cadherin-Mediated Cell Adhesion
The ICM and TE originate from the division of polar blastomeres when their cleavage furrows parallel their apical surfaces. These blastomeres polarize in response to asymmetric cell-cell contact. Pathway analysis identified signaling pathways related to integrin-mediated cell adhesion. In addition, several Na+/K+-ATPases (e.g., ATP1B3; Fig. 3B) were overexpressed in TE, reflecting their presumptive roles in driving fluid transport across this epithelium. Other cell adhesion–related genes were also detected, as expected where intercellular junctions are important for controlling blastocyst permeability . However, there was overexpression in the TE of a subset of these genes, including Desmocollin 2 (DSC2 x1.55), Protocadherins (PCDH7 x1.67, PCDH11 x1.57, PCDHB7 x1.62), E-cadherins (CDH19 x1.9, CDH24 x1.54, CDH22 x1.82), tight junction proteins (TJP1 x1.4, TJP2 x1.8), Claudins (CLDN2 x1.4, CLDN16 x1.79, CLDN10 x2.25), and seven-pass transmembrane receptor of the cadherin superfamily (CELSR2 x1.46). For the tight junction constituents OCLN (occludins), JAM-2, (Junction adhesion molecule 2), and CGN (Cingulin), a lack of overexpression in the TE may be due to the fact that translational rather than transcriptional control is operative due to cell contact symmetry . A comprehensive listing of these genes and their expression ratios is given in supplemental online Tables 1 and 7.
WNT Signaling in the Blastocyst
The WNT gene family consists of numerous conserved glycoproteins that regulate pattern formation during embryogenesis in a wide variety of tissues, including the nervous system. It has recently been shown that activation of the canonical WNT/Wnt pathway is sufficient to maintain self-renewal of both human and mouse ES cells  and also that this pathway is operative during human and mouse preimplantation development [21, 27]. We detected differential expression of transcripts encoding WNT ligands (WNTs), receptors of the Frizzled gene family (FZD), Frizzled-related protein family (SFRP), and intracellular signal transducers and modifiers (DVL1, AXIN). The genes encoding Casein kinase 1 alpha (CSNK1A), disheveled activator of morphogenesis 1 (DAAM1), which are agonists of the WNT pathway, are both overexpressed in the TE (Supplementary Table 8). These agonists were upregulated in differentiated ES cells . In addition, we found that GSK-3B (glycogen synthase 3 kinase) expression is downregulated in the ICM, thus corroborating the reported inactivation of GSK-3B, which leads to the activation of the WNT pathway in maintaining the undifferentiated state of ES cells .
Epigenetic Regulation of Lineage-Specific Gene Expression
De novo methylation of DNA by cytosine-5-methyltransferases is a well-characterized mechanism of epigenetic transcriptional control, and it has been shown that this mode of transcriptional control may contribute to the differentiation of the ICM and TE at the blastocyst stage . Dnmt3b protein is specifically localized in the ICM of mouse blastocysts . In addition, expression of DNMT3B and DNMT3A is enriched in undifferentiated human embryonic stem cells [8–10, 49] as well as in the ICM cell lineage (Fig. 1C). This expression pattern suggests an important role in ICM-specific methylation in the blastocyst . In contrast, transcripts of DNMT3L were expressed in both ICM and TE (Fig. 1C). We also detected differential expression of several methyltransferases (supplemental online Table 1).
Other epigenetic regulators of X-inactivation, imprinting, maintenance of pluripotency, and the establishment of the TE lineages, including EZH2 (enhancer of zeste homologue 2), EED (embryonic ectoderm development), and CTCF (CCCTC-binding zinc finger protein), are expressed at high levels in the blastocyst and all ICM and TE samples [50–52].
Several imprinted genes (H19, GRB10, SNURF, MEST, NAP1, UBE3A, DLX5, MAGEL2, OSBPL5/OBPH1, and ATP10A) were expressed at medium to high levels (BG-tag 70% to >90%) in the ICM, TE, and blastocyst. Our strict criteria for determining statistically significant differential expression between ICM and TE based on the microarray data may occasionally obscure more subtle differential expression patterns that are revealed when assayed by alternative methods. Real-time PCR identified a clear TE-biased (30-fold higher) expression for the imprinted gene IPL (imprinted in placenta and liver) (Fig. 3B). Significantly, IPL/PHLDA2 is a marker of human cytotrophoblast and in the mouse Ipl restricts placental growth . Thus, such imprinted genes that can act as regulators of nutrient supply at the feto-maternal interface may also influence growth and development of the early embryo.