Cells in multicellular organisms possess virtually identical genomic DNA but produce distinct cell types, often in a clonal and stably heritable fashion. The differences between cells in morphology, function and gene expression can be explained by epigenetic traits. These epigenetics traits can be defined as “stably heritable phenotypes resulting from changes in chromosomes without alterations in the DNA sequence” (Berger et al.,2009). These chromosomal changes involve chemical modifications of DNA and the chromosomal proteins, which can be propagated through mitosis, and in some cases through meiosis (Bhaumik et al.,2007; Lange and Schneider,2010).
The Xenopus model system has a rich history of scientific exploration (Harland and Grainger,2011). Sequencing of the Xenopus tropicalis genome (Hellsten et al.,2010) has enabled genome-wide epigenetic analysis of early embryogenesis in this species. In this review studies on the Xenopus embryonic epigenome will be discussed in the context of what is known about epigenetic regulation in other model systems to highlight issues of conservation and divergence. We will first provide a brief overview of epigenetic modifications and their relevance for developmental biology. Then we will review what is known about epigenetics and chromatin state in early vertebrate embryogenesis, with a particular emphasis on genome-wide studies and comparative aspects of epigenetic regulation of gene expression. In the last section the topic of epigenomics will be discussed; how genome-wide maps of epigenetic regulation can help to identify and characterize functional genomic elements.
One type of epigenetic modification involves DNA directly; genomic DNA can be modified by methylation and hydroxymethylation of cytosine, predominantly in a CG dinucleotide context (customarily referred to as CpG) in vertebrates (Hendrich and Tweedie,2003; Kriaucionis and Heintz,2009; Tahiliani et al.,2009). DNA methylation interferes with sequence-specific binding of some transcription factors thereby reducing transcriptional activation. A more dominant effect on transcription is mediated by specific recognition of methylated DNA by methyl-CpG binding domain (MBD) proteins and some zinc finger proteins, causing transcriptional repression (Klose and Bird,2006). DNA methylation patterns vary to some extent between developmental stages and different tissues; the MBD proteins have various functions in transcriptional repression, long-range interactions in chromatin, genomic stability and neural signaling [reviewed in Bogdanovic and Veenstra (2009)].
A second type of epigenetic modification involves chromosomal proteins such as the histones. In the nucleus DNA is wrapped around octamers of histone proteins to form nucleosome particles, which contain two copies each of histones H2A, H2B, H3, and H4 (Fig. 1a). The highly conserved N-terminal tails of these proteins protrude from the nucleosome and are subject to extensive post-translational modifications, such as acetylation, methylation, phosphorylation and ubiquitination. These modifications are indicated with the histone (H2A, H2B, H3, H4), followed by the amino acid modified (for example K4 for lysine 4) and the chemical modification (examples: ac, acetylation; me1, mono-methylation; me3, tri-methylation). They are found at functionally distinct regions of the genome such as coding regions, promoters and enhancers (Fig. 1b). Histone modifications are specifically recognized by regulatory proteins, which mediate downstream effects on chromatin compaction and accessibility in addition to protein–protein interactions that help establish “active” or “repressive” chromatin. In addition to these “readers” of histone modifications, specific enzymes that “write” or “erase” the mark are involved in regulation (Strahl and Allis,2000). These aspects of epigenetic regulation are reviewed more extensively elsewhere (Bhaumik et al.,2007; Chi et al.,2010; Taverna et al.,2007)
Chromatin is conventionally grouped into two broad categories depending on its activity. The active, transcriptionally permissive state is referred to as euchromatin whereas the more compact state, usually associated with transcriptional repression is called heterochromatin. Global profiling of 53 common chromatin proteins and four histone marks associated with distinct chromatin states in Drosophila cells resulted in a more precise subdivision of chromatin based on its activity (Filion et al.,2010). The most abundant of five principal types of chromatin, covering 48% of the Drosophila genome, is mostly gene-poor, has a role in gene silencing and is associated with four proteins: linker histone H1 and several other proteins. The second and third chromatin types also belong to the heterochromatic state. The hallmark of the second type is the presence of SU(VAR)3-9 (KMT1, a histone H3 K9 methyltransferase) and its binding partner Heterochromatin Protein 1 (HP1), whereas the third type is characterized by the binding of Polycomb group proteins (PcG) and its hall mark histone modification H3K27me3. The fourth and the fifth chromatin type belong to euchromatin and are enriched for permissive marks such as H3K4 methylation, but represent distinct gene ontology (GO) groups and display significant differences in replication timing (Filion et al.,2010).
Epigenetics is Key to Development
Epigenetic modifications affect most, if not all, DNA-dependent processes. By marking genes for transcriptional activity or repression in a cell-specific fashion, epigenetic mechanisms segregate the genome of multicellular organisms into many “epigenomes” to produce distinct cell fates. These differing epigenomes provide explanations for the stable commitment of cells, their potential, and the limitations in their competence to respond to inducing signals. Gene regulatory networks either act within epigenetic constraints or have to overcome these constraints to impose change. One exciting development in recent years was the discovery that it is possible to reprogram somatic cells to a pluripotent state. The transcription regulatory network governing pluripotency and self-renewal has been elucidated (Boyer et al.,2005), and forced expression of several master regulators such as Oct4 and Sox2 along with c-myc and Klf4 can induce differentiated somatic cells to pluripotency (Takahashi and Yamanaka,2006). The ectopic expression of pluripotency regulators itself is breaking an epigenetic barrier; in somatic cells the endogenous Oct4/POU5F1 gene is normally repressed by both DNA methylation and histone H3 K9 methylation (Athanasiadou et al.,2010; Feldman et al.,2006). In addition, small molecules that inhibit chromatin modifying enzymes such as histone deacetylases (HDACs), histone methyltransferases (HMTs) or DNA methyltransferases (DNMTs) can dramatically improve the efficiency with which the somatic epigenome is reprogrammed to pluripotency (Feng et al.,2009; Gonzalez et al.,2011). Valproic acid for example, a histone deacetylase inhibitor, improves reprogramming efficiency more than 100-fold and allows reprogramming with just two pluripotency regulators: Oct4 and Sox2 (Huangfu et al.,2008a,b). Oct4 activates the transcription of two JmjC histone demethylases which remove repressive H3K9 methylation and activate a number of pluripotency-associated genes, including Nanog (Loh et al.,2007). These examples illustrate the much broader point that chromatin determines the developmental potential of cells in dynamic interaction with signaling and transcription factor networks as it influences which genomic regions are receptive to regulatory influences. Chromatin acts as the gatekeeper of genetic information.
DEVELOPMENT OF CHROMATIN STATE
In most metazoans the embryonic genome is transcriptionally quiescent immediately after fertilization. The onset of transcription, also referred to as zygotic gene activation (ZGA) is part of the maternal-to-zygotic transition of regulatory control. This transition involves degradation of maternal RNA and synthesis of new RNA by the embryo [reviewed by Tadros and Lipshitz (2009)]. The timing of the zygotic gene activation varies between species. For example, in mice and zebrafish the transcriptional activation takes place at the two-cell and 512-cell stage, respectively. In Xenopus, a major increase in embryonic transcription occurs after the first 12 cleavages at the mid-blastula transition (MBT, Nieuwkoop-Faber stage 8.5), along with an increase in cell motility and the progressive loss of cell cycle synchrony (Newport and Kirschner,1982a,b). Using sensitive methods, low levels of transcription can be observed before the MBT (Blythe et al.,2010; Kimelman et al.,1987; Nakakura et al.,1987; Skirkanich et al.,2011; Yang et al.,2002), although the overall rate of transcription per cell increases approximately 200-fold at the mid-blastula stage (Kimelman et al.,1987). This new zygotic transcription is essential for gastrulation (Newport and Kirschner,1982a; Sible et al.,1997). Chromatin appears to plays an important role in this dynamic regulation of gene expression, in concert with the transcription machinery and transcriptional activators.
Repressive Chromatin Before the Mid-Blastula Transition
Early work by Newport and Kirschner (1982a,b) has suggested that a maternally derived repressor of transcription is titrated by the exponentially increasing amount of genomic DNA. Metabolic labeling of newly synthesized RNA (mostly RNA polymerase III -dependent tRNA synthesis) for example, starts earlier in polyspermic embryos and embryos injected with exogenous DNA than in control embryos (Newport and Kirschner,1982a,b). Similarly, hyperphosphorylation of the large subunit of RNA polymerase II, indicative of transcription, occurs precociously in polyspermic embryos (Palancade et al.,2001). The molecular nature of the putative suppressor is unknown but chromatin is likely to be involved in conjunction with other mechanisms (Almouzni and Wolffe,1995; Prioleau et al.,1994,1995; Veenstra,2002).
Both the repressive potential of chromatin and the capacity of the basal transcription machinery are dynamic between oocyte maturation, fertilization and the MBT. The transcriptionally quiescent state of chromatin is established during oocyte maturation when the nucleosome density increases and a more repressive chromatin state is established (Landsberger and Wolffe,1997). RNA polymerase II and the transcription initiation factor TBP2 (TBPL2, TRF3) dissociate from chromatin by the time of germinal vesicle breakdown, followed by partial proteolytic degradation of TBP2 (Akhtar and Veenstra,2009). The transcription machinery in the unfertilized egg is competent for transcription except for a deficiency in TBP / TBP2 and the repressive effect of chromatin assembly (Akhtar and Veenstra,2011; Prioleau et al.,1994; Veenstra et al.,1999). DNA topology experiments revealed that efficient and tight packing of nucleosomes is maintained in early embryonic chromatin. In early blastula embryos accumulation of TBP protein, translated from maternal RNA, gradually restores the full capacity of the transcription machinery for initiation (Veenstra et al.,1999).
A number of nuclear proteins may contribute to the repressive nature of chromatin before the MBT. Depletion of the maintenance DNA methyltransferase Dnmt1 in Xenopus embryos led to a precocious expression of mesodermal markers such as cer1, t (brachyury) and otx2 (Stancheva and Meehan,2000). This effect, however, is probably independent of DNA methylation as a catalytically inactive form of Dnmt1 was able to rescue the phenotype (Dunican et al.,2008). Likewise, morpholino knockdown of the zinc finger protein Kaiso led to a premature activation of a number of genes including oct25 (Ruzov et al.,2004). The available results point to multiple mechanisms contributing to transcriptional quiescence before the MBT, involving nucleosome-dense chromatin, specific repressors and a temporal deficiency in the transcription machinery.
Hierarchical Acquisition of Histone Modifications
Not much is known about the mechanisms that make chromatin more accessible around the MBT. One striking feature of cleavage-stage chromatin is the general absence of active histone modifications. Whereas maternally stored histones, which are abundant in early embryos, may carry active histone modifications (Dimitrov et al.,1993), key chromatin-associated histone modifications are largely absent before the MBT (Fig. 2) and emerge in blastula or gastrula embryos (Akkers et al.,2009; Vastenhouw et al.,2010). On early expressed genes such as nodal3.1 (xnr3) and sia1 (siamois), enrichment of H3K4me3 is found before the MBT (Blythe et al.,2010), suggesting that also in these cases the appearance of this permissive mark is in step with the onset of transcription. It is quite striking that the embryonic epigenetic landscape appears to develop newly from an unprogrammed state.
The deposition of the H3K4me3 mark in many cases precedes transcriptional activation of many developmental regulators in both Xenopus and zebrafish (Akkers et al.,2009; Vastenhouw et al.,2010). This may suggest that this newly emerging histone modification creates a more open, permissive chromatin environment that sets the stage for the onset of transcription. During subsequent development, repression contributes to the specificity of gene regulation and the epigenetic stability of lineage commitment and differentiation. Histone modifications are subject to a hierarchy, with active (or permissive) modifications such as H3K4me3 and H3K9ac appearing in pluripotent blastula embryos, and repressive marks such as H3K27me3 and H3K9me3 showing a major increase in enrichment in subsequent stages (Fig. 2, ref. (Akkers et al.,2009) and unpublished data). These observations were made using chromatin immunoprecipitation (ChIP) in combination with quantitative PCR (ChIP-PCR) or massive parallel sequencing (ChIP-seq). An independent assessment of the abundance of histone modifications was performed by quantitative mass spectrometry; the results indeed showed a shift from active to repressive histone modifications during development from blastula to tadpole stages (Schneider et al.,2011), in line with a global hierarchy of active and repressive histone modifications.
The Permissive Nature of Pluripotent Chromatin
Whereas key histone modifications are acquired during early development, levels of DNA methylation remain fairly constant during early Xenopus embryogenesis without major changes in content or distribution (Bogdanovic et al.,2011; Veenstra and Wolffe,2001). Surprisingly, relatively high DNA methylation upstream and downstream of promoter regions is compatible with transcriptional activity, a feature also observed in mammalian ES cells (Fouse et al.,2008). Experiments involving promoter constructs and stable transgenic embryos demonstrated a lack of DNA methylation-dependent transcriptional repression during blastula and gastrula stages and a re-established repression during organogenesis and differentiation (Bogdanovic et al.,2011), (cf. Fig. 2). Together with observations on the acquisition of histone modifications, these results add to the suggestion that blastula stage embryonic chromatin is relatively permissive and receptive to activating signals. This permissive chromatin state may allow temporary or low level transcription of stage- and lineage-specific genes. For example, in gastrula embryos, zygotically expressed MyoD is specifically localized to the marginal zone, in presumptive mesoderm (Hopwood et al.,1989; Harvey,1990). However, MyoD is transcribed ubiquitously at low levels in blastula embryos, independent of mesoderm induction (Rupp and Weintraub,1991). Also, both oocyte and somatic 5S rRNA are expressed in blastula embryos in almost equal proportions (Wormington and Brown,1983). During late gastrula, however, the oocyte-specific 5S RNA genes will be repressed. This increase in regulatory specificity is also observed by in situ hybridization and single cell RT-PCR analysis of ectoderm, mesoderm and endoderm marker genes. In early gastrula embryos, cells with similar positions in the embryo may express markers of different germ layers (Wardle and Smith,2004). These so-called “rogue” cells are characteristic for early gastrula stages and are seen less frequently in late gastrula embryos, suggesting late gastrula embryos are more constrained in their gene expression than early gastrula embryos.
The chromatin environment in mammalian embryos has not been extensively studied. The chromatin state of embryonic stem (ES) cells, however, has been subject to intense investigation and constitutes a well-characterized model system. Pluripotent chromatin in mouse and human ES cells is different from that in Xenopus blastula embryos, most notably in the abundance of repressive H3K27 methylation (discussed in more detail below). Interestingly however, at a global level ES cell chromatin is also relatively open and accessible; it contains fewer heterochromatin foci than differentiated cells and the association of proteins with chromatin is less rigid (Bhattacharya et al.,2009; Meshorer et al.,2006). The genome-wide H3K9me2 profile is largely invariant between ES cells and ES cell-derived neurons (Lienert et al.,2011). However, heterochromatic protein HP1 and H3K9me3, the mark it is associated with, show diffuse and poorly-defined distribution patterns in ES cells when compared to differentiated neural progenitor cells where they both form small, well-defined foci (Meshorer et al.,2006). As assessed by electron microscopy, highly dispersed chromatin fibers are observed in mouse ES cells, 8-cell embryos and pluripotent epiblast cells, in contrast to lineage-committed cells which show a higher degree of chromatin compaction (Ahmed et al.,2010). Chromatin remodeler Chd1 has been proposed to play a role in keeping the ES cell chromatin in a permissive state (Gaspar-Maia et al.,2009). Knockdown of Chd1 results in increased chromatin condensation and compromises the differentiation potential. Altogether, these results suggest that pluripotent chromatin exists in a relatively open, relaxed conformation, followed by chromatin compaction during lineage specification, commitment and subsequent differentiation.
Role of Polycomb in Lineage Specification
Polycomb group (PcG) genes were originally identified genetically in Drosophila, where they are essential for segmentation and the repression of Hox genes outside their normal boundaries of expression. The PcG proteins constitute two functionally related complexes: Polycomb repressor complex 2 (PRC2), which contains a H3K27 methyltransferase subunit (Ezh2 in vertebrates), and PRC1 which can bind to H3K27me3 and direct chromatin compaction [reviewed in (Margueron and Reinberg,2011; Simon and Kingston,2009)].
ChIP enrichment levels of the Polycomb-deposited mark H3K27me3 are generally low in Xenopus blastula embryos (Akkers et al.,2009; Lim et al.,2011; Peng et al.,2009). During gastrulation this modification becomes more abundant at spatially regulated genes, which often show enrichment for both H3K27me3 and H3K4me3 in whole gastrula embryos. Sequential ChIP demonstrated that this coenrichment largely corresponds to different nucleosomal populations, most likely in differently specified cells in the embryo (Akkers et al.,2009). The data suggest a role for H3K27me3 in the spatially regulated response to lineage specification. In support of this model, Geminin has been found to cooperate with PRC2 subunits Suz12 and Ezh2 to restrict multilineage commitment in Xenopus embryos (Lim et al.,2011). Knockdown of Geminin enhanced the cellular response to growth factor signals, resulting in ectopic mesodermal, endodermal and epidermal fate commitment in the embryo.
There are both strong parallels and notable differences between Xenopus blastula embryos and mammalian ES cells. One striking difference is the extent to which H3K27me3 decorates a subset of loci in the genome. Quantitative assessment of the histone modifications by mass spectrometry and western blotting revealed that mouse ES cells contain approximately 100-fold more H3K27me3 in their chromatin than Xenopus blastula embryos (Schneider et al.,2011). Recent studies have shed some light on the high H3K27me3 levels in ES cells by analysis of mouse early embryonic lineages and comparison with ES cells (Dahl et al.,2010; Rugg-Gunn et al.,2010). Cells in the mammalian zygote undergo “restricted differentiation” before pluripotency is established (Guo et al.,2010; Nichols and Smith,2009). In the early blastocyst pluripotency is established in the inner cell mass (ICM) of the early blastocyst, but not in trophectoderm which will give rise to extra-embryonic tissue. The ICM further segregates into the epiblast which will form the embryo proper, and primitive endoderm which will contribute to extra-embryonic tissue. Trophoblast stem cells and extra-embryonic endoderm stem cells almost completely lack the H3K27me3 mark, in contrast to ES cells (Rugg-Gunn et al.,2010). Interestingly, the three derived stem cell lines did not show any differences in the active H3K4me3 mark. Analysis of the embryonic tissues rather than stem cell lines derived from these tissues has shown a similar asymmetric distribution of H3K27me3 between ICM and trophectoderm (Dahl et al.,2010). Noteworthy is that ES cells and the ICM they are derived from, display large differences in the distribution and coenrichment of H3K4me3 and H3K27me3, mostly due an increase in H3K27 methylation upon ES cell derivation (Dahl et al.,2010). Similarly, expression profiling of ES cells and ICM outgrowths revealed robust expression changes during ES cell derivation; the expression levels of approximately half of the known epigenetic modifiers change during ICM outgrowth, with an overall increase for repressive regulators (Tang et al.,2010). Together these data indicate that the “restricted differentiation” that produces pluripotent cells in the mammalian blastocyst is accompanied by H3K27 methylation, and moreover that this methylation further increases during derivation and culture of ES cells in vitro.
In ES cells H3K27me3 is found in so-called bivalent domains, regions double-marked for the permissive H3K4me3 and repressive H3K27me3 modifications (Azuara et al.,2006; Bernstein et al.,2006; Boyer et al.,2006; Mikkelsen et al.,2007; Pan et al.,2007). Based on sequential ChIP experiments on a number of loci, these modifications are thought to co-occur on the same nucleosomal DNA. This bivalent chromatin is also thought to repress genes while keeping them poised for activation during differentiation. Indeed, differentiation causes many bivalent regions to be resolved to monovalent H3K4me3 or monovalent H3K27me3 by loss of one of these modifications (Mikkelsen et al.,2007). During differentiation new bivalent domains are acquired as well and they are by no means specific to the mammalian chromatin state of pluripotency (Golebiewska et al.,2009; Mikkelsen et al.,2007; Pan et al.,2007; Roh et al.,2006). Some of the apparent bivalency in ES cells may not reflect co-occurrence of the two histone marks on the same nucleosomal DNA. ES cell populations are heterogeneous and sensitive to destabilization of pluripotency (Wray et al.,2010). A recent study performed on human ES cells revealed that upon fractionation based on neural or mesodermal gene expression, the ES cells with “bivalent” chromatin segregated into subpopulations with clearly defined monovalent signatures corresponding to their developmental potential (Hong et al.,2011). Dissolving bivalency into a monovalent H3K4me3 or H3K27me3 chromatin state is promoted by the molecular interactions of the enzyme complexes involved. PRC2, the histone methyltransferase responsible for H3K27me3 deposition, is inhibited by the presence of active marks such as H3K4me3 and H3K36me2/3 (Schmitges et al.,2011), whereas it is stimulated by the presence of H3K27me3 (Hansen et al.,2008; Margueron et al.,2009). These properties facilitate maintenance and spreading of H3K27me3 but also contribute to inhibition of spreading at the boundaries of active chromatin. Adding to the antagonism between H3K4 and K27 methylation is the interaction between histone methyltransferases and demethylases; the H3K4 demethylase Rbp2 (Jarid1a) interacts with PRC2, and UTX, a histone H3 lysine 27 demethylase interacts with the MLL H3K4 methyltransferase complexes (Lee et al.,2007; Pasini et al.,2008). This antagonistic behavior of the two marks in deposition, removal and transcriptional consequences may contribute to the transitory nature of pluripotency in vivo.
The targets of Polycomb are significantly conserved between Xenopus gastrula embryos and mammalian ES cells (Akkers et al.,2009), and are generally linked to lineage commitment and differentiation. The function of Polycomb-mediated repression is to preserve lineage identity and restrict multilineage gene expression. For example, deletion of Ring1B, a PRC1 subunit gene, destabilizes mammalian ES cells due to aberrant expression of trophoblast stem cell, extra-embryonic endoderm and neural marker genes (Leeb and Wutz,2007); it predisposes the cells to differentiation but does not preclude functional pluripotency within the population of cells. Similar observations have been made in ES cells lacking PRC2 components Eed and Suz12 (Chamberlain et al.,2008; Montgomery et al.,2005; Pasini et al.,2007). These data are highly concordant with the cooperative role of Geminin and Ezh2 in reducing multilineage gene expression in Xenopus embryos (Lim et al.,2011). Given these similarities it seems that one of the main differences between Xenopus embryos and mammalian ICM and ES cells is the extent to which lineage-specific gene expression is repressed by Polycomb in the pluripotent stage; the pluripotent chromatin state in Xenopus embryos is rather naive and receptive to activation, whereas the early mammalian segregation of cell lineages (epiblast, primitive endoderm, and trophectoderm) and the process of ES cell derivation favor a more constrained, enforced type of pluripotent chromatin state.
Linker Histones and the Restriction of Cellular Competence
Linker histones dominate the most abundant type of inactive chromatin in Drosophila cells (Filion et al.,2010). Xenopus oocytes and early (pre-MBT) embryos are largely depleted of histone H1. However, they contain a variant linker histone called B4 (H1M), which is replaced by the canonical (somatic) H1 form during early development (Dworkin-Rastl et al.,1994). Experiments performed on Xenopus animal cap explants, derived from embryos injected with H1 or B4 (H1M) mRNA, identified H1 as the rate-limiting factor responsible for the loss of mesodermal competence (Steinbach et al.,1997). These data demonstrated that H1, but not the maternally contributed B4, has the ability to selectively repress genes involved in mesoderm induction. Interestingly, histone B4 has a relatively low affinity for the chromatin template compared to H1 (Ura et al.,1996). This may partly explain the transcriptionally permissive nature of pluripotent chromatin observed around the MBT. Linker histone H1b has been demonstrated to interact with Msx1, a homeobox protein transcription factor, to repress MyoD, a myogenic master regulator that is activated upon mesoderm induction. This provides a potential mechanism for the loss of mesoderm competence (Lee et al.,2004). The role of linker histones has also been studied within the context of nuclear reprogramming (Jullien et al.,2010). Nuclear transplantation of mammalian somatic nuclei to Xenopus oocytes demonstrated that the acquisition of B4 histone variant is crucial for the reactivation of pluripotency genes. Surprisingly, when linker H1 is overexpressed in oocytes, the transplanted nuclei retain their H1 histone in addition to gaining the B4 variant suggesting that H1 and B4 bind to distinct sites in nuclear chromatin (Jullien et al.,2010). These studies highlight the role of chromatin-based transcriptional repression in restricting cellular competence.
EPIGENOMICS: FUNCTIONAL GENOMIC ELEMENTS AND VALIDATION OF GENE MODELS
In the previous section, the role of epigenetic regulation and the exquisite cellular choice of lineage and identity in early development was discussed. Epigenetic studies however, also have another application. Epigenetic modifications define specific genomic elements, something that can be exploited to explore and annotate the genome. Initial annotation of newly sequenced genomes can be carried out on the basis of the sequence information alone. EST and cDNA sequences in combination with ab initio methods can be used to produce reasonably accurate gene models. Some functional elements, such as promoters, can be partly predicted using the primary sequence as input. However, many other functional elements, especially those in noncoding regions of the genome, can not be characterized based on just the sequence content. Yet, these regulatory regions show unique epigenetic characteristics, such as patterns of specific histone modifications. Integration of experimental data from different high-throughput methods enables the identification and characterization of these elements. Two public research consortia named ENCODE (Encyclopedia Of DNA Elements) and modENCODE (Model Organisms ENCODE) aim to identify all functional elements in respectively the human genome and the genome of two invertebrate model organisms (D. melanogaster and C. elegans) (ENCODE Project Consortium,2004; Celniker et al.,2009).The key findings of the pilot phase of ENCODE were reported in Nature (ENCODE Project Consortium,2007) and in a special issue of Genome Research (2007, volume 17, issue 7); over 200 experimental and computational data sets revealed the complexity of genome function and highlighted the importance of chromatin and its epigenetic modifications. Histone modifications are predictive of transcription start sites, and gene-distal DNAseI hypersensitivity sites (enhancers and insulators) have characteristic histone modifications (ENCODE Project Consortium,2007). Extending the pilot phase of ENCODE to the complete genome has yielded a treasure trove of data on gene expression and regulation (ENCODE Project Consortium et al.,2011). Likewise, recent modENCODE reports showed how different types of genomic data could improve the functional genome annotation of important model organisms (Gerstein et al.,2010; Roy et al.,2010).
Epigenomic analyses can be roughly grouped in four categories: 1) Identification and quantification of protein-coding and regulatory RNAs, 2) determining protein-DNA binding profiles of transcription factors, histone modifications, and chromatin-associated proteins using ChIP, 3) characterization of chromatin structure and DNA accessibility, and 4) mapping DNA methylation. All these different types of data assay specific functional properties of the genome. Integration of these different data sets provides additional information and enables comprehensive functional annotation.
RNA sequencing (RNA-seq) can be used to determine the expression level of genes. As this technique does not depend on any previous genomic annotation it can also be used to detect alternative splice variants and small and large noncoding RNAs (Armisen et al.,2009; Guttman et al.,2009; Roy et al.,2010). In another approach the 5' ends of capped transcripts can be sequenced, which allows one to determine the genome-wide dynamics of transcription start site (TSS) usage (deepCAGE or TSS-seq) (FANTOM Consortium,2009; Tsuchihara et al.,2009; van Heeringen et al.,2011).
ChIP-sequencing profiles have proven to be useful in annotating functional regions of the genome that are difficult to identify based on sequence information (Ernst and Kellis,2010; Ernst et al.,2011; Hon et al.,2009; Rada-Iglesias et al.,2011; Roy et al.,2010). Specific combinations of histone modifications are correlated with transcription initiation and elongation, splicing, enhancer activity and transcriptional repression. These patterns can be used to predict the function of specific regions. Promoters, or TSS-proximal regions, are marked by H3K4me3 and H3K9ac, with levels that correlate with expression (Bernstein et al.,2005; Kim et al.,2005; Roy et al.,2010). In addition H3K36me3 and H3K79me1 marks are enriched over the gene body with a bias towards the 3' end (Barski et al.,2007; Roy et al.,2010).
Figure 3 shows how RNA-seq and TSS-seq data together with ChIP-seq data of histone modifications and factors of the general transcription machinery can be used to improve gene annotation in Xenopus. The TSS (right orange arrow) is clearly marked by enrichment of H3K4me3 (green), a peak of the transcription initiation factor TBP (pink) and several TSS-seq reads marking the 5' end of capped transcripts (red). RNAPII (purple) shows enrichment over the transcribed regions, with a peak at the 5' end of the gene. In addition, the RNA-seq reads (blue) spanning splice junctions show that two exons of the annotated gene are skipped (left orange arrow), at least at this developmental stage. These RNA-seq and ChIP-seq profiles can be used as input in a computational pipeline to improve gene annotation. Gene annotation from different sources was compared to experimental data sets including EST and RNA-seq data, H3K4me3 and RNAPII ChIP-seq enrichment to update and improve the existing (Joint Genome Institute 4.1 Filtered Models - JGI FM) Xenopus tropicalis gene models (Akkers et al.,2009). In these Xenopus tropicalis experimentally validated (Xtev) gene models 3,595 genes were annotated with an updated 5' end with H3K4me3 evidence, and a total of 9,350 new exons were annotated. In addition, using EST evidence in combination with H3K4me3 data truncated gene models on different scaffolds can be linked in cases where promoter and coding regions were placed on different sequence contigs during genome assembly (Akkers et al.,2010). The TSS-seq (or deepCAGE) data in combination with TBP ChIP-seq peaks can also accurately identify the start sites of transcription, which can be used for detailed analysis of the core promoters (van Heeringen et al.,2011). Comparison between Xenopus and human promoters revealed both conserved and divergent aspects of vertebrate promoters. The nucleotide composition of Xenopus and human promoters is significantly different. Xenopus promoters have a relatively high AT content, while human promoters are more GC-rich. Most promoter motifs are conserved despite this difference; they occur with similar frequency in promoters of both organisms irrespective of the sequence background. However, some motifs such as SP1 and the TATA box vary in step with this background; GC-rich SP1 sites more often occur in human promoters, whereas the TATA box is more prevalent in Xenopus promoters. Both SP1 and TBP, which bind the SP1 and TATA box motifs respectively, can recruit the general transcription factor complex TFIID. Thus, the different sequence composition of Xenopus and human promoters is accompanied by differences in occurrences of important motifs for transcription initiation. This phenomenon likely reflects an adaptive process to recruit TFIID to CpG island and sharply initiating promoters (van Heeringen et al.,2011). The Xenopus tropicalis ChIP-seq, RNA-seq and TSS-seq data, and DNA methylation (MethylCap) and CpG island tracks are available for the community and can be used to explore the genomic context of individual genes and their regulation in the genome browser (see Box 1).
BOX 1. Exploring Xenopus deep sequencing data in a genome browser.
1Xenopus tropicalis ChIP-seq, RNA-seq and TSS-seq data, DNA methylation (MethylCap) and CpG island visualization tracks are available at www.ncmls.nl/gertjanveenstra (menu link “Resources,” select link to the “ChIP-seq, RNA-seq and MethylCap-seq data.” From the Genome Data page, the tracks are available in Wiggle (WIG) or BED (Browser Extensible Data) format. Copy the track URL to the clipboard (right click, “Copy link location”).
2Go to a genome browser, for example the UCSC genome browser (genome.ucsc.edu). Select “Genomes” from the menu, then select clade “Vertebrate,” genome “X. tropicalis” and the right genome assembly from the drop down menus. Click on “Manage custom tracks,” “Add custom tracks,” paste the track URL in the appropriate text box, and click “Submit.” Once the data is uploaded, click on “Go to genome browser.”
3Browse by genomic location or search for your favorite gene.
4Consider storing your uploads in a Session (top menu) for future use.
Tracks will also be available at Xenbase (www.xenbase.org), the authoritative Xenopus model system database. Raw sequencing data can be accessed at the NCBI GEO web site (www.ncbi.nlm.nih.gov/geo).
Regulatory regions such as promoters, enhancers and insulators are associated with “open” chromatin. Enhancers are cis-acting regulatory elements that can activate genes by long-range interactions with promoters (Miele and Dekker,2008). They can be found both in intragenic as well as intergenic locations, sometimes hundreds of kilobasepairs away from the promoter they are interacting with. The interactions are formed by looping out the intervening DNA (Fullwood et al.,2009; Kagey et al.,2010). To date, most enhancers have been found by comparative analyses of conserved noncoding sequences (CNSs), which are enriched for enhancers (Zhang and Gerstein,2003). Indeed, sequencing of the X. tropicalis genome has allowed the identification of several highly specific enhancers surrounding the six3 gene (Hellsten et al.,2010) and a number of other loci (de la Calle-Mustienes et al.,2005; Meadows et al.,2009; Ogino et al.,2008; Seo et al.,2007; Tena et al.,2011). Many CNSs however, have no detectable enhancer activity and may be conserved for other reasons. In addition, deep sequence conservation is not always observed at functional enhancers, most likely because the functional elements of enhancers (transcription factor binding sites) are relatively short and the intervening sequence need not be conserved. Chromatin modifications and binding of specific factors can be used to reliably identify enhancers and predict specificity. These regions are characterized by strong enrichment of H3K4me1 while being depleted for H3K4me3 (Heintzman et al.,2007). In addition, RNA polymerase II has been found at active enhancers (Kim et al.,2010) and enhancers often show binding of the transcriptional coactivator p300 and enrichment of H3 acetylation in a cell-type specific manner (Heintzman et al.,2009; Visel et al.,2009). These epigenetic characteristics can be employed to reliably identify and/or predict enhancers in a tissue- or developmental stage-specific manner (Rada-Iglesias et al.,2011; Visel et al.,2009; Zentner and Tesar,2011).
These different classes of epigenomic data, generated using a variety of techniques, all offer a view of different aspects of regulation. Integration of epigenetic data from different sources can provide an unprecedented insight into gene regulation on a genome-wide scale. Examples shown here include improvement of gene annotation, such as TSSs and gene models, and identification of important regulatory elements such as enhancers. This regulatory annotation will be essential to further our understanding of tissue- and stage-specific gene regulation and development.
Epigenetic mechanisms filter genetic information, allowing cells to specialize by expression of only a subset of all genes. These mechanisms are key to development and differentiation. The rapid adoption of Next Generation Sequencing technology is expected to spur the generation of high quality Xenopus data sets by sequencing of total and polyA+ RNA, small and large noncoding RNA from embryos and tissue explants. Chromatin immunoprecipitation analyses of histone modifications and sequence-specific DNA binding factors will provide additional mechanistic insight and identify and characterize a wider variety of genomic elements, including heterochromatic elements, enhancers and insulators. This may also shed new light on the important question of the developmental origins of chromatin state and address how regulatory epigenetic patterns develop, what their origins are in terms of developmental process and molecular mechanisms, and how they affect gene regulation and development. At the same time, genome scale analyses of cis-regulatory elements will allow integrated systems biology approaches to explore gene-regulatory networks. These approaches, when pursued in powerful model systems such as Xenopus, can provide more insight in the gene regulatory mechanisms of development, generate a rich resource for the scientific community, and contribute to a better understanding of the developmental and epigenetic origins of congenital disease.