Institute for Brain Disorders and Neural Regeneration, Departments of Neurology, Neuroscience and Psychiatry and Behavioral Sciences, Einstein Cancer Center and Rose F. Kennedy Center for Research in Mental Retardation and Developmental Disabilities, Albert Einstein College of Medicine, Bronx, New York, NY, USA
There is increasing evidence that dynamic changes to chromatin, chromosomes and nuclear architecture are regulated by RNA signalling. Although the precise molecular mechanisms are not well understood, they appear to involve the differential recruitment of a hierarchy of generic chromatin modifying complexes and DNA methyltransferases to specific loci by RNAs during differentiation and development. A significant fraction of the genome-wide transcription of non-protein coding RNAs may be involved in this process, comprising a previously hidden layer of intermediary genetic information that underpins developmental ontogeny and the differences between species, ecotypes and individuals. It is also evident that RNA editing is a primary means by which hardwired genetic information in animals can be altered by environmental signals, especially in the brain, indicating a dynamic RNA-mediated interplay between the transcriptome, the environment and the epigenome. Moreover, RNA-directed regulatory processes may also transfer epigenetic information not only within cells but also between cells and organ systems, as well as across generations.
Epigenetics has become a subject of intense interest, as it deals with the contextual information that is superimposed on the relatively stable underlying genomic sequence, by the modification of DNA (and RNA) and the modulation of chromatin structure. While not usually explicitly acknowledged, there is often an ambiguity about whether the term ‘epigenetic information’ refers mainly to that which is intrinsically acquired and transmitted within the trajectories of differentiation and development (as only a subset of genes are active within any given cell), and that which is acquired as a consequence of gene–environment interactions, although the two pathways probably intersect, at least in part. It is likely that most epigenetic changes underlying developmental processes are driven by internal feed-forward programs embedded within the genome (as a kind of first derivative of DNA information), and that the role of gene–environment communication (including that conveyed by cell–cell signalling) is to supplement and to fine-tune these endogenously programmed epigenetic cascades,1 as well as to respond to external physiological parameters.
The fine control of chromatin structure is one of the major hallmarks of eukaryotes and of gene regulation during multicellular development. Chromatin architecture is altered by methylation of the DNA and by various types of modifications to histones (the so-called ‘histone code’), including compound patterns of methylation, acetylation, phosphorylation, ubiquitinylation, sumoylation, ADP-ribosylation, carbonylation, deimination and proline isomerization at various residues.2 This epigenetic information can modify gene expression in differentiated cells and is often inherited within cell lineages (‘epigenetic memory’), although the mechanisms involved are not well understood.3 The major focus of the field to date has been on how such chromatin ‘marks’ are distributed in relation to features such as active or inactive promoters and vary in different developmental and disease (cancer) states, studies which are increasingly being extended genome-wide.4–6 However, there has been comparatively little advance in our understanding of how such modifications are differentially regulated and precisely targeted to a myriad of different genomic positions in different cell lineages during normal and abnormal growth and development.
One possibility is that a significant component of this targeting may be regulatory RNAs.7, 8 It is clear that non-protein coding RNAs (ncRNAs) dominate the transcriptional output of mammals and other complex organisms,9, 10 and that ncRNAs regulate many levels of gene expression during development.11 Indeed the eukaryotic genome, rather than being viewed as islands of protein-coding genes in an expanding sea of evolutionary ‘junk’, may be better thought of as an RNA machine12 which expresses large repertoires of developmentally regulated ncRNAs,9, 10 that are central to the genetic and epigenetic processes that orchestrate the exquisitely precise patterns of gene expression during the ontogeny of multicellular organisms.1
Evidence for RNA-directed regulation of chromosome structure and chromatin architecture
The evidence that RNA-directed processes help to orchestrate chromatin architecture and epigenetic memory is growing rapidly and is already compelling.8, 11 There are only a limited number of enzymes (DNA methyltransferases, histone methyltransferases, acetylases, deacetylases etc.) and repressive and permissive (Polycomb-group and Trithorax-group) chromatin-modifying complexes involved, very few of which are known to have affinity for particular DNA sequences. However, these modifications must be purposefully directed to different positions in different loci in different cells, which implies that there must be another layer of information to guide this process. While there is evidence of association of chromatin-modifying proteins with transcription factors at gene promoters,13 an important source of additional information may be RNA, which has the capacity for a high degree of sequence- and locus-specificity.
In support of this proposition, it is known that RNA is an integral component of chromatin14 and that many of the proteins involved in chromatin modifications (as well as transcription factors7, 15) have the capacity to bind RNA or complexes containing RNA. These include DNA methyltransferases and methyl DNA binding domain proteins,16 heterochromatin protein 1 (HP1),17 the multi-KH domain protein DPP1 which suppresses heterochromatin-mediated silencing in Drosophila,18 and domains commonly found in chromatin remodelling enzymes and effector proteins such as SET domains, tudor domains and chromodomains.8, 19–21 Moreover, the vast majority of the genomes of all metazoans, from worms to human, and probably plants, is transcribed in a developmentally regulated manner, mainly into ncRNAs with complex patterns of overlapping and interlacing transcripts from both strands,9, 10 potentially providing a rich source of regulatory molecules to guide the epigenetic trajectories of development.1 For example, there are large numbers of non-polyadenylated transcripts whose functions are elusive,10 but which comprise the prevalent RNA population present in isolated chromatin fractions.14
Many aspects of the regulation of chromatin structure have been shown to be directed by RNA (Fig. 1). RNA plays a central role in DNA methylation and transcriptional silencing, via the RNA interference (RNAi) pathway, both in plants22, 23 and in animals,24, 25 with associated alterations to chromatin structure involving Polycomb complexes in Drosophila26 and in human cells,27 the latter involving exogenous promoter-directed RNAs. There is also rapidly emerging evidence that promoter-associated and nascent RNAs can regulate transcription via epigenetic mechanisms, apparently functioning as sources and/or targets of other RNAs, or by directly recruiting regulatory and effector proteins,25, 28–30 including the recent finding that signal-induced ncRNAs can act as selective ligands to modulate histone acetyltransferase activity at specific genomic positions.30 Interestingly, the co-regulatory RNA binding protein involved, TLS (‘translocated-in-liposarcoma’), is a high-affinity interactor for steroid, thyroid hormone and retinoid receptors, whose activity also involves interactions with histone-modifying proteins.31 Moreover, these observations suggest that other RNA-binding co-regulators may be analogously recruited to transcription units through gene-specific ncRNAs to modulate their local chromatin context.30
Heterochromatin formation appears to be broadly regulated by small RNAs.32 RNAi-related processes, including RNAi-dependent histone methylation and recruitment of RITS (RNA-induced initiation of transcriptional gene silencing) complexes, have been shown to be involved in heterochromatin assembly, centromere formation and chromosome dynamics during the cell cycle in fission yeast32–34 (Fig. 1). Similar processes are involved in heterochromatin formation and programmed DNA elimination in ciliates,35, 36 as well as heterochromatin formation and nucleolar organization in Drosophila.18, 37 The nuclear organization of chromatin insulators and chromatin domains is also affected by the RNAi machinery38 and recent deep sequencing studies have shown that double-stranded RNAs formed by sense-antisense transcript pairs originating from inverted repeats, bidirectional/antisense transcripts from retrotransposons, pseudogenes and mRNAs in mouse oocytes and Drosophila somatic cells are processed into large numbers of small RNAs that may have regulatory functions in epigenetic pathways.39–42
PIWI-interacting RNAs (piRNAs), which control transposon activity in animals, are also involved in heterochromatin formation.43, 44 PIWI interacts with HP1a43 and the epigenetic silencing of retrotransposons in the mouse germline involves sequence-specific targeting of DNA methylation by PIWI–piRNA complexes that are dynamically modulated in their composition and temporal expression patterns during male germline development.45, 46 RNA-directed DNA methylation (RdDM) is well established in plants, whereby RNA Pol IV transcripts are processed by DICER-LIKE3-dependent endonuclease to generate 21–24nt small RNAs that are incorporated into AGO447, 48 to guide DRM1/2 methylation activity to the region of genomic DNA homologous to the siRNA sequence.49 Promoter-directed siRNAs can induce promoter methylation and transcriptional gene silencing, allowing plants to regulate transcription of genes during development.50 SiRNAs may also be incorporated into the RdDM pathway to silence transposons and repeat portions of the genome,51, 52 analogous to the silencing of retrotransposons by piRNAs in animals.43, 44 Moreover, deep sequencing has revealed that one-third of all methylated DNA sequences correlate with small RNAs in Arabidopsis flowers.53 Indeed the epigenetic plasticity afforded by RdDM may have contributed to the evolution of flowering plants with a recent study showing that small RNA-mediated RdDM controls the epigenetic changes underpinning phenotypic differences between two closely related Arabidopsis ecotypes.54 Interestingly, not only is DNA methylation RNA-directed, but also it can be dynamically counteracted by the 5-methylcytosine DNA glycosylase ROS1, which is regulated by an RNA-binding protein, ROS3, and involves association with regulatory RNAs.55
Long ncRNAs are also involved in many epigenetic processes, with increasing reports that RNAs can direct and regulate both chromatin activator complexes (CACs)56–59 or chromatin repressor complexes (CRCs)60–66 (Fig. 1). Good examples are the well-known role of roX ncRNAs in Drosophila dosage compensation, which guide the generic MSL complex to specific sites on the X-chromosome to promote global gene activation in males,56 the involvement of ncRNAs in parental imprinting,61, 67, 68 and Xist/Tsix-mediated X-chromosome inactivation in mammals61 which has recently been shown to intersect with the RNAi pathway.42 Other epigenetic processes that are also regulated by RNA include the regulation of rDNA copy number,69 T-cell receptor recombination70 and maintenance of telomere integrity.71
Several recent papers are beginning to give insights into the mechanisms involved. At the Kncnq1 and Igf2r imprinted clusters, the long antisense ncRNAs Kcnq1ot1 and Air have both been shown to coat target chromatin regions (similar to Xist) and to interact with histone methyltransferase complexes (G9a and Polycomb) to direct the imprinting of specific genes in placental tissues.63–65 The establishment and maintenance of imprinting mediated by Kcnq1ot1 involves the recruitment of the imprinted domain to perinucleolar compartments rich in heterochromatic machinery72 and the formation of repressive higher-order chromatin structures mediated by Polycomb.65 Similarly, in X-chromosome inactivation a 1.6 kb transcript from an internal repeat region of the Xist locus, RepA, directly interacts with and recruits Polycomb complexes to their target loci on the X chromosome.66
Many regulatory regions affecting chromatin structure and the expression of adjacent protein-coding genes are transcribed in spatially- and temporally regulated ways.57, 73 At least some of these transcripts play important roles in regulation of gene expression by targeting global protein regulators such as HP1, Ash1 and the chromatin insulator protein CP190 to the cognate sequences in cis-regulatory response elements, including Polycomb- and Trithorax-response elements (PREs and TREs).26, 38, 57, 74 Proteins of the Polycomb group and Trithorax group are important global regulators of transcriptional silencing/activation and mediators of epigenetic memory in development, best characterized in homeotic loci.75 Many PREs and TREs, such as those at the bxd locus within the Ultrabithorax (Ubx) region, are transcribed into ncRNAs58 which have been reported to recruit the SET domain-containing epigenetic regulator Ash1 to activate the Ultrabithorax locus.57Hox gene loci in mammals also exhibit complex patterns of non-coding transcripts on both strands.76 Indeed, it has been shown that over 200 long ncRNAs associated with human HOX gene clusters are co-linearly expressed along developmental axes, and that one of these ncRNAs (termed HOTAIR), originating from the HOXC locus, recruits Polycomb complexes to repress gene expression of the HOXD cluster in trans,60 indicating that non-coding transcription is not simply altering local chromatin structure.
More recently, a large-scale analysis identified 174 ncRNAs that are differentially expressed during the differentiation of mouse embryonic stem (ES) cells, many correlating with pluripotency or specific differentiation events.77 A number of these ncRNAs showed coordinated expression with genomically associated developmental genes such as Dlx1, Dlx4, Gata6 and Ecsit. Two developmentally regulated ncRNAs, Evx1as and Hoxb5/6as, which are derived from homeotic loci and share similar expression patterns and localization in mouse embryos with their associated protein-coding genes, were shown to be associated with trimethylated H3K4 histones and the histone methyltransferase (HMT) Mll1, suggesting a role in epigenetic regulation of homeotic loci during ES cell differentiation.77
The potential involvement of RNA (and inherited variations in loci encoding these RNAs) in epigenetically mediated human disease is also presaged by the observation that a particular type of thalassemia involves silencing of the α-globin gene HBA2 and methylation of its associated CpG island early in development, which is mediated by the transcription of an antisense RNA associated with an abnormally juxtaposed gene.78 In addition, in a study of ncRNAs associated with tumour suppressor genes, a long antisense RNA (p15AS) associated with the p15 locus was found to specifically act to alter histone methylation to silence the expression of the gene,62 with important implications for tumourigenesis. Taken together, these data indicate that long ncRNAs are likely to be important in both normal and abnormal developmental processes in some, if not many, cases through engagement of the epigenetic machinery.
Nevertheless, very few of these ncRNAs (of which there are tens of thousands)9, 10 have been studied, and there are almost certainly many more that are involved in such pathways and remain to be identified and functionally characterized. Moreover, a large fraction of the mammalian genome is comprised of transposon-derived sequences that are often transcriptionally active. Although often pejoratively referred to as ‘repeats’ and assumed to be non-functional ‘selfish’ DNA, many transposon-derived sequences are expressed in interesting patterns in development and appear to play a significant role in developmental regulation.51, 79 Recent evidence suggests that tissue-specific transcription of at least some of these repeats functions to organize the locus concerned into nuclear compartments as a developmental strategy to establish functionally distinct domains to control gene activation during development.80 These and other observations of the functionality of transposon-derived sequences (which were first described by McClintock as ‘controlling elements’) calls into question the assumption that ancient repeats may be used as an index of the rate of neutral evolution (unconstrained sequence drift), and therefore also the derived estimate that only 5% of the human genome is under ‘purifying’ selection.81
Finally, in yeast, many unstable ‘cryptic’ ncRNAs are barely detectable by conventional expression analysis but are up-regulated upon depletion of components or co-factors of the RNA-processing exosome complex.82 Although initially assumed to be transcriptional noise, it was recently found that the expression of these transcripts can be controlled by chromatin remodelling,83 and that some are exported to the cytoplasm.84 Moreover, it was found that specific cryptic RNAs in yeast, which are regulated during chronological aging, direct the histone deacetylase Hda1 to the PHO84 locus to repress its expression.85 A similar mechanism may be involved in heterochromatic gene silencing86 and gene activation.59 Hundreds of ncRNAs ‘reminiscent of cryptic transcripts in yeast’ have been detected in Arabidopsis and it seems likely that there are many rare or cell-specific short-half life functional ncRNAs operating to regulate gene expression and chromatin architecture in eukaryotes.87
Thus far, we have been concerned only with RNA-directed alterations to chromatin structure during programmed development or developmental abnormalities such as cancer. However, RNA is also involved in the transmission of environmental information into the system via RNA editing, which may in turn influence regulatory circuits that are regulated by RNA. RNA editing in vertebrates occurs via two classes of enzymes, the ADARs (one of which, ADAR3, is brain-specific) that catalyse adenosine deamination to inosine88 and the APOBECs (two of which, APOBEC1 and APOBEC3, are specific to mammals, the latter having been greatly expanded and subjected to positive selection in the primate lineage) that catalyse cytidine deamination to uracil.89, 90 RNA editing has been a well-recognized phenomenon throughout metazoan evolution and occurs in most if not all tissues, but is particularly active in the brain, with a dramatic increase in the incidence of RNA editing during vertebrate, mammalian and primate evolution,88 strongly suggesting an association with the development of more advanced cognitive abilities. There are well-characterized iconic examples of RNA editing altering the amino acid sequence and splicing patterns of neurotransmitter receptors, presumably to alter the electrophysiological properties of the synapse. RNA editing has also been shown to alter both miRNAs and their targets,91 indicating that these fundamental circuits can also be dynamically modulated. One cannot imagine that this is a random process and indeed inositol hexaphosphate is complexed within the active site of ADAR2,92 implying a link to cell signalling pathways. Moreover, the existence of RNA editing in many tissues suggests that environmental information is being fed into RNA-mediated pathways in many different contexts, with every reason to expect that at least some of this information will result in both immediate and longer-term epigenetic effects. It has also recently been shown that there is a global reduction of A–I editing and complex gene-specific alterations of editing patterns in tumours versus normal tissues, and that overexpression of ADARs resulted in a decreased proliferation rate of glioblastoma cells.93
Intriguingly, two orders of magnitude more RNA editing is observed in human transcripts than in mouse, the vast majority of which occurs in Alu sequences, which are primate-specific and whose genomic distribution suggests positive selection.94 While it is sometimes thought that such editing is a means of silencing retrotransposons,91 most Alu sequences are not active as such, and indeed the vast majority of the ∼1 million copies in the human genome are in fact unique sequences.95 An alternative interpretation of these observations is that Alu elements provided an important platform for the expansion of RNA editing in primates, driven by and underpinning the development of higher order cognition.94 Since most of these edited elements occur in non-coding sequences, one presumes that they are largely regulatory, affecting brain development and function. Alu RNAs have been shown to act as transacting transcriptional repressors by binding RNA polymerase II96 and to be involved in the regulation of alternative splicing, translation and mRNA stability.97 Moreover, non-coding RNA expression appears to be particularly active in the brain98 and it is known that both RNA transport and epigenetic changes, in which RNA molecules are increasingly implicated, are important in memory formation.99, 100
Intercellular and intergenerational epigenetic signalling by RNA
Most RNA regulatory circuitry is cell autonomous, but recent evidence suggests that RNA may also convey epigenetic information between cells and across generations. It has been known for some time that the epigenetic phenomenon of co-suppression in plants, which is mediated by the RNAi pathway, can be transmitted systemically following grafting of a transgenic scion onto a wildtype plant.101 Plants use similar pathways to coordinate normal developmental processes.49 Transport of these RNA signals occurs locally via plasmodesmata or systemically via phloem.102
Intercellular RNA signalling may also occur in animals.103 Most animals have orthologs of the Caenorhabditis elegans protein Sid1, a transmembrane protein that is required for the systemic spread of RNA interference and which allows the import of dsRNA into the cell.103, 104 There are two paralogs of Sid1 (SidT1 and SidT2) in mammals. Mammalian SidT1 has been confirmed to similarly import dsRNA across the cell membrane.103 SidT1 and SidT2 are expressed in most tissues and cell types in humans and mice, and exhibit specific expression patterns in the brain, suggesting that they have specialized RNA transport functions, possibly for different types of RNA substrates.103 There is also evidence for miRNA stably circulating in blood105 and for RNA transport between neurons and glial cells.106 Parenthetically, given the impermeability of the blood–brain barrier to RNA, it may be that this barrier (and the similar barrier in the testis) functions in part to privilege and segregate from systemic circulation the intercellular RNA signaling networks in these organs, both of which are known to be rich in RNA expression. There is also increasing evidence that RNA transport may be mediated by circulating microvesicles and that such microvesicles can convey developmentally relevant information.103 In addition, RNAs can be transmitted between cells in close contact, including from nurse cells to oocytes in Drosophilavia ‘ring canals’, between germ cells in mouse spermatogenesis via ‘cytoplasmic bridges’, and between human ES cells via gap junctions.103 Finally, genetic analysis suggests that specific genes are required for the transport of dsRNA into the germline and for RNAi-mediated gene silencing of germline-expressed genes in C. elegans, including PPW-1, which encodes a PAZ/PIWI domain protein of the Argonaute family, as well as three genes whose function is unknown but one of which (rsd-3) has both mouse and human orthologs and contains an ENTH domain commonly found in proteins involved in vesicle trafficking.107
There are many studies indicating that epigenetic memory may be heritable in both plants and animals, and suggesting that this process is RNA-directed.108–110 As noted already, parental imprinting is intimately linked to ncRNAs61 and it is known that RNAi-mediated gene silencing can be inherited for several generations in C. elegans.104 It has also been reported that modifiers of epigenetic reprogramming show paternal effects in mice.111 Perhaps the most exciting aspect of this field, and one with the capacity to change our view of inheritance and evolution,109 is the recently described phenomenon of ‘paramutation’. Paramutation refers to the allele-specific transfer of epigenetic information to cause the heritable silencing of one allele by another and appears to involve RNA signalling in both maize and mice.108–110 Moreover, it has been reported that RNAs directed at the coding region of cdk9, an important regulator of cardiac growth, can epigenetically mediate the heritable induction of cardiac hypertrophy in mouse.112 The phenomenon may also be induced by miRNAs112 and environmental parameters.113–115 Intriguingly there is increasing evidence that RNA-coupled DNA ‘repair’ can also occur in eukaryotes,94, 116 suggesting that RNA can direct both epigenetic and genetic modifications and that there is a much more dynamic interplay between genomes and the environment than previously envisioned.
For many years the emphasis has been on the role of proteins and protein–protein interactions in the regulation of gene expression in the developmentally complex eukaryotes, especially animals, which have to precisely control their ontogeny by (among other things) modulating chromatin architecture during different stages of differentiation and development.75 Recent data suggests that not only is chromatin structure modified by many different mechanisms, but that this process is exquisitely controlled at different positions in different loci in different cells, and may be much more complex than imagined.4–6 At the same time the recent discoveries of large numbers of ncRNAs that are transcribed in a developmentally regulated fashion9, 10 suggests that these RNAs may play a role in developmental processes,1, 11 a possibility supported by the rapidly emerging evidence that RNA signalling regulates chromatin modification and associated epigenetic memory. It will be important to understand the mechanisms by which RNA mediates epigenetic regulation as more examples of individual RNAs implicated in these processes are identified, especially as no consistent theme has yet been established. Many regulatory ncRNAs appear to operate by negative control, but there are also examples of those that have activating functions (see ref.29, 57, 77), which is consistent with a role in recruitment of different types of chromatin-modifying complexes. In addition, most ncRNAs have yet to be studied. The crucial question now is whether the examples known to date represent the tip of an iceberg of a vast layer of RNA regulatory networks that are interpreted by different types of relatively generic proteins (for example RISC complexes or chromatin-modifying complexes, that may themselves have cell type-specific components) or are simply supplemental to primarily protein-based regulatory mechanisms in development. The other crucial issue is to determine how extensively RNA editing may modify these networks, and therefore how plastic the epigenome may be, particularly in the brain but also in other tissues, and to what extent this plasticity may be transmitted in the germline.
J. S. M. is supported by an Australian Research Council Federation Fellowship (FF0561986), as well as by the University of Queensland and the Queensland State Government. P. P. A. is supported by a University of Queensland International Research Award, and T. R. M. by an Australian Postgraduate Award. MED is supported by a New Zealand FoRST International Postdoctoral Fellowship. M. F. M. is supported by grants from the National Institutes of Health (NS38902, MH66290, HD01799), as well as by the F. M. Kirby, the Rosanne H. Silbermann, the Alpern Family, the Lipid and the Roslyn and Leslie Goldstein Foundations. We apologize to colleagues for being unable to cite many primary papers, due to limitations on the length of the reference list.