RNA as the substrate for epigenome-environment interactions

RNA guidance of epigenetic processes and the expansion of RNA editing in animals underpins development, phenotypic plasticity, learning, and cognition

Authors


Abbreviations:

ADAR, adenosine deaminase that acts on RNA; ADAT, adenosine deaminase that acts on transfer RNA; AID, activation-induced cytidine deaminase; ApoB, apolipoprotein B; APOBEC, ApoB editing complex; cDNA, complementary DNA produced by reverse transcriptase; ERV, endogenous retrovirus; miRNA, microRNA; RISC, RNA-induced silencing complex; siRNA, small interfering RNA; snoRNA, small nucleolar RNA.

Introduction

Animal development and neurological function are critically dependent on inbuilt and environmentally influenced epigenetic processes that alter chromatin structure and hence gene expression patterns at many loci around the genome. Here I consider the implications of the increasing evidence that RNA directs chromatin-modifying complexes to their sites of action, and that RNA is widely edited, especially in the brain. Editing capacity and activity have expanded during vertebrate, mammalian and primate evolution, wherein the majority targets noncoding sequences, many of which are derived from retrotransposed elements. Heuristically joining these dots leads to the obvious possibility that RNA editing alters regulatory circuitry and can feedback into epigenetic memory, and that the expansion of the enzymatic repertoire for RNA editing along with mobilizable target cassettes was central to the emergence of phenotypic plasticity, learning, and cognition. It also suggests that the widespread colonization of mammalian genomes by transposable elements and the pervasive differential transcription of noncoding sequences are not due to selfish elements and noisy transcription, as often thought, but to an evolved capacity that harnessed RNA and retrotransposons as plastic substrates, underpinning phenotypic adaptability and information storage. Finally, the multiple parallels between the nervous and immune systems suggests that they use similar processes, many of which are RNA-related, to induce somatic plasticity and fine scale specificity, especially in intercellular and intermolecular recognition.

Gene-environment interactions and epigenetic memory

Gene-environment interactions occur at two levels. Short-term responses to physiological variables are largely transduced by signal transduction cascades that alter gene expression. Environmental signals can also result in stable changes to phenotype by inducing underlying epigenetic changes. This occurs in the brain, where epigenetic processes are involved in learning and memory formation 1, as well as in other tissues, where such processes underpin a range of neurological and physiological abnormalities, such as autism, type 2 diabetes, autoimmune diseases, and cancer 2–6. Importantly, epigenetic processes and epigenetic memory (involving Polycomb and Trithorax group proteins and associated complexes) are also central to normal development, and most observed epigenetic changes are associated with cellular differentiation 7–9. Thus, while developmental programming is very robust and reproducible (as evidenced by the phenotypic congruence of monozygotic twins), they may be influenced by environmental variables and signals 10–12, although the mechanism by which this might occur, and the extent to which it does occur, are unknown. However, one can mount a plausible, indeed attractive, case that the evolution of phenotypic plasticity was based on the modulation of epigenetic processes.

Epigenetic memory itself is embedded in the methylation and hydroxymethylation of cytosines in DNA and a range of modifications of the histones that package DNA into nucleosomes. These are catalyzed by a suite of ∼60 generic enzymes 13 that impose a myriad of different modifications at hundreds of thousands of genomic positions (including promoters and exon-intron boundaries) in different cells at different stages of differentiation 14–18. Interestingly, it has recently been shown that nucleosomes are preferentially positioned at exons 19–23, suggesting that epigenetic regulation operates not only at the level of the gene, but also at the level of individual exons, potentially allowing epigenetically driven control of splicing patterns, a prediction that has recently obtained experimental support 24.

RNA guidance of epigenetic modifications

What regulates the contextual selection and locus-specificity of particular epigenetic marks is unknown. The required information has been thought to be provided by combinations of “transcription factor” expression patterns and binding sites, although chromatin structure is also thought to regulate which binding sites are accessible. However, recent evidence indicates that RNAs may in fact guide the site-specific recruitment of chromatin modifying enzymes 25. Indeed, this may be the major function of the large numbers of intergenic, overlapping, and antisense non-protein-coding RNAs (ncRNAs) that are differentially expressed during development 26, presumably in a largely pre-programmed feed-forward manner 27. This appealing possibility integrates the sequence-specificity of RNA with its well-established capacity to guide generic effector proteins to their sites of action in target RNAs (or DNA), good examples of which are the microRNA (miRNA)/small interfering RNA (siRNA) guidance of the RNA-induced silencing complex (RISC) 28 and small nucleolar RNA (snoRNA)-mediated guidance of RNA-modifying enzymes 29. Indeed, many ncRNAs are associated with chromatin-modifying complexes 26, 30, 31, most if not all of which have components with RNA-binding domains 25. Moreover, at least some environmentally induced epigenetic states can be inherited 32, a process that appears to be mediated by RNA 33–36.

Modulation of RNA structure and information content by editing

RNA sequences can also be altered by RNA editing, which suggests an evolved ability to overwrite hard-wired genetic information, thereby providing the molecular basis for plasticity in the system. There are two types of RNA editing, both involving base deamination: A to I (A > I) editing (which sequences as G) is carried out by enzymes termed “adenosine deaminases that act on RNA” (ADARs) (for reviews see Refs. 37, 38); and C to U (C > U) editing by enzymes termed, for historical reasons, “ApoB editing complex” (APOBECs) (for reviews see Refs. 39, 40). Members of both classes of enzymes appear to shuttle between the nucleus and the cytoplasm. The existence of RNA editing in animals was uncovered by differences between cDNA and genomic sequences, notably in neuronal proteins such as serotonin and glutamate receptors where A > I editing alters the amino acid sequence 37, ostensibly to tune the electrophysiological properties of the synapse. C > U editing was first recognized in ApoB mRNA, where it introduces a stop codon to form a truncated protein that absorbs lipids in the intestine, in contrast to the long version produced in the liver that transports cholesterol in the blood 41. A related cytosine deaminase (activation-induced cytidine deaminase, AID) was subsequently shown to be required for class switch recombination and somatic hypermutation of immunoglobulins 42, although it may exert its action at the DNA rather than RNA level 40, a matter of some intrigue and debate.

ADARs

ADARs are double-stranded RNA-binding proteins whose general substrate appears to be “hairpin” regions formed by stem-loops and other intra- and intermolecular interactions 37. They evolved from adenosine deaminases that act on tRNAs (ADATs) and are restricted to animals. ADAR1 and ADAR2 are widely expressed, but most highly in brain. ADAR3 is restricted to vertebrates and is brain-specific, although little is known of its function. Most editing occurs in the brain. However, almost nothing is known about the factors that regulate and modulate the cell-specific expression, alternative splicing, and the target specificity of these enzymes. Nonetheless, it is difficult to imagine that editing is not regulated by intrinsic and extrinsic factors, and indeed the crystal structure of ADAR2 reveals that inositol hexaphosphate is complexed in the active site, implying a direct link to canonical cell signaling pathways 43. This is undoubtedly a fertile area for future research.

More recently, analyses of cDNA libraries has shown that A > I editing occurs in an extraordinary variety of transcripts, mainly in noncoding sequences 44–49, suggesting that it has a much wider influence on the transcriptome (and therefore potentially the epigenome) than previously suspected. This includes the editing of untranslated regions of mRNAs and introns, mainly in transposon-derived “repetitive” sequences 44, and miRNAs 50–52, whose action can also be altered APOBECs 53, indicating that such editing modulates regulatory networks. Moreover, there is an enormous (∼35x) increase in the intensity of editing of transcripts in human compared to mouse, which occurs largely in primate-specific retrotransposed Alu elements 45, 47, 48, of which there are over a million largely unique sequence copies. These elements comprise 10.5% of the human genome 54, 55, suggesting a link between their expansion and cognitive evolution 56. Intriguingly, Alus themselves evolved from a functional RNA ancestor (the 7SL RNA of the signal recognition particle) 57, 58.

RNA editing may also have a major role in the epigenetic etiology of cancer development and progression. Alterations in ADAR expression have been correlated with the grade of malignancy of glioblastoma multiforme 59. ADAR1 has recently been found among the most highly expressed proteins in a human breast tumor, concomitant with many alterations in editing, including non-synonymous changes in SRP9, which encodes a subunit of the signal recognition particle that binds to a variety of Alu-like RNAs 60.

APOBECs

APOBECs, like ADARs, also appear to have evolved from ADATs 40 and are, if anything, even more intriguing. APOBEC2, APOBEC4, and AID are specific to vertebrates. APOBEC1 appears in mammals, and APOBEC3 only occurs in placental mammals, with a massive expansion from one ortholog in mouse to eight in human (APOBEC3A–H), which show strong signatures of positive selection 39, 61, 62. Not a great deal is known about the expression, functions, and endogenous substrates of most human ABOBEC3s, although APOBEC3G is expressed in post-mitotic neurons 63 and APOBEC3s are overexpressed in various cancers 39. As noted earlier, AID is required for class switch recombination and somatic hypermutation of immunoglobulins 42. AID is also required for DNA demethylation and nuclear reprogramming during reversion to pluripotency in human somatic cells 64 and APOBEC2 is required for normal muscle development 65, suggestive of a much wider and more subtle role for these enzymes in developmental processes. There is evidence that APOBEC3F and 3G may be involved in defense against retroviral infection and LINE-1-mediated retrotransposition 66, 67, but why this might have been of particular selective advantage in primates is hard to understand.

An alternative and exciting possibility is that these enzymes have evolved and expanded not (simply) to defend against the movement and activity of endogenous retroviruses (ERVs) and retrotransposons, but to regulate evolved functions associated with the domestication of such sequences as agents of epigenetic regulation and somatic plasticity, especially in mammals and primates. There is evidence that particular ERVs regulate peri-implantation placental growth and differentiation 68 that retrotransposed sequences are dynamically expressed during development 69, and that LINE-1 retrotransposition may contribute to neuronal diversity 70. Indeed, given that the raw material for evolution is duplication and transposition, and that the latter has the advantage of being able to mobilize functional cassettes, it would be surprising if evolution had not harnessed their considerable power for plastic modulation of the genome, epigenome, and transcriptome, not just in evolutionary time but also in real time during development, to enable dynamic responses to environmental variables and to manage the extraordinarily complex cellular architecture and cell-cell interactions in the brain.

Another plausible, and not mutually exclusive, possibility is that some APOBECs may have evolved, following AID, to participate in somatic mutational events akin to those in the immune system that are associated with plastic recognition of external molecules and selective cell-cell interactions. There are many intriguing similarities between the nervous and the adaptive immune systems 71, 72. These include the fact that many of the key cell-surface receptors in brain are members of the immunoglobulin superfamily, that cytokines play a role in complex cognitive processes such as synaptic plasticity, neurogenesis, and neuromodulation, and that immune cells express neurotransmitters, associated receptors, and proteins classically associated with growth and guidance of neuronal axons 73–76. Moreover, the brain, like the immune system, is hypersensitive to mutations in a range of unusual DNA “repair” enzymes that are linked to RNA (“transcription-coupled repair”), some of which are involved in somatic hypermutation of immunoglobulins 56, suggesting that the brain uses such enzymes for similar processes. Interestingly, APOBEC3G appears to exert its action on the nascent DNA strand produced by reverse transcription in the target cell 77. Indeed, the adaptive immune system may have emerged from the nervous system 71, 72, given that the former (including many common components) predates the latter in metazoan evolution 78. The expansion of cell surface receptor families, editing enzymes, and mechanisms for epigenetic and somatic plasticity may have subsequently emerged hand-in-hand in the vertebrate lineage and been adapted in various ways to empower more flexible immunological, developmental, and physiological responses to environmental variables, and more sophisticated neural capacity for information storage and retrieval.

Other forms of dynamic RNA modification

RNA can also be modified by snoRNA-guided 2′-O-methylation and pseudouridylation 29. Many snoRNAs are differentially expressed in the brain 79, some in response to experience 80. Interestingly, most if not all snoRNAs are further processed into small RNAs 81, some of which have the characteristics of, and can function like, miRNAs 82–84, although the interplay between the snoRNA-mediated RNA modification and siRNA pathway(s) is not understood. In addition, strand-specific 5′-O-methylation controls guide strand selection and targeting specificity of siRNA duplexes 85.

RNA can also be modified by cytosine methylation. Dnmt2, named because of its homology to DNA methyltransferases, is in fact an RNA methyltransferase 86, 87 that plays a role in the development of the brain and other organs 88 and is required for retrotransposon silencing in somatic cells of Drosophila89. Apart from tRNA its substrates are not known, as bisulfite sequencing has not yet been applied to RNA, although some attempts are underway 90. The range of targets of RNA methylation is therefore unknown, and potentially represents another yet-to-be-explored layer of dynamic modulation of the transcriptome and epigenome.

Conclusions

However these things unfold, two things stand out. First, the ability to edit RNA, much of which occurs in noncoding sequences, suggests that not only proteins but also – and perhaps more importantly – regulatory sequences can be modulated in response to external signals and that this information may feedback via RNA-directed chromatin modifications into epigenetic memory.

Second, little is understood, and to a large extent even countenanced, about the complex transactions that occur at the RNA level in humans and other animals, or of the biological significance of the many different types of retroelements that populate, and are dynamically expressed from, our genome. Indeed, this area has been underexplored because many such elements are apparently transcribed by RNA polymerase III, are not polyadenylated (so fall outside many cDNA protocols), are often excluded from analyses by “Repeat Masking,” and are difficult to map unequivocally to particular genomic locations. I suggest that there is much to be revealed by a more intense examination of the expression of retrotransposed “repeat” sequences, and that this will reveal extensive dynamic modification of these sequences during development and in the brain, both of which will be determined by hard- and soft-wired (i.e., environmentally modulated) regulatory circuitry. I also predict that the exploration of these topics will dominate molecular biology and neurobiology in the coming years and is likely to radically transform our understanding of the dynamic programming of development, physiology, and brain function, and the epigenome-environment interactions that modulate them.

Acknowledgements

This work was supported by the Australian Research Council (Federation Fellowship grant FF0561986).