It is widely accepted that ncRNAs (non-coding RNAs), as opposed to protein-coding RNAs, represent the majority of human transcripts; and the regulatory roles of many of these ncRNAs have been elucidated over the past decade. One important role so far recognized for ncRNAs is their participation in the epigenetic regulation of genes. Indeed, it is becoming increasingly apparent that most epigenetic mechanisms of gene expression are controlled by ncRNAs. In this review, the different types of ncRNA that are strongly linked to epigenetic regulation are characterized and their possible mechanisms discussed.
As in other vertebrates, humans have approx. 20000 protein-coding genes according to the latest data from genomic analyses (Aparicio et al., 2002; Waterston et al., 2002; International Chicken Genome Sequencing Consortium, 2004; Goodstadt and Ponting, 2006; Clamp et al., 2007). The ENCODE (ENCyclopedia of DNA Elements) project has shown that at least 90% of the human genome so far analysed are transcribed in different cells, but a large portion of eukaryote transcripts cannot code protein. Taft et al. (2007) reported that the percentage of the genome coding for proteins decreases linearly with a function of biological evolution, with ∼90% in prokaryotes, being decreased to ∼68% in yeast, ∼25% in nematodes, ∼17% in insects, ∼9% in pufferfish, ∼2% in chicken and 1% in mammals. These data strongly indicate that ncRNAs (non-coding RNAs) can be regarded as molecular markers in the evolution of complex organisms (Costa, 2008). Therefore non-coding portions of the genome, previously referred to as either ‘junk DNA’ or ‘transcriptional noise’, may play crucial roles in the complexity of higher organisms (Mattick, 2001; Szymanski et al., 2005). As research has progressed in this field, an abundance of non-coding portions of the genome have been found to exert regulatory functions and are evolutionarily conserved (Guttman et al., 2009), so that an updated definition of the term ‘gene’ was required. Gerstein et al. (2007) have summarized the current ENCODE findings and provided such a new definition of a gene: ‘the gene is a union of genomic sequences encoding a coherent set of potentially overlapping functional products’. This definition of a gene can be explained in the following three points: (i) A gene is a genomic sequence (DNA or RNA) directly encoding functional product molecules, either RNA or protein. (ii) If several functional products share overlapping DNA regions, a gene is the union of all overlapping coding genomic sequences. (iii) This union must be coherent, but this coherence does not require that all products necessarily share a common sub-sequence.
Classification of ncRNAs
That ncRNAs could regulate gene expression was proposed during the 1960s (Britten and Davidson, 1969). However, their importance in gene regulation was not appreciated until the discovery of miRNAs (microRNAs) and siRNAs (small interfering RNAs), which indicated that ncRNAs are RNAs that are functional biologically, rather than simply being intermediate messengers between DNA and proteins (Costa, 2007).
Indeed, their significance is demonstrated further by the findings that approx. 98% of all transcriptional output in humans results from ncRNAs (Mattick, 2001). These ncRNAs are from the exons and introns of non-coding genes as well as from the introns of protein-coding genes (Mattick and Makunin, 2005), which are synthesized by RNAP II (RNA polymerase II) and RNAP III (Bartel, 2004; Carninci et al., 2005; Dieci et al., 2007). Besides the well-known intermediate-sized ncRNAs (50–500 nt) and their genetic functions, for example, tRNAs and rRNAs that are involved in mRNA translation, snoRNAs (small nucleolar RNAs) that are involved in the modification of rRNAs and snRNAs (small nuclear RNAs) that are involved in splicing (Mattick and Makunin, 2005), a number of additional diverse genomic and cellular functions of ncRNAs have been revealed as summarized in Table 1.
Chromatin modification and transcriptional and post-transcriptional regulation
Shamovsky and Nudler, 2006; Mercer et al., 2009; Ponting et al., 2009; Wilusz et al., 2009
miRNAs are tiny, endogenous and highly conserved ncRNAs derived by processing of short RNA hairpins that induce the degradation of target mRNAs, the repression of translation or both. miRNAs are able to regulate approx. 30% of all human protein-coding genes by imperfect base pairing with targeting sequences and control almost every cellular process investigated so far (Filipowicz et al., 2008; Costa, 2009). Data from recent reports reveal that miRNAs regulate the expression of other types of ncRNAs (such as long ncRNAs), suggesting that these small RNAs can exert a significant impact upon transcriptomic networks (Calin et al., 2007; Rossi et al., 2008).
A bi-directional expression of the genome is frequently observed in both mice and humans. Data from Xu et al. (2009) showed that bi-directionality is an inherent feature of promoters, a concept that has changed our view of genomic transcription. The sense strand of DNA generally acts as the template for mRNA which encodes proteins. ASRNAs (antisense RNAs) represent a substantial number of RNAs derived from the strand opposite to the sense or protein-coding strand (Katayama et al., 2005). These antisense transcriptions may hybridize with DNA or RNA and in this way influence transcription, translation or the stability of the ‘sense’ product (Willingham and Gingeras, 2006). The presence of widespread antisense transcripts in human and mouse genomes have been identified by computational analyses (Chan et al., 2006). EST (expressed sequence tag) and cDNA databases have been used to identify as many as 5880 human transcript clusters containing overlapping antisense transcripts (Chen et al., 2004). He et al. (2008) found evidence for antisense transcripts in 2900–6400 human genes by examining five different human cell types. Therefore antisense transcripts appear to be a remarkable feature of eukaryotic organisms, and are often thought to have a regulatory role. Transcriptional interference or post-transcriptional mechanisms are found in such regulatory processes involving splicing or RNA-induced silencing complexes.
piRNAs (Piwi-associated RNAs) are germline-specific RNAs associated with Piwi that connect Piwi proteins and guide them to their targets (O'Donnell and Boeke, 2007), as well as having several additional interesting characteristics. First, these small RNAs are longer than miRNAs and siRNAs, with a phosphorylated 5′-end and a 2′-O-me (2′-O-methyl) modification at their 3′ ends, and their biogenesis is not associated with Dicer (Houwing et al., 2007; Stefani and Slack, 2008). Secondly, most piRNAs are found at a few clusters with a high percentage of uridine residues at the 5′ termini (Brennecke et al., 2007). Thirdly, many of these clusters exhibit remarkable asymmetry, that is, within a given cluster all piRNAs are derived from the same strand (O'Donnell and Boeke, 2007). Finally, ∼17% of mammalian piRNA transcripts result from repeat sequences, such as LINEs (long interspersed elements), SINEs (short interspersed elements) and several classes of DNA transposons (Stefani and Slack, 2008).
Some rasiRNAs (repeat-associated RNAs) are defined as the subset of piRNAs found in the Drosophila germline (Yin and Lin, 2007; Farazi et al., 2008). The role of rasiRNAs, a family of RNAs derived from repetitive regions (Saito et al., 2006), cannot be trivialized in that they ensure genomic stability by silencing endogenous selfish genetic elements such as retrotransposons and repetitive sequences. Unlike siRNAs and miRNAs, rasiRNAs are derived mainly from the antisense strand. They are further distinguished in that their product appears not to be cleaved by the RNase III-type enzyme Dicer, a key component of the siRNA/miRNA processing machinery, but is associated with Piwi, Aubergine (Aub) and Ago3, all of which are members of the Argonaute protein family and essential components of RNA silencing (Gunawardane et al., 2007; Reamon-Buettner and Borlak, 2007).
At least three classes of endogenous siRNAs in Arabidopsis – tasiRNAs (trans-acting siRNAs), heterochromatic siRNAs (or chromatin-associated siRNAs) and nat-siRNAs [NAT (natural antisense transcript)-associated siRNAs] – are all 20–25 nt long. The generation of tasiRNAs from defined genetic loci (TAS loci) is through an miRNA-dependent biogenesis pathway. The expression of tasiRNAs is initiated by Pol II (polymerase II) transcription to yield TAS transcripts that contain miRNA target sites. Heterochromatic siRNAs are generated by a DCL3/RDR2/RNA Pol IV-dependent pathway and are mainly incorporated into AGO4 (Argonaute 4) to induce transcriptional silencing. In plants, they are associated with repetitive genomic sequences such as transposons, retroelements, rDNAs (ribosomal DNAs) and centromeric repeats (Katiyar-Agarwal et al., 2007; Xie and Qi, 2008). nat-siRNAs are a class of endogenous siRNAs derived from the overlapping region of a pair of NATs. These siRNAs appear to exert an adaptive protective mechanism in plants in response to either abiotic or biotic stress (Borsani et al., 2005; Katiyar-Agarwal et al., 2006).
ncRNAs are important in epigenetic regulation
ncRNAs appear to comprise a hidden layer of internal signals that control various levels of gene expression associated with physiological and developmental processes. ncRNAs, especially small ncRNAs, play a significant role in cellular physiology, specifically, epigenetic regulation of gene expression. Epigenetic regulation is a heritable change in gene expression that cannot be associated with genetic variation (Richards, 2006; Costa, 2008). Mechanisms of epigenetic regulation include DNA methylation, chromatin remodelling, RNA-associated gene silencing, chromosome inactivation, genomic imprinting and paramutation. A schematic review of the roles of ncRNAs in various epigenetic mechanisms is presented in Figure 1.
ncRNA and DNA methylation
Cytosine methylation is the only covalent DNA modification described in mammals. Almost all methylation sites are CpG dinucleotides and these are not equally distributed throughout the genome: the CpG islands spanning the 5′ end of the regulatory region of many genes. Approx. 60% of gene promoters include CpG islands, most of which are unmethylated in a normal cell at all stages of development and are often associated with active gene transcription (Clark, 2007).
siRNA-mediated suppression of transcription associated with histone and DNA methylation of mammalian cells that targets the promoter region has been reported by several independent laboratories (Morris et al., 2004; Castanotto et al., 2005; Suzuki et al., 2005). However, conflicting reports have appeared. For example, Li et al. (2006) were able to avoid CpG islands and designed 21-nt dsRNAs (double-stranded RNAs) targeted to selected promoter regions of the human genes E-cadherin, p21WAF1/CIP1 (p21) and VEGF (vascular endothelial growth factor), which resulted in sequence-specific and long-lasting re-expression of the targeted genes. This RNA-mediated process did not change the state of DNA methylation, but was associated with histone demethylation, especially H3m2K4. They termed this novel but still uncharacterized phenomenon as RNAa (RNA activation). Chen et al. (2008) confirmed this phenomenon two years later. They reported in vitro antitumour activity elicited by RNAa through a triggering of the expression of cell cycle repressor protein p21 (WAF1/CIP1) in human bladder cancer cells.
ASRNAs are also involved in the mechanism of DNA methylation. Khps1a, an endogenous antisense transcript, is derived from the CpG island and matches with a T-DMR (tissue-dependent differentially methylated region) of Sphk1. Demethylation of CG sites in the T-DMR is reduced by overexpression of two fragments of Khps1 (Mattick and Makunin, 2005). DNA methylation as regulated by miRNAs has been described. In 2007, the miRNA-29 family was found to reverse aberrant methylation in lung cancer by targeting DNMT-3A (DNA methyltransferase 3A) and DNMT-3B. miR-29s directly target both DNMT-3A and −3B to induce re-expression of some methylation-silenced tumour genes (Fabbri et al., 2007). Subsequently, Benetti et al. (2008), working on Dicer1-deficient cells, found that over-regulation of the miR-290 cluster increases the mRNA level of Rbl2 (retinoblastoma-like protein 2), which is the target gene of the miR-290 cluster and inhibits DNMT-3A and −3B expression. In addition, Sinkkonen et al. (2008) also showed that miRNAs control de novo DNA methylation in ES (embryonic stem) cells.
Recently, a new controversy has arisen as to whether the piRNAs, the Piwi-interacting RNAs, guide DNA methylation in male mouse germ cells. The Piwi subfamily proteins in mice, MIWI (mouse piwi), MIL1 (miwi-like 1) and MIWI2 – also known as PIWIl1 (Piwi-like homologue 1), PIWIl2 and PIWIl4 respectively (Aravin et al., 2006; Girard et al., 2006; Grivna et al., 2006a; Watanabe et al., 2006), maintain transposon silencing in the germline genome. It was reported that loss of MILI or MIWI2 in male germ cells results in defective DNA methylation of the regulatory regions of retrotransposons similar to that in DNMT-3L-deficient mice, indicating that Piwi—piRNA complexes play essential roles in the de novo DNA methylation of transposable elements in fetal male germ cells (Aravin et al., 2007; Carmell et al., 2007; Aravin et al., 2008; Kuramochi-Miyagawa et al., 2008). However, there exist alternative interpretations. For example, MIWI which is associated with both piRNA and mRNA cap and polysomes, has been reported to correlate with translational activity during spermatogenesis (Grivna et al., 2006b). Unhavaithaya et al. (2009) found that the mutation of MILI had no effect on cellular mRNA levels yet significantly reduced the rate of protein synthesis, indicating that MILI appears to positively regulate translation. Xu et al. (2008) demonstrated that inactivation of Nct1/2, two ncRNAs encoding piRNAs, leads to an increase in the protein of LINE-1 but not LINE-1 RNA, suggesting post-transcriptional regulation. These data led to the hypothesis that Piwi—piRNA complexes control gene expression post-transcriptionally, similar to miRNAs. Taken together, all of the above findings suggest that the signalling as performed by some ncRNAs plays a role in DNA methylation. However, this effect may result from different mechanistic processes.
ncRNA and chromatin modification
Chromatin modification is an epigenetic regulation in which the chromatin structure is altered through covalent modification of DNA and histones. Chromatin structure formation is a complex process requiring many different epigenetic elements (Costa, 2005). Increasing evidence indicates that ncRNAs are important for the control of epigenetic regulation in eukaryotic chromatin. Camblong et al. (2007) found that loss of the nuclear exosome component Rrp6 function in Saccharomyces cerevisiae leads to the stabilization of two PHO84 antisense transcripts. In turn, these antisense transcripts stimulate histone deacetylation, resulting in PHO84 gene repression through a pathway of RNA-induced histone de-acetylation. Houseley et al. (2008) reported that the expression of GAL10-ncRNA, a long ASRNA of the yeast GAL10 gene, leads to stable changes of chromatin structure by recruiting the methyltransferase Set2 and histone de-acetylation activities in cis. Yu et al. (2008) reported that p15AS (p15 antisense), an ASRNA of the p15 gene, induces p15 silencing in both cis and trans sites through heterochromatin formation but not DNA methylation. Such findings suggest that ncRNA transcription contributes to the formation of chromatin structure within euchromatic regions.
Heterochromatin is epigenetically inherited and found at many chromosome regions where gene expression is silenced. RNAi (RNA interference) is essential for silencing at heterochromatic loci, and it has been reported that a Dicer knockout affects centromeric heterochromatin formation in the mouse. Fukagawa et al. (2004) showed that Dicer-defective chicken cells carrying a human chromosome have defects in pericentromeric heterochromatin formation and segregation in that chromosome. In Dicer-null ES cells, the expression of homologous small dsRNAs and epigenetic silencing of centromeric repeat sequences are markedly reduced. Moreover, re-expression of Dicer in the knockout cells rescues these phenotypes, possibly implicating siRNA in the maintenance of centromeric heterochromatin structure and centromeric silencing (Kanellopoulou et al., 2005).
In mammals, females have two X chromosomes, whereas males have one. However, a dosage compensation mechanism ensures equal expression of X-linked genes in XX females and XY males. In random XCI, the silencing of the X chromosome is controlled by the Xic (X chromosome inactivation centre), an important cis regulatory region on the X chromosome (Heard and Disteche, 2006; Yang and Kuroda, 2007). Xic is composed of several genetic elements that make long ncRNAs, including Xist, Tsix, Xite, DXPas34 and jpx/Enox (Avner and Heard, 2001; Erwin and Lee, 2008). The best understanding of the role of long ncRNAs in silencing mammalian genes comes from studies on the role of Xist and its antagonistic antisense partner, Tsix. From these studies it appears that the initiation of XCI depends on Xist, and Tsix represses Xist RNA accumulation in cis (O'Neill, 2005).
Two unanswered questions remain about the molecular mechanisms of how Xist induces XCI on the inactive X (Xi) and how Tsix can exert a stable silencing of Xist on active X (Xa). Ogawa et al. (2008) and Zhao et al. (2008) proposed potential answers to these questions. Ogawa et al. (2008) suggested a model of RNAi in the formation of XCI in which Xist and Tsix form duplexes in vivo. These duplexes are then processed to sRNAs (small RNAs), named xiRNAs, on Xa, a process that occurs most likely in a Dicer-dependent manner, because deleting Dicer reduces sRNA production and de-represses Xist. Furthermore, without Dicer, Xist cannot accumulate and histone 3 Lys27 can be methylated on the Xi. Zhao et al. (2008) concluded that an ncRNA co-factor recruits polycomb complexes to initiate and spread XCI. They discovered a 1.6-kb ncRNA (RepA) within Xist and found that PRC2 (polycomb repressor complex 2) is its direct target and EZH2 is the RNA-binding subunit. The antisense Tsix RNA inhibits this interaction. Without RepA or PRC2, full-length Xist induction and trimethylation on Lys27 of histone H3 of the X are blocked. Therefore, RepA, together with PRC2, is required for the initiation and spread of XCI.
Diploid organisms carry two alleles of each autosomal gene, one inherited from each parent. In most cases, both parental alleles are expressed equally, but a subset of genes is expressed by either the maternal or the paternal allele, and this ‘genomic imprinting’ is regulated by epigenetic mechanisms. Imprinted expression is restricted to a few hundred genes in the mammalian genome, most of which are found in small clusters. Imprinted clusters have an ICR (imprinting control region) that is typically 1–5 kb in size, differentially methylated and regulates imprinting across the entire domain. Most of the genes in an imprinted cluster are protein-coding genes. However, at least one gene always encodes an ncRNA that is usually expressed from the maternal allele and displays reciprocal imprinted expression relative to the neighbouring protein-coding genes (Wan and Bartolomei, 2008). It is becoming clear that ncRNAs are frequently observed in imprinted regions of the human genome, suggesting that these ncRNAs play a functional role (Sleutels et al., 2002; Costa, 2008). For example, the Kcnqlot1 and Air ncRNAs, the antisense transcriptions of Kcnq1 and Igf2r, are involved in silencing clusters of imprinted gene in cis on mouse chromosomes 7 and 17 respectively. Deletion of their promoters or truncation of the ncRNA leads to the bi-allelic, non-imprinted gene expression in their respective clusters (Wutz et al., 2001; Sleutels et al., 2002; Mancini-Dinardo et al., 2006). Takahashi et al. (2009) reported that deletion of Gtl2, an imprinted ncRNA, could induce lethal parent-origin-dependent defects in mice. Furthermore, some of these ncRNA genes are transcribed in an antisense orientation relative to the protein-coding gene. Six well-characterized clusters named after their founding imprinted genes are known – Igf2/H19, Igf2r/Airn, Kcnq1/Kcnqlot1, Gnas, DLK/Gtl2 and Pws/As. Four of these, Igf2r, Kcnq1, Pws and Gnas, contain an ncRNA in the antisense orientation of each silenced gene, and at the Igf2r and Kcnq1 loci, the Airn and Kcnqlot1 genes have been shown to play a direct role in the control of genomic imprinting. An incomplete full length of complement each results in bi-allelic expression of genes in each of their clusters (Mancini-Dinardo et al., 2006; Yang and Kuroda, 2007). There is evidence to suggest that ncRNAs are involved in allele-specific silencing, but the molecular details of the process are not clear. However, some possibilities have emerged recently. In the placenta, Slc22a3 is silenced in cis by Air as well as Igf2r, but the silencing mechanisms involved appear to differ. Accumulated full-length Air at the Slc22a3 promoter recruits the H3K9 (histone H3 Lys9) histone methyltransferase G9a and results in targeted H3K9 methylation and allelic silencing. In contrast, Igf2r silencing does not require G9a and does not occur through an ncRNA interaction with its promoter (Nagano et al., 2008).
ncRNA and paramutation
Paramutation is the epigenetic transfer of information from one allele of a gene to another to establish a state of gene expression that is inherited (Chandler and Stam, 2004). Paramutation has been found at several loci in maize, and at fewer loci in other species, including mice and humans. The notable epigenetic feature resulting from paramutation is that the newly established expression state without DNA sequence change is heritable even though the allele or sequences originally issuing the instructions are not transmitted and the altered locus goes on to issue similar instructions to homologous sequences (Chandler, 2007).
In maize, paramutation has been described for four genes (r1, b1, pl1 and p1), with the b1 locus being the classic example of this process. Plants homozygous for the B-I allele are dark purple with a high expression of b1, whereas maize homozygous for the B′ allele are lightly pigmented. In plants, heterozygous for the two alleles, the B-I allele is converted (that is, paramutated) into B′. It is worth noting that this new B′ allele (designated B′*) is equally capable of paramutating B-I to B′ in subsequent generations as summarized schematically in Figure 2 (Coe, 1966; Chandler, 2007; Stam, 2009).
An RNA-dependent mechanism has recently emerged as a prominent mediator of paramutation. An RdRP (RNA-dependent RNA polymerase) called mop1 (mediator of paramutation 1) is an absolute requirement for the silencing of B-I by B′ and for paramutation in several other maize genes (Dorweiler et al., 2000; Alleman et al., 2006; Woodhouse et al., 2006). The tandem repeats located ∼100 kb upstream of b1 may lead to the production of siRNA. These sites represents the key sequences required for paramutation, and can be detected in any genotype, but not in the mutant lines lacking mop1 (Stam et al., 2002). Mice and plants share significant common features, the most remarkable being the involvement of RNA molecules as determinants of epigenetic variation. Rassoulzadegan et al. (2006) reported a paramutation of the mouse Kit gene in the progeny of heterozygotes, which was similar in many aspects to that observed in the plant. Levels of polyadenylated Kit mRNA are abnormal both in heterozygotes and in their paramutated progeny. When microinjecting RNA from somatic and germ cells of heterozygotes into fertilized eggs a remarkable epigenetic change is obtained in the offspring, but microinjecting RNA from normal mice leads to a low efficiency of transmission. Moreover, miR-221 and miR-222 targeting Kit RNA can induce a paramutated state, suggesting that miRNAs contribute to the establishment and maintenance of paramutation. Subsequently, a variety of other miRNAs tested were shown to be completely negative for Kit expression (Cuzin et al., 2008).
Recent research into ncRNAs has highlighted their roles in epigenetics. The data generated over the last few years have indicated that in addition to their negative effects, positive functions have also been found, which has extended our knowledge and understanding of ncRNAs. Uhler et al. (2007) showed that an unstable ASRNA across the PHO5 promoter plays a role in activation, but not repression, in S. cerevisiae. Abrogation of this ASRNA clearly leads to a slower rate of chromatin remodelling during the transcriptional activation of PHO5. A new mechanistic role of ncRNAs was demonstrated in the genetic activation of eukaryotes. In the fission yeast Schizosaccharomyces pombe, ncRNAs transcribed by RNAP II are required for chromatin remodelling at the fbp+ locus during transcriptional activation (Hirota et al., 2008). Some ncRNAs, such as miRNA, snoRNA and piRNA, have been linked to specific disease states including cancer, and the de-regulation of several ncRNAs has been associated with cancers of the colon, prostate, liver, ovary, breast, cervix, tongue and oesophagus, as well as lymphomas and leukaemias. Such relationships suggest that ncRNAs can serve as markers that can be used as novel targets for early diagnosis and prognosis, and for drug development (Costa, 2005; Lin et al., 2007; Perez et al., 2008). Although there have been many exciting discoveries, the possible functional mechanisms remain poorly understood. We have just begun to lift the veil on non-coding transcription, and surely this will be a fascinating field that will attract and reward scientists in the coming years.
We thank Dr Brian Eyden (Christie NHS Foundation Trust, U.K.) and Dr Iain Bruce (University of Hong Kong) for their assistance with English language editing of this paper prior to submission.
This work was supported by the National Natural Science Foundation of China [grant no. 30770989, 30971135] and the Natural Science Foundation of Zhejiang Province [D2080011 and 2007C13020].
Gene: a union of genomic sequences encoding a coherent set of potentially overlapping functional products.
Non-coding RNAs (ncRNAs): RNAs that are functional biologically, rather than simply being intermediate messengers between DNA and proteins.
DNA methylation: an epigenetic mechanism that modifies the DNA information without changing the primary nucleotide sequence and occurs on cytosine at CpG dinucleotides across the human genome.
Chromosome modification: an epigenetic mechanism that changes the gene expression by the regulation of chromatin structure.
Paramutation: the epigenetic transfer of information from one allele of a gene to another to establish a state of gene expression that is inherited.
X-chromosome inactivation: a sex-chromosome dosage-compensation mechanism that leads to the transcriptional silencing of a large percentage of genes on one X-chromosome in XX females.
DNA imprinting: an epigenetic mechanism that leads to some gene mono-allelically expressed.