Long non-coding RNAs in nuclear bodies

Authors

  • Joanna Y. Ip,

    1. RNA Biology Laboratory, RIKEN Advanced Research Institute, Wako, Saitama, Japan
    2. Banting and Best Department of Medical Research, Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
    Search for more papers by this author
  • Shinichi Nakagawa

    Corresponding author
    1. RNA Biology Laboratory, RIKEN Advanced Research Institute, Wako, Saitama, Japan
    Search for more papers by this author

Author to whom all correspondence should be addressed.
Email: nakagawas@riken.jp

Abstract

High-throughput analyses of mammalian transcriptomes have revealed that more than half of the transcripts produced by RNA polymerase II are non-protein-coding. One class of these non-coding transcripts is the long non-coding RNAs (lncRNAs), which are more than 200 nucleotides in length and are molecularly indistinguishable from other protein-coding mRNAs. Although the molecular functions of these lncRNAs have long remained unknown, emerging evidence implicates the functional involvement of lncRNAs in the regulation of gene expression through the modification of chromatin, maintenance of subnuclear structures, transport of specific mRNAs, and control of pre-mRNA splicing. Here, we discuss the functions of a distinct group of vertebrate-specific lncRNAs, NEAT1/MENε/β/VINC, MALAT1/NEAT2, and Gomafu/RNCR2/MIAT, which accumulate abundantly within the nucleus as RNA components of specific nuclear bodies.

Introduction

High-throughput genomic analyses surprisingly revealed that the complexities of organisms are not directly reflected by the number of protein-coding genes in their genomes. Higher vertebrates, such as humans and mice, have approximately 20 000–25 000 genes, but lower eukaryotes, such as Drosophila and Caenorhabditis elegans, have comparable numbers of genes: 13 000 and 19 000, respectively (Consortium 1998; Adams et al. 2000; Lander et al. 2001; Waterston et al. 2002). This apparent discrepancy can be explained in part by the discovery that the majority of transcribed genomic regions are non-protein-coding. In fact, it has been estimated that at least 90% of the mammalian genome is transcribed, but <2% of these transcripts are capable of producing proteins (Carninci et al. 2005; Birney et al. 2007; Kapranov et al. 2007). Moreover, as the complexity of organisms increases, the amount of non-coding sequences within their genomes also increases. Therefore, this suggests that the development of complex organisms may not depend solely on their protein-coding genes, and that non-coding RNA (ncRNA) may provide an additional layer of regulation required for the development of higher eukaryotes (Taft et al. 2007). A class of these non-coding transcripts is the long non-coding RNAs (lncRNAs), which consist of more than 200 nucleotides and have diverse molecular functions (Wilusz et al. 2009). The lncRNAs are provisionally divided into four groups (Table 1). The largest group includes lncRNAs that associate with the chromatin modifying complex and are involved in the epigenetic regulation of gene expression including genomic imprinting (e.g., Xist, Airn, Kcnq1ot1, ANRIL, ncRNA-a1∼7, HOTAIR, and HOTTIP). Recent deep-sequencing analysis has revealed that 20% of lncRNAs are co-immunoprecipitated with the histone methyltransferase complex, PRC2 (Khalil et al. 2009); thus, chromatin modification may be one of the representative functions of lncRNAs. The second group contains the precursors of the small RNAs (i.e., miRNAs, esiRNAs, and piRNAs) involved in gene suppression mechanisms that are collectively called RNA silencing (Fejes-Toth et al. 2009). In the case of miRNAs, nearly half of their sequences are found in introns or exons of long noncoding RNAs (Rodriguez et al. 2004). In general, these precursor RNAs are relatively unstable and are rapidly processed into the small RNAs of sizes between 20 and 30 nucleotides. The third group contains nuclear lncRNAs (NEAT1, NEAT2, and Gomafu) that stably and abundantly localize to distinct nuclear bodies. Other ncRNAs, such as steroid hormone activator (SRA) and a half-STAU1 binding site (1/2-SBS), are tentatively categorized into a fourth group and regulate individual molecular processes but may also comprise a larger family. This review will focus on the third group of lncRNAs, and their roles within the nucleus and their effects on developmental processes will be discussed. For reviews of the lncRNAs involved in other aspects of gene expression, particularly for the first group that regulates transcription via chromatin modifications, readers are referred to (Wilusz et al. 2009; Chen & Carmichael 2010; Mercer et al. 2010).

Table 1.   Classification of long non-coding RNAs (lncRNAs)
GroupFunctionsExamples (HUGO Gene symbols)
Group IChromatin modifiersXIST, AIRN, KCNQ1OT1, ANRIL, Evf-2
HOTAIR, HOTTIP, lincRNA-p21, ncRNA-a1∼7
Group IIPrecursors of small RNAssee Rodriguez et al. 2004
Group IIIComponents of nuclear bodiesNEAT1, MALAT1, MIAT
Group IVOthersSRA1, HSR1, NRON, 1/2-sbsRNA1∼4

Overview of the lncRNAs that localize to specific nuclear bodies

The nucleus of higher eukaryotes is a highly organized structure and is divided into different nuclear bodies, including the nucleoli, nuclear speckles, paraspeckles, promyelocytic leukemia (PML) bodies, and Cajal bodies. These domains contain a series of protein factors that are involved in particular nuclear processes (Platani & Lamond 2004; Mao et al. 2011). The compartmentalization of the nucleus is thought to provide an essential basis for the complex regulation of nuclear processes and gene expression in higher organisms. In addition to protein factors, some of these subnuclear domains contain lncRNAs, including NEAT1 (MENε/β or VINC) and MALAT1 (NEAT2), which are localized to paraspeckles and nuclear speckles, respectively, and Gomafu (RNCR2 or MIAT), which constitutes novel nuclear bodies (Fig. 1). These lncRNAs are produced by RNA polymerase II (Pol II) and are processed in the same manner as protein-coding mRNA (Hutchinson et al. 2007; Sone et al. 2007). However, unlike other Pol II-transcribed mRNAs that are immediately transported to the cytoplasm after splicing and polyadenylation, the lncRNAs are stably retained in the nucleus. NEAT1 and MALAT1 are single exon genes (Hutchinson et al. 2007) and theoretically do not carry exon junction complexes that strongly promote nuclear export (Dreyfuss et al. 2002), which might partially account for their nuclear localization. On the other hand, spliced Gomafu transcripts accumulate in the nucleus (Sone et al. 2007), suggesting that certain nuclear retention mechanisms counteract with the nuclear export signals associated with pre-mRNA splicing. Specific sequences or secondary structures that prevent transport of these lncRNAs have not been identified, and they are assumingly distributed redundantly along the entire length of the transcripts, at least in the case of Gomafu or Malat1 (Tripathi et al. 2010; Tsuiji et al. 2011).

Figure 1.

 Long non-coding RNAs (lncRNAs) localized to distinct subnuclear domains, as determined by fluorescent in situ hybridization. (A) Expression of NEAT1/MENε/β/VINC in mouse kidney. (B) Expression of MALAT1/NEAT2 in motor neurons in the spinal cord. (C) Expression of Gomafu/MIAT/RNCR2 in pyramidal neurons in the cerebral cortex. Green: lncRNAs, Magenta: 4′6′-diamidino-2-phenylindole dihydrochloride (DAPI). Scale bar, 10 μm.

The sequences of the three nuclear body-related lncRNAs are highly conserved in multiple species, especially among mammals, suggesting that they are not by-products generated by transcriptional noise (Hutchinson et al. 2007; Sone et al. 2007). Recent studies have revealed that these nuclear lncRNAs are essential for the maintenance of nuclear structures and can affect pre-mRNA processing and export, thereby regulating gene expression (Fig. 2). In the following sections, we review the detailed functions of each lncRNA that localizes to a specific nuclear body.

Figure 2.

 Functions of nuclear-retained long non-coding RNA (refer to the text for details). (A) NEAT1/MENε/β/VINC is localized to paraspeckles and is required for the formation of paraspeckles and the nuclear retention of hyper-edited RNA. (B) MALAT1/NEAT2 is located in nuclear speckles to modulate the phosphorylation and localization of the SR proteins, thereby affecting alternative splicing. It can also be processed to produce a tRNA-like RNA that is transported to the cytoplasm. (C) Gomafu/RNCR2/MIAT is located within novel subnuclear domains. It interacts with the splicing factor, SF1, and can affect the efficiency of splicing in vitro; therefore, it may also have the ability to affect splicing in vivo.

NEAT1 (MENε/β, VINC)

Function as an architectural component of paraspeckles

The NEAT1_1 and NEAT1_2 (nuclear-enriched abundant transcript 1_1 and 2), also referred to as MENε/β (multiple endocrine neoplasia ε/β) or VINC (virus-induced non-coding transcript), is expressed from a single promoter as two transcripts that differ only at their 3′-ends. The polyadenylated short transcript, NEAT1_1 (MENε/VINC), is 3.7 kb in human and 3.2 kb in mice, and the non-polyadenylated long transcript, NEAT1_2 (MENβ), is 23 kb in human and 20 kb in mice (Guru et al. 1997; Sasaki et al. 2009). The 3′-end of NEAT1_2 is generated by RNase P cleavage near a tRNA-like structure located downstream of the cleavage site. The cleaved 3′-fragment is then processed by RNase Z to produce a highly unstable tRNA-like small RNA (Sunwoo et al. 2009). Whereas NEAT1_1 is highly and widely expressed in different mammalian tissues, the expression of NEAT1_2 is restricted to particular populations of cells (Hutchinson et al. 2007; Nakagawa et al. 2011).

NEAT1 is the architectural component of paraspeckles, which are the nuclear bodies adjacent to nuclear speckles and are found in almost all cultured cell lines examined (Fig. 1A) (Chen & Carmichael 2009; Clemson et al. 2009; Sasaki et al. 2009; Sunwoo et al. 2009). Paraspeckles contain core proteins belonging to the Drosophila Behavior Human Splicing (DBHS) family, PSPC1, SFPQ (PSF) and NONO (p54nrb) (Fox et al. 2002, 2005); these nuclear bodies are proposed to regulate the expression of Adenosine to Inosine (A-to-I) edited RNAs through nuclear retention (Figs 2A,3, and see the next paragraph). The long isoform of NEAT1, NEAT1_2, is essential for the formation of paraspeckles, whereas the short NEAT1_1 isoform plays a supplementary role, increasing the number of paraspeckles when overexpressed (Clemson et al. 2009). Among the three core DBHS proteins, only SFPQ and NONO are necessary for paraspeckle formation, whereas PSPC1 is dispensable in Hela cells (Sasaki et al. 2009). Alternatively, function of PSPC1 is redundantly supported by other members of DBHS proteins. Knocking-down NEAT1_2 causes the disintegration of paraspeckles without affecting the levels of these core protein factors (Sasaki et al. 2009). Based on such studies, the formation of paraspeckles is proposed to begin with the direct binding of NEAT1_2 to NONO and possibly SFPQ, followed by the recruitment of NEAT1_1 and PSPC1 (Sasaki et al. 2009; Souquere et al. 2010). These protein-RNA complexes are then organized to form paraspeckles, with NEAT1_1 and the 5′ and 3′ ends of NEAT1_2 arranged at the periphery and the central sequences of NEAT1_2 toward the center of paraspeckle (Souquere et al. 2010). The interactions of NEAT1 with the DBHS proteins occur near the NEAT1 locus, as a result, newly formed paraspeckles are found near the NEAT1 locus (Clemson et al. 2009). Transcriptional inhibition quickly disrupts paraspeckles and causes the protein factors to relocate to the perinucleolar cap and NEAT1 to diffuse in the nucleoplasm (Sasaki et al. 2009). In agreement with these observations, the formation of paraspeckles is coupled with the transcription of NEAT1 (Mao et al. 2010). Live-cell imaging has shown that the protein factors are recruited to the site of NEAT1 transcription, and, as the size of the newly formed paraspeckle increases, new paraspeckles bud from the original, thereby forming clusters of paraspeckles around the site of transcription (Mao et al. 2010).

Figure 3.

 Paraspeckles regulate gene expression through the nuclear retention of hyper-edited RNAs. Two transcripts are transcribed from the two promoters of the mouse CAT2 gene. The short mCAT2 mRNA is exported to the cytoplasm and is translated into mCAT2 protein. The long transcript, CTN, has an extended 3′-UTR that contains hyper-edited inverted repeats and forms hairpins recognized by the DBHS proteins within paraspeckles, leading to the retention of CTN in paraspeckles. When the cell is under stress, the CTN RNA will be cleaved and polyadenylated to produce the mCAT2 RNA, which is then released into the cytoplasm for translation into the mCAT2 protein.

Paraspeckles and nuclear retention of A to I edited RNAs

Paraspeckles are proposed to be involved in the nuclear retention of A-to-I hyper-edited RNA to regulate gene expression (Figs 2A,3). The first mRNA that was reported to be regulated by paraspeckles is the mouse CTN RNA (CAT2 transcribed nuclear RNA), which is a long isoform of the mouse cationic amino acid transporter 2 (mCAT2) transcript (Prasanth et al. 2005). The CTN RNA contains an extended 3′-untranslated region (3′-UTR), where two short interspersed nuclear elements (SINEs), retrotransposons frequently found in the mammalian genome, are inserted in an inverted orientation. The inverted repeats form intra-molecular double-stranded RNAs, and this structure is subsequently recognized by adenosine deaminase, which converts Adenosine to Inosine (A-to-I editing). NONO has inosine-binding activity (Zhang & Carmichael 2001) and is thought to be involved in the retention of the hyper-edited CTN RNA within paraspeckles. Upon conditions of cellular stress such as transcriptional inhibition or combinational stimulation of interferon-γ (IFN-γ) and lipopolysaccharides (LPS), the CTN RNA is cleaved at the 3′-UTR and polyadenylated, and the processed transcript is released into the cytoplasm, where it is translated into the mCAT2 protein (Fig. 3). This suggests that paraspeckles act as a reservoir for the CTN transcripts, which can be exported to the cytoplasm immediately upon stress to trigger the prompt upregulation of the expression of mCAT2 without the de novo synthesis of the transcript. Although human CAT2 does not have inverted repeat sequences in its 3′-UTR, many other human mRNAs are reported to contain hyper-edited 3′-UTRs (Chen et al. 2008; Faulkner et al. 2009). Indeed, it has been found that mRNAs with inverted SINE Alu repeats are hyper-edited and are retained in the nuclei of differentiated cells (Chen & Carmichael 2009). Interestingly, in human embryonic stem cells, which do not express NEAT1 and, thus, do not contain paraspeckles, these mRNAs are efficiently exported to the cytoplasm (Chen & Carmichael 2009). Upon the knockdown of NEAT1 in differentiated cells, paraspeckles are disrupted, and the retained RNA molecules are exported to the cytoplasm (Chen & Carmichael 2009). Because all the core proteins of paraspeckles are expressed in embryonic stem cells, this study demonstrated that NEAT1 is required for both paraspeckle formation and the nuclear retention of mRNAs containing inverted Alu repeats. However, it remains unclear how these mRNAs with edited Alu repeats are retained in paraspeckles and also how many of these mRNAs are retained. These mRNAs likely bind to NEAT1, either directly or indirectly, to become trapped within paraspeckles, because an interaction between the hyper-edited RNA and the NONO protein complex alone is not sufficient to mediate nuclear retention (Chen & Carmichael 2009). It is also not known whether A-to-I editing or hairpin formation is the more important determinant for the nuclear retention of the mRNAs. Recently, NEAT1 has been shown to interact with the multi-functional RNA-binding protein TDP-43 (Tollervey et al. 2011), but the functional importance of this interaction remains to be determined.

Physiological function of NEAT1 and paraspeckles

NEAT1 is differentially expressed during the differentiation of several precursor cells and may control these processes. The expressions of both NEAT1_1 and NEAT1_2 are upregulated during the differentiation of murine myoblasts to myotubes, leading to increases in the number and size of paraspeckles in the myotube nuclei (Sunwoo et al. 2009). In addition, the bovine orthologue of NEAT1 also demonstrates increased expression during muscle development in cattle (Lehnert et al. 2007). During the differentiation of mouse neural stem cells, NEAT1 is downregulated in the progenitor cells for neurons and oligodendrocytes, and its expression increases again during the specification and maturation of neurons and oligodendrocytes (Mercer et al. 2010). As mentioned above, NEAT1 is not expressed in human embryonic stem cells; it is only expressed after differentiation into trophoblasts. However, the target transcripts regulated by NEAT1 in these model systems remain unknown.

All of the studies described above are supported by evidence from experiments using cell cultures, whereas the physiological roles of NEAT1 and paraspeckles in whole organisms are still unclear. A detailed in situ hybridization analysis of NEAT1 expression in different mouse tissues showed that NEAT1_1 is ubiquitously expressed, but NEAT1_2 was restricted to specific subpopulations of cells within individual tissues (Nakagawa et al. 2011). In agreement with the studies conducted using cell cultures, paraspeckles are only observed in cells that have a high level of NEAT1_2 (Nakagawa et al. 2011). Surprisingly, NEAT1 knock-out mice are viable, fertile, and show no obvious phenotype. Even in tissues containing cells that have a high expression level of NEAT1_2, the tissue integrity is not affected by the absence of NEAT1 (Nakagawa et al. 2011). This suggests that the expression of NEAT1 and the formation of paraspeckles are not essential for mice living under laboratory conditions. The induction of NEAT1_2 transcription and the formation of paraspeckles may only occur under specific conditions, for example, in response to a specific stimulus or stress. This idea is supported by a study demonstrating that the expression of NEAT1 is induced in the central nervous system after viral infection and is also increased in patients with frontotemporal lobar degeneration (FTLD) (Saha et al. 2006; Tollervey et al. 2011). The mechanism by which the expression of NEAT1 is induced and how the formation of paraspeckles is initiated in vivo remains currently unknown. Interestingly, NEAT1 is supposedly transcribed in mouse embryonic stem cells, as its gene locus contains histone H3 subunits that are trimethylated at lysine 36, a histone marker that is associated with active transcription (Mikkelsen et al. 2007); however, the expression of NEAT1 is not detected in these cells (Nakagawa et al. 2011). This suggests that NEAT1 expression is regulated post-transcriptionally, at least in mouse embryonic stem cells.

It should be noted that the NEAT1 knockout mice still express low levels of NEAT1_1 and that the transcripts are still retained in the nucleus, albeit without the formation of paraspeckles (Nakagawa et al. 2011). This implies that the retention of NEAT1_1 in the nucleus may be mediated by a mechanism that is distinct from the mechanism leading to the formation of paraspeckles; NEAT1_1 may play additional roles that are independent of its function in paraspeckles.

MALAT1 (NEAT2): an abundant regulatory RNA in nuclear speckles

MALAT1 (metastasis-associated lung adenocarcinoma transcript 1), or NEAT2 (nuclear-enriched abundant transcript 2), is another abundant nuclear lncRNA found ubiquitously in cultured cell lines and is highly conserved in mammalian species (Hutchinson et al. 2007). MALAT1 had originally been identified as an overexpressed transcript in metastatic lung cancer (Ji et al. 2003); in fact, MALAT1 is known to be misregulated in different types of cancers (Luo et al. 2006; Lin et al. 2007; Guffanti et al. 2009). Mammalian MALAT1 is >7 kb in length (8.7 kb in human and 7 kb in mice), and MALAT1 transcripts are located in nuclear speckles, which are well-characterized nuclear bodies where RNA processing and export factors and a discrete set of splicing factors are stored (Hutchinson et al. 2007; Clemson et al. 2009; Tripathi et al. 2010). In the genomes of mammalian species, the MALAT1 locus is located adjacent to the NEAT1 locus (Hutchinson et al. 2007; Stadler 2010). MALAT1 is located 40 kb and 60 kb downstream of NEAT1 on mouse chromosome 19 and human chromosome 11, respectively (Hutchinson et al. 2007). Unlike NEAT1, MALAT1 is not a structural component of nuclear speckles because the knockdown of MALAT1 does not affect the expression and localization of certain nuclear speckle markers (Clemson et al. 2009). In addition, after mitosis, the formation of nuclear speckles occurs before the transcription and recruitment of MALAT1 (Hutchinson et al. 2007). The inhibition of transcription leads to the redistribution of MALAT1 within the nucleoplasm, and, upon resuming transcription, the relocation of MALAT1 to speckles is slower than the localization of splicing factors to speckles (Bernard et al. 2010). This suggests that MALAT1 is recruited to nuclear speckles after they are formed and that the localization of MALAT1 to speckles is dependent on transcription.

MALAT1 has been reported to affect alternative splicing through its interaction with splicing factors that are located within nuclear speckles (Fig. 2B). The 5′-end of MALAT1 is enriched with binding sites for serine/arginine-rich splicing factor-1 (SRSF1) (Tripathi et al. 2010). In addition to SRSF1, MALAT1 can interact with other serine/arginine-rich (SR) proteins and splicing factors, including SRSF2 and SRP20 (Tripathi et al. 2010). MALAT1 is required for the correct localization of these proteins to nuclear speckles and for their recruitment to the site of transcription (Bernard et al. 2010; Tripathi et al. 2010), but the depletion of MALAT1 does not have any effect on general transcription itself (Bernard et al. 2010). The knockdown of MALAT1 leads to an increase of the hypo-phosphorylated form of SRSF1 and affects the splicing patterns of alternative exons that are regulated by SRSF1 (Tripathi et al. 2010). The effects of MALAT1 on splicing are likely due to the changes in the ratio of phosphorylated and hypo-phosphorylated SR proteins and subsequent changes in the localization and binding preferences of these splicing factors (Tripathi et al. 2010).

MALAT1 is also the precursor of a conserved tRNA-like small RNA, mascRNA (MALAT1-associated small cytoplasmic RNA), which is 61 nt long and is located exclusively in the cytoplasm (Wilusz et al. 2008) (Fig. 2B). MALAT1 is cleaved at the 3′-end by RNase P to produce the 3′-end of the aforementioned abundant MALAT1 transcript and the precursor of the mascRNA, which is further modified by RNase Z to produce the mature small RNA (Wilusz et al. 2008). However, mascRNA is short-lived compared to MALAT1, and its function is currently unknown. As mentioned above, 3′ end of NEAT1_2 is also processed in a similar manner as Malat1; however, cleaved product of short RNA has not even been detected unlike mascRNA (Sunwoo et al. 2009).

MALAT1 is highly expressed in a wide range of mammalian tissues (Ji et al. 2003; Hutchinson et al. 2007; Chen & Carmichael 2009; Bernard et al. 2010). In adults, the brain expresses a high level of MALAT1, and MALAT1 has also been implicated in the development of the nervous system. The knockdown of MALAT1 in a mouse neuroblastoma cell line affects the expression levels of genes that are involved in synapse formation and dendrite development, in addition to the genes that are related to nuclear function and organization, as shown by microarray analysis (Bernard et al. 2010). Furthermore, the depletion of MALAT1 decreases the number of synapses in cultured hippocampal neurons through the downregulation of genes functioning in synapse formation, including neuroligin-1 and SynCAM1 (Bernard et al. 2010). MALAT1 is also repressed in the progenitor cells of neurons and oligodendrocytes but is expressed during the late stages of neuronal and oligodendrocyte development (Mercer et al. 2010). Finally, MALAT1 affects the invasion of the placenta by trophoblasts (Tseng et al. 2009), which is likely due to the effect of MALAT1 on cellular motility. Recently, MALAT1 was reported to affect the transcript levels of genes involved in the regulation of the extracellular matrix and cytoskeleton components and also to enhance the cellular motility of cancer cells in a wound-healing assay (Tano et al. 2010).

Gomafu (RNCR2, MIAT): a novel splicing regulator?

Gomafu, or RNCR2 (retinal non-coding RNA 2), is a nuclear-retained lncRNA that is located in novel subnuclear domains and is associated with the nuclear matrix (Sone et al. 2007). Gomafu is also called MIAT (myocardial infarction-associated transcript) due to its connection to myocardial infarction, which was revealed by a large-scale SNP analysis (Ishii et al. 2006). Gomafu RNA is 9 kb long and is spliced and polyadenylated (Sone et al. 2007). It contains seven exons and is alternatively spliced to produce more than 10 isoforms (Sone et al. 2007). The Gomafu transcripts are scattered throughout the nucleus and display a punctate pattern that does not overlap with the markers for known nuclear bodies (Sone et al. 2007). Unlike paraspeckles and nuclear speckles, the integrity of Gomafu-containing nuclear bodies and the localization of Gomafu are not affected by the inhibition of transcription (Sone et al. 2007). The function of these Gomafu-containing domains is not clear, and the protein factors that are located in these domains have yet to be identified. Multiple fragments of Gomafu can be retained in the nucleus when exogenously expressed, suggesting that there may be redundant nuclear localization signals widely distributed throughout the Gomafu transcript (Tsuiji et al. 2011). Gomafu is evolutionarily conserved from mammals to birds and amphibians (Rapicavoli et al. 2010; Tsuiji et al. 2011). In contrast to NEAT1 and MALAT1, which are expressed in a wide range of tissues, Gomafu is only expressed in subsets of neurons (Blackshaw et al. 2004; Rapicavoli et al. 2010; Tsuiji et al. 2011). However, the molecular mechanism for the regulation of Gomafu expression is unknown.

Gomafu contains multiple tandem copies of the sequence “UACUAAC”, which is the consensus branch-point sequence within the introns of Saccharomyces cerevisiae (Rapicavoli et al. 2010; Tsuiji et al. 2011). The splicing factor 1 (SF1), which recognizes branch points, also binds to these sequences in Gomafu both in vitro and in vivo (Tsuiji et al. 2011). However, this interaction is not required for the localization of Gomafu, since the deletion of these UACUAAC repeats from Gomafu does not affect its nuclear localization (Tsuiji et al. 2011). Importantly, during in vitro splicing reactions, the addition of the repeat-containing Gomafu fragment affects the kinetics of the formation of the spliceosome and delays the removal of the intron, which can be rescued by adding a nuclear extract prepared from cells over-expressing SF1 (Tsuiji et al. 2011). Notably, this inhibitory effect of the Gomafu repeats on splicing reactions is only observed when reporter pre-mRNAs with suboptimal branch-point sequences are used as the splicing substrates (Tsuiji et al. 2011). These observations suggest that Gomafu may function in a manner that is similar to MALAT1 and may affect splicing by binding to splicing factors and affecting their availability to the pre-mRNAs (Fig. 2C). Because only a small fraction of SF1 is co-localized with Gomafu, it is expected that only a portion of the splicing events regulated by SF1 will be affected by Gomafu (Tsuiji et al. 2011).

Several studies have implied a functional involvement of Gomafu in the development of the nervous system. Gomafu is highly expressed in fetal mouse brains, and its expression persists in the adult brain in a subset of neurons (Ishii et al. 2006; Sone et al. 2007). Gomafu is expressed in differentiating neurons and oligodendrocytes, but it is not expressed in progenitor cells before the specification of lineage (Mercer et al. 2010). In the mouse retina, the expression of Gomafu begins in the first postmitotic retinal ganglion cells; later, it becomes restricted to the retinal ganglion and amacrine cells (Sone et al. 2007; Rapicavoli et al. 2010). These data suggest that Gomafu is initially expressed in differentiating cells and persists only in specific postmitotic cells. In addition, the knockdown or dominant negative form of Gomafu affects the differentiation of the retinal cells in developing retinas and causes an increase in the number of amacrine interneurons and Müller glial cells (Rapicavoli et al. 2010). Although the molecular basis for this effect on neuronal differentiation remains unclear, it would be intriguing to examine whether there are any changes in the pattern of alternative splicing of pre-mRNAs in the retinal cells in which the differentiation is affected by the loss of Gomafu function.

Closing remarks

Despite their inability to encode proteins, lncRNAs are now recognized as important regulators of gene expression. The functions of the lncRNAs are very diverse and can affect different biological processes. In this review, we have summarized the current knowledge of a group of lncRNAs that accumulate abundantly in the nucleus. NEAT1, MALAT1 and Gomafu localize to distinct nuclear bodies in the nucleus and regulate gene expression by maintaining the structure or function of the nuclear bodies to which they locate and by affecting the splicing and transport of specific target mRNAs (Fig. 2). Interestingly, these lncRNAs are highly conserved among mammalian species, suggesting that they may have evolved to regulate the more complex physiological systems that are present in higher vertebrates (Hutchinson et al. 2007; Sone et al. 2007). We are only beginning to understand their functions and mechanisms of action.

One of the major tasks in the field is to determine the physiological roles of the lncRNAs in an organism. Although they are differentially regulated during development and are required for certain developmental processes in particular experimental systems, their precise molecular mechanisms remain unclear. It will be essential to identify their physiological target genes in different tissues and during different stages of development. A paradox surrounding the lncRNAs is their non-apparent necessity in vivo. As mentioned before, knockout mice of NEAT1 are viable and fertile and do not display any phenotypic defects. Similarly, knockout mice of either MALAT1 or Gomafu also do not have any observable defects (S. Nakagawa, unpubl. data, 2011). Thus, the lncRNAs may be regulating important biological processes, but they are not absolutely required for the survival of organisms, which may be due to a functional redundancy, with the deletion of their functions compensated for by uncharacterized lncRNAs that can perform the same functions. Another possibility is that these lncRNAs regulate specific transcripts only under particular stresses, such as heat shock, aging, starvation, or bacterial or viral infections, which cannot be tested in a normal laboratory setting (Nakagawa et al. 2011). Understanding the physiological roles of these lncRNAs may shed light on how these transcripts, which were previously thought to be transcriptional noise, may be important for complex organisms such as mammals.

Ancillary