The emerging roles of long non‐coding RNAs in polyglutamine diseases

Abstract Polyglutamine (polyQ) diseases are characterized by trinucleotide repeat amplifications within genes, thus resulting in the formation of polyQ peptides, selective neuronal degeneration and possibly death due to neurodegenerative diseases (NDDs). Long non‐coding RNAs (lncRNAs), which exceed 200 nucleotides in length, have been shown to play important roles in several pathological processes of NDDs, including polyQ diseases. Some lncRNAs have been consistently identified to be specific to polyQ diseases, and circulating lncRNAs are among the most promising novel candidates in the search for non‐invasive biomarkers for the diagnosis and prognosis of polyQ diseases. In this review, we describe the emerging roles of lncRNAs in polyQ diseases and provide an overview of the general biology of lncRNAs, their implications in pathophysiology and their potential roles as future biomarkers and applications for therapy.

In this review, we briefly introduce typical lncRNA biogenesis and functions, and we describe the most relevant lncRNAs specifically associated with polyQ disease. The advantages and limitations of potential biomarkers involved in the diagnosis and prognosis of in polyQ disease, as well as the use of lncRNA-based therapeutic strategies, are also highlighted.

| BA S IC S OF LN CRNA S
LncRNAs are non-coding RNAs that are structurally similar to messenger RNAs, but lack an open reading frame and are longer than 200 base pairs. LncRNAs are transcriptional products of RNA polymerase II and are distributed in the nucleus and cytoplasm. 8 In 2002, Schrauwen,9 a Japanese researcher, first discovered and identified a long transcription product when sequencing a mouse DNA library and named it lncRNA. Recent studies have shown that although lncRNAs do not encode proteins, they are involved in DNA methylation, nucleolar dominance, X chromosome silencing, genomic imprinting and chromatin modification, transcriptional activation and regulation, RNA interference, intranuclear transport and other important regulatory processes. [10][11][12] Although most lncRNA sequences have only a low degree of evolutionary fidelity, a small number of sequences have been conserved among various species.
LncRNAs are believed to have arisen from the following sources: (1) a lncRNA incorporating the precursor sequence of a coding protein gene can be formed by breaking the protein-coding gene; (2) a lncRNA containing multiple exons can be reconstructed from two unrelated sequences and one separated sequence; (3) functional or non-functional ncRNA can be produced by reverse transcriptional replication of non-coding genes; (4) lncRNAs can be formed by insertion of transposons; and (5) lncRNAs can be formed by tandem replication of adjacent replicators. 13 According to the relative positions of the coding sequence of the lncRNA and the protein-coding gene, lncRNAs can be divided into the following categories: (1) sense lncRNAs overlapping with the sense strand of the protein-coding sequence; (2) antisense lncRNAs overlapping with the antisense strands of protein-coding sequences; (3) bidirectional lncRNA sequences located on the antisense strand, at a distance more than 1000 bp from the transcription start site, with the two directions of transcription being opposite; (4) intron ln-cRNA sequences located completely in the intron region of another transcript; and (5) intergenic lncRNA sequences, which are not adjacent to any protein-coding gene and originate from the gene spacer between two protein-coding genes. 14 According to their molecular mechanisms and roles, lncRNAs can be divided into signal molecules, decoy molecules, guide molecules and scaffold molecules ( Figure 1).
LncRNAs are believed to regulate gene expression at three levels: has high specificity (88%) for AD; therefore, LncRNA BACE1 may be a potential candidate biomarker for predicting AD. 19 21 These studies have shown the potential of lncRNA to serve as a molecular biomarker for the diagnosis of CNS diseases.

| LncRNAs in HD
The prevalence of HD in Europe and North America is 5-10 per 100,000 people. 22 CAG trinucleotide duplication in the Huntington gene leads to abnormal accumulation of misfolded Huntington protein (HTT) in nuclear inclusion bodies and progressive loss of striatal neurons, which are the main pathogenic factors of the disease. 23 The clinical features of HD are chorea, dystonia and cognitive or mental disorders. 24 Altered levels of lncRNAs have been found to contribute to the dysregulation of genes observed in HD and to modulate HD pathogenesis. We will review some of the consistently identified as dysregulated lncRNAs associated with HD pathology in the following section ( Figure 2; Table 1).
Human accelerated region 1 (HAR1) is a segment of the human genome found on the long arm of chromosome 20, a highly conserved genomic region consisting of a cis-antisense pair of structured lncRNAs (HAR1F and HAR1R) specifically transcribed in the nervous system. 25 35 Huntingtin antisense (HTT-AS) is a natural antisense transcript at the HD repeat locus, which forms a 5′ head-to-head divergent pair overlapping with the CAG expansion region and the 5´ UTR of HTT mRNA. 38 Zucchelli et al. have confirmed the expression of HTT-AS in the brain and implicated its participation in neuronal differentiation. 39 HTT-AS v1 (exons 1 and 3) is down-regulated in the human HD frontal cortex; however, its function remains unknown. 40 Previous studies have reported that other lncRNAs may be involved in the pathogenesis of HD. DiGeorge syndrome critical region gene 5 (DGCR5) is a neurospecific disease-associated transcript that may play an important role in the human nervous system. 41 It has been reported to be down-regulated in HD; however, no functional studies have been performed on DGCR5. 40 Taurate up-regulated gene 1 (TUG1) is highly expressed in the mammalian brain and was originally found in a genome screen for genes up-regulated after taurine treatment of developing retinal cells. 42 It has been reported to be a target of p53 and to be up-regulated in patients with HD. 43 This up-regulation, possibly induced by p53 activation, may antagonize mHTT cytotoxicity. 21

| LncRNAs in SCAs
SCAs are a complex group of fatal neurodegenerative diseases that primarily affect the brainstem, cerebellum and spinal cerebellar Up-regulated in the brain [42] BDNF-AS 11p14.1 Decreasing BDNF expression post-transcriptionally Up-regulated in the brain [54] tract. 44 Of the more than 40 SCA types, at least six (SCA1, SCA2, SCA3, SCA6, SCA7 and SCA17) are associated with polyQ disease. 45 They are clinically characterized by gait and limb ataxia, dysarthria and abnormal eye movements. SCAs usually develop in adulthood and exhibit significant clinical heterogeneity. Symptoms usually appear between the ages of 30 and 40 and progress slowly. 46 The size of the mutant allele CAG amplification is inversely correlated with the age of onset, and this phenomenon is more pronounced in patients with SCA2 and SCA7. 47  However, only a few studies have confirmed the differential expression of some lncRNAs in SCAs. NEAT1L is not only dysregulated in patients with HD but also highly expressed in the SCA1, SCA2 and SCA7 mouse brain. 34 The significance of the elevated expression of NEAT1L in SCA has not been verified experimentally, but given previous conclusions in HD studies, we infer that NEAT1L may play a protective role in the setting of CAG repeat expansion disease.
Another notable study has examined SCA7, a neurodegenerative

| LN CRNA S IN THE D IAG NOS IS AND TRE ATMENT OF P OLYQ D IS E A S E
The large number and tissue-specific expression of lncRNAs, as compared with coding genes, make them possible markers for disease diagnosis and treatment. 49 The lncRNA HTT-AS can be detected in the blood in patients with HD and thus may have potential applications in molecular diagnosis. 38 Brain-derived neurotrophic factor (BDNF) belongs to a class of secreted growth factors that are essential for neuronal maturation and survival. 50 BDNF-AS, an overlapping antisense lncRNA, has been reported to inhibit expression of BDNF at the post-transcriptional level. 51,52 The level of BDNF is diminished in the brain in patients with HD, and overexpression of BDNF in the forebrain in a mouse model has been confirmed to rescue the HD phenotype. 53 Given that BDNF plays such a key role in HD, increasing BDNF levels by down-regulating BDNF-AS may be a reasonable method for HD treatment. 54 HTT-AS may be also a prom- Although the application of lncRNAs as diagnostic biomarkers and potential treatment strategies for polyQ disease has a bright future, many difficulties remain to be overcome before clinical application. Currently, the detection of circulating lncRNA faces several challenges. For example, a consensus is lacking regarding the reference genes of circulating lncRNAs; moreover, it is not possible to determine which genes are stable and can serve as internal reference genes, and how to use appropriate reference genes to calculate the expression of circulating lncRNA. Therefore, methods to improve the accuracy of detection must be further studied. Furthermore, differentially expressed circulating lncRNAs lack specificity for specific neurodegenerative diseases. For example, NEAT1 has been found to be differentially expressed in AD, 58 Parkinson's disease 59 and amyotrophic lateral sclerosis. 60 The occurrence and development of polyQ disease is a result of the combined actions of multiple genes.
Therefore, the detection of only one type of circulating lncRNA has limited specificity and sensitivity. Combined detection of multiple lncRNAs and the combined diagnostic application with traditional serum markers can greatly improve the diagnostic value and will be an important direction in future developments. The actual mechanism of lncRNAs as a therapeutic strategy is not fully understood.
The development of Genasense failed because of the lack of indepth understanding of its mechanisms, thus revealing the importance of understanding mechanisms in drug development. 61 Second, owing to the low conservatism of lncRNAs, some lncRNAs are expressed only in primates; therefore, establishing a general experimental model is difficult. 62 For most lncRNAs, appropriate animal models have not yet been constructed, but the availability of such models will be essential to understanding lncRNA function. Third, although some experiments on the application of lncRNA have been performed, the experimental results are not very reliable because of the small sample sizes. 57 However, with the gradual advancement of lncRNA research, the prospects of using lncRNAs for the treatment of polyQ disease are broad.

| CON CLUS IONS
In recent years, researchers have gradually deepened understand-

CO N FLI C T S O F I NTE R E S T
The authors declare that they have no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data sharing is not applicable to this article, as no new data were created or analysed in this study.