Homeodomain-associated opposite strand transcripts (HOSTs).
It is common for protein-coding genes to be transcribed in head-to-head orientation, where the transcriptional start sites of the genes in question are separated by less than a few thousand base pairs and transcription occurs in opposite directions. Large-scale microarray analysis in both yeast and humans indicates that such genes are often co-regulated, as might be expected given that they share common 5′ cis-regulatory sequences (Cohen et al.,2000; Trinklein et al.,2004). A surprising finding to emerge from large-scale cDNA sequencing efforts is that noncoding, mRNA-like transcripts are also often transcribed in a head-to-head, or discordant, orientation with protein-coding genes (Carninci et al.,2005; Katayama et al.,2005). Though in some such cases the unspliced and the processed mature mRNAs from these pairs overlap, there is usually no obvious mechanism for homologous base-pairing between the two transcripts. It is thus more appropriate to refer to these mRNA-like ncRNAs more broadly as opposite-strand transcripts (OST) rather than antisense transcripts, as they have been sometimes termed.
The first OST identified that corresponded to a protein-coding gene that plays an important role in retinal development was the Six3-associated OST, initially termed RNCR1 (Blackshaw et al.,2004). Analysis of SAGE tags for Six3 and RNCR1 revealed that these two transcripts showed an almost identical temporal expression pattern, with expression of both RNAs high in retinal progenitors and decreasing dramatically with the end of retinal neurogenesis. A systematic investigation of OSTs associated with retinal homeodomain factors confirmed the prominent expression of the Six3OS transcript and identified seven other homeodomain-associated opposite-strand transcripts (or HOSTs) partners (Alfano et al.,2005). These HOSTs are named Pax6OS, Six3OS, Six6OS, Vax2OS, CrxOS, Otx2OS, Pax2OS, and RaxOS for their partnered transcription factors Pax6, Six3, Six6, Vax2, Crx, Otx2, Pax2, and Rax, respectively. Retinal expression of each HOST was confirmed by reverse transcriptase (RT)-PCR analysis. Overlap in exonic or intronic sequence is sometimes seen between the 5′ end of the HOSTs and some isoforms of the associated coding mRNA, although this is clearly not the case for all HOSTs or homeodomain transcripts. For instance, though one alternative 5′ isoform of Six3 is found in an intronic sequence of SixOS, it is clear from full-length EST and CAGE tag analysis (Geng et al.,2007; Rapicavoli and Blackshaw, unpublished data) that this is a relatively uncommon isoform of Six3 and that the most commonly used transcriptional start site for Six3 lies several kilobases downstream of the common start site for Six3OS. Examination of EST data implied that the HOSTs are usually spliced as well as polyadenylated, and elaborate alternative splicing is sometimes seen; this is particularly pronounced for Six3OS, which shows at least 10 different splice forms (Alfano et al.,2005; Geng et al.,2007).
The majority of these HOSTs lack any clear protein coding potential, and have relatively little primary sequence homology among mammalian species, despite often being found in syntenic positions. Many, such as Six3OS, undergo such extensive alternative splicing that little common sequence is found among the various isoforms. It is thus highly plausible that they represent genuine ncRNAs. The longest transcript of Six6OS, however, does appear to encode an evolutionarily conserved protein of unknown function, although several full-length alternative isoforms of this OST appear to be noncoding (Alfano et al.,2005). One other prominent exception is CrxOS, which encodes a highly divergent homeodomain protein, though one predicted to have a reduced likelihood of protein coding potential relative to experimentally verified ORFs (Alfano et al.,2005). In humans, the atypical homeodomain Tprx1 is found as an OST for Crx (Booth and Holland,2007). These two putative protein coding genes have no primary homology to each other aside from belonging to the homeodomain superfamily. It is thus possible that some HOSTs might also contain ORFs, but that these ORFs are under a high degree of divergent selection and not detectable via the phylogenetic analysis typically used to identify protein coding genes. It is also possible that some OSTs may encode both biologically active ncRNAs and proteins, as is the case for the RNA-based steroid receptor coactivator Sra-1 (Chooniedass-Kothari et al.,2004) Functional studies designed to differentiate these two possibilities will be needed to directly address this possibility.
Both the cellular expression pattern and the RNA expression level of a given HOST RNA and its partnered homeodomain transcription factor can be either concordant or discordant in the retina. RNA expression levels of both the HOST and its associated coding sequence can be practically identical, as is demonstrated for Six3 and Six3OS (Fig. 1). RNA levels of the HOST transcript can also be far less abundant than the coding transcript, as is seen for Six6 and Six6OS. Cellular expression patterns of HOSTs and coding sequences can likewise converge or diverge. Six3OS is essentially coexpressed with Six3 in retinal and diencephalic progenitor cells, while several other regions of the developing brain that express Six3 do not express Six3OS (Geng et al.,2007). In the mature retina, different Six3OS splice forms show differing expression, with some isoforms coexpressed with Six3 in retinal ganglion cells while others are restricted to Muller glia, which do not express Six3 (Blackshaw et al.,2004; Geng et al.,2007). Other HOSTs are expressed in very different cellular patterns from their partnered homeodomain transcription factors. Both CrxOS and Otx2OS are expressed primarily in amacrine and ganglion cells in the adult retina, while Crx and Otx2 are expressed in photoreceptor and bipolar cells, and thus show essentially complementary patterns of expression (Alfano et al.,2005). The only other HOST whose cellular expression pattern has been examined in retina is Vax2OS, which was independently identified as a transcriptional target of both Crx and Nrl in a recent microarray-based screen (Corbo et al.,2007; Hsiau et al.,2007). Vax2OS is selectively expressed in rod photoreceptors and, uniquely for rod-enriched transcripts, at higher levels in ventral than in dorsal retina. This is reminiscent of the embryonic expression pattern of Vax2, which is confined to ventral retinal progenitors (Barbieri et al.,1999; Ohsaki et al.,1999). Vax2 is weakly expressed in the outer nuclear layer of the adult retina, with RNA accumulating in the outer plexiform layer, and thus is also likely to overlap with Vax2OS in adult retina.
In some cases, there appears to be either a mutually reinforcing or a reciprocal relationship between expression levels of the HOSTs and their partnered homeodomain transcription factor. Mutually reinforcing relationships are seen for Vax2 and Vax2OS, as mice bearing a targeted deletion of Vax2 also showed a decrease in Vax2OS RNA (Alfano et al.,2005). This same study, however, reported that reciprocal relationships are seen for Crx and CrxOS, as overexpression of CrxOS by adenoviral transduction in postnatal retina leads to a decrease in Crx mRNA levels. A caveat applies here though, because the isoform of CrxOS selected contains an open reading frame and may be translated into protein, as discussed above, and these experiments did not directly determine whether these effects are mediated by CrxOS-encoded protein or by the CrxOS RNA itself. In the case of Six3 and Six3OS, neither relationship was observed, as mice mutant for Six3 showed no change in Six3OS expression (Geng et al.,2007).
Novel OSTs associated with retinal transcription factors.
We further investigated the extent to which OSTs are found in association with transcription factors that are prominently expressed in the developing mouse retina. We compiled a list of 100 transcription factors that have been either been previously shown to regulate retinal cell fate specification or to be expressed in specific retinal cell types during development (Blackshaw et al.,2004; Gray et al.,2004). We found that 35 of these transcription factors had ESTs in Genbank corresponding to a putative OST, including 18 out of a total 34 homeodomain-containing transcription factors (see Supp. Table ST1, which is available online). This implies that the phenomenon of OSTs is relatively widespread for developmentally important transcription factors, but not exclusively for homeodomain factors.
Though some of these novel OSTs have been previously reported in other studies, their retinal expression has not been examined (Engstrom et al.,2006). To identify OSTs expressed at readily detectable levels in the retina, we investigated whether SAGE tags corresponding to any of these OSTs were expressed and found that 10 of the 34 OSTs identified were detected, all but one of which were spliced. Five of these OSTs had been previously reported in the retina (Six3OS, Six6OS, CrxOS, Otx2OS, and Vax2OS) (Alfano et al.,2005), while five others are partnered with mRNAs for Lhx1, Prox1, Hmx1, Zfhx4, and Hes5 (Fig. 2). Several of these, including Lhx1OS, Prox1OS, and Zfhx4OS are present at substantially higher levels than the previously identified HOSTs, with the exception of the abundant Six3OS transcript. RaxOS and Pax6OS are not detected at all in the SAGE data set, while OSTs such as CrxOS and Otx2OS present only once each in a pool of over 500,000 total retinal SAGE tags (see Supp. Table ST2 for a list of all SAGE tag counts for the retinally expressed OSTs and associated transcription factors). The cellular expression patterns of these newly identified OSTs remains to be investigated.
The extent to which OSTs are found in association with retinal transcription factors in species other than mice is not clear. Previous studies reported that OSTs are associated with retinal-expressed transcription factors in humans as well as mice (Alfano et al.,2005). These human transcripts often share little primary sequence homology with their murine counterparts, with the position of exon-intron boundaries varying substantially. More surprisingly, the genomic sequence transcribed by the human OSTs often only partially overlaps the equivalent mouse OST, or in some cases does not overlap at all (Alfano et al.,2005; Babak et al.,2005). Clearly, the evolutionary constraints on OSTs are far more relaxed than for their associated coding transcripts, a fact that makes identifying true homologues challenging, particularly in nonmammalian vertebrates.
Nonetheless, identifying OSTs associated with retinal transcription factors is quite straightforward if one simply asks whether OSTs of any sort are present within 5 kb of the transcriptional start site of the relevant coding transcript. Using these criteria to search the genomes of chick, frog, and zebrafish, we determined that OSTs are found associated with 22 out of 37 of the transcription factors that had associated OSTs in mouse in at least one of these three species (Supp. Table ST1). Nine out of the 10 retinally expressed mouse OSTs listed in Supp. Table ST2 had counterparts in nonmammalian vertebrates (see Supp. Table ST1). In nearly half of the nonmammalian OSTs, the associated OST was spliced, implying that these transcripts represent bona fide mRNAs. These data imply that OSTs are found throughout the vertebrate lineage, though the question of whether there is a difference in the number or complexity of OSTs among different vertebrates awaits further clarification.
How might OSTs work?
What might the function and mechanism of action of these OSTs be? Given their close proximity to developmentally important protein coding genes, the most obvious hypothesis would be that they act in cis to regulate expression of their associated protein coding gene via transcriptional facilitation or interference (Fig. 3A). Numerous examples of naturally occurring transcriptional interference or transcriptional gene silencing are known, where the act of transcription through an enhancer or promoter region of a nearby gene reduces or eliminates expression of that gene. Some cases, such as the SRG1 transcript in yeast that blocks activation of the SER3 gene, involve transcriptional regulation of a protein coding gene by an ncRNA (Martens et al.,2004,2005). The converse, transcriptional facilitation, where the act of transcription of an ncRNA through the regulatory region of a nearby gene enables access of transcriptional activators to enhancer elements in the transcribed region, can also occur (Ho et al.,2006).
At least one retinal noncoding OST may act in trans to regulate retinal cell fate. We have recently observed that ectopic overexpression and knockdown of Six3OS/RNCR1 in neonatal retina result in changes in cell fate. Interestingly, these changes partially phenocopy the changes seen with Six3 overexpression and dominate negative expression, indicating that Six3 and Six3OS/RNCR1 may also interact in vivo to regulate cell fate (Rapicavoli and Blackshaw, unpublished data).
Other studies performed outside the retina suggest that OSTs may at least in part function in trans to regulate transcription of nearby genes (Fig. 3B). In this case, the ncRNA itself may have functions that are distinct from the simple act of transcription through the genomic locus covered by the transcript (Fig. 2). This has been perhaps most clearly demonstrated by work on Evf-2, a ncRNA associated with the Dlx5/6 locus (Feng et al.,2006), which is itself strongly expressed in retinal progenitors (Rapicavoli and Blackshaw, unpublished data). Evf-2 can act in trans to activate transcription at the Dlx5/6 locus by interacting with Dlx2, another Dlx family member. Dlx5 and Dlx6 are members of a homeodomain protein family that are related to the Drosophila Distalless genes. The Dlx genes play important roles in the neuronal differentiation and craniofacial and limb patterning in development. The Dlx 5/6 genes are transcribed in convergent orientation, and important intergenic enhancer elements have been identified for the Dlx5/6 loci (Zerucha et al.,2000). Evf-2 is transcribed from ei, the ultraconserved region of the Dlx5/6 locus, which is located between Dlx5 and Dlx6. Evf-2 interacts directly with Dlx-2 to increase the activity of the Dlx5/6 enhancer by binding to the enhancer element, ei. Evf-2 thus appears to be acting at least partially as a RNA-based transcriptional coactivator.
This potential function is not unprecedented, as the ncRNA SRA has been known for some time to act as a transcriptional coactivator by directly interacting with steroid receptors (Lanz et al.,1999), while the ncRNAs NRSE and HSR have been shown to regulate the activity of REST and HSF-1, respectively (Kuwabara et al.,2004; Shamovsky et al.,2006). However, the story is likely more complex for Evf-2. The region of Evf-2 that is essential for transcriptional activation by ei by Dlx2 corresponds precisely to the ei sequence itself. It thus seems plausible that the Evf-2 RNA may directly base pair with the ei DNA sequence, thereby altering chromatin structure or facilitating recruitment of transcription factors and leading to activation of transcription in conjunction with Dlx2.
Other work from both Drosophila and mammals has linked ncRNAs transcribed from the Hox gene cluster to changes in chromatin structure that in turn lead to altered regulation of Hox gene transcription. One such Drosophila ncRNA is bxd, which contains enhancer elements for the nearby Ubx gene. Though there is agreement that transcription of bxd does indeed regulate chromatin conformation in these enhancer elements, which in turn regulate expression of Ubx, there is disagreement as to the effect of bxd and its mechanism of action. It has been suggested that bxd acts in trans to recruit histone methyltransferases to Ubx enhancer elements and thus promotes stable activation of transcription (Sanchez-Elsner et al.,2006). Other groups, however, have claimed that bxd has no effect in trans, but instead acts in cis via transcriptional interference to prevent recruitment of trithorax proteins to the enhancer elements, and thus represses expression of Ubx (Petruk et al.,2006).
Human HOX complex-associated OSTs have also been shown to regulate chromatin structure. HOTAIR is an OST located at the boundary of two chromatin domains at the HOXC locus (Rinn et al.,2007). Depletion of HOTAIR had no effect on transcription of genes in the HOXC cluster, but did lead to transcriptional activation at the HOXD cluster, which lies on a separate chromosome. HOTAIR was found to be directly associated with the Polycomb Repressive Complex 2 (PCR2), which mediates transcriptional silencing. HOTAIR depletion, moreover, resulted in loss of both Suz12 (a component of PRC2) and H3K27me3 at the HOXD locus, implying HOTAIR ncRNA acts in trans to recruit PRC2 to the HOXD locus to promote transcriptional silencing. Finally, John Mattick's group has recently demonstrated that two novel OSTs, Hoxb5/6as and Evx1as, can associate with transcriptionally active chromatin in mouse embryonic stem cells. However, the genomic regions with which these OSTs associate remains to be determined (Dinger et al.,2008).
Finally, five of the ncOSTs listed in Supp. Table ST1 overlap the 5′ sequences of their associated protein coding transcript, potentially allowing the possibility of homologous base pairing and double-stranded RNA formation between the two transcripts. This is shown for Prox1 and Hes5 in Figure 2. While it is unclear whether such duplexes actually form in vivo, naturally occurring sense-antisense pairs of this sort can inhibit translation (Werner and Berdal,2005), and can also serve as substrates for the generation of endogenous siRNAs directed against the coding transcripts (Tam et al.,2008; Watanabe et al.,2008).
ncRNAs expressed from homeodomain transcription factor loci may thus act to regulate transcription of either a partnered opposite strand transcription factor or, alternatively, a family member located at a separate locus. ncRNAs may also act in cis to block transcription by transcriptional interference or perhaps activate transcription through transcriptional facilitation. They may also act in trans to repress transcription by binding to polycomb proteins or activate transcription by binding to another homeodomain transcription factor or trithorax family members. Cis and trans-acting mechanisms of action for OSTs, moreover, need not be mutually exclusive. Interestingly, a number of chromatin-modifying proteins, including both components of the PRC2 DNA methyltransferase complexes, directly bind to RNA (Zhang et al.,2004; Bernstein et al.,2006; Jeffery and Nakielny,2004). Perhaps ncRNAs direct histone modifications in chromatin to influence expression of transcription factors. ncRNAs may thus act as critical regulators of transcription factor expression in higher organisms, at least in part by facilitating chromatin remodeling.