Archaeal L7Ae is a multifunctional protein that binds to a distinctive K-turn motif in RNA and is found as a component in the large subunit of the ribosome, and in ribose methylation and pseudouridylation guide RNP particles. A collection of L7Ae-associated small RNAs were isolated from Sulfolobus solfataricus cell extracts and used to construct a cDNA library; 45 distinct cDNA sequences were characterized and divided into six groups. Group 1 contained six RNAs that exhibited the features characteristic of the canonical C/D box archaeal sRNAs, two RNAs that were atypical C/D box sRNAs and one RNA representative of archaeal H/ACA sRNA family. Group 2 contained 13 sense strand RNA sequences that were encoded either within, or overlapping annotated open reading frames (ORFs). Group 3 contained three sequences form intergenic regions. Group 4 contained antisense sequences from within or overlapping sense strand ORFs or antisense sequences to C/D box sRNAs. More than two-thirds of these sequences possessed K-turn motifs. Group 5 contained two sequences corresponding to internal regions of 7S RNA. Group 6 consisted of 11 sequences that were fragments from the 5′ or 3′ ends of 16S and 23S ribosomal RNA and from seven different tRNAs. Our data suggest that S. solfataricus contains a plethora of small RNAs. Most of these are bound directly by the L7Ae protein; the others may well be part of larger, transiently stable RNP complexes that contain the L7Ae protein as core component.
Emerging evidence suggests that many microorganisms contain a plethora of non-coding RNAs that are involved in a wide variety of cellular functions. These functions range from the well established roles in translation, ribosome assembly and intron splicing to the recently described involvement in the regulation of developmental genes, gene silencing, mRNA turnover and chromosomal architecture (Eddy, 2002; Mattick, 2003). Whole genome sequence analysis and annotation of both prokaryotic and eukaryotic organisms have generally failed to identify the portion of the genome relegated to the production of these non-coding RNAs or to reveal their structural and functional diversity. The recognition of most non-coding RNA genes remains a challenging task that integrates computational methods that search for features of primary and secondary structure (if known) coupled with searches for sequence conservation across multiple genomes. In contrast, protein-coding genes can generally be identified using standard gene predictors based on well established criteria.
As an alternative to these computational methods, a number of innovative biochemical approaches have been developed to identify small non-coding RNAs. Among the most productive has been the systematic cloning of cDNAs from the size selected RNA fraction that is less than 500 nt in length. This method has identified many small non-coding RNAs (smnRNAs) from both eukaryotic and archaeal species; a small fraction of these smnRNAs are members of the C/D and H/ACA box families that are involved in the processing and modification of ribosomal RNA (rRNA) and the assembly of ribosomal subunits whereas the vast majority remain uncharacterized (Tang et al., 2002a).
In eukaryotic organisms, most smnRNAs are localized predominantly to the nucleolus, the site of ribosome biogenesis, and are collectively referred to as snoRNAs. Based on distinctive structural and functional features, most snoRNAs can be classified as either C/D box or H/ACA box RNAs that guide, respectively, either methylation to the 2′-O-ribose position or conversion of uridine to pseudouridine at specific locations within the target rRNA. In both methylation and pseudouridylation, heteroduplex formation between the snoRNA guide sequence and the RNA target is a prerequisite of the reaction (Kiss-Laszlo et al., 1996; Balakin et al., 1997). As with many other RNA effectors, snoRNAs function as ribonucleoprotein complexes (snoRNPs) and each of the two types of complexes contain a characteristic set of associated proteins (Bachellerie et al., 1995; Tollervey and Kiss, 1997; Kiss, 2001).
Homologues to both the C/D and the H/ACA box families of snoRNAs have been identified in Archaea and have been extensively characterized using informatics, biochemical and structural techniques (Gaspin et al., 2000; Omer et al., 2000; Klein et al., 2002; Kuhn et al., 2002; Speckmann et al., 2002; Tang et al., 2002a; Aittaleb et al., 2003; Rashid et al., 2003). The characterization of archaeal C/D and H/ACA box RNPs has proven to be a useful model for the more sophisticated and less stable eukaryotic RNP complexes. The archaeal C/D box, methylation guide sRNAs are associated with three conserved proteins: the core L7Ae protein, the aNOP56 (sometimes called aNOP56/58) protein and the aFIB methyltransferase protein (Omer et al., 2002). The L7Ae protein plays a central role early in the assembly pathway of the box C/D RNP complex. Initially identified and annotated as a ribosomal protein, L7Ae exhibits sequence similarity and functional homology to the multifunctional human 15.5 kDa protein that has been implicated in both rRNA methylation and spliceosomal intron excision (Ban et al., 2000; Vidovic et al., 2000; Watkins et al., 2000). In vitro studies have demonstrated that both the 15.5 KDa and the L7Ae proteins recognize an RNA structural element known as the K-turn motif that is found in many different RNAs from all three domains of life (Klein et al., 2001; Kuhn et al., 2002).
In archaea, K-turn motifs have been shown to occur in rRNA, C/D box sRNAs and H/ACA sno-like RNAs (Rozhdestvensky et al., 2003). Biochemical and biophysical data have demonstrated that L7Ae specifically interacts with the K-turn motif in all three of these RNA classes (Ban et al., 2000; Kuhn et al., 2002; Rozhdestvensky et al., 2003). The requirement for the L7Ae protein in functionally distinct RNPs emphasizes the critical role this protein plays in the structure and function of these RNP machines. To understand and appreciate more fully the role and importance of this protein, we used immuno-affinity chromatography to isolate RNAs specifically associated with the L7Ae protein and used these RNAs to construct a cDNA library. This approach led to the characterization of 45 novel smnRNAs and revealed that the diversity of RNAs able to interact directly or indirectly with the L7Ae protein is substantially larger and more diverse than anticipated.
Ribosomal protein L7Ae co-purifies with box C/D methylation guide RNPs
To characterize the protein composition of archaeal C/D box RNPs, extracts of Sulfolobus solfataricus were subjected to anti-L7Ae, anti-aFIB or anti-aNOP56 immuno-affinity chromatography. The isolated complexes were separated on SDS-PAGE and analysed for the presence of L7Ae by Western blot analysis (Fig. 1). All three immuno-purified complexes contained L7Ae protein although the amount of L7Ae protein recovered from both anti-aFIB and anti-aNOP56 purified complexes appeared to be less than that recovered in L7Ae-purified complexes.
The observation that L7Ae can be co-precipitated with aFIB and aNOP56 corroborates the observation that in vitro L7Ae is the core protein required for the assembly of C/D box methylation guide sRNP. This observation prompted us to attempt to expand the spectrum of verified smnRNAs that associate with the L7Ae protein. To do this, S. solfataricus cell-free extracts were fractionated on a 10–30% sucrose gradient; gradient fractions were subjected to anti-L7Ae immuno-affinity chromatography as described in Experimental procedures. The RNA recovered form the complexes obtained after immuno-affinity chromatography displayed a pattern similar to that observed previously using antibodies against Sac aFIB and aNOP56 (Omer et al., 2000). The immuno-purified complexes were highly enriched for small (s)RNAs that are generally between 60 and 100 nucleotides in length, that sediment as a broad peak through the entire gradient (Supplementary material, Fig.S1) and that were essentially undetectable by end labelling of unfractionated total RNA (data not shown). In the lower portion of the gradient, a substantial amount of 5S rRNA was observed in the immuno-purified material; this suggests that the anti-L7Ae antibodies were able to interact to some extent with 5S rRNA or intact 50S ribosomal subunits.
Construction and analysis of the L7Ae-associated cDNA library
Since the L7Ae protein is a core component of all archaeal C/D box RNPs, we reasoned that the sRNAs that were co-immunoprecipitated with the L7Ae antibodies would likely be highly enriched for novel and unusual C/D box containing, sRNA sequences. The co-precipitated RNAs were extracted from pooled gradient fractions and the corresponding cDNAs were generated by reverse transcription polymerase chain reaction (RT-PCR), using a conventional method (Omer et al., 2000). Sequence analysis of 128 insert-containing clones from the library revealed 45 distinct sequences (Supplementary material, TableS1); to our surprise, the sequence and structural characteristics of novel library entrants was much more diverse and heterogeneous than expected. For convenience, the clone sequences were divided into six groups based on sequence, structure, genomic location and possible function (Table 1). Thirteen sequences proved to be fragments of either 7S SRP RNA or rRNA and tRNA, and were categorized as groups 5 and 6 respectively (see below).
Table 1. Analysis of the L7Ae-associated RNAs isolated from S. solfataricus.
RNA position relative to predicted ORF
RNA function/ORF annotation
Conserved in other Archaea
The non-7S, rRNA and tRNA clones are assigned sR numbers; the 7S RNA, rRNA and tRNA clones are numbered separately. The column designations are as follows: Nr, the number of identical clones sharing all or part of the core sequence (longest clone sequence illustrated in Table 1); Size, length in nucleotides of the longest clone; Northern/RT, estimated size of the expressed RNA as determined by Northern hybridization or detection on in vivo RNA by RT-PCR analysis; PE, length in nucleotides of the extension products generated using a primer complementary to the 3′ terminal sequence of the cDNA clone; BS L7Ae, band shift analysis using recombinant L7Ae protein; BS NF, band shift analysis using L7Ae, aNOP56 and aFIB recombinant proteins; Orientation, orientation of the RNA encoding sequence (underlined), relative to annotated adjacent genes; RNA position relative to predicted ORF, either the distance in nucleotides, between either of the sRNA ends and the nearest ORF boundary or the overlap with a predicted ORF boundary; Genomic locus, annotated coding genes flanking or overlapping the RNA; Predicted target, the predicted RNA target for the novel C/D box RNAs and for the H/ACA RNA; ORF annotation the predicted or known function of the ORF overlapping the novel RNA; Conserved in other Archaea, presence of orthologous sequences in other archaeal genera as identified computationally.
The species abbreviations are as follows: Sme, S. metallicus, Sto, S. tokodaii, Tvo, Thermoplasma volcanium, Dam, Desulphurolobus ambivalens.
ND, not determined; –, not detected; +, detected; ±, uncertain result; >, 5′−3′ gene orientation; <, 3′−5′ gene orientation; ≥ or ≤, orientation of the sRNA sequence.
1. sRNAs with conserved sequence/structure elements
General characterization of RNA sequences present in groups 1–4
Based on the presence of common sequence and structural motifs and/or genomic location, the remaining 32 sequences were further divided into four groups. The sequences were characterized (i) for expression and length using Northern hybridization and primer extension analysis; (ii) for the ability to bind to the L7Ae protein in band shift experiments; (iii) for phylogenetic conservation of the sequences in related archaeal genomes and (iv) in selected instances, for the ability of RNAs to function as methylation guides. Five of the sequences (sR116, sR127, sR129, sR130 and sR131) could not be detected in total cellular RNA using either primer extension or Northern hybridization, although they could be detected by RT-PCR, suggesting that they are low abundance RNAs. In length, only nine of the cDNA clones correspond closely to the full-length of the RNAs detected in vivo; most of these are members of the canonical C/D box family of sRNAs. The remaining clones appear to be fragments of longer transcripts and based on the Northern hybridization or primer extension results, are likely to represent partial degradation products derived from the longer (detectable) RNAs, although at this stage we cannot exclude the possibility that these may represent processing products. A similar pattern of size reduction was observed by Tang et al. (2002b) in the set of cDNA clones recovered from an Archeoglobus fulgidus cDNA library corresponding to the RNA fraction ranging from 50 to 500 nucleotides. Gel-electrophoresis retardation assays were used to analyse the affinity of the RNA derived from each of the cDNA clones for the L7Ae protein; 22 of the RNAs formed a detectable complex with L7Ae protein whereas complex formation with the remaining 10 could not be detected. Database searches using blastn revealed that seven of the 32 sRNA sequences had highly conserved homologues in other sequenced archaeal genomes. This reinforces the idea that these are authentic and functional small non-coding RNAs. These 32 sRNAs were divided into the four additional groups (groups 1–4; Table 1), as described below.
Group 1: sRNAs exhibiting known sequence and/or structure motifs
This group contains nine sRNAs, which were subdivided as follows: six display canonical features of archaeal C/D box ribose methylation guides, two contain degenerate C- and D-like elements separated by segments of unusual lengths and one exhibits the hallmark features of H/ACA pseudouridylation guide RNA. The second subgroup has been generically termed ‘atypical C/D box sRNAs’. The lengths of the canonical C/D box RNAs sR101–sR106 agree well with the sizes of the transcripts expressed in vivo whereas the atypical C/D box representatives appear to represent fragments of longer transcripts. Among the canonical C/D box RNAs, sR101, sR102 and sR104 sRNAs correspond to sequences that have been identified computationally (T. Lowe and P. Dennis, unpubl. results). The D or D′ box guides of these three sRNAs are predicted to guide methyl transfer at positions U435 (D box, sR101) and A635 (D′ box, sR102) in the 16S rRNA and G811 (D′ box, sR104) in 23S rRNA. Using the dNTP concentration-dependent pause primer extension assay, as described in Omer et al. (2000), we were able to confirm the presence of 2′O-methyl ribose at position G811 in 23S rRNA (data not shown); however, using the same procedure, we could not detect the presence of methyl-ribose at neither U435 nor A635, in the 16S rRNA.
Homologues of C/D box sRNA genes in other species
We next examined other sequenced archaeal genomes for the presence of sequences homologous to the box C/D RNAs sR101–sR106. For this aim, we used routine blastn searches combined with a genomic tool available at http://genome-tools.sourceforge.net/. The genomic tool allows gaps or nucleotide mismatches in the short input sequence. To test this program, we first searched for homologues of the initial set of 29 sRNAs identified biochemically in Sulfolobus acidocaldarius in the related genomes of S. solfataricus and SULFOLOBUS tokodaii. We defined homologous sRNAs, as those predicted to guide modification to the same position in a given RNA target, in distinct archaeal species (Dennis et al., 2001). Inspection of all the archaeal C/D box RNA sequences identified thus far has shown that in most cases the region of complementarity between the target and the guide extends over 9–10 base pairs. Based on this feature, the query sequence that we used in conjunction with the genomic tool corresponded to the 9- to 10-nucleotide long guide sequence and the adjacent D or D′ element as present in the S. acidocaldarius genome. The program output consisted of the genomic coordinate number where a match was found. Knowing that archaeal C/D box RNAs have a dyad modular organization, we looked for the presence of additional box elements in the region surrounding the output coordinate. The analysis identified 10 homologous groups of C/D box RNAs with members in two, three or four Sulfolobus species (alignment of these sequences is shown in the Supplementary material, TableS2). Of the 10 groups, seven have representatives in S. solfataricus; two of these were in a group of 13 sRNA genes identified previously using an archaeal sRNA gene finding program (Omer et al., 2000) and the remaining five are new genes. Thus, our strategy is proven to be robust as exemplified by the identification of five new sRNA genes in S. solfataricus and nine new sRNAs genes in S. tokodaii.
Using the same search algorithm or conventional blastn, we looked for homologues to the S. solfataricus sR101–sR106 RNAs in other archaeal genomes (Supplementary material, TableS2). No homologues were detected in other Sulfolobus species for sR101–sR104 whereas homologues were detected for sR105 and sR106. An inspection of the genomic environment of sR106 (previously annotated as sR18 in S. acidocaldarius; Omer et al., 2000) encoding gene showed that in S. solfataricus and S. tokodaii, the 3′ end of the small RNA is complementary (antisense) over two and seven nucleotides, respectively, to the 3′ end of the mRNA encoding the thiamine biosynthetic enzyme, thiI. To date, it is unknown whether this gene organization has any regulatory implications. The context of this sRNA gene in the genome of S. acidocaldarius is unknown. Inspection of the alignment of sR105 with the homologous sequence from SULFOLOBUS metallicus indicates that the most conserved region spans the D′ guide; this suggests that the guide may be used to direct ribose-methylation to an unidentified target RNA.
Putative rRNA and tRNA targets for C/D box sRNA guides
To find putative rRNA or tRNA targets for the C/D box sR103, sR105 and sR106, the S. solfataricus genome was searched for antisense sequences able to form a minimum of a 9 bp long complementarity with the D or D′ guide regions of the respective sRNAs. The resulting hits were manually sorted after discarding the perfect match to the sRNA gene and then ranked according to the length of the perfect match. The longest match to the Sso sR106 D′ guide element was 11 nucleotides within the S. solfataricus tRNAMet, tRNATyr and tRNAPhe; the predicted site of 2′O ribose-methylation was G52. In S. tokodaii, the homologue to sR106/sR18 contains the exact same D′ guide sequence but is predicted to guide methylation to position G52 within a somewhat different set of tRNAs: two tRNAMet, two tRNAThr, tRNAVal, tRNATrp and tRNAPhe.
The D box guide of sR106 also exhibited a nine nucleotides complementary match to a genomic sequence to position C22 of another RNA (sR117) that was recovered in our library (see below). Using our in vitro methylation assay, we attempted to demonstrate the D box guide activity of sR106 against a transcript of sR117 without success (data not shown). We suspect that this failure may be related to the fact that both the guide and the target RNAs are able to bind the C/D box proteins and that this may interfere with proper complex formation and/or activity.
No potential rRNA, tRNA or other non-coding RNA targets were identified for the D or D′ guides of either sR103 or sR105; we speculate that these sRNAs might modify unknown RNAs, although at this stage, alternate chaperone function cannot be excluded. In these analyses, many of these guide sequences exhibit complementarity to protein coding sequences. However, understanding the significance of these matches requires further investigation.
Assembly and methylation activity of RNP complexes
Canonical C/D box sRNAs are known to assemble along with proteins L7Ae, aNOP and aFIB into functional RNPs ribose-methylation machines (Omer et al., 2002). The group 1 canonical sRNAs were tested for their ability to assemble into complexes using standard band shift assays. As exemplified by sR101 (Fig. 2 and data not shown) all six canonical C/D box RNAs (sR101–sR106) were able to assemble into higher order structures in the presence of the three proteins (Fig. 2B and C) and were able to direct methylation to RNA oligonucleotides that were complementary to one of the two guide regions (Fig. 2D).
SR107 and sR108 RNAs
The sR107 and sR108 RNAs have some of the structural features of canonical C/D box sRNAs but the spacing of the box sequences is atypical. To obtain information on the possible function of these two atypical C/D box RNAs, we tested their ability to form complexes with the three C/D box binding proteins and to guide methylation in vitro, to short oligonucleotide targets (Fig. 3C and D). The RNA targets were designed to form a 10-nucleotide perfect helix with the region upstream the D or D-like element with methylation expected to occur at the −5 position. Both RNAs were able to guide a low level of methyl transfer; the activities were only about 13% of that obtained with the control S. acidocaldarius sR1 RNA (2 pmoles of product for the atypical sRNAs compared to the 15 pmoles of product incorporated for the canonical sR1 sRNA) (Fig. 3E). These values are very close to the sensitivity of detection for the in vitro assay. The low level of activity associated with sR107 may have been due to the fact that only seven nucleotides are available in the loop to base pair with the target oligonucleotide. Alternatively, reduced activity of the atypical sR107 may be the result of partial or inefficient assembly of this RNA into higher order complexes, as suggested by the low amount of complex II and III formed in presence of aNOP and aFIB proteins. To distinguish between these two possibilities, we inserted the D box guide of sR1 into the corresponding position of sR107 (Fig. 3F); this expands the loop and the potential for guide–target interaction from seven to nine nucleotides and the activity was enhanced fourfold (Fig. 3E). These studies suggest that the atypical C/D box sRNA can assemble into higher-order RNP complexes in the presence of L7Ae, aNOP56 and aFIB and that these complexes may possess the ability to direct methylation to appropriate target oligonucleotides. Alternatively, these atypical C/D boxes sRNA may function in processes unrelated to nucleotide modification. In this instance, suboptimal recruitment of aNOP56 and aFIB may be the consequence of protein–protein interactions mediated by L7Ae only. Further work will be required to fully understand the structure, activity and function of these non-canonical C/D box RNAs.
The sR107 coding region is single copy, located in an intergenic region and has a homologue in S. tokodaii. Northern hybridization results indicate that the in vivo transcript containing the cDNA sequence was about 190 nucleotides in length. Attempts to map the 5′ end of the in vivo transcript by primer extension were unsuccessful. However, the correlation between the length of the conserved sequence and the in vivo transcript length suggests that the highly conserved region corresponds to the coding region of this transcript (Fig. 3A).
The sR108 RNA was derived from a 297-nucleotide long sequence that is repeated 14 times in the genome and contains an imperfect match to a number of other related genomic sequences. Northern hybridization and primer extension analyses indicate that the detectable in vivo transcript is transcribed from the distal 200 nucleotides of the repeat sequence and that the cDNA sequence recovered in the library is derived from the middle of the in vivo transcript. A single nearly perfect but truncated copy of the same repeat is found in the genome of S. tokodaii (Fig. 3B).
sR109 – an H/ACA RNA
Typical eukaryotes such as yeast and humans, contain dozens of pseudouridine modifications in their large and small subunit rRNAs (Maden, 1990; Ofengand and Fournier, 1998). Most or all site-specific pseudouridine modifications are introduced during ribosome assembly and are directed to selected locations within the rRNAs by the guide function of the H/ACA family of snoRNAs (Kiss-Laszlo et al., 1996; Ganot et al., 1997; Ni et al., 1997; Weinstein and Steitz, 1999). Archaea and Bacteria typically contain fewer than a dozen pseudouridine modifications in their rRNA. In Escherichia coli, all of these modifications appear to be enzyme-mediated and none are known to involve or require an RNA guide function (Massenet et al., 1999). In contrast, in at least one archaeal example (A. fulgidus), a cDNA library made from small RNAs captured four H/ACA-like sRNAs and pseudouridine modifications were detected in the rRNA at three of the four sites predicted from the guide sequences in these RNAs (Tang et al., 2002a).
The sR109 sequence from the S. solfataricus L7Ae library has several of the hallmark features of an H/ACA RNA: it has (i) the conserved ACA sequence at the 3′ end; (ii) the sequence and structural features of pseudouridylation pocket including antisense guide elements that are predicted to target modification to position U2598 (U2457 in the E. coli numbering system) in 23S rRNA and (iii) a K-turn motif that is predicted to be bound by the L7Ae protein (Fig. 4A). Pseudouridine modification at the predicted position is common and has been observed in organisms ranging from E. coli to humans (Ofengand and Bakin, 1997; Massenet et al., 1999). The ability of sR109 to bind to the L7Ae protein was confirmed by band shift analysis and the presence of a pseudouridine modification at the predicted position U2598 was confirmed by 1-cyclohexyl-3-(2-morpholinoethyl)carbodiimide metho-p-toluenesulfonate (CMC) treatment and primer extension (Fig. 4B and C). Mutational analysis, where the two adjacent sheared GA base pairs of the putative K-turn were replaced with four C residues, was used to confirm the site of L7Ae binding within the sR109 sequence (Fig. 4D and E).
Group 2: sense RNAs
This group includes 13 RNAs that are encoded on the sense strand of annotated open reading frames (ORFs). Depending on the position of the RNA gene relative to the protein encoding gene, we subdivided this group into three categories: (i) RNAs overlapping the 5′ end of ORFs; (ii) RNAs overlapping the 3′ end of ORFs and (iii) RNAs encoded within ORFs. Within the first subdivision, four clones (sR110–sR113) map to the region upstream of the initiation codon and extend 3′ into the protein coding region (Fig. 5). In the first instance the overlapping gene encodes subunit four of formate hydrogenlyase (hycD) and in the other three instances the overlapping genes are transposases. The sR110–112 sequences are characterized by a long hairpin that is interrupted by bulge nucleotides or internal loops; the initiation codons are located either within the stem structure or at the 3′ base of the stem (Fig. 5A–C). These three sRNAs are able to form a stable complex with the L7Ae protein and each of the three stems contains an easily recognizable K-turn motif. Somewhat surprisingly, sR110–112 transcripts can form a higher-order complex in the presence of all three C/D box-binding proteins. Although the 5′ ends of the in vivo transcripts from which these cDNA fragments are derived, have not been mapped, it is likely that all of these cDNAs represent the immediate 5′ end of the mRNAs. This is supported by the observation that the 5′ regions flanking the cDNA sequences contain AT-rich promoter-like elements centred about 25–30 nucleotides upstream of the start of the cDNA sequences. At the present time there is no direct experimental data to suggest the function of a K-turn motif at the 5′ end of these mRNAs.
The second subgroup representing sRNAs that overlap the 3′ end of coding regions contains three members (sR114–116). The sR114 sRNA contains a region of secondary structure that overlaps the translation termination codon of a hypothetical transposase-related protein and contains a K-turn motif (Fig. 5D). Primer extension analysis showed that the in vivo RNA is about 50 nucleotides longer than the corresponding cDNA clone (Table 1). Informatics analysis identified homologues of the sR114 sequence in the chrenarchaeon S. tokodaii and in the moderate thermophilic euryarchaeon, Thermoplasma volcanium. Interestingly, in S. solfataricus and S. tokodaii, sR114 overlaps the gene encoding a hypothetical protein, with homology to transposase 1974, whereas in T. volcanium, the corresponding sRNA sequence lies in an unrelated non-protein coding region (Supplementary material, Fig.S2). Inspection of the three sequences reveals extensive conservation over the region represented by the cDNA clone and the additional 50 nucleotides of 5′ flanking sequence. Although we noted the presence of a promoter-like sequence element (TTTAAGT) centred about 28 nucleotides upstream of the 5′ end of the in vivo transcript, we cannot predict at this time whether the sRNA is generated by processing of the transposase mRNA or is independently transcribed from this putative internal promoter.
There are two additional sequences in the genome of S. solfataricus that exhibit significant similarity to the sR114 gene [score = 65.9 bits, expect = 9e-09]. These correspond to a non-coding region similar to that found in T. volcanium and to a truncated form of transposase 1974. All five of these RNA sequences appear to contain the C/D box-like motifs as illustrated in Supplementary material, Fig.S2. The sR114 RNA was able to bind L7Ae with high affinity and to form at somewhat reduced efficiency, a higher-order structure containing aNOP56 and aFIB. The location of the L7Ae binding site within the sR114 has been confirmed by mutational analysis of the K-turn motif (data not shown).
Within group 2, four sRNAs (sR113, sR115, 116 and 119) contain no K-turn like motifs and are unable to bind the L7Ae protein. It is unclear why these sRNAs were recovered in our library. They might represent fragments of a larger RNA or part of a larger complex that binds the protein. Conversely, these RNAs might represent non-specific sequences carried over with the pool of L7Ae-associated RNAs. It is interesting to note that sR113 and sR116 are derived from the same transposase ISC 1476 mRNA and that neither fragment has a K-turn motif (Table 1).
The third subgroup contains six sRNAs (sR117–122) that are contained entirely within the respective coding regions of predominantly transposase genes (Table 1). Five of these contain recognizable K-turn motifs and are able to form a strong complex with the L7Ae protein and a somewhat weaker complex in the presence of all three C/D box binding proteins (data not shown).
Our results have identified a plethora of mRNAs (most but not all encoding transposable element proteins) containing a functional K-turn motif that are able to bind the L7Ae protein. The binding motif can apparently be located anywhere along the length of the RNA – at the 5′ end overlapping the initiation codon, in the middle and completely within the coding region or at the end and extending into the 3′ flanking region. Although a mechanism is not obvious, it seems likely that these motifs play a critical role in controlling the expression or stability of these mRNAs.
Group 3: sRNAs in intergenic regions
There were three sRNAs (sR123–125) clones that are contained completely within intergenic regions. sR124 is a fragment of a longer RNA that has been previously described [Tang et al., 2002b; Supplementary material, Fig.S3(A)]. This RNA is generated by processing of the precursor rRNA transcript and ligation of sequences from the 5′ external transcribed spacer (5′ ETS), internal transcribed spacer (ITS) and the 3′ external transcribed spacer (3′ ETS). The two related clones recovered in our library are missing the 5′ ETS sequences and the proximal portion of the ITS sequence but contain the distal portion of the ITS, the ligation junction in the 23S rRNA processing stem and the 3′ ETS. This RNA contains a K-turn structure located in an asymmetric loop at the base of the processing helix that binds the L7Ae protein [Tang et al., 2002b; Supplementary material, Fig.S3(A)]. The K-turn motif at this position is conserved in S. acidocaldarius and S. tokodaii (Tang et al., 2002b and data not shown).
The 56-nucleotide long sR125 is positioned in the 87-nucleotide long intergenic space between two convergently transcribed ORFs. The sRNA has a helical secondary structure containing a K-turn motif and is able to bind the L7Ae protein [Supplementary material, Fig.S3(B)]. It is unclear if this sRNA is independently transcribed or co-transcribed with the upstream ORF and generated by mRNA processing. We have found no indication of a function for this sRNA.
Group 4: antisense sRNAs
This group of seven sRNAs is divided into two subgroups: five that are antisense to protein encoding mRNAs and two that are antisense to sequences that exhibit the features of canonical C/D box sRNAs (Table 1). In the first subgroup, four of the five antisense RNAs overlap transposase encoding genes and one overlaps an unknown hypothetical protein encoding gene. All of these were confirmed either by Northern hybridization or by RT-PCR. The complementarities between the antisense RNA and the target mRNA can apparently occur at the beginning, middle or end of the mRNA transcript (Fig. 6A). Indeed, two of the clones (sR126 and 129) exhibit, respectively, antisense complementarity to the beginning and end of the same transposase ISC1439 mRNA. Only the last two (sR129 and 130) are able to bind to the L7Ae protein and have recognizable K-turn motifs (Fig. 6A). The presence of the non-L7Ae binding antisense sequences in our library suggests that in vivo, they are in complexes that contain the L7Ae protein. In E. coli, antisense sRNAs have been implicated in the regulation of translation; several mechanisms involving RNA–RNA or RNA–protein interactions result in inhibition or promotion of ribosome binding and ultimately changing translational efficiency (Wagner et al., 2002; Wassarman, 2002). The sR126 and sR127 antisense RNAs are complementary to the 5′ end of the transposase and might participate in regulation of translation using a mechanism similar to the one identified in E. coli. Moreover, it seems reasonable to suggest that novel mRNA/antisense RNA complexes may be important for other types of translational process that have yet to be described. The L7Ae protein is likely to be a critical component in these processes in S. solfataricus.
The second subgroup of antisense RNAs were complementary to canonical C/D box methylation guide sRNAs (Fig. 6B). Neither of these RNAs had recognizable K-turn motif and neither was able to bind the L7Ae protein. We suspect that they appear in our library because they are at some point, in complexes with the sense C/D box RNAs that are demonstrated substrates for L7Ae protein binding. The sR131 was observed to be the antisense partner to the previously characterized C/D box sR4 RNA of S. solfataricus (Omer et al., 2000). The D′ and D box guides of sR4 are predicted to direct methylation to positions G894 in 23S rRNA and to position C277 in 16S rRNA respectively. Only C277 ribose modification in 16S rRNA has been confirmed using the concentration-dependent pause primer extension reaction (Omer et al., 2000). The new C/D box sRNA that exhibits antisense complementarity to sR132 has been designated sR133 for annotation purposes. We have designated sR133 as an orphan guide, since the D′ and D box guide regions of this RNA do not exhibit complementarity to rRNA or tRNA sequences.
Group 5: fragments of 7S RNA
The 7S RNA (4.5S RNA in Bacteria) is a universally conserved component of the signal recognition particle that functions in the membrane translocation of proteins (Herskovits and Bibi, 2000; Eichler and Moll, 2001; Keenan et al., 2001). It was surprising that our library contained two fragments of 7S RNA. One fragment (RNA1; nucleotides 220–311) was the most prevalent sequence in our library and was recovered 19 times. This RNA fragment was unable to form a stable complex with L7Ae protein in a band shift assay (data not shown). Because RNA1 was so prevalent in the library, we wondered if the full-length 7S RNA might contain a K-turn in a region of the molecule not recovered within the two cloned sequences. A putative K-turn has been identified in the Alu domain that forms near the 5′ and 3′ ends of the human 7S RNA (Klein et al., 2001). Visual inspection of the S. solfataricus 7S structure failed to identify a K-turn at this position. However, we were able to identify a putative K-turn motif present in the large bulge in hinge 1 of helix 5, as illustrated in Fig. 7C. To determine if 7S RNA binds L7Ae and to locate the binding site we carried out band shift assays with full-length RNA and with RNAs containing 5′ deletions extending to nucleotide positions 88 and 134 (Fig. 7A and B). The full-length RNA and the deletion to nucleotide 88 (which removes half of the Alu domain) form a stable complex with the protein whereas the longer deletion that removes half of the predicted K-turn motif does not interact with the protein. To confirm the precise location of the L7Ae−7S RNA interaction, a modified toe-printing assay was used. Using a primer complementary to positions 311–292 in 7S RNA, a strong block to reverse transcription was observed at position A258, within a few nucleotides of the predicted motif (Fig. 7D).
These results suggest that the L7Ae protein may be a functional component of the signal recognition particle. The binding site for the protein is in the large asymmetric loop in the middle of helix 5. In eukaryotes this loop serves as the binding site for the heterodimeric protein SRP68/72 that is responsible for bending of the RNA so that it can interact simultaneously with both the A site and the exit tunnel on the surface of the large ribosome subunit (Halic et al., 2004). Genes encoding only two (SRP19 and the GTPase SRP54) of the six known eukaryotic SRP proteins have been identified in archaeal genomes; the two proteins mediate the interaction of the particle with the leader peptide at the exit tunnel (Bhuiyan et al., 2000; Diener and Wilson, 2000; Eichler and Moll, 2001). It is possible that the binding of L7Ae to the loop in helix 5 facilitates the bending of the RNP complex to allow it to interact at both the exit tunnel and the A site in order to arrest translation.
In the S. solfataricus genome sequence database (http://www-archbac.u-psud.fr/projects/sulfolobus/), the annotation of the 7S RNA gene is incorrect and actually represents the antisense complement to the authentic 7S RNA gene (see http://psyche.uthct.edu/dbs/SRPDB/SRPDB.html). Because of this confusion, we tested by RT-PCR amplification if S. solfataricus might contain cellular copies of the anti-7S RNA sequence. We were able to detect both sense and antisense 7S RNAs in cell extracts. Similar to the analysis of antisense C/D box RNAs, gel shift assays indicate that antisense 7S RNA does not bind to L7Ae but interestingly, the anti-7S RNA sequence can displace the binding of L7Ae from the authentic 7S RNA sequence (data not shown).
Group 6: fragments of rRNA and tRNA
Eleven of the 45 sequences recovered from the library were placed in group 6. blastn searches against the whole S. solfataricus genome sequence revealed that four of the 11 sequences were fragments derived from the 5′ and 3′ ends of the 16S and 23S rRNAs (multiple clones of the four respective sequences were obtained), and the remaining seven were from different tRNAs (single or multiple clones; Table 1). The RNAs from all of the clones were examined for their ability to form a complex with the L7Ae protein in band shift experiments; none of the RNAs was able to form a stable complex (data not shown), even though one of the rRNAs fragments (RNA5) is known to contain a K-turn (Klein et al., 2001). Visual examination of the other 10 sequences failed to reveal the presence of a structural motif that resembled the K-turn.
Why were these rRNA and tRNA sequences prevalent in the library when they do not appear to contain a functional K turn, and are unable to bind directly to the L7Ae protein? Several explanations are possible. For instance, in eukaryotic organisms, C/D box containing RNP complexes have been implicated in the essential endonucleolytic processing events at the 5′ and 3′ ends of the small and large subunit rRNAs (Fatica and Tollervey, 2002). The rRNA sequences may have been recovered because they enter into a transiently stable, L7Ae containing, RNP processing complex at some point along the ribosome assembly pathway. Moreover, at least some of the recovered rRNA and tRNA fragments are targets for C/D box RNP mediated methylation. The most thoroughly characterized example of this is within RNA3 from the 5′ end of 16S rRNA. It contains the U52 target site that is methylated by the S. solfataricus sR1-containing RNP methylation guide complex. This may mean that in at least some instances, the transient complexes containing both the target rRNA or tRNA as well as the guide RNA can be co-precipitated with antibodies against the L7Ae protein. Alternatively, these RNAs that lack recognizable K-turns may simply represent non-specific carryover in the immunoprecipitation reactions.
Summary and perspective
In Archaea, the L7Ae protein was first designated as a ribosomal protein but recently has been shown to be a member of a large multipurpose protein family that binds to a well defined structural motif (the K-turn) present in a number of different types of RNAs. Based on observations made in eukaryotic organisms (Watkins et al., 2000; Kuhn et al., 2002; Nottrott et al., 2002), we demonstrated that the L7Ae was a core component of archaeal C/D box sRNAs that guide methylation to numerous positions in rRNA and tRNAs (Omer et al., 2002). The protein binds to each of the two K-turn motifs that are generated by the interaction between the C and D and the C′ and D′ box elements within the sRNA and nucleates the addition of the two remaining proteins, aNOP56 and aFIB. These RNP complexes are active in directing methylation to target RNA oligonucleotides that are complementary to the sRNA guide regions when provided with S-adenosyl methionine as the methyl donor substrate. Based on these and other observations, we reasoned that antibodies against the L7Ae would be an effective way to enrich for additional C/D box containing RNAs that might lack one or more of the canonical features that are required for detection using informatics search programs.
The characterization of entries in the L7Ae-derived library that has been described in this study illustrates two important points. First, the efficiency of new sRNA discovery using L7Ae antibodies as a probe has been very high. A high percentage of the sequences obtained in the library appear to be of general interest and they seem to be related, directly or indirectly, to the function of L7Ae. Second, the world of small RNAs in the archaeon S. solfataricus is much larger and much more diverse than we could have guessed prior to undertaking this study. The further characterization of these RNAs and the structure and composition of the complexes in which they are found, is certain to reveal plethora of novel biological phenomena.
L7Ae-containing complexes were immunoprecipitated form S. solfataricus fractionated cell extracts using anti-L7Ae polyclonal antibodies as indicated in Omer et al. (2000). The RNAs in the immunoprecipitated complexes were recovered by phenol chloroform extraction, ligated to a linker oligonucleotide, AO30 (see sequence in Supplementary material) and used as a template in an RT-PCR (Omer et al., 2000). PCR products were cloned into pCR2.1 vector employing the TOPO-TA cloning system (Invitrogen) following the manufacturer's instructions. The resulting cDNA clones were sequenced using the M13 forward primer and the BigDye terminator cycle sequencing reaction kit.
This work has been supported by the Canadian Institutes for Health Research, the National Science Foundation (PPD) and by the Natural Sciences and Engineering Research Council of Canada (ADO). Any opinions, findings and conclusions expressed in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation and the Natural Sciences and Engineering Research Council of Canada.