• cDNA library;
  • non-messenger RNAs;
  • RNomics;
  • snoRNAs


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Materials and methods
  6. Acknowledgements
  7. References
  8. Supporting Information

In mouse brain cDNA libraries generated from small RNA molecules we have identified a total of 201 different expressed RNA sequences potentially encoding novel small non-messenger RNA species (snmRNAs). Based on sequence and structural motifs, 113 of these RNAs can be assigned to the C/D box or H/ACA box subclass of small nucleolar RNAs (snoRNAs), known as guide RNAs for rRNA. While 30 RNAs represent mouse homologues of previously identified human C/D or H/ACA snoRNAs, 83 correspond to entirely novel snoRNAs. Among these, for the first time, we identified four C/D box snoRNAs and four H/ACA box snoRNAs predicted to direct modifications within U2, U4 or U6 small nuclear RNAs (snRNAs). Furthermore, 25 snoRNAs from either class lacked antisense elements for rRNAs or snRNAs. Therefore, additional snoRNA targets have to be considered. Surprisingly, six C/D box snoRNAs and one H/ACA box snoRNA were expressed exclusively in brain. Of the 88 RNAs not belonging to either snoRNA subclass, at least 26 are probably derived from truncated heterogeneous nuclear RNAs (hnRNAs) or mRNAs. Short interspersed repetitive elements (SINEs) are located on five RNA sequences and may represent rare examples of transcribed SINEs. The remaining RNA species could not as yet be assigned either to any snmRNA class or to a part of a larger hnRNA/mRNA. It is likely that at least some of the latter will represent novel, unclassified snmRNAs.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Materials and methods
  6. Acknowledgements
  7. References
  8. Supporting Information

A major goal of the joint international efforts of the Human Genome Project is the sequence, identification, structure, regulation and function of all 30 000–40 000 genes and their products. To facilitate functional analysis of the encoded gene products, this endeavour has been extended to model organisms from bacteria to mouse. Furthermore, expressed sequence tags (ESTs) have been employed to catalogue all mRNAs and recent efforts to generate their full-length sequences provide essential tools to study post-transcriptional processing of transcripts including alternative splicing, identification of protein coding genes and functional analysis. In contrast, not many experimental efforts address the class of small non-messenger RNAs (snmRNAs; Kiss-Laszlo et al., 1996; Olivias et al., 1997). These molecules do not encode proteins, but have cellular functions on their own or in complex with proteins that are bound to the RNA and thus form ribonucleoprotein complexes (RNPs). Such RNPs, found in cellular compartments as diverse as the nucleolus or dendritic processes of nerve cells (Tiedge et al., 1993; Pederson, 1998), exhibit a surprisingly diverse range of functions. However, the biological role of some of them remains elusive. Moreover, most systematic genomic searches are biased against their detection, and comprehensive identification by computational analysis of the genomic sequence of any organism remains an unsolved problem (Eddy, 1999). Hundreds of genes and their RNA products may thus remain undetected. Their functions, interactions in cellular circuits and roles in disease would remain unknown and our understanding of the functioning of a cell would be incomplete. Therefore, we set out to identify directly snmRNAs and their genes in the human genome and those of various model organisms.

Here we describe our experimental approach to the discovery of novel snmRNAs in mouse. This EST-like approach has been tailored for the detection of small RNAs [starting with material that is usually discarded: small total RNA in the size range ∼50–500 nucleotides (nt)]. The resulting sequences have been termed expressed RNA sequences (ERNS). In this study, we present the first unbiased look at the small RNA population in a mammalian cell. Thus far, we have identified ∼200 candidates for novel snmRNA species via ERNS. More than half of them correspond to new members of the two expanding subclasses of small nucleolar RNAs (snoRNAs) that guide RNA ribose methylation and pseudouridylation. Interestingly, while the vast majority of previously known members of these two snoRNA classes direct the modification of rRNA, several of the novel members are able to guide modification of spliceosomal small nuclear RNAs (snRNAs). Moreover, an unexpectedly large number of them remain without identified RNA targets. Finally, some of them are not ubiquitously expressed, as expected for rRNA or snRNA modification guides, raising the possibility of tissue-specific targets, presumably mRNAs.

Results and discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Materials and methods
  6. Acknowledgements
  7. References
  8. Supporting Information

Library construction and analysis

We constructed two cDNA libraries from mouse brain (see Materials and methods) based on small RNAs sized from ∼50 to ∼110 (Fraction II) and from ∼110 to ∼500 nt (Fraction I). Two separate libraries were generated to avoid potential overrepresentation of highly abundant tRNA species in Fraction II. We randomly sequenced 400 clones from each fraction and identified sequences by a BLASTN database search (Figure 1).


Figure 1. Sequence analysis of 400 randomly chosen cDNA clones from mouse brain library Fraction I (derived from RNAs sized 500–110 nt) or Fraction II (derived from RNAs sized 110–50 nt), respectively. cDNA clones representing different RNA species or categories are shown as a percentage of total clones. The segment denoted snmRNAs identifies candidates for novel snmRNAs.

Download figure to PowerPoint

In Fraction I, many of the cDNA sequences could be assigned to genes encoding known snmRNAs (Figure 1). In addition to rRNAs or snRNAs, we identified other known small RNA species such as 7SL RNA, 7SK RNA, Y1 scRNA, RNase P or a brain-specific snmRNA, designated BC1 RNA (DeChiara and Brosius, 1987). About 1% of the sequences were derived from known mRNA fragments. Only 3% of cDNA sequences could not be assigned to known RNAs and therefore represented potentially novel snmRNAs. The Fraction II library contained, among others, sequences derived from tRNAs, 4.5S RNA and previously identified snRNAs or snoRNAs. As observed in the Fraction I cDNA libary, degradation fragments of 28S or 18S rRNA genes were also present. Compared with Fraction I, a larger number of novel, unknown cDNA clones (7%) could be identified by a BLASTN database search, thus potentially representing novel snmRNAs (Figure 1).

To enrich the fraction of novel RNA species in our analysis, cDNA clones were spotted on filters in high density arrays and hybridized to radiolabelled oligonucleotides identifying the most abundant, known snmRNAs. By this approach, we could significantly increase the amount of novel RNA species in our selection procedure from 3 to 20% in Fraction I and from 7 to 22% in Fraction II. From each library, ∼40 000 clones were screened by the hybridization procedure. Signals obtained in the filter hybridization were ranked by computer-aided analysis. Subsequently, we sequenced ∼2500 clones from each fraction exhibiting the lowest hybridization scores.

Analysis of candidates for snmRNAs

By sequence analysis, 201 novel ERNS from mouse were identified. Expression and sizes of these potential snmRNA species were confirmed by northern blot analysis. In general, they matched the sizes of the corresponding cDNAs (Tables I, II, III, IV, V, VI), which were shorter by at least 5–10 bases, since the extreme 5′ ends of RNAs were not present due to the cloning strategy employed. In several cases, a complete sequence of the mouse snmRNA could be found within a mouse EST entry using a BLASTN database search. We also investigated whether novel RNAs would be expressed specifically in any of the following tissues: brain, liver, heart, kidney or testis (data not shown). The expression of a subset of ERNS could not be confirmed by northern blot analysis. This could be explained by low expression levels of the respective RNA species.

Table 1. Group I: novel C/D box snoRNAs guiding a rRNA methylation; group II: novel C/D box snoRNAs guiding a snRNA methylation; and group III: novel C/D box snoRNAs with unidentified targets
ERNSCopiescDNARNAHomologyModificationAntisense elementLocation/commentsAccession No.
Group I
  MBI-434221240humanUm3787 in 28S13 nt (5′)very long C/D box snoRNA; intron of sortin nexin 5 geneAF357317
  MBII-5545965humanUm1288 in 18S12 nt (5′)AF357318
  MBII-8226472Gm3913 in 28S13 nt (5′)AF357319
  MBII-9516375human ESTGm509 in 18S13 nt (3′)AF357320
  MBII-9946695/75humanGm3868 in 28S13 nt (5′)functional homologue of yeast snR190AF357321
  MBII-108156humanGm683 in 18S12 nt (3′)AF357322
  MBII-13515772rat, bovineUm627 in 18S14 nt (5′)functional homologue of yeast snR77AF357323
  MBII-14215877humanCm1272 in 18S12 nt (5′)AF357324
  MBII-18028980human ESTCm3670 in 28S10 nt (5′)functional homologue of yeast snR76AF357325
  MBII-21127280human ESTCm3670 in 28S10 nt (5′)homologue of yeast snR76, isoform of MBII-180AF357326
  MBII-20214870human, ratAm2378 and Um428 in 18S14 nt (3′) and 13 nt (5′)intron of rpL13 geneAF357327
  MBII-210559human ESTGm4454 in 28S9 nt (3′)AF357328
  MBII-23415370Am512 in 18S12 nt (3′)AF357329
  MBII-239264humanUm14 in 5.8S13 nt (5′)partial and cytoplasmic 5.8S rRNA methylationAF357330
  MBII-24016060humanUm4580 in 28S12 nt (5′)intron of rpL37 geneAF357331
  MBII-25155965Gm601 in 18S11 nt (5′)AF357332
  MBII-27636472Gm3713 in 28S11 nt (5′)AF357333
  MBII-29616067humanGm4578 in 28S16 nt (5′)AF357334
  MBII-31617175A3836 in 28S?13 nt (3′)A3836 is not a reported methylation siteAF357335
  MBII-32419480G1630 in 28S?10 nt (5′)G1630 is not a reported methylation siteAF357336
  MBII-33318275Um1670 in 18S9 nt (3′)AF357337
  MBII-33614570humanAm576 in 18S13 nt (3′)AF357338
  MBII-42045665human, rat ESTAm2764 in 28S12 nt (3′)AF357339
  MBII-42915660ratGm436 in 18S9 nt (5′)intron of rpS12 geneAF357340
Group II
  MBII-1935875Cm40 in U214 nt (3′)AF357341
  MBII-11916275human ESTCm8 in U411 nt (5′)AF357342
  MBII-1661198105Cm60 in U613 nt (3′)AF357343
  MBII-38216175human, rat ESTCm61 and Gm11 in U28 nt (5′) and 9 nt (5′)AF357344
Group III
  MBI-461249280n. d.n. d.AF357345
  MBI-52173110humann. d.n. d.AF357346
  MBI-106163120n. d.n. d.AF357347
  MBII-4167100n. d.n. d.AF357348
  MBII-1152799105human ESTn. d.n. d.AF357349
  MBII-163184165humann. d.n. d.AF357350
  MBII-170196n. d.n. d.AF357351
  MBII-244168120n. d.n. d.AF357352
  MBII-289193100human ESTn. d.n. d.AF357353
  MBII-29526882humann. d.n. d.AF357354
  MBII-34315475n. d.n. d.AF357355
  MBII-361172n. d.n. d.AF357356
  MBII-36615685n. d.n. d.AF357357
  MBII-41912856n. d.n. d.AF357358
  MBII-42613060mouse ESTn. d.n. d.AF357359
Tables I–VI: compilation of novel ERNS from Fraction I (derived from RNAs sized 500–110 nt) and Fraction II (derived from RNAs sized 110–50 nt) cDNA libraries. ERNS: expressed RNA sequences from Fraction I (MBI-) or Fraction II (MBII-); copies: number of independent cDNA clones identified from each RNA species; cDNA: length of cDNA, as assessed by sequencing; RNA: length of RNA as assessed by northern blot analysis; homology: homology of snmRNAs to genomic or EST sequences within other organisms; modification: predicted modified nucleotides within rRNAs or snRNAs (numbering according to the human RNA sequence); antisense element: for C/D box snoRNAs, the length of the antisense element is indicated in nucleotides, followed by its location in the 5′ domain (5′) or 3′ domain (3′) of the snoRNA; location/comments: genomic locus and specific features of ERNS (when applicable); Accession No.: accession number in DDBJ/EMBL/GenBank.
Table 2. Group V: novel H/ACA box snoRNAs guiding a rRNA pseudouridylation; group VI: novel H/ACA box snoRNAs guiding a snRNA pseudouridylation; and group VII: novel H/ACA box snoRNAs with unidentified targets
ERNSCopiescDNARNAHomologyModificationLocation/commentsAccession No.
Group V
  MBI-33128150Ψ4391/28S; Ψ4470/28Sintron 2 of rpL23 geneAF357384
  MBI-63121140humanΨ4512/28Sfunctional equivalent of yeast snR42AF357385
  MBI-124126140Ψ3813/28S; Ψ681/18SAF357386
  MBI-135122130humanΨ815/18S; Ψ866/18Sextremely strong expressionAF357387
  MBI-201104125ratΨ1723/28Sintron 3 of rat rpP2 gene; functional equivalent of yeast snR5AF357388
  MBI-261107120humanΨ4633/28S; Ψ3731/28SAF357389
  MBI-285120130humanΨ3889/28S; Ψ3928/28Sintron 3 of mouse rpL27 geneAF357390
  MBI-391124150humanΨ1003/18Sintron 4 of Tcp-1 gene; same genetic location as MBI-125AF357392
  MBI-421105130humanΨ4956/28Sintron 4 of mouse rpS12 geneAF357393
  MBI-801101130humanΨ1237/18S; Ψ1625/18SAF357395
  MBI-892118130humanΨ34, Ψ863/18S; Ψ4259/28SAF357396
  MBI-1411129180/140humanΨ1771/28Sintron 1 of rpL32-3A geneAF357398
  MBI-142194120humanΨ4536/28Sintron 2 of rpS16 geneAF357399
  MBI-1611112humanΨ218/18S; Ψ3703/28Sintron 5 of TPT1 gene for translationally controlled tumour protein (TCTP)AF357400
  MBI-164162120U144/5.8S?; U2458/28S?not reported pseudouridylation sitesAF357401
Group VI
  MBI-571118human, XenopusΨ34, Ψ44/U2 snRNAAF357402
  MBI-1002124140Ψ40/U6 snRNAAF357403
  MBI-1142118140Ψ40/U6 snRNAisoform of MBI-100AF357404
  MBI-125161120Ψ91/U2 snRNAintron 9 of Tcp-1 geneAF357405
Group VII
  MBI-11171n. d.intron 6 of Nit1 gene, antisense directionAF357406
  MBI-15132140humann. d.AF357407
  MBI-51156200/120n. d.AF357408
  MBI-619164150humann. copies on chromosome 21, one copy on chromosome 19: retrogenesAF357409
  MBI-791116145ratn. d.intron 10 of Cctz-1 gene for Tcp-1 protein, zeta subunit (chaperonin)AF357410
  MBI-831113300/120humann. d.AF357411
  MBI-871113120human, ratn. d.intron 8 of human dyskerin (DKC1) geneAF357412
  MBI-1371112n.d.humann. d.isoform of MBI-89, does not have the same rRNA targetAF357413
  MBI-147177200n. d.AF357414
  MBI-1521164n. d.AF357415
See footnote to Table I.
Table 3. Group IX: brain-specific H/ACA and C/D box snoRNAs with so far unidentified targets
ERNSCopiescDNARNAClassHomologyModificationLocation/commentsAccession No.
MBI-364119130H/ACA boxhumann. d.intron 2 of serotonin receptor gene 5-HT2CAF357423
MBII-1314660C/D boxhumann. d.chromosome 15, PWS region; PAR-5 RNAAF357424
MBII-4855760C/D boxn. d.AF357425
MBII-4945665C/D boxn. d.AF357426
MBII-52377880C/D boxhumann. d.chromosome 15, PWS region, tandemly repeated genesAF357427
MBII-7815057C/D boxn. d.AF357428
MBII-85569195C/D boxhumann. d.chromosome 15, PWS region, tandemly repeated genesAF357429
See footnote to Table I.
Table 4. Group X: novel RNAs located within coding regions of known mRNAs; and group XI: novel RNAs located within 5′ or 3′ UTRs of mRNAs
ERNSCopiescDNARNAHomologyLocation/commentsAccession No.
Group X
  MBI-45190mitochondrial cyt c mRNAAF357430
  MBI-501201humanRPA16 mRNAAF357431
  MBI-691210240human, ratM-PFK mRNAAF357432
  MBI-85198110human, ratATP1B2 mRNAAF357433
  MBI-93134human, ratGAD 65 mRNAAF357434
  MBI-1121112120human, ratS27 mRNAAF357435
  MBI-1221311humanC1/C2 mRNAAF357436
  MBII-815656GARP45 mRNAAF357437
  MBII-26196humanIlf3 mRNAAF357438
  MBII-51153ratPTP-NP mRNAAF357439
  MBII-1931562000humansnap 25 mRNA but homology only from pos. 4–38 to snapAF357440
  MBII-1981392000SGP-1 mRNAAF357441
  MBII-208117n. d.humanglutamate receptor channel mRNAAF357442
  MBII-228174humanPam mRNA (protein associated with Myc)AF357443
  MBII-26718660humanEph receptor A4 (Epha4) mRNA, snmRNA corresponds to signal peptide sequenceAF357444
  MBII-339190human, ratthrombomodulin mRNAAF357445
Group XI
  MBI-1292156humanmRNA DKFZp566B183 3′ UTRAF357446
  MBI-1451130humanscg mRNA 3′ UTRAF357447
  MBI-1511167humancdk mRNA 3′ UTRAF357448
  MBI-1541109human, ratcd24 mRNA 3′ UTRAF357449
  MBI-156198humanHSPC218 mRNA 5′ UTRAF357450
  MBI-163125human, ratcam III mRNA 5′ UTR (calmodulin)AF357451
  MBII-84165humanUDP-glucuronosyltransferase mRNA 3′ UTRAF357452
  MBII-283188ratPodxl mRNA 3′ UTRAF357453
  MBII-395183135ratRC3 mRNA 3′ UTR (calmodulin binding protein)AF357454
  MBII-396168265/115ratadd2 mRNA 5′ UTRAF357455
See footnote to Table I.
Table 5. Group XII: novel RNAs resembling repetitive elements
ERNSCopiescDNARNAHomologyLocation/commentsAccession No.
MBI-21306humanSINE B4AF357456
MBI-56184300/140SINE B4AF357457
MBI-160112280humanSINE B1; homology to MBI-2AF357458
MBII-133165n. d.humanSINE B2AF357459
MBII-373191n. d.humanSINE B2; homology to MBII-133AF357460
See footnote to Table I.
Table 6. Group XIII: novel snmRNAs without known sequence or structural motifs
ERNSCopiescDNARNAAbundanceHomologyLocation/commentsAccession No.
MBI-44197100/60+++mitochondrial pro/D loopAF357461
MBI-541105detectable by RT–PCRAF357488
MBII-37182detectable by RT–PCRAF357502
MBII-65162RNA exhibits pseudoknot structureAF357504
MBII-109180humandetectable by RT–PCRAF357506
See footnote to Table I.

We determined the total number of independent cDNA clones obtained for each snmRNA. While most clones were found only once in our screen, some were present in numerous copies. This correlated well with their abundance as deduced by northern blot analysis. Homologues for more than half of the mouse ERNS could be identified in human genomic or EST sequences (sequence similarity >80%), consistent with a functional role of these RNAs. Based on structural hallmarks, expression and presumed function, the novel 201 snmRNA candidates were assigned to 13 different subgroups (see Tables I, II, III, IV, V, VI).

Novel mouse snoRNAs

Based on sequence and structural features (Maxwell and Fournier, 1995; Balakin et al., 1996; Ganot et al., 1997b), we identified 72 novel snoRNA species from the C/D box and 41 from the H/ACA box type. The known function of snoRNAs is post-transcriptional processing and modification of rRNAs or snRNAs. C/D box antisense snoRNAs guide 2′-O-ribose methylation at specific sites in rRNAs or snRNAs, while H/ACA snoRNAs guide specific pseudouridylation within these RNA species (for reviews see Tollervey, 1996; Smith and Steitz, 1997; Ofengand and Fournier, 1998; Weinstein and Steitz, 1999; Bachellerie et al., 2000). Unexpectedly, a substantial number of the novel specimens of both snoRNA classes do not appear to target rRNA or a snRNA. Moreover, seven of them are not ubiquitously expressed in mouse tissues, but are specific to brain.

Novel ubiquitous C/D box snoRNAs

C/D box snoRNAs contain two short sequence motifs, box C and box D, located only a few nucleotides away from the 5′ and 3′ ends, respectively, generally as part of a typical 5′–3′ terminal stem–box structure (for a review see Bachellerie and Cavaillé, 1998). Immediately upstream from box D or from an additional box (D′) in the 5′ half, C/D snoRNAs feature sequence tracts, 10–21 nt in length, that are complementary to rRNA spanning the sites of 2′-O-ribose methylation. In the corresponding RNA duplexes, the ribose-methylated nucleotide is always at the same location, paired to the fifth snoRNA nucleotide upstream from box D or box D′ (Kiss-Laszlo et al., 1996; Nicoloso et al., 1996). In rRNA of the yeast Saccharomyces cerevisiae, cognate box C/D snoRNAs have been identified for 51 of the 55 ribose-methylated sites (Lowe and Eddy, 1999). In mammals, however, a large fraction of the 105–107 expected rRNA 2′-O-ribose methylations (Maden, 1990) remained without a known cognate guide until completion of this study. Moreover, it is now apparent that the complexity of C/D box snoRNAs might be greater than anticipated, since methylation guide snoRNAs targeting substrates other than rRNA have been identified. Thus, three 2′-O-ribose methylations of spliceosomal U6 snRNA in human are also directed by bona fide C/D box antisense snoRNAs (Tycowski et al., 1998; Ganot et al., 1999), while both a 2′-O-ribose methylation and a pseudouridylation in U5 snRNA are guided by a novel C/D–H/ACA ‘hybrid’ snoRNA (Jady and Kiss, 2001).

Of the 72 novel mouse C/D box snoRNAs identified in this study, 66 are ubiquitously expressed, of which 24 correspond to novel C/D box snoRNAs able to guide a 2′-O-methylation within rRNA (Table I, group I) and 23 to orthologues of previously identified human snoRNAs able to guide methylation in rRNAs or in U6 snRNA (see supplementary data available at The EMBO Journal Online; Table I, group IV). Particularly interesting is MBII-239, able to direct methylation at position U14 within ribosomal 5.8S RNA. Um14 is unique among all vertebrate rRNA ribose methylations because it is partial, takes place in the cytoplasm rather than the nucleolus, and is undermethylated in tumour tissues (Nazar et al., 1980; Munholland and Nazar, 1987). The detection of MBII-239 strongly suggests that the atypical Um14 methylation of 5.8S rRNA is catalysed by the same snoRNA-guided machinery as the remainder of rRNA ribose methylations, raising the issue of assessing the MBI-239 snoRNA intracellular site of action and its expression level in tumour tissues.

From 15 of this first subset of 24 novel mouse C/D box snoRNAs, the human orthologues can be found as genomic or EST entries in DDBJ/EMBL/GenBank, further supporting the functional relevance of the identified cDNAs, as does their location in introns (in two cases: for MBII-202 and MBII-240). Collectively, the novel C/D box snoRNAs in group I are able to direct a total of 24 rRNA methylations, since MBII-211 represents an apparent isoform of MBII-180, able to direct the same methylation in 28S rRNA, while MBII-202 can direct two methylations, corresponding to Um428 and Am2378 in human 18S rRNA.

Within group I, one particular snoRNA, MBI-43, stands out for its exceptionally large size (240 nt) for a C/D box snoRNA. So far, the only non-canonical specimen in this regard was the recently reported C/D–H/ACA ‘hybrid’ snoRNA, which exhibits a roughly similar size (Jady and Kiss, 2001). Curiously, the uridine targeted by MBI-43 is the only site that is both ribose methylated and pseudouridylated in mammalian rRNA (Maden, 1990; Ofengand and Bakin, 1997). The presumptive H/ACA snoRNA able to guide this particular pseudouridylation remains unknown so far (see below). Careful inspection of the sequence and folding potential of both MBI-43 and its human homologue in DDBJ/EMBL/GenBank did not reveal the presence of H/ACA snoRNA hallmarks in addition to C/D motifs, ruling out the possibility that the atypical snoRNA corresponds to a hybrid C/D–H/ACA snoRNA directing the two types of modification at the same nucleotide position. As a consequence of the work presented here, only 14 rRNA ribose methylations, from a total of 105–107 in mammals (Maden, 1990), remain without identified cognate guide snoRNA.

In our screen we have also discovered four novel C/D box snoRNAs (MBII-19, MBII-119, MBII-166 and MBII-382) able to direct ribose methylation within U2, U4 or U6 snRNAs [group II, Table I; see Massenet et al. (1998) for a review on snRNA nucleotide modifications). For MBII-119 and MBII-382, this assignment is supported by comparison of the complete human homologous sequences found as ESTs in DDBJ/EMBL/GenBank, which both exhibit box C and a 4 or 5 bp terminal stem in addition to conserved antisense elements. While MBII-382 could guide two distinct ribose methylations in U2 snRNA, the two cognate antisense elements are unusual, because they are both located in the 5′ half of the snoRNA and found immediately upstream of a potential D′ box carrying two deviations from the consensus.

We also discovered 15 ubiquitously expressed RNAs with structural hallmarks of C/D box snoRNAs, but devoid of complementarity to rRNAs or snRNAs at the expected position relative to the box motifs (Table I, group III). While clone MBII-426 is severely truncated (30 nt in length), it unambiguously corresponds to a bona fide C/D box snoRNA, since it matches perfectly a mouse EST sequence exhibiting box C, box C′ and a 4 bp terminal stem at the expected locations. For MBII-115, MBII-163, MBII-289 and MBII-295, human homologues are available in DDBJ/EMBL/GenBank, and in each case one of the two presumptive antisense elements is conserved between mouse and human, supporting the notion that these snoRNAs also represent bona fide methylation guides. Two ubiquitous methylation guide snoRNAs devoid of rRNA or snRNA complementarity have been reported recently (Jady and Kiss, 2000). This expanding subset of box C/D snoRNAs lacking complementarity to rRNA might be involved in rRNA processing (Tycowski et al., 1994) or other, still unknown, aspects of ribosome biogenesis or other functions. Alternatively, these snoRNAs might target cellular RNAs other than rRNAs or snRNAs, such as ubiquitous snmRNAs transiting through the nucleolus, like telomerase RNA, RNase P, SRP RNA or pre-tRNAs (for review see Pederson, 1998). However, the presence of 2′-O-ribose-methylated nucleotides has not been reported in these RNAs thus far. Furthermore, systematic searches of these snoRNA sequences did not reveal any potential antisense element of at least 8 bp that could direct 2′-O-ribose methylation within these potential targets.

Novel ubiquitous mouse H/ACA snoRNAs

The formation of pseudouridines in eukaryotic rRNA is directed by a large family of site-specific H/ACA box snoRNAs carrying an appropriate bipartite guide sequence in the internal loop of one (or both) of their two major hairpin domains (Ganot et al., 1997a; Ni et al., 1997; Ofengand and Fournier, 1998; Bortolin et al., 1999). In contrast to methylation guide snoRNAs, pseudouridylation guide snoRNAs, thus far, have been exclusively discovered by experimental approaches, as identification in genomic sequences is critically hampered by their shorter box motifs and shorter bipartite antisense elements. Of 91–93 pseudouridines of mammalian rRNAs (Maden, 1990; Ofengand and Bakin, 1997), only 15 can be guided by one of the 13 previously reported human H/ACA snoRNAs (Ganot et al., 1997a). The present study dramatically expands the repertoire of eukaryotic H/ACA snoRNAs, decreasing the number of rRNA pseudouridinylation sites without a cognate H/ACA snoRNA by 27 to 49–51.

The majority of novel H/ACA snoRNA species have no counterpart among the mammalian snoRNAs reported so far, except for seven corresponding to homologues (sequence similarity 73–90%) of human pseudouridylation guides, namely U23, E2, U64, U65, U68, U69 and U70 (see supplementary data; Table II, group VIII). All novel mouse RNAs contain the two H and ACA box motifs at the expected locations and fold into the typical two major hairpin domains connected by a single-strand hinge region carrying the box H motif (data not shown). The vast majority also exhibit bipartite antisense elements matching known RNA pseudouridylation sites (Figure 2). This large set of novel data overwhelmingly confirms the validity of the model for the base-paired snoRNA–rRNA interaction guiding site-specific pseudouridylation (Ganot et al., 1997a).


Figure 2. Potential base-pairing interactions between novel mouse H/ACA snoRNAs and mouse rRNA (A) or snRNA (B). The snoRNA sequences in a 5′ to 3′ orientation are shown in the upper strands with the two H and ACA motifs boxed and the apical part of the snoRNA 5′ or 3′ hairpin domains schematized by a solid line. The two complementarities to the RNA target are always found within the large internal loop of one (or both) of their hairpin domains, invariably abutting its apex-proximal stem. Sequence coordinates in parentheses refer to snoRNAs for which the 5′ terminal sequence remains incomplete after database analysis. For other snoRNAs, nucleotides in the 5′ terminal sequence derived from databases are depicted as lower case letters. For rRNA or snRNA, sequence coordinates correspond to the respective human sequences to facilitate interpretation of data. Positions of pseudouridines are as reported by Maden (1990) for 18S rRNA, by Ofengand and Bakin (1997) for 28S rRNA and by Massenet et al. (1998) for mammalian snRNAs. Pseudouridines predicted to be directed by the snoRNA are denoted by Ψ, while other known pseudouridylation sites are indicated by U. In three cases, the uridine (indicated by an arrow) at the expected target position in the canonical, bipartite guide RNA duplex has not been reported to be pseudouridylated.

Download figure to PowerPoint

Nineteen entirely new specimens (Table II, group V) can collectively direct 27 of the 93–95 pseudouridylations identified in mammalian rRNAs. All but three of them display an identical location in human and mouse rRNAs (Figure 2A). Mouse homologues of seven already known human H/ACA snoRNAs show, as expected, the perfect sequence conservation of the antisense elements proposed previously (Ganot et al., 1997a), except for one. Thus, the proposal that the 3′ pseudouridylation pocket of U69 could target Ψ36 in human 18S rRNA (Ganot et al., 1997a) is not phylogenetically supported, as in comparison with its human homologue the mouse MBI-134 sequence exhibits three nucleotide differences over the presumptive bipartite 3′ antisense element of U69.

A large fraction of the novel rRNA pseudouridylation guides can each target two distinct modifications, through appropriate antisense elements in both pseudouridylation pockets. One of them, MBI-89, is even able to direct three distinct modifications: its 5′ pseudouridylation pocket contains an antisense element matching two distinct target sites, one in 18S rRNA, one in 28S rRNA. Intriguingly, in addition to the 27 rRNA pseudouridines targeted by the novel mouse snoRNAs reported previously (Ofengand and Bakin, 1997), three rRNA uridines not known to be pseudouridylated nevertheless appear as bona fide targets for two of the novel H/ACA specimens, MBI-39 (through its 5′ pseudouridylation pocket) and MBI-164 (through both pseudouridylation pockets), as shown in Figure 2A. In this regard, it is noteworthy that several rRNA 2′-O-ribose methylations had not been identified until the detection of a cognate guide snoRNA prompted further scrutiny (Qu et al., 1999).

The set of 34 entirely novel H/ACA snoRNAs identified in mouse also includes four outstanding specimens able to target pseudouridylation onto snRNA instead of rRNA (Table II, group VI). MBI-57, MBI-100, MBI-114 and MBI-125 have the potential to target U2 or U6 snRNAs, through the formation of guide duplexes that appear perfectly canonical as compared with those matching rRNA targets (Figure 2B). Interestingly, the sequence of a likely homologue of MBI-57, which can direct both ψ34 and ψ44 in U2 snRNA, is present in a Xenopus laevis EST (BE507485), which could provide the basis for a direct experimental analysis of the elusive function of these two U2 pseudouridylations.

Novel mouse H/ACA RNAs also include 11 species for which we could not identify any reasonable target uridine in rRNAs or snRNAs (Table II, group VII). Two of these RNAs, MBI-79 and MBI-87, are encoded in introns of ubiquitously expressed genes, like all vertebrate rRNA modification guide snoRNAs characterized so far (see below). Searches involving antisense elements of the 11 novel RNAs were also negative for potential target uridines in other stable non-coding RNAs trafficking through the nucleolus, such as telomerase RNA, RNase P or SRP RNA, the pseudouridine content of which remains unknown. We cannot rule out with certainty that novel snoRNAs might still target these RNA species; however, they could also target other cellular RNAs such as mRNA, as suggested recently in the case of a C/D box snoRNA (Cavaillé et al., 2000). Finally, recent findings that telomerase RNA in vertebrates contains a typical H/ACA domain (Mitchell et al., 1999) and that human H/ACA snoRNPs and telomerase share evolutionarily conserved proteins (Pogacic et al., 2000) expand the structural and functional diversity of H/ACA box snoRNAs, suggesting that some of the novel snoRNAs in this group might have unanticipated functions.

Tissue-specific snoRNAs

We identified six C/D box snoRNAs (MBII-13, MBII-48, MBII49, MBII-52, MBII-78 and MBII-85; Table III, group IX) and one H/ACA box snoRNA (MBI-36) that are expressed in mouse brain but not in other tissues tested so far (heart, liver, kidney, testis or muscle). Human homologues of MBI-36, MBII-13, MBII-52 and MBII-85 have been identified (Cavaillé et al., 2000; Filipowicz, 2000). In human, genes encoding MBII-52 and MBII-85 are present in multi-copy repeats on chromosome 15q11–13 and located in introns of host genes that apparently have no capacity to encode proteins (Cavaillé et al., 2000). The multi-copy repeat arrangement of these two snoRNAs is in agreement with their high abundance in cells, as deduced by northern blot analysis; in our screen, these clones could be identified 37 times (MBII-52) and 56 times (MBII-85), respectively, while most other cDNA clones encoding snoRNA genes are only found once. The human homologue of MBII-13 maps as a single-copy gene to chromosome 15q11–13 and MBI-36 maps to the large intron 2 of the serotonin 5-HT2C receptor gene, consistent with its brain-specific expression pattern with highest levels in the choroid plexus (Cavaillé et al., 2000).

None of the brain-specific snoRNAs exhibits complementarity to ribosomal or snRNAs within their antisense element(s), in agreement with a role different from targeting these RNA species. Remarkably, the antisense element of MBII-52 snoRNA is complementary to the serotonin receptor 5-HT2C mRNA (the same gene serving as a host gene for MBI-36, see above) and is proposed to regulate editing or alternative splicing of the mRNA (Cavaillé et al., 2000). In addition, MBII-13, MBII-52 and MBII-85 C/D box snoRNAs might be involved in the aetiology of Prader–Willi syndrome, a neurodegenerative disease, thus constituting the first snoRNAs whose absence is potentially causing a human disorder (Cavaillé et al., 2000). So far, no potential targets have been identified for the remaining six brain-specific snoRNAs. Availability of the complete mouse and human genomes might reveal conserved target sites for these unusual snoRNA species.

Intronic localization of novel modification guide snoRNAs

Sequences encoding a large subset of the novel snoRNAs in mouse (or their unambiguous orthologues in another mammalian species) are located within long fragments of the mammalian genomes in sequence databases. Whenever exons and introns were annotated, the snoRNA coding region was found within an intron of a mostly ubiquitously expressed gene. This is in agreement with the observed pattern for vertebrate modification guide snoRNAs (Pelczar and Filipowicz, 1998; Smith and Steitz, 1998; Weinstein and Steitz, 1999; Bachellerie et al., 2000), which are usually processed from the debranched lariat by exonucleolytic trimming of excess intronic sequences (Kiss and Filipowicz, 1995). Characterization of our novel mouse snoRNAs largely expands the repertoire of known host genes. A majority of the novel host genes identified in this study encode ribosomal proteins, in agreement with previous observations. They include rpL3, rpL13, rpL23a, rpL37 and rpS12, as well as rpL23, rpL27a, rpL32-3A, rpP2, rpS12 and rpS16 for C/D box and H/ACA snoRNAs, respectively. In addition, detection of the mouse homologues of human C/D box U58 snoRNA and H/ACA U68 snoRNA allowed us to identify their cognate host genes, rpL17 and rpL18a, respectively. Five novel ubiquitous genes for non-ribosomal proteins have also been identified as hosts for intronic snoRNAs. One is an unidentified 5′TOP gene hosting MBII-99 C/D box snoRNA, while the other four contain intron-encoded H/ACA snoRNAs (Tables I and II). Curiously, one of them encodes dyskerin (Heiss et al., 1998), the mammalian homologue of yeast Cbf5, the pseudouridine synthase thought to catalyse the snoRNA-guided isomerization of uridine (Lafontaine et al., 1998). For a more extensive analysis of novel snoRNAs see supplementary data.

Novel RNAs that do not exhibit snoRNA motifs

Of the 201 ERNS, 88 could not be assigned to known classes of snmRNAs and no function can be attributed to these RNA species at this point. However, hallmarks might exist at the level of secondary structure, as observed for H/ACA box RNAs. In fact, some of the RNA sequences can be folded into highly stable stem–loop structures. Since we are currently analysing cDNA libraries encoding small RNA species from organisms including Caenorhabditis elegans, Drosophila melanogaster and Arabidopsis thaliana, interspecies comparisons of the novel sequences might reveal conserved structural or sequence motifs and provide hints as to the function of these RNA species in the cell.

Novel snmRNAs located within mRNA coding regions

From 88 novel ERNS from the non-snoRNA type, 26 can be located within known or predicted mRNA or heterogeneous nuclear RNA (hnRNA) coding regions (Table IV, groups X and XI). Thereby, 16 ERNS are part of the open reading frame of mRNAs, whereas 10 are located within 5′ or 3′ untranslated regions (UTRs). At this point, the function of these RNAs remains elusive. It is noteworthy that the expression as snmRNAs of some but not all ERNS from this group can be confirmed by northern blot analysis. While ERNS derived from coding regions might correspond to more or less stable intermediates during degradation of hnRNAs or mRNAs, snmRNAs derived from the 5′ or 3′ UTRs of mRNAs could exhibit regulatory functions. Such mRNA regions have been shown previously to be involved in cis in the control of mRNA stability and intracellular localization (Schuldt et al., 1998; Saunders et al., 1999).

Novel snmRNAs resembling repetitive elements

Clones MBI-2, MBI-56, MBI-160, MBII-133 and MBII-373 contain sequences derived from short interspersed repetitive elements (SINEs; Table V, group XII). A common denominator between sequences from this group of clones is sequence similarity to nuclear 4.5SH RNA, 4.5SI RNA or the related B1, B2 or B4 retronuons. These sequences, in turn, are related to an ancestral SRP RNA or tRNA (Jurka, 2000). The small RNAs that served as templates for the aforementioned cDNAs may be related to 4.5SH RNA, 4.5SI RNA or directly transcribed B-type SINEs. Alternatively they may reflect degradation or processing products from larger hnRNAs or mRNAs that harbour such sequences. B1 and B2 SINEs, for example, can often be found in 3′ UTRs of mature mRNAs in both orientations (Brosius, 1999). Further work is necessary to establish whether the clones from this category reflect novel snmRNAs related to 4.5SH RNA, 4.5SI RNA, B1, B2 or B4 RNAs.

Novel ERNS without known sequence or structural motifs

Of the 88 novel snmRNAs identified without snoRNA motifs, 57 did not exhibit any sequence or structural motifs that would have made it possible to assign a genomic location within the mouse genome or a specific function to these RNAs. A notable exception is clone MBI-44, which maps to the mitochondrial pro/D loop (Table VI, group XIII). Twenty snmRNAs from the group of 57 were expressed, as assessed by northern blot analysis, while the expression of the remaining 37 snmRNAs could not be confirmed. From three randomly chosen clones of that group, however, we could amplify cDNA fragments of the expected size by RT–PCR, demonstrating their expression. Again, at this point, we cannot exclude the possibility that some of the cDNA sequences of this class represent degradation products of unknown hnRNAs or mRNAs rather than snmRNAs. If this turns out to be the case, these sequences are still useful in providing ESTs for novel mRNAs in mouse. Further analysis of the human and mouse genomes should provide a better insight as to whether these sequences represent novel snmRNAs, as at least some of them will.


This study represents a first unbiased look at the population of snmRNA species in a mammalian cell, providing the basis for a comprehensive understanding of genomic, cellular and organismal function. By our experimental approach, we could identify a large set of novel snoRNAs of the C/D or H/ACA box type guiding ribose methylation or pseudouridylation not only in rRNA, as expected, but also in snRNAs. For the first time, we report the detection of guide snoRNAs directing ribose methylations in U2 and U4 snRNAs, as well as snoRNA guides for pseudouridylations in U2 and U6 snRNAs. In addition, we identified a surprisingly large number of snoRNA species from both classes without the potential to target rRNAs or snRNAs, as deduced from their lack of appropriate complementarity. Especially intriguing was the identification of several brain-specific snmRNAs, all of which belong to the snoRNA type. This might lead to further studies to identify snoRNAs expressed tissue specifically in tissues other than brain. One of the brain-specific snoRNAs (MBII-52) has been suggested to target serotonin receptor 5-HT2C hnRNA or mRNA, which in turn is expressed specifically in brain (Cavaillé et al., 2000). This may be indicative of a novel function of snoRNAs, namely the regulation of gene expression by binding to and/or modifying mRNAs or their hnRNA precursors via their antisense elements. At this stage, it is difficult to speculate about the function of potential snmRNAs of the non-snoRNA type. As demonstrated, some of these novel species are derived from hnRNAs or mRNAs and might therefore correspond to degradation products of larger transcripts. Alternatively, they could regulate the expression of mRNAs by as yet unknown mechanisms, especially when located within their 5′ or 3′ UTRs. Their detection sets the stage for direct experimental testing of these hypotheses.

Materials and methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Materials and methods
  6. Acknowledgements
  7. References
  8. Supporting Information

Identification of novel RNA species

We prepared total RNA from mouse brain by the TRIzol method (Gibco-BRL). Total RNA was subsequently fractionated on a denaturing 8% polyacrylamide gel (7 M urea, 1× TBE buffer). RNAs in the size range ∼50 to ∼110 (Fraction II) or ∼110 to ∼500 nt (Fraction I) were excised from the gel, passively eluted and ethanol precipitated. Subsequently, 5 μg of RNA were tailed with CTP using poly(A) polymerase, as described by DeChiara and Brosius (1987). RNAs were reverse transcribed into cDNAs using primer GIBCO1 (see supplementary data) and cloned into pSPORT 1 vector employing the GIBCO Superscript™ system (Gibco-BRL). cDNAs were amplified by PCR using primers FSP and RSP (see supplementary data). PCR products were spotted by robots in high density arrays onto filters by the method of Schmitt et al. (1999), performed at the Resource Center of the German Human Genome Project (Berlin, Germany).

Filter hybridization and isolation of clones

For exclusion of the most abundant, known, small RNA species, we end-labelled oligonucleotides (see supplementary data) derived from these sequences with [33P]ATP and T4 polynucleotide kinase, and hybridized oligonucleotides to DNA arrays spotted on filters (see above). We performed hybridization in 0.5 M sodium phosphate pH 7.2, 7% SDS, 1 mM EDTA at 53°C for 12 h. We washed filters twice at room temperature for 15 min in 40 mM sodium phosphate buffer pH 7.2, 0.1% SDS, exposed filters to a phosphoimaging screen and analysed filters by computer-aided determination of hybridization signals (Maier et al., 1994).

Accession numbers of sequences

The sequence data have been submitted to the DDBJ/EMBL/GenBank databases under the accession numbers AF357317–AF357517.

Supplementary data

For additional methods see supplementary data available at The EMBO Journal Online. These data will also be available and periodically updated at our web page:


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Materials and methods
  6. Acknowledgements
  7. References
  8. Supporting Information

We would like to thank Dr Stefan Hennig for his support with computer analysis of data and Christine Mersmann for technical assistance during the initial phase of the project. This work was supported by the German Human Genome Project through the BMBF (#01KW9616 and #01KW9966) to J.B. and A.H., by an IZKF grant (Teilprojekt F3, Münster) to A.H. and a grant from the Association pour la Recherche sur le Cancer and laboratory funds from the Centre National de la Recherche Scientifique and Université Paul-Sabatier, Toulouse, to J.-P.B.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Materials and methods
  6. Acknowledgements
  7. References
  8. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Materials and methods
  6. Acknowledgements
  7. References
  8. Supporting Information

Supplementary data

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.