Analysis of microRNA-size, small RNAs in Streptococcus mutans by deep sequencing


Correspondence: Heon-Jin Lee, Department of Oral Microbiology, School of Dentistry, Kyungpook National University, 2-188-1 Samduk-dong, Jung-gu, Daegu 700-412, South Korea. Tel.: +82 53 660 6832; fax: +82 53 425 6025; e-mail:


MicroRNAs (miRNAs) are important modulators of gene expression in eukaryotic cells. However RNAs of the same size in bacteria have not been specifically discussed previously. Here, we provide a library of miRNA-size RNAs (msRNAs), which were registered by deep sequencing in Streptococcus mutans. Bioinformatic analysis of the whole set revealed more than 900 individual msRNA species. The cellular content of selected msRNAs was verified by quantitative RT-PCR and Northern blotting. The high abundance and discrete size of the subset of registered msRNAs suggest their functional significance, although the precise biological role of the RNA species revealed in S. mutans, which is one of the principle causative agents of dental caries, has to be elucidated.


MicroRNAs (miRNAs) are small noncoding RNAs that are c. 22 nt long. They are found in various species of plants, animals and viruses (Bushati & Cohen, 2007), and normally act as regulators in every major cellular event through inhibitory mechanisms (He & Hannon, 2004). It has long been known that bacteria contain noncoding small RNAs (sRNAs) that have regulatory functions, other than miRNAs (Gottesman, 2005; Waters & Storz, 2009). sRNAs are usually between 50 and 200 nt in length and have been predicted by computational searches in a variety of bacterial species (Livny & Waldor, 2007). Like miRNAs, sRNAs usually act as post-transcriptional regulators by interacting with the target mRNAs through a variety of mechanisms, including changes in RNA conformation and modulation of the stability of the specific targets (Waters & Storz, 2009). Smaller RNAs that have size similar to miRNAs are not well understood in bacteria, although many of them may be found among sequence reads registered in the transcriptome of Escherichia coli (Dornenburg et al., 2010). Streptococcus mutans is the major causative agent of human dental caries and is considered to be the most cariogenic of all of the oral streptococci (Ajdić et al., 2002). The disease occurs when ecologically driven changes in oral biofilms are perturbed and S. mutans is mainly responsible for the formation of the oral biofilms (Burne, 1998). The genome of S. mutans has been fully sequenced and contains c. 2 Mb and 1963 ORFs (Ajdić et al., 2002).

This study examined the existence of small (c. 26 nt) RNAs in S. mutans that we subsequently isolated. For this purpose, a deep sequencing (next-generation sequencing) approach was used and more than 19 million sRNA clones were read. To differentiate these very sRNAs from the bacterial sRNAs (50–250 nt) and well-studied eukaryotic miRNAs, we suggest the term ‘miRNA-size, small RNA’ (msRNA). Their origin and putative functional significance are discussed.

Materials and methods

Total RNA extraction

Streptococcus mutans (ATCC 25175) were inoculated into brain heart infusion broth (three independent cultures) and total RNA was extracted from the cultured S. mutans after pooling using the miRNeasy Mini kit (Qiagen, CA) according to the manufacturer's protocol.

sRNA cloning

RNA was processed and used for deep sequencing by LC Sciences (Houston, TX). An sRNA library was generated from the S. mutans RNA according to Illumina's sample preparation instructions for Illumina Genome Analyzer IIx (Ilumina Inc., San Diego, CA). The following gives a brief summary of the procedures performed.

The total RNA sample was size-fractionated on a 15% Tris–borate–EDTA (TBE)–urea polyacrylamide gel. The RNA fragments of c. 15–50 nt in length were eluted and ethanol-precipitated. The SRA 5′-adapter (Illumina) was ligated to the aforementioned RNA fragments with T4 RNA ligase (Promega, Madison, WI) in the ATP-free ligation buffer to decrease the proportion of degraded forms of RNAs. The ligated RNAs were size-fractionated on a 15% TBE–urea polyacrylamide gel and the RNA fragments of c. 41–76 nt in length were isolated. The SRA 3′-adapter (Illumina) ligation was then performed followed by a second size-fractionation using the same gel conditions as described above. The RNA fragments of c. 64–99 nt in length were isolated through gel elution and ethanol precipitation.

The ligated RNA fragments were reverse transcribed to single-stranded cDNAs using M-MLV reverse transcriptase (Invitrogen, Carlsbad, CA) with RT primers recommended by Illumina. The cDNAs were amplified with pfx DNA polymerase (Invitrogen) in 20 cycles of PCR using Illumina's sRNA primers set.

PCR products were run on a 12% TBE polyacrylamide gel and a slice of gel containing fragments of c. 80–115 bp in length was excised. This fraction was eluted and the recovered cDNAs were precipitated and quantified using a Nanodrop (Thermo Scientific, Rockford, IL) and a TBS-380 mini-fluorometer (Turner Biosystems, Sunnyvale, CA) using Picogreen dsDNA quantization reagent (Invitrogen). The sample concentration was adjusted to c. 10 nM and a total of 10 μL was used in the sequencing reaction.

The purified cDNA library was used for cluster generation on Illumina's Cluster Station and then sequenced on Illumina Genome Analyzer IIx following the supplier's instructions for running the instrument.

Bioinformatic analysis of the sequence data

Raw sequences were processed using Illumina's Pipeline software and then subjected to a series of data filtration steps to analyse sequencing data using the ACGT101-miR software package (V3.5; LC Sciences). The reference database of S. mutans UA159 ( and Rfam ( was used for msRNA mapping. Hairpin RNA structures were predicted from the adjacent 60–80 nt sequences in either direction using mfold software (Zuker, 2003).

Real-time PCR

Real-time quantitative RT-PCR (qRT-PCR) was performed to verify the presence of several selected candidates within the fraction of purified cellular RNAs. The total RNA (50 ng) was reverse transcribed using a TaqMan microRNA Reverse Transcription kit. From the 15 μL of RT mixture, 2 μL was used for real-time PCR. qRT-PCR was performed with TaqMan universal master mix (Applied Biosystems, Foster City, CA). Seventeen msRNAs were selected and specific primer sets and TaqMan probes were designed by Applied Biosystems. Ten out of 17 custom-designed TaqMan probes and primer sets failed, which may be due to the small size or structure of verified RNA species. PCR was carried out in 96-well plates using the 7500 Real-Time PCR system (Applied Biosystems). The expression of each msRNA gene was determined from three replicates in a single qRT-PCR experiment.

Northern blotting

The total RNA (20 μg) was separated on a 15% urea-acrylamide gel and blotted onto nylon N+ membrane (Invitrogen). Duplicated gel was stained with SYBR Gold nucleic acid gel stain (Invitrogen, see Supporting Information, Fig. S1). After UV-cross-linking, the membrane was prehybridized in PerfectHyb plus hybridization buffer (Sigma, St Louis, MO) at 65 °C. A biotin-labeled antisense oligonucleotide (5′-GTGTGTTCCCTTGCGTCCCA-3′) probe was then added directly to the prehybridization buffer and incubated overnight at 37 °C. After hybridization, the membrane was washed twice with 0.1× SSC/0.1% SDS at room temperature. The signals were detected by using the chemiluminescent nucleic acid detection module (Thermo Scientific) according to the manufacturer's protocol.


Small size cDNA libraries of S. mutans were analysed by deep sequencing, which gave 19 million sequence reads. The sequences composed of 15–26 nt were extracted as valid sRNAs and were compared with various RNA databases (NCBI and Rfam). The length distribution of all sRNAs (mappable reads) is shown in Fig. 1. sRNAs and their extended sequences (flanking sequences) were analysed for hairpin structure prediction and classification. Of these sequenced sRNAs, 17.6% (3 372 405 reads) and 6.5% (1 239 481 reads) were mapped to ribosomal RNAs (and others) and mRNAs, respectively (Table 1). Others belonged to the group of RNAs that were not blasted to any reference RNA databases and therefore may represent the fraction of novel RNAs. sRNAs were considered as putative msRNAs if they are able to form hairpins with flanking nucleotide sequences in the genome.

Figure 1.

Size distribution of sequenced msRNAs. The nucleotide (nt) lengths of cloned msRNAs are shown on the x-axis; the number of total reads by deep sequencing are shown on the y-axis. Reads with length > 26 nt were not chosen for mapping.

Table 1. A summary of standard data analysis
 Number of readsaPercentage of mappable readsbNumber of unique msRNAsc
  1. a

    The total number of raw sequencing reads.

  2. b

    Percentages of sequencing reads, which were blasted to a reference database.

  3. c

    The total number of unique sequences.

Raw19 153 470  
Mapped to mRNA1 239 4816.47 
Mapped to other RNAs (rRNA, tRNA and other RNA species)3 372 40517.6 
Total mapped RNAs of miRNA size (15–26 nt)4 006 21820.9 
msRNAs mapped to the genome within hairpins53 408 922

msRNAs with more than 100 clone counts are detailed in Table 2. Seven selected msRNAs were verified by qRT-PCR using specific TaqMan probe and primer sets (Fig. 2). This analysis revealed a rough correlation between the number of msRNAs, identified by the deep sequencing, and their cellular content. Six of seven tested candidates may form complementary duplexes with other msRNAs registered in this study (Fig. 2b). In animals, during typical miRNA biogenesis, one strand of an RNA duplex is preferentially selected for combining with a silencing complex, whereas the other one, known as the miRNA* strand, is inactivated or degraded (O'Toole et al., 2006). However, some miRNA* sequences were reported as guide miRNAs with abundant expression (Okamura et al., 2008; Jagadeeswaran et al., 2010). Revealing putative msRNA* sequences for certain msRNAs (Fig. 2b and Table 2), however, we were unable to verify msRNA* expression by qRT-PCR because the software failed to design specific TaqMan probe and primer sets, which may be due to their RNA structure or small size (Table 2).

Figure 2.

Validation of the deep sequencing data by qRT-PCR and Northern blot analyses. (a) Cellular abundance of seven msRNAs as estimated by qRT-PCR. The lower Ct values indicate higher msRNA expression (error bars indicate standard deviations). (b) Predicted secondary RNA structures formed by validated msRNAs (highlighted) and putative msRNA*s, registered by deep sequencing. (c) Northern blot analysis of RNA extracted from Streptococcus mutans using an msRNA-428-specific probe.

Table 2. List of msRNAs with high clone counts (see Table S1 for full listing)
msRNA indexmsRNA sequenceCopy numberaValidationmsRNA*b
  1. a

    Total copy number of msRNAs.

  2. b

    Opposite msRNA stand of duplex on the same hairpin loop sequence.

213TAAGCTGTTAGATTTAGG1512TaqMan probe design failed 
428TGGGACGCAAGGGAACACAC830qPCR and Northern blot 
923AAGTGGTATCTGGATT519TaqMan probe design failed 
462TGGGACGCAAGGGAACACACTGTGCT419TaqMan probe design failed 
211TAAGGGCTGCATTAAC136TaqMan probe design failed861
829TAGTTATTTGTCGCATT100TaqMan probe design failed779

Although the validated msRNA-428 can also form a short hairpin structure with its extended sequence, the corresponding msRNA* was not found among the registered reads. msRNA-428 is encoded by the genomic region located in front of 16S rRNA genes (one or two mismatches with S. mutans UA159 genomic DNA). The cellular form of msRNA-428 was tested by Northern blotting (Fig. 2c), which revealed a single band of the expected size (20 nt). Smeared hybridization with longer products may reflect nonspecific interactions with multiple different RNAs or specific interaction with a highly processed/degraded precursor. The latter case assumes that msRNA-428 may be produced in the course of degradation; however, its discrete size and cellular abundance argue for controlled processing and putative independent functioning.

A complete data list, including ID, representative clone sequence, location in the 5′- and 3′-strand duplex of each msRNA hairpin loop, clone count, extended sequence and hairpin formation are presented in Table S1, which can be viewed online.


Deep sequencing (next-generation sequencing) has given new opportunities to identify and quantify miRNAs (or sRNAs). With this technique, we analysed small-size, noncoding RNAs in an oral pathogen. By sequencing cDNA libraries prepared from size-fractionated S. mutans RNA, we identified more than 900 possible msRNAs.

Despite intensive studies of miRNAs in eukaryotic cells and viruses, the functions of sRNAs in bacteria remain largely uncharacterized except in E. coli. The c. 22 nt miRNAs employ well-established mechanisms to repress the mRNAs by short seed pairing (animal) or intensive pairing (plant) within the 3′ untranslated region (Bartel, 2009). In bacteria, sRNAs are often bound to the RNA chaperone protein Hfq, which stabilizes their folding (Gottesman, 2004). Bacterial sRNAs form complementary duplexes with their target RNAs most frequently at the 5′ end of the message, which is not usually the case for eukaryotic miRNAs (Gottesman, 2005). However, recently, our knowledge of the functions of sRNAs has been extended by demonstrations that sRNAs can target not only the 5′ ends but also the 3′ ends, the internal part of RNAs, some combinations within the transcripts, and even proteins (see the reviews by Gottesman, 2005; Vogel & Wagner, 2007; Thomason & Storz, 2010). The functions of miRNAs have been extended also by the finding of miRNAs that bind to the promoter regions of DNA (Li et al., 2006; Schnall-Levin et al., 2010). Applying the uniform identification system used for miRNAs – a precursor structure that contains the c. 22 nt miRNA sequence within one arm of the hairpin (Ambros et al., 2003) – we show that msRNAs surrounding the sequence fulfil this potential fold-back structure using the RNA-folding software and also code for miRNA*-like msRNA* sequences (see Fig. 2b for an example of the msRNA structure).

Although deep sequencing and Northern blot data show the existence of a family of msRNAs, the possibility that many of them originated from randomly degraded larger forms of RNAs cannot be excluded. However, a single, clear, unsmeared band of msRNA-428 revealed by the Northern blot (Fig. 2c) suggests that at least some of them may be specifically processed from the longer RNAs rather than produced in the course of random degradation. In this case, msRNAs may have functional activity in bacteria. The Dicer enzyme is critical for processing mature miRNAs from hairpin precursors in eukaryotic cells (Bartel, 2004). The same function may be carried out by certain RNA-restriction enzymes, such as MazF found in E. coli (Zhang et al., 2005). In this case, an msRNA-mediated bacterial model of gene expression regulation may be useful for understanding the evolution of miRNAs.

Recently, the secretory mechanisms of miRNAs (Zhang et al., 2010) and salivary miRNAs (Park et al., 2009) have been reported. Currently it is not clear whether the saliva in addition to secreted miRNAs contains msRNAs originating from the oral bacteria and whether interspecies actions of sRNAs on the host gene expression are possible.

Although the functional significance of the revealed msRNAs remains to be elucidated, their identification highlights the particular genomic regions, which encode either sRNAs or their targets. Further studies of these msRNAs in S. mutans could lead to novel therapeutic strategies for dental caries.


We thank Dr Scott Young for helpful discussions and assistance with proofreading. We also thank Ji-Woong Choi for his excellent technical support. This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2010-0029460).