Key Laboratory of Animal Epidemiology and Zoonosis of Ministry of Agriculture, College of Veterinary Medicine, China Agricultural University, Beijing, China
Correspondence: Qingmin Wu, Key Laboratory of Animal Epidemiology and Zoonosis of Ministry of Agriculture, College of Veterinary Medicine, China Agricultural University, Beijing 100193, China. Tel.: +86 10 6273 3901; fax: +86 10 6273 3901;
Involved in diverse biological processes, bacterial sRNAs are novel regulators of gene expression involved in a wide array of biological processes. To identify sRNAs in Brucella abortus, we performed a genome-wide computational prediction with integrated sipht and napp results. In total, 129 sRNA candidates were identified, of which 112 were novel sRNA. Twenty novel sRNA candidates were tested by RT-PCR and seven could be verified. The putative targets of these sRNAs were also predicted and verified. This study provides a significant resource for the future study of sRNAs, as well as how sRNAs influence B. abortus physiology and pathogenesis.
Ranging in length between 40 and 500 nucleotides, and located in intergenic regions (Saito et al., 2009), small RNAs (sRNAs) play an important role in the regulation of bacteria gene expression. By binding to their target mRNAs or proteins, sRNAs are involved in biological processes such as regulating expressions of the outer membrane proteins, iron homeostasis (Masse et al., 2005), bacteria motility, biofilm formation (Mika & Hengge, 2013), quorum sensing (Lenz et al., 2005), bacteria virulence (Toledo-Arana et al., 2007), and stress responses (Hoe et al., 2013). The functional importance of sRNAs should encourage researchers to better understand how sRNAs are involved in other similar processes.
There are some deterrents facing sRNAs researchers. For one, sRNAs are heterogeneous in length and secondary structures. They are also insensitive to frameshift or nonsense mutations, thus making it difficult for researchers using genetic screening methods to find sRNA genes (Li et al., 2012). Current strategies for identifying bacterial sRNAs often combine bioinformatics prediction and experimental validation (northern blot or reverse transcriptase PCR). One recent strategy involved bioinformatics programs (Rfam_scan, sipht, and sRNAscanner) and RT-PCR validation, resulting in 29 sRNAs identified in Burkholderia pseudomallei (Khoo et al., 2012).
Brucella abortus encounters harsh environments when interacting with host cells. To adapt to these stresses, B. abortus adjusts its gene expression pattern at transcription levels, using as classic transcription regulators (Dong et al., 2013), two-component systems (Liu et al., 2012), and ECF σ factors (Delory et al., 2006). However, much less is known about sRNA and the complex regulation mechanism that occurs after the transcription. Only two sRNAs (AbcR1 and AbcR2) have so far been in identified in B. abortus 2308 (Caswell et al., 2012).
In this study, two sRNA prediction programs (sipht and napp) were used for the genome-wide prediction of sRNA candidates in the B. abortus 2308 genomes. Ranging from 34 to 434 nt, 129 sRNA candidates were predicted. To determine the accuracy of our prediction, the expression of 20 randomly chosen sRNA candidates was tested; seven of those 20 were verified. We also predicted and verified the targets of the seven novel sRNAs. This study facilitates further investigations into sRNAs functions.
Material and methods
Bacterial strains and culture conditions
Brucella abortus 2308 was routinely grown in tryptic soy broth (TSB) at 37 °C or on tryptic soy agar plates incubated at 37 °C under 5% CO2.
Identification predictions of sRNA candidates in B. abortus
To predict the sRNA candidates, we integrated the output of two published sRNA detection programs: sipht and napp.
The sRNA identification protocol using high-throughput technologies (sipht) identify potential intergenic loci based on the colocalization of intergenic conservation and rho-independent terminators. This program also annotates each candidate loci for numerous features of the strength of its prediction and/or its potential biological functions (Livny et al., 2008).
Nucleic Acids Phylogenetic Profiling (napp) uses an efficient clustering method to identify sRNA elements in a bacterial genome (Ott et al., 2012). In this program, the intergenic regions of a reference genome are tiled into overlapping 50-nt segments, and all tiles and coding sequences are classified based on their occurrence profiles in 1000 other genomes. Tiles corresponding to actual sRNAs tend to cluster together, with similar types of protein-coding genes. The tiles and these protein-coding genes were termed as ‘RNA-rich clusters’. Any nonannotated tile in the clusters could be considered a strong sRNA candidate (Marchais et al., 2011; Ott et al., 2012).
sipht (http://newbio.cs.wisc.edu/sRNA/) searches were restricted to detecting sRNA candidates between 30 and 550 nucleotides. Parameters were changed as described (Livny et al., 2008): Briefly, parameters of maximum blast E-value, minimum TransTerm confidence value, maximum FindTerm score, and maximum RNAMotif score were changed to ‘1e-15’, ‘87’, ‘-10’, and ‘-9’, respectively. The other parameters were kept at their default values. To offset the inaccuracy of the computational prediction, we only selected the candidates that were denoted ‘RNA’, in accordance with the qrna analyses of sipht. Only the ‘RNA-denoted’ candidates were considered in the following analysis.
The napp database (http://napp.u-psud.fr/) was used to predict the noncoding RNA elements. The genome ‘Brucella_melitensis_biovar_Abortus’ was selected from the list, and the parameters were the default values.
To determine the potential conserved sRNA candidates within the Brucella spp, the blastn in the sipht program was used. The sRNA candidates were also found in ten other Brucella genomes.
RNA isolation and reverse transcription polymerase chain reaction
Bacteria were grown in TSB at 37 °C until the log phase was reached. Total RNA was isolated and reverse-transcribed into cDNA using random primers as previously described (Liu et al., 2012). One microliter of cDNA or sterile water (negative control) was used as the template for the PCR. Primers were designed according to the sequences of the sRNA candidates listed in Supporting Information, Table S1. The PCR products were analyzed with 1% agarose gel electrophoresis, and the bands with the appropriate sizes were cut and sequenced by the Beijing Genomics Institute (Shenzhen, China). To exclude that residual DNA was present as contaminant, a negative control without reverse transcriptase was performed according to the manufacturer's instructions, and no PCR product was found (date not shown).
Prediction of the target genes for verified sRNAs
The target genes for the verified sRNA were predicted using sTarPicker (http://ccb.bmi.ac.cn/starpicker/), and the parameters for the software were as default. In this study, only the candidates with the highest scores were used as potential sRNA targets.
Verification of the target genes regulated by sRNAs and β-galactosidase assays
With some modifications, a previously engineered (Caswell et al., 2012) Escherichia coli-based reporter system was used for the verification of genes regulated by the sRNAs. Primers used for the amplification of the sRNAs, and the target sequences are listed in Table S1. The sequences encoding the seven sRNAs were amplified by PCR using the genomic DNA from B. abortus 2308 as a template. The amplified DNA fragments were digested with BamHI and KpnI, and the digested fragments were then cloned into the pUT18C plasmid (Table S2).
To construct a lacZ fusion plasmid using lacZ as the report gene, the ORF sequence of lacZ was digested from pSV-β-galactosidase vector (Promega) by HindIII and BamHI, and ligated into pMR10, which was then referred to as pMR-lacZ. The potential target regions of BAB1_0451, BAB1_0472, BAB1_1905, BAB1_0854, BAB1_1361, or BAB2_0187 were amplified and digested with KpnI and HindIII, and ligated into pMR-lacZ. As the putative target sequence of BASRCI408 contained the 5′-UTR and the first 52 nt of BAB1_2002 ORF, this region was also cloned in-frame with lacZ to prevent frameshift mutation (Fig. S1).
To determine the interaction between the sRNAs and putative target sequences, the E. coli DH5α cell was transformed either with a combination of a target lacZ fusion plasmid and pUT18C empty plasmid, or with a combination of a target lacZ fusion plasmid and a corresponding pUT18C-sRNA expression plasmid. The E. coli strains were cultured to the log phase, and β-galactosidase assays were performed using the methods described by Miller (1972).
Prediction of the sRNA candidates in B. abortus
A total of 555 sRNA candidates were predicted by the sipht, with 403 for NC_007618, and 152 for NC_007624. To accommodate the inaccuracy of the computational prediction, only the qrna (Rivas & Eddy, 2001) analyzed ‘RNA’ candidates were used in the next step in the analysis. In total, 263 of the 555 candidates were predicted to encode RNAs with a conserved secondary structure. These were the candidates used in the said next step.
According to the napp prediction, a total of 2548 elements (685 annotated genes and 1863 tiles) were suspected in the genome of B. abortus 2308. Only the elements annotated as tiles were considered as potential sRNA candidates. Among the 1863 tiles, 73 tiles were annotated in the Rfam database (http://www.sanger.ac.uk/resources/databases/rfam.html), and four of the 73 tiles were annotated as noncoding RNAs (one 6S, one suhB, and two RsmY). Excluding the 73 tiles already annotated in the Rfam database, the rest of the 1790 tiles were used in the comparison with the 263 sRNA candidates predicted by the sipht to account for the overlapping tiles.
To improve prediction accuracy, the predictive results from the two programs were merged together for combined analysis (Fig. 1). The common candidates predicted by both programs were assumed for Brucella sRNAs. Finally, 129 of the sRNA candidates (103 for NC_007618 and 26 for NC_007624) were found in the B. abortus 2308 (Table S3) and named according to the following convention BASRC (B. abortus sRNA candidate) I/II (chromosome number) [candidate number], for example, BASRCI371.
Characteristics of the predicted sRNA candidates
The 129 sRNA candidates found in both the sipht and napp tests were analyzed. Lengths differed between 34 and 434 nt (Fig. 2a). The majority of the sRNA candidates (64.3%) were 51–150 nt in length, and the G + C percentage of the sRNA candidates ranged from 32% to 63%, with most between 45% and 55% (Fig. 2b). Compared with the B. abortus genomes at 57.2%, the average G + C percentage of sRNA candidates seemed to be lower than the genome sequences. On chromosome I and II, 103 and 26 sRNA candidates were found, respectively.
To eliminate the duplicates found among the sRNA candidates and those in the sRNA database, 129 sRNA candidates were blasted to two widely used sRNA databases: Bacterial Small Regulatory RNA Database (http://kwanlab.bio.cuhk.edu.hk/BSRD) and Rfam database (http://rfam.sanger.ac.uk/). Seventeen sRNA candidates had a high homology with already identified sRNAs (Table 1), and the other 112 candidates were identified as potentially novel sRNA candidates.
Table 1. Comparison of predicted sRNA candidates with BSRD and Rfam database
cis-encoded antisense RNA
trans-encoded antisense RNA
trans-encoded antisense RNA
trans-encoded antisense RNA
Putative noncoding RNA
To detect putative conserved sRNA candidates in Brucella spp., another 10 Brucella genomes representing different biotypes were also predicted with the sipht program's blastn. A total of 104 conserved sRNA candidates (82 on chromosome I and 22 on chromosome II) were identified in the genomes of Brucella spp. (Table S3).
Validation of B. abortus sRNA candidates
To validate the accuracy of our prediction results, 20 novel sRNA candidates (longer than 120 nt) were randomly selected to determine the expression of the candidates by RT-PCR. Primers were designed according to the sequences of the 20 candidates. To eliminate false-positive interferences, for each candidate, the negative controls were set without template. Seven of the sRNAs could be verified by RT-PCR (Fig. 3). The PCR products were sequenced by the Beijing Genomics Institute, which were consistent with the presumptive sequences (data not shown). The maps of the chromosomal organization of the verified sRNA-surrounding regions were shown in Fig. 4.
Prediction of target genes for the newly verified sRNAs
To identify the genes regulated by the seven verified sRNAs, we performed an in silico analysis with sTarPicker (http://ccb.bmi.ac.cn/starpicker/). BASRCI414 was predicted to regulate BAB1_0854 gene encoding a tetracycline resistance protein with the transmembrane transport activity. Two sRNAs (BASRCI385 and BASRCI337) were involved in the regulation of basic metabolism in B. abortus. The other four sRNAs were predicted to regulate the putative uncharacterized protein without GO annotation (Table 2). Details of the prediction result are listed in Data S1.
Table 2. Verification of the interaction between sRNAs and putative target sequences
Putative target genes
β-galactosidase activity (Miller units)
The data were expressed as averages ± standard deviations (SD). More than three independent experiments were performed.
1521.65 ± 17.10
3.14 ± 0.10
1595.93 ± 49.07
1584.26 ± 48.11
51.12 ± 2.95
2.37 ± 0.05
815.02 ± 55.62
844.06 ± 15.85
9.49 ± 0.21
6.42 ± 0.14
5.14 ± 0.19
2.57 ± 0.09
Verification of the target genes regulated by the sRNAs
To verify the interaction between the sRNAs and their target sequences, the target sequences were cloned in-frame with lacZ into a low-copy plasmid (which we referred to as lacZ fusion plasmid). Likewise, the genes with encoded sRNAs were cloned into a high-copy plasmid pUT18C. A combination of lacZ fusion plasmid and pUT18C empty plasmid, or a combination of lacZ fusion plasmid and pUT18C-sRNA expression plasmid, was transformed into DH5α. As shown in Table 2, for BASRCI408, BASRCI385, BASRCI414, and BASRCI153, the β-galactosidase activity of the strains containing the combination of the sRNA-encoding plasmids and corresponding target lacZ fusion plasmids was significantly reduced. However, the co-expression of the BASRCI27 or BASRCI337 with the corresponding target lacZ fusion plasmids showed no significant difference in β-galactosidase activity. For BASRII26, there was no β-galactosidase activity detected in either strain.
Current bioinformatic methods of bacterial sRNA prediction can be divided into following four major fields in principle: (1) comparative genomics, (2) secondary structure and thermodynamic stability, (3) ‘orphan’ transcriptional signals, and (4) ab initio methods regardless of sequence or structure similarity (Sridhar & Gunasekaran, 2013). The qrna program, a typical representative of comparative genomics, is the first systematic method for the detection of sRNA among closely related organisms. It uses intergenic conservation to detect sRNA regions, restricted to pairwise alignments only. sipht searches ‘orphan’ transcriptional signals based on algorithms integrating the locations of promoters/TFBS, terminators, and sequence conservation. napp is a program that searches the ‘RNA-rich’ clusters in query genomes, which employ the ab initio methods regardless of the sequence or structure similarity (Sridhar & Gunasekaran, 2013).
According to a quantitative assessment of three different sRNA prediction methods (Rfam_scan, sRNAscanner, and sipht), the sensitivity of each method is < 30%, and the precision only 10% (Khoo et al., 2012). As the accuracy of these methods is so poor, many researchers combined several bioinformatic methods to increase the prediction accuracy. In this way, many sRNAs have been identified in different bacterial species (Khoo et al., 2012; Tesorero et al., 2013).
In this study, we combined two programs to improve the reliability of our prediction. Our prediction results suggested that only 23.2% (129/555) candidates with sipht and 7.2% (129/1790) candidates with napp were considered as sRNA candidates. The false-positive ratio was significantly reduced when compared to the results of either of the two programs on their own. In our results, 17 of the 129 predicted sRNA candidates (one was found to be riboswitch) were found to have homologs in the BSRD and Rfam databases. Seven novel sRNAs were validated by RT-PCR among the randomly selected 20 sRNA candidates. However, the rest of the 13 sRNA candidates could not be detected by RT-PCR. The reasons might be false predictions, or no transcription under the tested condition, despite they are true sRNAs. Our results suggested that the combination of the two prediction programs provided a reliable outcome.
Our integrated results came at the expense of the prediction's sensitivity. However, this was a small price to pay considering that only two sRNAs have so far been identified in B. abortus 2308 (Caswell et al., 2012). In those two reported sRNAs, only abcR1 was predicted in our study (BASRCII133), while the other sRNA abcR2 was not found among the 129 sRNA candidates. This inconsistency suggests that some sRNA might have been missed and that we therefore need more effective prediction methods in the future.
In a recent study, three programs (sRNAPredict, eQRNA, and RNAz) were used to predict sRNA candidates in Streptococcus pyogenes. Forty-five sRNA candidates were found when the three predicted results were integrated (Tesorero et al., 2013). Because these prediction methods are less expensive than using the next-generation sequencing methods (Gomez-Lozano et al., 2012; Miotto et al., 2012), they have become the preferred research method for genome-wide detection of the bacteria sRNAs.
When sRNAs bind to a target mRNAs’ 5′-UTR, the expression of the target gene is regulated as either positive or negative (Caswell et al., 2012). The sRNAs binding to the 5′-UTR of the targets may alleviate secondary structures in the ribosome binding site (RBS) region of the mRNAs (Caswell et al., 2012), and allow translation to proceed. The binding of an sRNA to the 5′-UTR of the mRNA may also result in the occlusion of RBS (Fröhlich & Vogel, 2009), or decrease the stability of a target mRNA (Aiba, 2007), leading to lower levels of gene expression. In our study, all putative target sequences of the seven sRNAs were located at the 5′-UTR. Four of the seven sRNAs negatively regulated their targets. Further study is needed to understand the mechanism by which these sRNAs regulate the target gene expression.
Brucella abortus 2308 is an intracellular pathogen, which must adapt to a series of hostile intracellular environments. Bacterial sRNAs play an important role in stress responses (Hoe et al., 2013). In our study, we revealed 112 novel sRNA candidates using a combination of sRNA prediction programs. Our findings might contribute to the understanding of stress response or virulence regulation in Brucella spp., laying the groundwork for those who might late determine the function of these sRNAs.
This work was supported by the National Basic Research Program of China (973 Program) (2010CB530202) and the National Natural Science Foundation of China (no. 31372446).