The power of cooperation: Experimental and computational approaches in the functional characterization of bacterial sRNAs

Abstract Trans‐acting small regulatory RNAs (sRNAs) are key players in the regulation of gene expression in bacteria. There are hundreds of different sRNAs in a typical bacterium, which in contrast to eukaryotic microRNAs are more heterogeneous in length, sequence composition, and secondary structure. The vast majority of sRNAs function post‐transcriptionally by binding to other RNAs (mRNAs, sRNAs) through rather short regions of imperfect sequence complementarity. Besides, every single sRNA may interact with dozens of different target RNAs and impact gene expression either negatively or positively. These facts contributed to the view that the entirety of the regulatory targets of a given sRNA, its targetome, is challenging to identify. However, recent developments show that a more comprehensive sRNAs targetome can be achieved through the combination of experimental and computational approaches. Here, we give a short introduction into these methods followed by a description of two sRNAs, RyhB, and RsaA, to illustrate the particular strengths and weaknesses of these approaches in more details. RyhB is an sRNA involved in iron homeostasis in Enterobacteriaceae, while RsaA is a modulator of virulence in Staphylococcus aureus. Using such a combined strategy, a better appreciation of the sRNA‐dependent regulatory networks is now attainable.


| THE CHALLENG E OF IDENTIF YING THE REG UL ATORY TARG E TS OF BAC TERIAL S RNA S
Bacteria experience various metabolic and stress conditions they need to respond rapidly. Small trans-acting regulatory RNAs (sRNAs), which frequently but not always are noncoding, have been found at the heart of regulatory pathways that allow bacteria to regulate virulence gene expression, respond to stresses, sense the population density, modulate the cell surface composition, and adjust their metabolism (Carrier, Lalaouna, & Massé, 2018;Desgranges, Marzi, Moreau, Romby, & Caldelari, 2019;Holmqvist & Wagner, 2017;Radoshevich & Cossart, 2018). Therefore, it is of utmost interest to identify the RNA targets of these sRNAs, as many bacteria express hundreds of different sRNAs from differentially regulated genes. These sRNAs are very heterogeneous in length, sequence composition, and secondary structure. They can originate from their own genes or may be processed from the 5′ or 3′ UTRs of protein-coding genes (Chao et al., 2017;Lalaouna et al., 2019).
Some sRNAs, such as ArcZ in Escherichia coli (Mandin & Gottesman, 2010) or RprA in Salmonella enterica (Papenfort, Espinosa, Casadesus, & Vogel, 2015) are even further processed by RNase E, which resulted in different sRNA fragments. Although this processing is essential for the sRNA regulatory functions, it is not yet known whether the targets set of the sRNA is changed (Chao et al., 2017). In Bacillus subtilis, it has been described that an additional RNase Y-dependent processing of RoxS expanded the repertoire of its target mRNAs (Durand et al., 2015).
These sRNAs may not only interact with mRNAs but also with other sRNAs, tRNA precursors, or with proteins. However, the largest class of bacterial sRNAs frequently targets different mRNAs that are often functionally related. The sRNA-mRNA interaction relies on base pairings (including Watson-Crick and G-U) between complementary sequences stretches in the two molecules. To complicate things further, these RNA sequence elements involve usually short (between 8 and less than 50 nts) and noncontinuous base pairings, and can be involved in alternative intramolecular secondary structures competing with the intermolecular interaction. Moreover, interacting proteins such as the RNA chaperones Hfq (Dos Santos, Arraiano, & Andrade, 2019) and ProQ (Smirnov et al., 2016) frequently mediate these interactions in Enterobacteriaceae (Holmqvist & Vogel, 2018). All these factors make the simple bioinformatic search for complementary segments in a given sRNA versus the pool of coding sequences inadequate (Backofen & Hess, 2010). In this review, we have chosen two case studies from the two distant bacteria, E. coli and Staphylococcus aureus, which have evolved different helper proteins and ribonucleases associated with sRNA regulation.

| DIFFERENT E XPERIMENTAL APPROACHE S FOR THE IDENTIFI C ATI ON OF S RNA TARG E TS
Due to the shortcomings of simple bioinformatic searches, several new experimental RNA-seq approaches have been developed to enrich and identify the targetome of an sRNA (for a review, see (Saliba, C Santos, & Vogel, 2017)). Here, we will primarily focus on MAPS (MS2-affinity purification coupled with RNA sequencing) and RIL-seq (RNA interaction by ligation and sequencing).
In MAPS, the sRNA of interest is tagged usually at its 5′ end with the MS2 RNA aptamer and expressed in bacteria. Cytoplasmic extracts are purified by affinity chromatography followed by the sequencing of the enriched MS2-sRNA-RNA complexes (Lalaouna, Prévost, Eyraud, & Massé, 2017;Lalaouna, Desgranges, Caldelari, & Marzi, 2018). The main advantages of this approach include the sensitive detection of poorly expressed targets, the discrimination between direct and indirect targets, and finding sRNA targets that form base pairings . The approach was applied successfully to many sRNAs from Gram-negative bacteria such as E. coli and Salmonella Typhimurium, and from Gram-positive bacteria like S. aureus (see below).
To achieve a genome wide analysis of sRNA-RNA interactions, both CLASH (cross linking, ligation, and sequencing of hybrids) and RIL-seq rely on the association of sRNAs-targets with RNA-binding proteins (Melamed et al., 2016;Waters et al., 2017). In this application of the CLASH method, sRNA-target hybrids bound by RNase E were explored (Waters et al., 2017), while RIL-seq investigated interactions of RNAs associated with the RNA chaperone Hfq (Melamed et al., 2016). In both methods, the tagged proteins and the bound RNAs are purified by affinity chromatography, then the RNA hybrids are ligated and the resulting RNA chimeras are sequenced. RIL-seq has expanded the ensemble of known targets of sRNAs in E. coli and showed the dynamics of the regulatory networks under various stress conditions (Melamed et al., 2016). These two interactome methods are not limited to a specific sRNA and can simultaneously identify a great number of RNA-RNA interactions. The RIL-seq approach has been restricted until now to E. coli as Hfq in several Gram-positive bacteria has limited RNA chaperone activity (Zheng, Panja, & Woodson, 2016). Only in Listeria monocytogenes, Hfq has been found as a key partner of sRNAs (Nielsen et al., 2010). As Listeria expresses numerous sRNAs involved in virulence (Cerutti et al., 2017;Toledo-Arana et al., 2009), the RIL-Seq approach using Hfq might be an appropriate strategy to simultaneously probe RNA-sRNA interactions involved in stress tolerance and virulence, conditions that require Hfq (Nielsen et al., 2010).

| ADVAN CED ALGORITHMIC APPROACHE S IN THE IDENTIFI C ATI ON OF S RNA TARG E TS
Computational RNA-RNA interaction prediction methods could be a fast and cheap alternative or complement to the experimental approaches. These tools (reviewed in Wright, Mann, & Backofen, 2018) use a thermodynamic model of base pairing to find stretches of complementary bases in two RNA sequences that can form a stable intermolecular duplex. A major feature of more advanced methods is the consideration of the interaction site accessibility, that is, the question of whether intramolecular structures within sRNA and target RNA interfere with the intermolecular duplex formation (Backofen & Hess, 2010). There are two distinct tasks for bioinformatic tools: (a) finding the actual target site, that is, the interacting bases, for a given sRNA-target pair that was discovered experimentally. This site can be subsequently scrutinized by additional experiments, such as reporter gene assays in combination with point mutations in the predicted interaction regions of the mRNA and sRNA. (b) Predicting the full targetome of a given sRNA. In practice, this means to predict the best possible interactions between the given sRNA and all possible target sequences in the respective transcriptome. Then, the results can be ranked by the minimal energies of the predicted interactions and true positives should appear higher than false positives.
Tools like IntaRNA (Busch, Richter, & Backofen, 2008) or RNAplex (Tafer & Hofacker, 2008) are quite successful to cope with the first task. For instance, the true positive rate and the positive predictive value for the prediction of base pairs in 109 experimentally verified interactions is 0.62 and 0.64 for IntaRNA, respectively (Lai & Meyer, 2016). However, all existing noncomparative tools struggle more or less with the second task, the targetome prediction (Pain et al., 2015). To improve the computational performance for the full genome target prediction, the comparative CopraRNA algorithm was developed (Wright et al., , 2013. The general principle of CopraRNA can be described as follows: if an sRNA is conserved in different organisms, also the targets should be conserved to some extent. Thus, a true positive interaction should appear in multiple organisms, while a false positive interaction should be restricted to a single prediction. Simply, CopraRNA joins a set of organism-specific target predictions by IntaRNA into a combined prediction, which leads to an increased sensitivity and specificity. Indeed, a separate benchmarking study revealed that CopraRNA currently provides the best and most advanced computational strategy to predict bacterial sRNA targets (Pain et al., 2015). Furthermore, it was reported that CopraRNA has only a slightly worse specificity (2.7% vs. 4.7%) and sensitivity (52.7% vs. 53.4%) than the experimental RIL-seq method (Melamed et al., 2016).
CopraRNA not only produces a list of likely targets but also generates a complex set of additional information. The algorithm defines the regulatory domains of the sRNAs, performs functional enrichment of the predicted target mRNAs, and reconstructs the regulatory networks (Wright et al., , 2013. Originally, CopraRNA was successfully tested on 18 well-known Enterobacteriaceae sRNAs and on a small number of sRNAs from different bacteria revealing the previously identified mRNA targets as well as novel base pairing interactions (Wright et al., 2013). In the following, multiple studies confirmed the power of such approach in diverse bacterial groups including, for example, Cyanobacteria

| S TRENG TH S AND WE AK NE SS E S OF D IFFERENT ME THODS
Despite the excellent performance of the leading experimental (RILseq, MAPS) and computational (CopraRNA) methods, each individual approach nevertheless suffers from false positives (specificity) and struggles to recover the full targetome of a given sRNA leading to false negatives (sensitivity). Each method has its individual strengths and weaknesses.

| False positives
In case of computational predictions the tested sequence space is high, while the actual sRNA-target interactions are often short and involve noncontinuous base pairings. This leads to a large number of random interactions with similar or even lower minimal interaction energies than the true targets. This is the main reason for the lower performance of noncomparative bioinformatic tools and the resulting high number of false positives. While this problem has been resolved by considering comparative information, several examples studied in more detail have shown that the problem of false positives is often overrated due to incomplete knowledge about the "true" targetome of any given sRNA. This is because all predicted "true" target until they are independently validated, will be considered as false positive. Furthermore, a predicted (mRNA) target can act as a "sponge" that sequesters the sRNA to inhibit interactions with other targets (Figueroa-Bossi, Valentini, Malleret, Fiorini, & Bossi, 2009;Grüll & Massé, 2019;Miyakoshi, Chao, & Vogel, 2015). In this scenario, the mRNA is actually the regulator of the sRNA, which hinders the use of reporter gene assay for validation. For experimental in vivo methods such as MAPS or RILseq, the source of false positives is less obvious. Both methods require an actual physical interaction of the sRNA and the targets.
However, unspecific binding to any molecule involved in the respective enrichment protocols other than the sRNA cannot be excluded (e.g., MS2 aptamer, column, beads). The RIL-seq protocol includes a ligation step of the RNA heteroduplexes, which ensures that only RNAs that have been in close proximity are identified as interaction partners. MAPS interprets all RNAs that are purified by the MS2 tagged sRNA as interaction partners. However, indirect copurification with RNA binding proteins (e.g., Hfq, ProQ, or CsrA in Enterobacteriaceae) or with other RNAs might occur. An example would be an mRNA which is targeted by the MS2-tagged sRNA and also by another sRNA. RIL-seq uses a crosslinking step, which might lead to artifacts for molecules that are not directly interacting, but that are nevertheless in close vicinity in the cell.

| False negatives
The other side of the coin is the biologically "true" targets that are not detected by the respective method.
In case of CopraRNA, there are two major reasons for false negatives. The first reason is linked to the target sequence or the interaction region, which is not in the set of sequences that are scanned for interactions. Indeed, the searched sequence space was reduced to the most likely interaction regions, that is, the 5′UTR (often 200 nts upstream of the annotated start codon) and the first 100 nts in the coding region. This heuristic reduces the number of random, potential false positive, high-ranking predictions, but also prevents the detection of some targets. In many bacteria, majority of 5′UTRs is only 20-40 nucleotides long (Kim et al., 2012;Kopf et al., 2014;Ruiz de los Mozos et al., 2013;Voigt et al., 2014), and therefore, CopraRNA could potentially predict interactions with nontranscribed sequences. Other mRNAs possess very long 5′UTRs (e.g., ~570 nt for rpoS in E. coli and ~410 nt for sarA in S. aureus). Thus, part of these 5′UTRs is overlooked.
In the same vein, the coding sequence beyond the first 100 nt after the first nucleotide of the start codon and the 3′UTR are not taken into account, and some sRNAs are indeed pairing deep in the coding sequence (Gutierrez et al., 2013;Lalaouna, Morissette, Carrier, & Massé, 2015;Papenfort, Sun, Miyakoshi, Vanderpool, & Vogel, 2013). Finally, CopraRNA concentrates on protein-coding mRNAs, that is, noncoding transcripts such as other sRNAs or tRNAs are not yet considered. This should not be neglected since interactions between sRNAs (e.g., RsaI and RsaG (Bronesky et al., 2019)) and between sRNAs and tRNA precursors (e.g., RyhB and 3′ETS leuZ ) have been revealed first by MAPS technology, as well as by RIL-seq, and CLASH (reviewed in Grüll & Massé, 2019). The second reason for false negatives by CopraRNA is that the in silico predicted minimal interaction energy might not reflect the energy value of an in vivo setting. For some targets the strength of intramolecular interactions might be overrated in the prediction, because in vivo these internal structures may sometimes be disrupted by RNA-binding proteins such as Hfq, ProQ, or CsrA in Enterobacteriaceae (Dos Santos et al., 2019;Müller, Gimpel, Wildenhain, & Brantl, 2019;Smirnov et al., 2016). Furthermore, discontinuous interactions such as double kissing complexes, for example, between OxyS and fhlA in E. coli (Argaman & Altuvia, 2000) or between RNAIII and rot mRNA in S.
In the experimental targetome approaches, a major reason for false negatives is surely that the respective target is not or only weakly expressed in the used experimental setting. An exhaustive targetome search would thus require the repetition of the experiment with multiple relevant stress and growth conditions. Consequently, the individual results should be rather considered as a snapshot of dynamic events.
Additionally, both RIL-seq and CLASH cannot uncover RNA interactions that are independent of the respective RNA binding protein used for the pulldown of the RNA-RNA heteroduplexes. To identify only directly interacting partners, RIL-seq and CLASH use a crosslinking and a ligation step. However, their efficiency is pretty low, leading to the loss of a significant number of real RNA:RNA interactions. In case of MAPS, it cannot be excluded that the used MS2 RNA tag interferes with the secondary structure of the tested sRNA and the interaction to some targets. Furthermore, for MAPS some (weaker) targets might be lost during the affinity purification and clean up steps of the protocol.
In Enterobacteriaceae, many sRNA-dependent repressions lead to the rapid depletion of the target mRNAs. These mRNAs might be under-

| RyhB, an sRNA involved in iron homeostasis
The first test case was the well-characterized sRNA RyhB, involved in iron homeostasis in E. coli. The analysis of RyhB was initially used as a proof of principle for the MAPS technology , which has demonstrated to be very effective, as emphasized by the copurification of previously known targets like sodB, sdhC, and shiA mRNAs (Massé & Gottesman, 2002;Prévost et al., 2007;Vecerek, Moll, Afonyushkin, Kaberdin, & Bläsi, 2003), but also by the discovery of the 3′ETS leuZ , a tRNA-derived fragment acting as a sRNA sponge.
We compared the top 200 results taken from a slightly modified CopraRNA version and from MAPS. To complete the analyses, we added the results from the global Hfq-dependent RIL-seq interactome study with RyhB (Melamed et al., 2016) and used a set of 30 independently verified RyhB targets as reference (Figure 1a).
The benchmark shows that the methods are highly complementary ( Figure 1b) since only 10/30 targets are detected by all methods, while 4/30 targets are not predicted by any. Moreover, 18, 17, and 15 verified targets are in the top lists of CopraRNA, MAPS, and RIL-seq, with 2, 4, and 1 uniquely identified targets, respectively. Unsurprisingly, the top targets were highly enriched in mRNAs for iron-binding proteins or proteins related to iron homeostasis (iron transport, siderophore, and Fe/S cluster biosynthesis, iron storage; Figure 1c), with the majority of those not yet independently F I G U R E 1 Comparison of MAPS, CopraRNA, and RIL-seq data to characterize the RyhB targetome. The MAPS and RIL-seq data were taken from references  and (Melamed et al., 2016), respectively. (a) The cumulated number of wellcharacterized targets are plotted against the prediction rank for CopraRNA, MAPS, and RIL-seq as an estimate of sensitivity and specificity. It is hard to pin down the reasons for individual true targets not being detected by one or the other method. However, there is a likely explanation for some of them. For instance, MS2-RyhB MAPS was performed in exponential phase of growth (OD 600nm = 0.5) in LB-rich medium. Knowing that RyhB is mainly produced in response to iron starvation, some targets are weakly or not expressed under the tested conditions (e.g., nirB mRNA (Massé, Vanderpool, & Gottesman, 2005)). Therefore, while nirB could not be found using MAPS, CopraRNA successfully identified nirB as one of the best candidates regardless of its expression (Wright et al., 2013). Later on, nirB was also detected by RIL-seq, which was done at various conditions including iron starvation (Melamed et al., 2016). As mentioned above, the noncoding tRNA The only exception is the glyW-cysT-leuZ operon, which is the precursor of the 3′ETS leuZ and is not regulated by RyhB .

| RsaA, an sRNA that modulates virulence
RsaA is a sigma B-dependent regulatory RNA identified in S. aureus as an attenuator of acute infection and promoting chronic diseases (Romilly et al., 2014). Transcription of the rsaA gene leads to the synthesis of a short form (138 nts) and a long form (286 nts) most probably due to transcriptional readthrough. The longest form is less well expressed and appears to be less stable than the short version of RsaA (Geissmann et al., 2009). Using different approaches (proteomic and MAPS), the mRNA encoding the posttranscriptional repressor MgrA was characterized as the main target of RsaA (Romilly et al., 2014;Tomasini et al., 2017). By repressing mgrA, RsaA inhibits capsule formation and induces biofilm formation. In addition, MAPS allowed the identification of four other targets such as the SsaA-like enzymes (i.e., in Figure 2b shows the overlap of the three datasets. Targets that appeared in more than one dataset are of special interest, as they are more likely true targets. Four targets appear in all three datasets. These are the already known mgrA and ssaA2-3 mRNAs, and the so far not independently verified mRNAs encoding the hemolysin α (hly) and the general stress protein YdaG.
Returning to method-specific characteristics, we will here address selected errors and shortcomings of MAPS and CopraRNA.
Because it is a difficult task to detect false positives, we concentrate on targets that are missed by one or the other method (false negatives). Based on the rank 2 CopraRNA predictions, RsaA could interact with the 3′ end of icaA (icaD in Figure 2c). Using gel retardation assays, we validated the formation of an efficient and stable complex between RsaA and icaA-icaD mRNA fragment (nucleotides −239 to +175, from the AUG of icaD; Figure 2d,e). However, the polycistronic icaADBC mRNA was not considered in the first conservative analysis of the MS2-RsaA MAPS data due to the relatively low enrichment factor (icaA (FC 1.94, rank 28); icaD (FC 1.14)). The reason for this low enrichment is certainly due to a poor expression of icaADBC operon in tested conditions, as validated by transcriptomic analysis (Tomasini et al., 2017). CopraRNA in turn did not predict ssaA2_2 mRNA due to a straightforward reason. Prior to the combination of predictions from different organisms, CopraRNA needs to identify which mRNAs code for homologous proteins. This is done using the DomClust algorithm (Uchiyama, 2006). The two proteins SsaA2_2 and SsaA_2 are too similar (76.7% sequence identity) to be assigned to different clusters. In the following, only ssaA_2 with the best predicted interaction energy of −14.49 kcal/mol, and not ssaA2_2 (−11.97 kcal/mol), was F I G U R E 2 Integration of MAPS, CopraRNA, and proteomic data to estimate the RsaA targetome. The MAPS and proteomic data were taken from reference (Tomasini et al., 2017). (a) The cumulated numbers of known targets are plotted against the prediction rank for CopraRNA and MAPS as an estimate of sensitivity and specificity. (b) The Venn diagram shows the overlap of the top 200 MAPS and CopraRNA predictions and the proteins, which were significantly differentially synthesized in the proteome and secretome data. (c) Gel retardation assays using the short version of RsaA and a putative mRNA target. 5′end-radiolabeled RsaA(*) was incubated with increasing concentrations of icaA_icaD fragment (nts −239 to +175, from the AUG of icaD). (d) Schematic visualization of the icaADBC operon and the location of the verified interaction site with RsaA. (e) Gene names or locus tags of the genes that were detected by more than one method or targets that are unique to one method but belong to one of the indicated functional categories. Genes in black characters encode hypothetical proteins or proteins with functions unrelated to the major targets of RsaA. C is for CopraRNA, M for MAPS, P for proteome, and S for secretome. (f) Schematic drawing of the regulatory networks involving RsaA. The experimental data are taken from Romilly et al. (2014) and Tomasini et al. (2017). Black arrows (activation) or bars (repression) are for transcriptional control, red arrows (activation) or bars (repression) are for post-transcriptional control. Transcriptional regulators are in blue, regulatory RNAs are in red and other mRNAs regulated by RsaA are in grey. Dashed arrows are for potential RsaA targets for which experimental validation is required (e) (f) considered for the combination. Using the poor ssaA2_2 prediction would have resulted also in prediction rank 10.

| The power of cooperation
As a surprising outcome of the present analysis, we found that the general performance of the methods MAPS, CopraRNA and RIL-seq with respect to sensitivity and specificity are largely comparable, but that the overlap between each set of predictions is remarkably low.
Due to the different strengths and weaknesses, no single experimen-

| New findings due to cooperation
Combining the results of the different methods for the two sRNAs chosen as examples yielded several interesting new findings.
With more than 15 years research and numerous scientific studies, RyhB is well established as a regulator of iron homeostasis in E. coli and other γ-proteobacteria and presumably one of the best investigated bacterial sRNAs overall. Nevertheless, the combined CopraRNA, MAPS, and RIL-seq data indicate that the RyhB regulon is significantly larger than described so far ( Figure 1). In total, there are more than 30 suggested new targets, including the mRNAs of multiple iron containing proteins (e.g., napF, fdx), mRNAs with a clear link to iron import (e.g., exbB, fepD), and cyuP coding for a serine transporter, which is relevant for siderophore biosynthesis.
The sRNA RsaA is a key modulator of S. aureus virulence (Romilly et al., 2014;Tomasini et al., 2017). Besides MgrA, the major regulator of capsule formation, RsaA might also affect a multitude of major virulence regulatory systems. This includes the transcription factors of the Sar family (Rot, SarA, SarT, SarR, SarV, SarZ), the regulatory RNAIII, and several virulence associated secreted proteins or surface proteins. These factors include the hemolysin α (hly), the enterotoxin-like toxin X (selX), a complement inhibitor (scn_3), the surface protein G (sasG), and staphopain A (sspP). The third large group of predicted targets copes with the peptidoglycan/cell wall metabolism and biofilm formation. Examples are the icaADBC operon and the cell wall hydrolases. This study also indicates that the longest form of RsaA, even less expressed, regulates additional targets, expanding the regulon of RsaA (Figure 2). Even though these predictions still await independent validation, the combined results of MAPS and CopraRNA provide a promising basis for further investigations. This strategy is particularly useful in organisms where no major RNAbinding proteins have yet been found associated with sRNA such as in Gram-positive bacteria.
These new findings nicely illustrate what can be gained from the joint application of experimental and computational approaches on defining the targetome of a bacterial sRNA, a prerequisite to better decipher its roles in bacteria physiology, stress response, adaptation, and virulence. A major challenge will be to adapt these new methods to probe the dynamics of RNA-RNA interactions in more complexed ecological environments and in in vivo models of infection for pathogenic bacteria.

ACK N OWLED G EM ENTS
We would like to thank Pascale Cossart for her constant interest and support in our work. We are all grateful for the major findings and concepts she has highlighted in the field of RNA regulation in Listeria. No. 753137-SaRNAReg to DL. JG was supported by the DFG (GE 3159/1-1).

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to declare.