Plants feature a particularly diverse population of short (s)RNAs, the central component of all RNA silencing pathways. Next generation sequencing techniques enable deeper insights into this complex and highly conserved mechanism and allow identification and quantification of sRNAs. We employed deep sequencing to monitor the sRNAome of developing tomato fruits covering the period between closed flowers and ripened fruits by profiling sRNAs at 10 time-points. It is known that microRNAs (miRNAs) play an important role in development but very little information is available about the majority of sRNAs that are not miRNAs. Here we show distinctive patterns of sRNA expression that often coincide with stages of the developmental process such as flowering, early and late fruit maturation. Moreover, thousands of non-miRNA sRNAs are differentially expressed during fruit development and ripening. Some of these differentially expressed sRNAs derived from transposons but many derive from protein coding genes or regions that show homology to protein coding genes, several of which are known to play a role in flower and fruit development. These findings raise the possibility of a regulative role of these sRNAs during fruit onset and maturation in a crop species. We also identified six new miRNAs and experimentally validated two target mRNAs. These two mRNAs are targeted by the same miRNA but do not belong to the same gene family, which is rare for plant miRNAs. Expression pattern and putative function of these targets indicate a possible role in glutamate accumulation, which contributes to establishing the taste of the fruit.
Gene expression is regulated at several levels and one of the most recently discovered mechanisms in eukaryotes involves short non-coding RNAs (sRNAs) to silence gene expression (Brodersen and Voinnet, 2006). Plant sRNAs are generated from double-stranded RNA (dsRNA) by a member of the Dicer-like (DCL) family. Depending on the nature of the precursor dsRNA we distinguish microRNAs (miRNAs) that are produced from short hairpin structures and small interfering RNAs (siRNAs) that are processed from long dsRNA (Brodersen and Voinnet, 2006). Formation of long dsRNA requires the activity of one of the six RNA-dependent RNA polymerases (RDRs; Peragine et al., 2004; Vazquez et al., 2004). The DCL-produced short duplexes are unwound and one of the strands binds to one of the 10 Argonaute (AGO) proteins (Hutvagner and Simard, 2008). The AGO complexes are then guided by the incorporated sRNAs to target RNA or DNA that are recognised by sequence complementarities. Due to the diversification of the DCL, RDR and AGO gene families there are different classes of siRNAs (Brodersen and Voinnet, 2006). The trans-acting siRNAs (ta-siRNAs) are produced from non-coding RNAs that are cleaved by a miRNA and then made double stranded by RDR6 (Peragine et al., 2004; Vazquez et al., 2004). The dsRNA is processed by DCL4 and the ta-siRNAs are incorporated into AGO1 or AGO7 (Adenot et al., 2006). The natural antisense siRNAs (nat-siRNAs) are produced from overlapping antisense transcripts and generation of the primary 24 nt nat-siRNA requires a unique set of silencing proteins (DCL2/RDR6/SGS3/NRPD1a) (Borsani et al., 2005). The most abundant class of plant siRNAs are the heterochromatin siRNAs, which are 24 nucleotide long and generated from PolIV transcripts by RDR2 and DCL3 (Lu et al., 2006). Most but not all heterochromatin siRNAs are incorporated into AGO4 and mediate DNA and/or chromatin methylation, causing heterochromatin formation that leads to transcriptional silencing of retroelements, 5S rDNA arrays and other repetitive sequences (Mosher et al., 2008). miRNAs are produced by DCL1, incorporated into AGO1 and target mRNAs in trans either through cleavage (Reinhart et al., 2002) or by suppressing translation without cleavage (Brodersen et al., 2008).
Although mRNA expression has been profiled in various developmental systems, the dynamics of sRNA expression has not been analysed carefully. In fact, an extended time-course analysis of sRNAs has not been carried out in any plant tissue. We used fruit as a model tissue to study the dynamics of sRNA expression because it is a complex tissue therefore we expected to reveal a lot of coordinated changes. Tomato has been broadly used as a model plant for studying climacteric fruit development (Giovannoni, 2004), which can be divided into several stages: flowering and pollination, fruit set, cell division, cell expansion and ripening. Small RNA investigation in tomato has a relatively short history (Dalmay, 2010). Conserved miRNAs were found in tomato through sequence homology (Yin et al., 2008; Zhang et al., 2008) and new sRNA sequences were identified by traditional (Pilcher et al., 2007; Itaya et al., 2008) or high-throughput (Moxon et al., 2008a) sequencing. These sRNA sequences can be found in several databases (Bazzini et al., 2010; Fei et al., 2011; Zhan et al., 2010; and at http://silodb.cmp.uea.ac.uk/sdb_sly). Functional analysis has been carried out only for a few tomato miRNAs and these mainly investigated their role in leaf development (Ori et al., 2007; Berger et al., 2009; Buxdorf et al., 2010).
We previously showed that conserved and non-conserved miRNAs regulate genes involved in fleshy fruit development (Moxon et al., 2008a) and here we describe the sRNA profile of tomato flower/fruit across 10 time points from closed flower bud to ripened fruit. We found that not just miRNAs and ta-siRNAs but many other sRNAs are differentially expressed during development and therefore very likely to play a role in development. Some of these differentially expressed sRNAs map to transposons but many of them derive from protein coding genes or regions that are similar to protein coding genes.
Deep sequencing of short RNAs during fruit development
To assess the involvement of regulatory sRNAs in a complex developmental process such as fleshy fruit development, we profiled sRNA accumulation at 10 time points in the flower and fruit of the MicroTom cultivar of tomato. The 10 samples comprised two flower stages (closed flower bud and open flower), four pre-breaker stages (1–3, 5–7, 11–14 mm fruit diameter and mature green), the breaker stage (fruit turning red) and three post-breaker stages (3, 5 and 7 days after breaker). We will refer to these samples as Bud, Flower, F1-3, F5-7, F11-14, Fmg, Br, Br+3, Br+5 and Br+7 respectively throughout the manuscript. The sRNA fraction was subjected to deep sequencing on the Illumina 2G platform yielding between 1.38 and 5.2 million redundant sRNA reads each (Figure S1; GEO accession GSE18110). Using version 405 of the preliminary tomato genome (ftp://ftp.sgn.cornell.edu/tomato_genome/bacs), we could identify at least one perfect full-length alignment to the genome for 22.7–31.6% of total sRNA reads in all samples (Figure S2). Given the estimated percentage of completion of the tomato genome sequence used for the analysis (∼43%), this figure is equivalent to a genomic match rate of 53–73% of total sequences in a fully sequenced genome. The fact that genomic match rates were similar across the time series shows that the quality of the sRNA samples and sequence reads was similar across the series.
First, we analysed sRNA expression profiles throughout the developmental time series, using normalised counts of sRNA sequences to compare the 10 time points. As this is the most detailed dataset of sRNA expression levels in a developmental series in plants so far, we were interested in the pattern of changes in sRNA expression levels throughout the series. If sRNAs are indeed regulated in response to, or as triggers for, developmental processes, we would expect to see continuous trends rather than widespread chaotic changes between samples. To this end, we calculated the Jaccard index of similarity for the top 5000 most highly expressed sRNA reads of each sample. Intriguingly, not only did we observe a gradual change of abundant sRNA expression signatures, but the samples clearly grouped into three blocks: flowering, fruit growth stage (F1-3 to mature green), and fruit ripening (mature green to breaker stage +7 days) as shown in Figure 1. This was the first indication that changes in the sRNA expression profile of a developing fleshy fruit are not random but are precisely timed to coincide with developmental processes.
One possible cause of the observed changes in predominance of different sets of mature sRNAs could be differential activity of the various sRNA biogenesis pathways in different stages of fruit development. The size of mature sRNAs and the complexity of produced sRNA populations are two powerful characteristics that distinguish sRNA pathways. miRNA biogenesis, for example, is known for the precise excision of one predominant and well defined mature sRNA (resulting in a population of low complexity), which in most cases is ∼21 nt long. In contrast, heterochromatin-associated sRNAs are typically characterised by a more chaotic dicing pattern (high complexity) and a predominance of 24 nt sRNAs (Schwach et al., 2009). In plants, this 24 nt size class has often been reported as the overall most abundant class of sRNAs, which we also observed in our data (Figure S3). However, the size-class distribution was not constant across the developmental series (Figure 2; Figure S4). The bias towards 24-mers was particularly strong in bud and flower, then decreased gradually until F11-14 and exhibited another peak at the breaker stage. Twenty-one nucleotide sRNAs, on the other hand, showed an increased abundance in the late stages of fruit ripening. This observation indicated that different silencing pathways are independently regulated during fruit development. Furthermore, we noticed a surprisingly clear-cut distinction between early and late fruit stages by analysing the complexity of sRNA populations (Figure 3). There was a sharp drop in the complexity index, a measure for the ratio of unique versus total sequences, at the breaker stage. This finding means a shift towards a more homogenised population of sRNAs in the late stages of fruit development, indicating an increased contribution from sRNA biogenesis pathways that produce the more precisely defined sRNA species at those stages. It is worth pointing out that the complexity measure was not simply a consequence of the total read count in each dataset, since bud and breaker stage had very similar total read counts but very different complexity indices (Figures S1 and 3). Furthermore, the complexity index did not fluctuate chaotically but instead yielded a smooth curve that underpins the time-series character of our sRNA data and the gradual changes in the sRNAome that occur during fruit development.
Analysis of expression profiles of known and new miRNAs
One particularly well studied class of regulatory sRNAs are the miRNAs. To elucidate the potential regulatory roles of miRNAs in tomato fruit development, we analysed expression profiles of known miRNAs. These were identified in our dataset by perfect matches to sequences from the miRNA repository (miRBASE release 14; Griffiths-Jones et al., 2008). We computed the normalised counts of known miRNA sequences, plotted them across the time series and validated the results by northern blot analysis from biological replicates (Figure 4). Ten miRNA families showed differential accumulation during fruit development. MiR167, miR172 and miR390 were expressed mainly during the flower stages. MiR159, miR162 and miR165/166 were most abundant during early fruit development and the expression of miR156, miR164 and miR396 was increased during ripening. Northern blot analysis of these miRNAs was in broad agreement with sequencing results, i.e. the trends in expression level changes correlated well. One exception was miR319 where the sequencing result and northern blot showed different patterns. The most likely reason for this is that the Northern blot analysis can detect molecules that are not necessarily classified as miR319 by computing. The changes in expression of the differentially expressed miRNAs through fruit development coincided with the three main developmental stages: flowering, fruit growth and fruit ripening, suggesting specific roles for these miRNAs in these stages.
Next, we identified new miRNAs in the 10 data sets using the miRCat program (Moxon et al., 2008b). Twenty candidate miRNA loci (representing thirteen different miRNAs) were predicted by miRCat. For seven of these loci we identified a miRNA*. These seven loci encode a total of six distinct new mature miRNAs, which were submitted to miRBASE (Table S1). Expression of four new miRNAs was validated by northern blot and they all showed differential expression during fruit development (Figure 5). miR-V and miR-Y showed a gradual decrease while miR-X and miR-W were mainly expressed during the growth phase of the fruit. Secondary structures and positions of mature miRNA and miRNA* of new miRNAs are shown in Figure S5.
Interestingly, miR-Y and miR-W* seem to be transitions between two known mature miRNAs. There are four mismatches between miR-W* and ptc-miR-472a and two mismatches between miR-W* and ptc-miR-482 (Figure S6). miR-Y is shifted by two nucleotides and also contains mismatches compared with miR-472 and miR-482. The ptc-miR-472 family and ptc-miR-482 were classified as different miRNAs because there are four mismatches between them. In addition to these two miRNAs there are several recently discovered other miRNAs (such as miR-2118, miR-3633*, miR-2089, miR-2517 and miR-4159-3p) that are similar to this group of miRNAs. The discovery of miR-Y and miR-W* might therefore make it necessary to combine these different miRNAs into one family. It is also worth pointing out that a strand-shift occurred since the mature Arabidopsis/poplar miR-472/482 is similar to the tomato miR-W* instead of the mature miR-W strand.
We predicted targets for the new miRNAs using our previously published tool (Moxon et al., 2008b) and selected seven for 5′-RACE assays. Five of them could not be validated, although unigene SGN-U579632 (annotated as alcohol-dehydrogenase 2) showed cleavage sites at and closely around the expected position but only a few in the exact cleavage site (Table S2). However, two of the predicted targets were validated and interestingly both targets were cleaved by the same miRNA (miR-W*). SGN-U573791 (annotated as membrane bound ATPase) and SGN-U585460 (annotated as membrane bound glutamate permease) showed all or most of the cleavage sites at the expected position (Table S2).
Identification of co-regulated clusters of sRNA loci
MiRNAs only represent one type of sRNA-producing genomic loci but plants in particular produce a highly diverse range of sRNA classes (Schwach et al., 2009). To identify sRNA-producing loci that could cooperatively act as regulators during fruit development and ripening, we looked for co-regulation patterns of sRNA loci across the series. To this end, we grouped sRNA reads that had a full match to the tomato genome on the scaffolds of the tomato genome assembly as described previously (Moxon et al., 2008b). This approach generated 47 176 sRNA-producing loci with lengths varying between 18 and 67 891 nt from 9.97 million reads mapping to the available genome (27% of the total sRNA reads). We removed copies of repetitive loci to avoid biasing the downstream analysis, thus reducing the number of sRNA loci to 43 336. We combined normalised individual sRNA expression levels in each locus to calculate a single expression level for each time point.
We noticed that many of these sRNA loci stay surprisingly constant in their expression levels over several measurements, even at medium to high expression levels. For example, we found a 3.3 kb locus (C06HBa0109C03.1/79897–83195) with a narrow range of normalised expression levels from 88 to 129 reads per million throughout the entire time course, corresponding to up to 545 raw reads in a single sample (Figure S7). This finding is important as it not only highlights the surprisingly tight regulation of sRNA production from a genomic region the size of a protein-coding gene, but also greatly increased our confidence in the usability of our normalised data for quantitative analyses. Furthermore, ‘housekeeping’ sRNA loci, if they are common, could be instrumental in a wide variety of sRNA expression assays, just as genes with relatively constant expression are in transcriptomic analyses. To assess the stability of sRNA loci in our data, we grouped them by their average normalised expression level and plotted distributions of fold-changes between the average and the most extreme outlier in the whole time series, as well as two sub-series, corresponding to early and late fruit development (Figure 6). This revealed a large proportion of loci where the change between average sRNA expression and the most distant outlier was <1.5-fold, even at medium to high average expression levels: 1.1% of loci in the 100–500 read per million bin were in this category, rising to 51 or 41.8% when the analysis is restricted to the four early or late fruit samples, respectively. Intriguingly, a lower degree of stability resulted from four-sample sub-sets that spanned the flower-to-fruit transition or the breaker stage, which further indicates that these stages are characterised by more widespread changes in the sRNAome (data not shown).
For the next analysis, we selected a subset of the most highly differentially expressed loci (among the topmost 5% in at least one of the transitions between the 10 time points). This yielded a set of 1185 sRNA loci. It is worth pointing out that this is not the complete set of differentially expressed sRNA loci, instead, we focussed on a subset that can be clustered reliably based on expression profiles. An interesting feature of this set of highly differentially expressed loci is the enrichment for strand-biased patterns of sRNA production (Figure S8). We defined strand bias as a measure ranging from zero (for sRNAs mapping equally to both strands of the genomic locus) to one (in which all sRNAs are mapping to one strand). We limited this analysis to loci with at least 65 matching raw sRNA reads – a threshold established by using randomisation experiments and an entropy measure as described in Data S2. The strand bias of sRNAs in their genomic loci of origin may give clues to their biogenesis. In general, single-stranded transcripts that form hairpin structures (like miRNA precursors), will produce a highly strand-biased set of sRNAs, whereas a low strand bias is more likely to be associated with a biogenesis pathway that involves long double-stranded precursors that are the result of a polymerase activity. An example of a locus that gives rise to sRNAs almost equally from both strands (strand bias = 0.08) can be seen in Figure S9 (C08SLe0010M05.1_38938_40457). A genome browser screen shot of the locus reveals a dispersed sRNA production pattern and the transcript of the locus does not show any potential for clear-cut hairpin formation, thus pointing to the involvement of a polymerase and a long double-stranded precursor for these sRNAs. In contrast, locus C02SLe0092M23.1_65939_67042 (Figure S9) has a strand bias of 0.84 and the most abundant sRNAs derive from a region in the locus that folds into a long hairpin structure, which is most likely to be the precursor of these sRNAs. However, there are also loci in between these extremes, such as C09HBa0170H21.1_45038_46528 (Figure S9), which has a high strand bias (0.92) but this is mainly due to a small region within the locus that gives rise to a small number of highly abundant sRNAs. While transcript from this region may form long hairpins, the highly abundant strand-biased sRNAs do not appear to derive from a hairpin region themselves. It is, therefore, not clear how these sRNAs are produced in the cell.
Next, we applied hierarchical and k-means clustering consecutively on the expression profiles of the differentially expressed loci to identify groups that show similar activity at corresponding time points in fruit development and could therefore be involved in controlling, or being controlled by, related events. As a proximity measure between expression profiles we tested Euclidian, Manhattan, and angular distances (data not shown) as well as the Pearson Correlation Coefficient (PCC). The latter emerged as the best choice due to the high degree of variability in the data. This method identified 63 clusters of sRNA loci with similar expression profiles (Table S3). Some of the clusters exhibited a very striking pattern of expression changes in certain stages of the series and we analysed these in more detail, including their sRNA size class distribution, enrichment for sequence motifs, genome annotation, locus complexity and strand bias distribution (Figure 7).
One group, cluster A, showed a very distinctive drop in sRNA levels in the flower stage. This group had a strong bias towards 24-mer sRNAs and a low strand bias. We identified 21 loci with similarity to retroelements (putative LTR/Gypsy and Copia long-terminal repeat) among the 28 member loci of this group. We also noticed that potential methylation sites are highly overrepresented in the genomic sequences in this group. Altogether, the sRNAs in cluster A may silence retroelements in cis, most likely inducing transcriptional silencing by DNA methylation as is common for this class of sRNA-producing loci. It is intriguing that silencing pressure on this group of retroelements appears to be relaxed particularly in flower tissue while cells undergo meiosis in the anthers and ovules. It is known however, that retroelements may be reactivated very specifically under certain conditions, e.g. as a result of stress or in pollen (Grandbastien et al., 2005; Pérez-Hormaeche et al., 2008; Slotkin et al., 2009). In addition to retroelements, cluster A also contained two loci with homology (allowing one mismatch) to the known tomato miRNA sly-miR1916, which we had already reported previously to be expressed at higher levels in mature fruits than in flowers or leaves (Moxon et al., 2008a).
In contrast, another group of 28 sRNA loci, designated cluster B, showed an almost steady decline in sRNA expression levels, particularly from buds through to early fruit stages. The sRNAs produced by members of this group were highly enriched for 24-mers and there was a much more pronounced strand-bias in sRNA production than in cluster A. BLAST and Pfam searches (Finn et al., 2010) revealed that one of its members shows homology to β-1,3-glucanase, an enzyme that has been implicated in pathogen defence and shown to be induced prior to radicle penetration of the endosperm during tomato seed germination (Wu et al., 2001; Leubner-Metzger, 2003). The majority of the reads from this locus were 24 nt and did not show strand-bias. This group also contained a locus with significant homology to AP2-domain containing transcription factors. These belong to a large multigene family, including APETALA2 in Arabidopsis, which are known to be involved in floral and seed development and are themselves regulated by miRNAs (Chen, 2004; Shigyo et al., 2006; Ohto et al., 2009). Reads from this locus were also dominated by 24nt sequences with the dominant reads antisense to the mRNA. The predominance of 24 nt sequences suggests that sRNAs from both loci are produced through the RDR2/DCL3 pathway and so likely to silence these loci in-cis. A known tomato miRNA, sly-miR1919, also belongs to this expression-profile cluster.
Another cluster, cluster C, comprising 41 sRNA loci, was characterised by a more abrupt change in sRNA expression levels, which dropped markedly with the onset of fruit development. One of the member loci is 87% similar to the tomato unigene SGN-U601727, which is annotated as a homologue of EIN4, a member of a putative ethylene receptor family in Arabidopsis (Hua et al., 1998). Reads from this locus were mainly 24 nt with the dominant read antisense to the mRNA. Another member of cluster C exhibited homology to genes in the R1 family of late blight resistance proteins from wild potato (Kuang et al., 2005). Reads from this locus were mainly 21 nt and were generated from both strands indicating that they were not degradation products of the mRNA but likely to be products of a polymerase activity. Furthermore, five members of the cluster matched the miR167 family of known plant miRNAs (allowing up to two mismatches).
Cluster D (made up of 13 sRNA loci) had an expression profile similar to cluster C in that there was a clear distinction between higher levels of sRNA expression in bud and flower stages and decreased levels during fruit development. However, the drop in sRNA expression between flower and early fruit was much more pronounced than in cluster C. At the same time, the sRNA populations produced from this group are more complex than in cluster C. In addition, complexity increased dramatically after the flower stage indicating that the high expression levels in buds and flowers are mostly due to specific sRNAs that are highly expressed at those stages. Clusters C and D also differed in their sRNA size-class composition, with cluster D showing a clear bias towards the production of 22-mer sRNAs. Interestingly, the shift to a more diverse sRNA population in early fruit development was accompanied by a shift from 22-mer to 24-mer sRNAs in cluster D loci. These findings indicate that some sRNA-generating loci can be processed by different sRNA biogenesis pathways at different times throughout a developmental process. We found the sequence motif CTG, a putative methylation site, to be highly enriched in members of this cluster, indicating that RNA-directed DNA methylation might be a possible mode of action of the sRNAs produced at these sites. Three sRNA loci in cluster D show homology to EIN3, an F-box domain ethylene receptor implicated in flower senescence/abscission and fruit ripening (Tieman et al., 2001). Reads from these loci were mainly 22-mer and showed no strand bias suggesting that they were produced by DCL2 from an RDR generated dsRNA. Yet another locus matched a tomato unigene SGN-U583299 (97% similarity) encoding fasciclin-like arabinogalactan family protein, which is a known regulator at many developmental stages in plants (Ellis et al., 2010). Reads from this locus were mainly antisense to the mRNA and predominantly 22nt.
Intriguingly, we also found a group of sRNA loci with potential links to chloroplast genes. This group, designated cluster F, shows a striking up-regulation of sRNA levels at the mature green fruit stage, just before the fruit turns red in the breaker stage. Although sRNA levels decrease again, they generally stay above those in earlier stages. This change in expression levels again coincides with a change in the diversity of the sRNA population. A drop in the complexity measure indicates that some specific sRNAs are becoming predominant in this cluster at the mature green and later stages. BLAST analysis revealed that seven of the 24 member loci of this cluster cover known chloroplast genes: two loci have matches to Photosystem I P700 chlorophyll a apoprotein A1, one locus matches chloroplast ATP synthase subunit c, four loci match ycf2, an essential gene of unknown function (Drescher et al., 2000), and three loci have matches to the chloroplast DNA-directed RNA polymerase subunits alpha or beta. All these matches exhibit more than 95% similarity to the chloroplast genome (AM087200) and to BAC sequences. Due to the incomplete genome assembly it is not possible to establish whether these regions are present on both nuclear and chloroplast genome or only on the chloroplast. Interestingly, one locus alone spanned a 56-kb region that contained the apoprotein A1, the RNA polymerase subunit alpha and ycf2, with sRNAs deriving from all of the respective coding sequences. This region includes the large interspersed inverted repeat that is common to all chloroplast genomes (Kahlau et al., 2006). In addition, this cluster contains one locus with homology to the known miRNAs miR159 and miR319. Overall, cluster F is characterised by a bias towards 24-mer sRNAs and a strong strand bias but no clear enrichment for putative target motifs for RNA directed DNA methylation. Because of the potential link to the chloroplast, we analysed this cluster in more detail. To rule out that these are short degradation products rather than bona-fide sRNAs, we produced plots of sRNA production along the length of each member locus, which revealed distinct hot spots that are unlikely to be the result of random degradation (data not shown). Furthermore, we noticed a bias towards adenosine as the first base at the 5′-end of sRNAs in this cluster (Figure S10a), which is a known feature of 24 nt heterochromatin sRNAs in plants, although the bias is less pronounced than previously described (Havecker et al., 2010). In addition, we mapped all cluster F sRNAs to an assembled and annotated version of the tomato chloroplast (AM087200). The sRNAs mapped to gene and intergenic regions, as well as to rRNA and tRNA genes (Figure S10b). While sRNAs mapping to all genomic features showed the same overall expression profile as cluster F as a whole (Figure S10c,d), their biogenesis appears to be different: there was a pronounced strand-bias for sRNAs mapping to tRNA and rRNA genes but no bias in sRNAs mapping to coding and intergenic regions, which also differed in their sRNA size class distribution (Figure S10e,f). Based on this we can rule out that the sRNA reads mapping to genes are degradation products.
While it is preferable to analyse sRNA expression levels as biological units as done above, the unfinished character of the tomato genome meant that we probably miss many sRNAs that might show interesting expression patterns but cannot be assigned to genomic loci. In addition, finding biologically relevant sRNA locus boundaries is not a perfectly solved problem and we may ignore interesting expression patterns if sRNAs from unrelated precursors are incorrectly grouped. For this reason we extended the analysis of expression profile clusters to individual sRNAs but limited the analysis to those sequences that had a normalised abundance >150 in at least one of the 10 time points. We obtained 12 clusters and analysed their sRNA size-class distribution and any available functional annotation for some of the most striking patterns (Figures 8 and S11; and Table S4).Two clusters (G and M) were validated by northern blot analysis of 14 and 10 sRNAs, respectively and Figures S12 and S13 show that sRNAs belonging to the same cluster indeed have very similar expression patterns. This method also identified locus cluster D described above (cluster G). The 353 sRNA members of this group include miRNAs 166 and 167. Next we selected clusters of sRNAs with expression profile changes in early fruit development and found six particularly striking examples (clusters H–M). All of these clusters shared the property of relatively low expression levels in both bud and flower and a sharp rise of sRNA production in the first fruit stage with a plateau of high expression at one or more stages of fruit development before the breaker stage. The clusters in this category had between 18 and 155 sRNAs, including miRNAs 156 and 157. We also found clusters that had their most pronounced expression level changes in breaker stage or late fruit ripening, such as cluster O with a sharp rise at breaker stage and a very heavy bias towards 24-mer sRNAs, or cluster Q, which showed moderate expression levels throughout the time series but a very sharp peak shortly before the final ripening stage.
Intriguingly, most clusters overall had a clear bias towards one sRNA size class. For example, cluster I showed a clear dominance of 25-mer sRNAs, followed by 20-mers as the second most abundant group. Cluster K on the other hand (which showed a sharp rise in sRNA production at early stages of fruit development, followed by a steady decline) was heavily biased towards 24- and 23-mers. More than half of sRNAs belonging to cluster M, in contrast, were of the 20-mer class. It is worth pointing out that the sRNAs that form those clusters are only linked by expression profiles, not by sequence similarity. Therefore, the co-clustering of size classes is further evidence for highly specific and independent regulation of the various sRNA biogenesis pathways available to plants during fruit development.
Our knowledge of the sRNAome in plants is largely based on single time point studies in Arabidopsis (Rajagopalan et al., 2006; Kasschau et al., 2007). The main conclusion from these studies is that only a very small fraction of the sRNAs are miRNAs and ta-siRNAs and only these two classes of sRNAs regulate mRNAs. The large majority of the plant sRNAs are heterochromatic siRNAs and in Arabidopsis these sRNAs are almost exclusively generated from transposon and repeat regions but not from or around protein coding genes (Kasschau et al., 2007). We have presented here a comprehensive analysis of a plant sRNAome during a developmental process, in this case fruit development. The first important finding is that there are many genomic regions that give rise to differentially expressed sRNAs during fruit development and only a small fraction of these sRNAs are miRNAs. Furthermore, many of these loci show shared characteristics when grouped by their expression profiles, indicating that they are not random background noise. There are many examples for miRNAs’ role in development (for a review see Chen, 2009) but differential expression of heterochromatin siRNAs was only demonstrated in pollen (Slotkin et al., 2009). In that case, transposon regions in the vegetative nucleus are revealed by transcription and the generated sRNAs are transferred into the sperm cells to silence mainly the Athila retrotransposons. However, we have very little understanding about the function of most sRNAs in somatic cells. Our results indicate that at least in tomato, many other sRNAs (in addition to miRNAs and ta-siRNAs) are differentially expressed and therefore may play a role in development. We also found that in contrast to Arabidopsis, 24 nt sRNAs that are not strand biased (therefore look like heterochromatin siRNAs) are produced from protein coding genes or from regions that are similar to protein coding genes. The other important finding of our study is that sRNA expression profiles, when observed over a prolonged period of time in developing organs, are not chaotic, instead they exhibit a high degree of order that points to an involvement of sRNAs in processes throughout fruit development.
Phases in sRNA expression profiles
Several of our analyses clearly revealed discernible phases in sRNA expression patterns. For example, the 5000 most abundant sRNAs naturally grouped the time points into flowering, fruit growth phase and ripening stages. These stages had strikingly different sRNA complexity indices, a measure for the diversity of the sRNA population, which also reflects the involvement of different biogenesis pathways. The size-class composition of our sRNA samples confirmed the differences between the three phases in that there were higher percentages of 24-mers in the first phase, contrasted by an increased percentage of 21-mer sRNAs in the late fruit stages. Intriguingly, 21-mer sRNAs are often associated with the more precisely excised sRNA species such as miRNAs and ta-siRNAs, which fits well with the observation that the increase in the 21-mer percentage coincides with a drop in the sRNA complexity measure. Together, our findings point to different sRNA biogenesis pathways contributing differently to overall sRNA production throughout fruit development. Such a degree of coordination likely indicates an important role for sRNAs in regulating the processes that lead to a ripe fruit. This finding is of particular interest given the economical significance of tomatoes as one of the most important fruit crops worldwide.
Housekeeping sRNA loci?
One of the main problems in analysing expression profiles of sRNAs is the fact that the biological unit of sRNAs is not as obvious as a distinct transcript or splice variant from a gene locus. While the precise mature products of miRNA loci may be treated like mRNA transcripts, most of the plant sRNAome is diverse and low individual read counts, which makes expression profile analysis unfeasible for many mature sRNAs. Grouping of sRNAs by their genomic locus of origin is therefore often applied to this kind of analysis (Schwach et al., 2009). We found that some sRNA-producing loci can be remarkably constant throughout a long developmental process. On the one hand, this indicates that these sRNA loci are not involved in regulating genes specifically in fruit development. On the other hand, it raises the possibility that there are ‘housekeeping sRNAs’ that are kept within a narrow range of steady-state levels in the cell. One possible explanation for this would be that they derive from transcripts whose expression is tightly controlled and these sRNAs could influence the activity of their region of origin in a negative feed-back loop mechanism.
Identification of co-regulated sRNAs and sRNA-producing loci
We used the sRNA locus approach to find regions of sRNA production that share similar expression profiles throughout fruit development and may therefore be part of regulatory networks that react to the same triggers and/or regulate targets that are involved in related processes. We have shown that such clusters are not only readily identifiable but that most of them also show co-clustering of features such as enrichment for methylation target motifs, strand bias, or sRNA size class composition. In addition, many clusters exhibited prominent changes in expression levels at key stages of the flowering and fruit development process. Furthermore, there was generally good agreement between our sRNA expression profiles and hybridisation experiments performed for validation from biological replicates. It is not surprising though, that there are also cases where northern blot and sequence counts do not correlate well (e.g. miR319; Figure 4), because hybridisation probes may bind to immobilised RNA samples with mismatches and variants of sRNAs with slightly differing lengths may not always be distinguishable on a northern blot.
Links between sRNA co-expression clusters and fruit development
Grouping sRNA producing loci by their sRNA expression profiles and selecting patterns with pronounced changes during the time course of our experiment identified a striking number of links to proteins with known involvement in flower and fruit development. Such links were established either by retrieving the annotation of the genomic loci in question or performing additional similarity searches to identify putative gene products. Given that the sRNAs match the genomic regions perfectly, we would expect them to originate at these loci but also to be potential regulators of their locus of origin either by transcriptional or post-transcriptional silencing. In many cases the loci show homology to other genes, although these are clearly distinct loci and the sRNAs do not perfectly map to the homologous genes. However, it is possible that sRNAs would act post-transcriptionally in-trans on the homologous mRNAs when they are antisense to each other. We must point out that we cannot formally rule out that some of the sRNAs derived from gene (or any transcribed) regions are simply degradation products of transcripts rather than Dicer products. However, we believe that random degradation is unlikely to account for the presence of the gene-derived sRNAs in our study. Firstly, all loci with links to genes of known function presented here either had no strand bias or the majority of reads mapped antisense to the mRNA transcript. In contrast, degradation products of mRNAs should be highly strand-biased and map to the sense strand. In addition, the uneven match patterns and pronounced size-class distributions of the sRNAs in most loci further increase our confidence in them being bona-fide Dicer products. The final proof for sRNA biogenesis at the loci we describe can only come from experiments with Dicer deficient mutants.
Among the most interesting sRNA loci were three loci that show homology to two ethylene-responsive factors, EIN3 and EIN4. We do not know yet how the sRNAs are produced from these loci but the size class distribution and lack of strand bias suggest that an RDR generates dsRNA from the primary transcript, which is processed by DCL2. The role of DCL2 in producing endogenous sRNAs is not clear and these loci could be rare examples, although they need to be validated using a dcl2 mutant. The function of these sRNAs is not known but it is likely that they would be capable of regulating their region of origin in-cis or even other EIN3 and EIN4 mRNAs in-trans, thus contributing to complex regulatory networks in developmental processes. In some cases, this regulation may help to establish or focus localised expression of genes, a role fulfilled by sRNAs in plants (Chitwood et al., 2007; Kawashima et al., 2009).
Cluster F provides some particularly interesting leads to its putative function in fruit development. It includes sRNAs mapping to several photosynthesis-related genes and sRNAs in this group showed a peak of expression at the stage that marks the end of photosynthetic activity and the transition from chloro- to chromoplasts in the ripening fruit. While we are certain that these are indeed bona-fide sRNAs, we cannot be certain where in the cell they originate. The genes and genomic features they map to, among them a 56 kb fragment, are normally part of chloroplast genomes. However, due to the unfinished nature of the tomato genome we cannot formally rule out that large chunks of the chloroplast genome may have integrated into the nuclear genome in tomato. Few eukaryotic sRNAs have been reported that could derive from the bacteria-like chloroplast genome but those are poorly characterised (Lung et al., 2006; Morin et al., 2008).No components of the RNA silencing machinery are known to be present in the chloroplast and it is generally assumed that viroids, capable of replicating in chloroplasts give rise to sRNAs only while passing through the cytosol (Martínez de Alba et al., 2002; Bolduc et al., 2010). If we assume that cluster F sRNAs are chloroplast-derived and/or have the ability to target chloroplast loci, then they would not only be able to regulate a component of the light-harvesting complex but could also impact chloroplast gene expression at high level by targeting a subunit of the plastid-encoded DNA-dependent RNA-polymerase (PEP). While many chloroplast housekeeping genes are transcribed by a nuclear-encoded RNA polymerase (NEP), PEP has been reported to predominantly transcribe the photosynthesis-related genes (Hajdukiewicz et al., 1997). Considering these potential targets and the timing of the pronounced peak of sRNA expression in the developmental series, an involvement of cluster F in one of the key stages of tomato fruit development seems likely.
What could be the mode of action of these chloroplast-related sRNAs? DNA methylation was shown to be unchanged in chloroplasts between early and late fruit stages and therefore seems an unlikely candidate (Marano and Carrillo, 1991). Intriguingly, an in-depth study of the tomato transcriptome and proteome during fruit ripening revealed that transcriptional regulation is mostly responsible for global reductions in plastid gene expression prior to or at the onset of fruit formation whereas post-transcriptional events become predominant during chloroplast-to-chromoplast conversion at the breaker stage (Marano and Carrillo, 1992; Kahlau and Bock, 2008). These post-transcriptional events did not lead to significant reductions in mRNA levels, although mRNA stability was reported to be modestly affected for some genes (Marano and Carrillo, 1992). Therefore, if cluster F sRNAs are involved in this process, their mode of action could be translational inhibition with a slight affect on mRNA stability. Although more frequently associated with animal miRNAs, some plant sRNAs have also been shown to regulate their targets in this way (Brodersen et al., 2008). A further twist to the story may come from ycf2 (ORF2280), a large gene of unknown but essential function that is covered by four loci in cluster F. This gene was shown to give rise to several protein products of very different size, of which the largest (170 kDa) disappears between small green and mature green tomato fruit stages (Richards et al., 1994). Although highly speculative at this stage, it is plausible that sRNAs could be involved in the as yet unidentified processing steps that lead to the different sizes of the products of ycf2 and influence their expression patterns.
Targets of miR-W*may provide a link to an economically important trait
We identified two unigene transcripts that are targeted by miR-W*. SGN-U573791 is similar to membrane bound ATPases and SGN-U585460 is annotated as a membrane bound glutamate permease. It is very rare that a plant miRNA targets two mRNAs that do not belong to the same family. One exception is miR-395, which regulates the expression of ATP sulphurylases and a low affinity sulphate transporter (Kawashima et al., 2009). Although these targets are in different gene families they are all part of the sulphur metabolism network. It is tempting to propose that the two targets of miR-W* are also involved in the same process. Based on their annotations it is reasonable to hypothesise that these genes act together in ATP dependent glutamate transport. It is well established that glutamate is accumulated in tomato fruit during ripening (Mounet et al., 2007) contributing to the ‘umami taste’ of tomato (Bellisle, 1999). Increasing glutamate level could be explained by an increased activity of these genes after the breaker stage. Coincidentally, MIR-W gene expression is down-regulated at breaker, which would allow an increased accumulation of these target genes during ripening. This raises the possibility that miRNAs are involved in the ripening process not just by regulating transcriptions factors such as Colourless non-ripening (Cnr; Moxon et al., 2008a) but through the regulation of other cellular processes such as small molecular transport.
By profiling short RNAs during an extended time period we obtained a snapshot of sRNA involvement in tomato fruit development and identified targets for further research, including genes that are known to be relevant for fruit development and appear to be originators and potential targets for sRNA-mediated regulation. Time course analyses like this will become more and more important to fully understand the complex regulatory potential of RNA silencing pathways, especially in organisms with highly diverse sRNAomes such as plants.
Solanum lycopersicum cv. MicroTom plants were grown in a growth chamber at 24°C/18°C (day/night) with 12 h of illumination per day. Flower and fruit tissues were collected and grouped into 10 samples. The flower tissues were put into two groups: closed flower bud and open flower. The green fruits were classified based on size rather than time from anthesis because it is more reproducible in different growing conditions. We created three non-overlapping classes during the growing phase: fruits with diameter between 1–3, 5–7 and 11–14 mm. These were followed by the mature green stage that represents the full size green fruit. The next stage is breaker when the first yellowish colour appears on the fruit. This stage was followed by three more groups: breaker plus 3, 5 or 7 days to capture the fruit ripening process.
Total RNA was extracted from the 10 samples using the mirVana kit (http://www.Ambion.com) and was used for sRNA library generation as we previously described (Szittya et al., 2008). The libraries were sequenced on a Genome Analyser II (Illumina) by BaseClear (http://baseclear.com/). Each library was sequenced on a separate lane without using bar coding. Northern blot analysis was carried out as previously described by Pilcher et al. (2007). 5′-RACE assay was carried out as described by Pantaleo et al. (2010) using RNA from flower and fruit tissues.
Expression levels and mapping of sRNAs
Sequencer read counts were used as expression levels for each unique sRNA sequence and normalised to total sample size as described in Mortazavi et al. (2008). The patman software (Prüfer et al., 2008) was used to map sRNAs (full-length, no mismatches) to version 405 of the preliminary tomato genome obtained from the SOL Genomics Network (Mueller et al., 2005; ftp://ftp.sgn.cornell.edu/tomato_genome/bacs), Expression levels of mapped sRNA were weighted by the number of genome matches as described in Mosher et al. (2008). Mapped sRNAs were grouped into loci as described in Moxon et al. (2008b), briefly: sRNAs mapping to the genome with <200 bp distance were pooled and regions were kept for further analysis if they contained at least 10 mapped sRNA reads. Expression levels for sRNA loci were calculated as the sum of weighted and normalised expression levels of their constituent sRNAs. Genome browser images showing sRNA alignments to regions of the tomato genome were produced using the UEA sRNA toolkit (Moxon et al., 2008b). Accession number of short RNA dataset: GEO accession GSE18110.
Differential expression and clustering of sRNAs/sRNA loci
To identify suitable candidates for clustering by expression profile, differentially expressed sRNAs and sRNA loci were identified at each transition between time points using an ‘offset fold change’ method. In brief: an offset of 20 was added to sRNA or sRNA locus normalised counts before calculating log-ratios of the counts in order to determine changes in expression level between time points. In essence, the offset serves as a background correction for low expression level counts and deliberately underestimates the magnitude of differential expression where the absolute read counts are low, thus avoiding false positives in the less reliable range of sRNA read counts. At higher counts, the impact of the offset on log-ratios becomes negligible (Figure S14). We compared this method to two alternative methods: ‘unusual ratio’ (Draghici, 2003) and ‘modified SAM’ (Fahlgren et al., 2009) and found a good agreement between all methods in the intersection of differentially expressed loci (74.7%). Formulas for all definitions of differential expression are given in Data S1 online.
Differentially expressed sRNAs and sRNA loci, respectively, were then grouped into clusters, using the Pearson Correlation Coefficient as measure of similarity between expression profiles. Agglomerative hierarchical clustering was used to select the number of clusters, and the k-means clustering method (Xu and Wunsch, 2009) was then applied to obtain the final clustering by expression profile.
To identify suitable background sets of sRNA loci for the enrichment analyses of sequence motifs and strand bias in clustered sRNA loci, we applied an entropy measure described in detail in Data S1 online.
This work was supported by the BBSRC (grant numbers: BB/G008078/1 and BB/I00016X/1). S. L-G was supported from the Spanish Ministerio de Ciencia e Innovación. We thank Prof. John Wood for help with statistical problems.
Note added to proof:
The new miRNAs have been submitted to miRBase and the following names were assigned: sl-miR-V is sly-MIR5300; sl-miR-W is sly-MIR5301; sl-miR-T is sly-MIR5302; sl-miR-X is sly-MIR5303; sl-miR-Z is sly-MIR5304 and sl-miR-Y is sly-MIR482.