MicroRNAs (miRNAs) represent an important class of sequence-specific, trans-acting endogenous small RNA molecules that modulate gene expression at the post-transcriptional level. They function by binding to partial complementary cis-regulatory sites (miRNA binding sites) in their target mRNAs. Based on two recent observations from plant genome studies, namely that alternative splicing is a common phenomenon and that miRNA regulates a significant proportion of the transcriptome, we hypothesize that there may be a mechanism for gene regulation that involves both processes. In the present study, we performed a systemic search in the model plant Arabidopsis thaliana using annotated gene models as well as publically available high-throughput RNA sequencing data with a total of 570 million reads. Of the 354 high-confidence miRNA binding sites identified in Arabidopsis, at least 44 (12.4%) were affected by alternative splicing such that mRNA isoforms of the same miRNA target gene differ in the sequences encoding the miRNA binding sites. By simulation, we found that the frequency of alternative splicing at miRNA binding sites is significantly higher than at other regions. Comparative and functional analyses further indicated that the alternative splicing events are important for target gene expression and miRNA action. Together our results show that alternative splicing of miRNA binding sites is a plausible mechanism for attenuating miRNA-mediated gene regulation.
Temporal and spatial control of transcript abundance for expressed genes is crucial for many biological processes and developmental programs. In eukaryotes, gene expression regulation occurs at both the transcriptional and post-transcriptional levels. Concurrent with transcription, the primary transcripts are recognized and processed by the spliceosome to splice out introns and join exons. Alternative processing of the primary transcripts by use of different splice junctions can produce multiple mRNA isoforms from a single gene. Alternative splicing is increasingly being recognized as a major cellular mechanism generating transcriptome plasticity and proteome diversity in plants (Reddy, 2007). For example, 20–30% of transcripts were found to be alternatively spliced in both Arabidopsis thaliana and rice (Oryza sativa) through large-scale EST-genome alignments (Campbell et al., 2006; Wang and Brendel, 2006). Recently, deep sampling of the transcriptome using high-throughput RNA sequencing (RNA-Seq) has indicated that at least 40% of intron-containing genes in Arabidopsis are alternatively spliced (Filichkin et al., 2010).
At the post-transcriptional level, miRNAs are emerging as an important class of sequence-specific, trans-acting endogenous small RNA molecules that modulate gene expression (Bartel, 2009; Voinnet, 2009). The 20–24 nt mature miRNAs are processed from much longer primary transcripts via stem-loop-structured intermediates (Bartel, 2009). In plants, miRNA processing is performed in the nucleus, mainly by the endonuclease Dicer-like 1 (DCL1) (Papp et al., 2003). The mature miRNA is exported to the cytoplasm and integrated into the RNA-induced silencing complex, where it is used as a guide to recognize mRNA targets through base pairing with miRNA binding sites (MBSs) (Bartel, 2009). Interaction between the miRNA and its targets leads to repression of the target genes through cleavage (Llave et al., 2002; Reinhart et al., 2002) and translational inhibition (Brodersen et al., 2008) of the target transcripts, or miRNA-directed DNA methylation at the target loci (Wu et al., 2010). Currently, hundreds of miRNAs have been annotated in a broad spectrum of plant lineages (Kozomara and Griffiths-Jones, 2011), and dozens have been studied experimentally. It is well-established that many miRNAs are crucial for diverse plant development processes and responses to environmental challenges (Jones-Rhoades et al., 2006; Garcia, 2008; Voinnet, 2009).
We are keen to understand how the seemingly rigid miRNA–target interaction that exists when the miRNA is fully incorporated into the regulatory gene networks continues to evolve to allow fine-tuning of gene activity and adaptation. In the context of multi-layer gene regulation, it is worth noting that MBSs are present in mRNAs and must be transcribed and processed to be functional. Alternative processing of pre-mRNAs can conceivably eliminate or create functional MBSs in different mRNA isoforms of the same gene. Such alternative splicing events may provide a mechanism to bypass the strict constraint at MBSs but at the same time maintaining the interaction between miRNAs and their targets. Thus, global identification and analysis of alternative splicing events associated with MBSs is highly desirable to further understand the biological significance of this form of transcriptomic diversity and the intrinsic complexity of miRNA–target interactions in plants.
In the present study, we performed a series of analyses in the model plant Arabidopsis thaliana to assess alternatively spliced MBSs as a mechanism for regulating gene expression. By genome-wide examination, we identified 44 high-confidence alternative splicing events from annotated gene models and RNA-Seq data that produce mRNA isoforms of the same miRNA target gene that differ in the sequences required for miRNA binding. Comparative and functional studies indicate that these events are important for target gene expression and miRNA function. Together, our results reveal that alternatively spliced MBSs represent a plausible and prevalent mechanism for regulating miRNA–target gene circuits in plants.
Compilation of high-confidence MBSs
To perform a genome-wide survey of alternative splicing events that affect miRNA-based gene regulation in Arabidopsis, we aimed to compile a comprehensive list of valid MBSs. Because plant miRNAs have near-perfect complementarity to their binding sites within the target mRNAs (Schwab et al., 2005; Bartel, 2009), they typically induce target cleavage and produce relatively stable RNA intermediates with a 5′ phosphate (Llave et al., 2002). Cloning these cleavage intermediates based on the 5′ RACE technique has been widely used to validate miRNA targeting and to identify the corresponding MBSs in plants. We first obtained 109 individually validated MBSs from the Arabidopsis Information Resource (TAIR). We next utilized 510 putative MBSs identified from high-throughput sequencing of the 5′ ends of polyadenylated RNAs that are predicted to be compatible with miRNA-mediated mRNA decay (German et al., 2008). Using a recent algorithm specifically designed for predicting plant miRNA targets (Alves et al., 2009), we filtered out MBSs that are not capable of forming stable RNA duplexes with known miRNAs, and obtained 349 high-confidence MBSs. By combining the two sets, we identified a total of 354 non-redundant MBSs for downstream analyses (Figure 1a and Table S1).
Identification of alternative splicing events at MBSs
Conceptually, there are four possible models for alternative splicing to create a functional MBS in one mRNA isoform but not the other (Figure 1b). In model I, this is achieved by use of tandem splice donor or acceptor sites near the MBS; in model II, it is achieved by retention or removal of an intron either encompassing or encompassed in the MBS; in model III, it is achieved by differential initiation or termination; in model IV, it is achieved by an exon cassette or mutually excluding exons (Figure 1b). Based on these models, we performed a systemic search using two sets of data to identify alternative splicing events relevant to the 354 MBSs of Arabidopsis. The first dataset contains 5885 alternatively spliced genes obtained from TAIR release 10 that produce 13 594 gene models. Mapping both the MBSs and the gene models to the genome revealed alternative splicing events that affect 14 MBSs. The second dataset involves publically available high-throughput RNA-Seq data generated from 12 libraries with over 570 million reads (Table S2). Utilizing this dataset, we identified 30 additional MBSs affected by alternative splicing, bringing the total number of cases to 44 (Figure 1a). The corresponding miRNAs include members of the most conserved families (e.g. miR156) and those have so far only been found in Arabidopsis (e.g. miR864) (Table S1). Thus, the data indicate that alternatively spliced MBSs are an experimentally supported phenomenon affecting a significant portion (12.4%) of the known miRNA target genes in Arabidopsis.
We use the DCL1 gene (At1g01040) as an example to illustrate the effect of pre-mRNA splicing on a functional MBS. The annotated DCL1 gene model (DCL1-1) contains 20 exons and possesses a functional binding site for miR162 that is split across exons 12 and 13 (Figure 2a,b) (Xie et al., 2003). Analysis of RNA-Seq data revealed that intron 12 is spliced in multiple ways (Figure 2a). Removal of the 84 nt intron generates DCL1-1 (Figure 2b). The presence of specific junction reads revealed another isoform (DCL1-2) that is derived from utilizing a wobble splice donor site within intron 12 and that differs from DCL1-1 by a GUU trinucleotide (Figure 2a–c). The sequencing reads also indicate retention of the entire intron, resulting in DCL1-3 (Figure 2b). Additionally, we performed RT-PCR to clone a portion of the DCL1 transcript (Figure 2c). By sequencing 19 independent clones (Figure 2d), we discovered a shorter DCL1 isoform that is derived from an in-frame cryptic splice acceptor site located within exon 13, 69 nt downstream of the known splice site of intron 12 (Figure 2b). Inspection of all four DCL1 gene models indicated that only DCL1-1 forms an mRNA:miRNA duplex that is compatible with miR162 targeting (Figure 2e). In previous studies (Xie et al., 2003; German et al., 2008), the 5′ ends of cleaved DCL1 mRNA resulting from miRNA-guided slicing all mapped to DCL1-1 (Figure 2e). Together, these results indicate that pre-mRNA processing generates sequence polymorphisms around the miR162 binding site in DCL1 such that only DCL1-1 possesses a functional MBS.
Alternative splicing of the MBS in non-coding TAS genes
Most plant MBSs are located in the open reading frame (ORF), which creates a constraint on splicing. To evaluate the effect of alternative splicing of MBSs without the constraint of an ORF, we examined the TAS genes, which produce non-coding transcripts. Eight TAS genes have been identified in Arabidopsis that each produce a series of trans-acting siRNAs after miRNA-directed cleavage of the primary transcript (Allen et al., 2005; Chen et al., 2007). RNA-Seq data indicate that intron removal results in exclusion of MBSs for TAS1A and TAS2, indicating that MBSs in non-coding genes are also subject to regulation by alternative splicing.
TAS1A (At2g27400) encodes a 930 nt transcript (TAS1A-1) that contains a functional binding site for miR173 (Yoshikawa et al., 2005; Montgomery et al., 2008). RNA-Seq data conclusively show that this gene is subject to alternative splicing (Figure 3a). Based on specific junction reads, we propose that 572 and 594 nt introns can be spliced out from TAS1A-1, resulting in TAS1A-2 and TAS1A-3, respectively (Figure 3b). Neither TAS1A-2 or TAS1A-3 contain the MBS (Figure 3c). Indeed, scanning the sequences of all three TAS1A isoforms revealed no plausible binding site for miR173 except the original MBS in TAS1A-1 (Figure 3d). In contrast to the protein-coding gene DCL1 (Figure 2d), RT-PCR and sequence analysis revealed that TAS1A-2, an isoform without an MBS, is predominant in all organ types examined (Figure 3e). Similar results were also obtained when TAS2 was analyzed (Figure S1). These results indicate that, in the absence of an ORF, alternative splicing is a major determinant of the relative abundance of transcripts competent for miRNA-based regulation.
Frequency of alternative splicing increases at MBSs
We reasoned that increased frequency of alternative splicing at MBSs, relative to other transcribed regions of the genome, is indicative of selection and hence the functional importance of this form of gene regulation. To test this, we first randomly extracted 354 mRNA segments (the same number as the number of MBSs) that were 21 nt long (a typical length for MBSs), and identified splicing events that coincide with these sequences. For this analysis, we did not consider intron retention for the RNA-Seq data (corresponding to three cases of alternatively spliced MBSs) because of potential ambiguity (Figure S2). After repeating the process 1000 times, we found that the simulated frequency of splicing events affecting the random RNA segments forms a normal distribution, with mean and standard deviation of 27.1 and 3.96, respectively. The observed number of alternatively spliced MBSs (41) is significantly (P <0.001) greater than the mean of the simulated data (Figure 4a).
To rule out the possibility that the elevated alternative splicing frequency at MBSs is caused by an unusually high background of alternative splicing of miRNA target genes, we compared MBSs with flanking mRNA regions. After aligning all MBSs, we used a sliding-window approach to establish a trend line of the frequency of alternative splicing in the regions surrounding MBSs. This analysis clearly shows a decrease of splicing frequency in the regions surrounding the MBSs (Figure 4b), indicating that the observed high frequency of alternative splicing at MBSs is specific to the binding sites. We further analyzed the occurrence of the dinucleotides GT (the almost invariant sequence of splice donor sites) and AG (the almost invariant sequence of splice acceptor sites) in the regions upstream and downstream of MBSs, respectively. This analysis indicated that the frequency of the potential splice sites tracks that of observed splicing events very well (Figure 4c). Together, these results demonstrate an elevated splicing frequency at MBSs in Arabidopsis that correlates with the occurrence of potential splice sites immediately adjacent to the MBSs.
The increased frequency of alternative splicing at the MBSs suggests that splicing events may not be conserved in different species, due to divergence of sequences outside the MBSs. To test this prediction, we obtained and compared DCL1 sequence from 21 plant species. Although exon 13 is highly conserved, the in-frame alternative acceptor site is conserved only in the Brassicaceae and Rosaceae, but not any other families (Figure S3). By contrast, the wobble donor site in intron 12 is only conserved in the Brassicaceae and Fabaceae (Figure S3). The lengths of intron 12 in all other species, unlike that in Arabidopsis (84 nt), are not multiples of three (data not shown). It is unlikely these introns will be retained because of frame shifts. Together, these results indicate that specific alternative splicing events at MBSs may have a phylogenetic distribution that is more limited than the MBSs themselves.
Functional implication of an alternatively spliced MBS
The miR156-regulated SPL4 gene encodes a transcription factor that is a member of the SPL (squamosa-promoter binding protein-like) gene family (Guo et al., 2008a). The function of this gene circuit in regulating vegetative to reproductive development has been well characterized in Arabidopsis (Wu and Poethig, 2006; Wang et al., 2009; Wu et al., 2009). We therefore selected SPL4 as an example to demonstrate the role of alternatively spliced MBSs in gene expression and function. There are two annotated gene models from the SPL4 locus. At1g53160.1 (SPL4-1) has two exons and contains an experimentally confirmed miR156 binding site within its 3′ UTR (Figure 5a,b) (Wu and Poethig, 2006). Another annotated gene model At1g53160.2 (SPL4-3) contains three exons and differs from SPL4-1 immediately downstream of the stop codon. The two mRNA isoforms encode identical proteins but the MBS is spliced out in SPL4-3 (Figure 5a,b). When sequencing the PCR amplicons derived from SPL4 mRNAs, we found a new isoform in which an upstream donor site for the last intron is used. Not only does this splicing event eliminate the MBS, but it alters the N-terminal portion (last 18 amino acids) of the protein (Figure 5a,b). We have named the new gene model represented by this mRNA isoform SPL4-2. We then scanned the three gene models for possible miR156 binding sites and found only the original one in SPL4-1 (Figure 5c). Thus, alternative splicing produces multiple SPL4 mRNA isoforms with or without the binding site for miR156.
To examine the influence of alternative splicing on the regulation of SPL4, we compared the expression pattern of the three mRNA isoforms with that of miR156 in various organ types. RT-PCR analysis revealed the presence of three isoform-specific products in the seven examined organ types, albeit at varying abundance. SPL4-1 was the predominant isoform, with a higher expression level than SPL4-2 and SPL4-3 in essentially all examined samples (Figure 5d). Consistent with previous reports (Wu and Poethig, 2006), the expression level of SPL4-1 is highest in adult tissues (e.g. rosette and cauline leaf) and lowest in seedling and root. Such a pattern is clearly opposite to that of miR156, whose expression is high in seedling and root and low in adult leaves (Figure 5e). The only exception is the silique, in which miR156 is not expressed. By contrast, the other isoforms (especially SPL4-3) do not exhibit such an opposite expression pattern to miR156. We further compared wild-type and transgenic plants in which miR156 expression is driven by the constitutive CaMV 35S promoter. In the 35S::miR156 plants, the miR156 expression level is elevated in rosette leaves (Wu and Poethig, 2006). However, only SPL4-1 expression is down-regulated, not that of SPL4-2 or SPL4-3 (Figure 5f). These results indicate that alternative spliced SPL4 transcripts with or without the miR156 binding site are regulated differently by miR156. We therefore conclude that alternative splicing at the MBS constitutes a mechanism for controlling the development-specific expression levels of SPL4.
SPL4 and its paralog SPL3 both contain a functional miR156 binding site (Figure S4a,b) (Wu and Poethig, 2006). However, only SPL4, but not SPL3, has been found to be capable of producing alternative mRNAs that differ in the miR156 binding site. We reasoned that if alternative splicing is important for the regulation of SPL4, its expression pattern should deviate further from that of SPL3 under conditions where miR156 is expressed. To test this hypothesis, we reconstructed the expression profiles of SPL3 and SPL4 during the vegetative to reproductive phase based on published microarray data (Balasubramanian et al., 2006). Because miR156 level decreases in this phase (Wu and Poethig, 2006; Wang et al., 2009; Wu et al., 2009), the steady-state levels of SPL3 and SPL4 both increase (Figure S4c). Importantly, it has been shown that expression of miR156 exhibits circadian regulation (Hazen et al., 2009). However, only SPL3, but not SPL4, shows a moderate clock-dependent expression pattern (Figure S4c), indicating that the endogenous SPL4 isoforms without the miR156 binding site contribute to the distinctive expression pattern of SPL4.
Because of genetic redundancy among the SPL genes, a loss-of-function mutation in SPL4 does not result in obvious development defects (Wu and Poethig, 2006). We therefore chose to demonstrate the importance of the alternative transcripts by a gain-of-function approach. We created transgenic lines over-expressing the three SPL4 isoforms driven by the CaMV 35S promoter (Figure 6a) and assayed their development phenotypes (Figure 6b). Over-expressing SPL4-1 has no significant effect on timing of flowering compared to wild-type (Figure 6b,c), but there is a slight reduction in the number of adult leaves (leaves with abaxial trichome) at flowering (Figure 6d). By contrast, the 35S::SPL4-2 and 35S::SPL4-3 plants whose SPL4 transcripts lack the MBS show an accelerated rate of flowering induction (Figure 6b,c) and significantly fewer adult leaves at bolting (Figure 6d). Thus, constitutive expression of SPL4 transcripts with or without the miR156 binding site resulted in distinguishable phenotypes.
As trans-acting gene regulators, miRNAs function by binding to the complementary MBS in their target mRNAs (Bartel, 2009). MBSs are present in mRNA molecules and are transcribed from DNA. In the present study, we show that mRNA splicing in the model plant Arabidopsis can generate transcript isoforms that differ in the MBS. This process affects a substantial proportion of MBSs (Figure 1a and Table S1), and involves all major types of alternative RNA processing (Figure 1b) (Reddy, 2007). Two observations led us to conclude that the reported cases of alternatively spliced MBSs are an under-estimation. First, the RNA-Seq data used in the current study were derived from limited tissue types and physiological conditions (Table S2), such that genes with low expression or rare isoforms may not be sufficiently covered (e.g. Figure S2). Second, different DCL1 isoforms were recovered by direct amplicon sequencing and whole-genome RNA-Seq (Figure 2d), indicating that neither analysis is exhaustive. miRNAs and miRNA target genes are increasingly being discovered in plants, and alternative splicing of mRNA sequences encoding functional MBSs has the potential to significantly enhance the regulatory complexity of miRNA-mediated gene networks.
A series of intriguing questions arose following the initial discovery. First, is alternatively spliced MBS a regulated event or simply a coincidence between two separate molecular processes? MBSs are parts of mRNA molecules, so it is perhaps not surprising that some are alternatively spliced by chance. However, our results show that the frequency of alternative splicing events encompassing an MBS is significantly higher than for other portions of the transcriptome (Figure 4a). Within the miRNA target genes, the frequency of alternative splicing specifically increases at the MBS (Figure 4b). Importantly, we show that the increased frequency of alternative splicing tracks the occurrence of potential splice sites surrounding the MBS (Figure 4c). These results collectively suggest a scenario in which the presence of an MBS causes changes to local sequences, increasing the possibility of alternative mRNA splicing. Future experiments aimed at detecting the biochemical interactions between the nucleus-located RNA-induced silencing complex and components of the spliceosome (Bayne et al., 2008; Ohrt et al., 2008) should help to fully elucidate this potential new genetic mechanism.
Second, will alternatively spliced MBSs influence regulation of the corresponding miRNA target genes? Because plant MBSs are often located in the coding regions, it is important to distinguish the impact on the encoded proteins from the MBSs when the influence of alternative splicing to the miRNA target genes is concerned. This was one of the reasons that we chose SPL4, for which two mRNA isoforms (SPL4-1 and SPL4-3) encode an identical protein but differ with respect to the presence or absence of an miR156 binding site (Figure 5). We found that alternatively spliced SPL4 transcripts with or without the miR156 binding site are regulated differently by miR156 at the post-transcriptional level in a development-specific manner (Figure 5 and Figure S4). We also found that increasing the expression level of the SPL4 transcripts with or without the miR156 binding site by transgenic means resulted in distinguishable phenotypes (Figure 6). Together with the finding that at least two TAS genes are alternatively spliced at the MBS (Figure 2 and Figure S1), our results indicate that alternative splicing of MBSs constitutes a mechanism to specifically attenuate miRNA-based regulation of target genes.
Third, what is the influence of alternatively spliced MBSs on the function of miRNAs? In addition to birth and death of miRNA genes, MBSs may conceivably serve as an important determinant for the evolutionary dynamics of miRNA–target interactions. On the one hand, the short length of MBSs may have given them the opportunity to mutate and change specificity rapidly during evolution. On the other hand, many MBSs are deeply conserved across plant lineages (Axtell and Bowman, 2008), suggesting that they are under strong purifying selection. An investigation of paralogous gene families targeted by miRNAs in rice supports this view (Guo et al., 2008a). Conserved MBSs in human are also under stronger purifying selection than surrounding sequences (Chen and Rajewsky, 2006). Compared to mutation in either the MBS or the miRNA, alternatively processing of MBS at the mRNA level provides a more flexible mechanism by which an miRNA target gene may attenuate rigid miRNA regulation but at the same time maintain its competence for such regulation. The additional layer of regulation conferred by alternative splicing may link spliceosome activity to the regulation of certain miRNA–target interactions. A predicted advantage of this mechanism is that coordinated regulation by miRNAs and splicing of the target genes could allow plants to integrate multiple input signals to fine-tune target transcript abundance for precise and timely execution of developmental programs.
Finally, what is the implication of alternative splicing in the evolution of miRNA–target gene circuits? Given that many plant MBSs are located in the coding regions, selection at MBSs should have substantially influenced the evolution of miRNA target genes and hence the biological processes involving these genes. Our work shows that one such influence is increased occurrence of mRNA splicing sites near MBSs to generate transcript isoforms that bypass miRNA regulation (Figure 4). Consistent with reports from cross-species EST alignments (Wang et al., 2008), we show that the alternative splicing events affecting MBSs may be family- or species-specific (Figure S3). This means that mRNA processing could increase divergence in the miRNA-mediated gene networks beyond the presence of new miRNA genes across plant lineages. Our finding further implies that genetic changes in other parts the gene where the selective constraints may be weaker (e.g. synonymous mutations that create/eliminate functional splice sites) may be used to modulate selected MBSs. The many possible ways to affect MBSs by alternative splicing (e.g. in-frame cryptic splice sites) suggest that this mechanism may constitute a means to differentially regulate orthologous genes that underlie numerous developmental processes. The divergence brought about by this mechanism may conceivably provide a genetic basis for the extensive physiological and ecological diversity among plants.
Taken together, our results indicate that alternative splicing of MBSs is a plausible and prevalent mechanism for regulating gene expression in Arabidopsis. Further identification of alternatively spliced MBSs in other plants is therefore important to fully elucidate the biological significance and intrinsic complexity of this form of gene regulation. Such studies will provide new insights into the gene networks that integrate both transcriptional and post-transcriptional regulation. Perhaps more importantly, these efforts may create opportunities to integrate different genetic components both within the genome (e.g. mRNA processing and miRNA function) and within the genes (MBSs and sequence changes that create/eliminate splice sites) to provide holistic view of gene expression programs that underpin development and responses to environmental challenges in diverse plant species.
Set II miRNA target genes were assessed as previously described (Alves et al., 2009) to identify high-confidence MBSs. Briefly, the program RNAhybrid (Rehmsmeier et al., 2004; Kruger and Rehmsmeier, 2006) was used to predict energetically plausible miRNA:mRNA duplexes with plant-specific constraints: (i) perfect base pairing of the miRNA:mRNA duplex from nucleotides 8–12 counting from the 5′ end of the miRNA, (ii) loops and bulges no longer than one nucleotide long in either strand, and (iii) end overhangs no longer than two nucleotides in size. In addition, G:U wobble base pairs were not treated as mismatches, but were considered to contribute less favorably to the overall free energy. The putative MBSs were further filtered using the ratio between the minimum free energy of the identified miRNA:mRNA duplexes and the minimum duplex energy of the miRNA when bound by perfectly matched targets as calculated in RNAhybrid. A cut-off value for minimum free energy/minimum duplex energy of 0.70 was used (Alves et al., 2009). Of the 510 set II MBSs, 349 were retained after the filtering steps. These MBSs were combined with the 109 set I MBSs to create a non-redundant collection of 354 high-confidence MBSs.
Identification of new splicing events from RNA-Seq data
The RNA-Seq data utilized in this study include 12 independent libraries with more than 570 million trimmed and filtered reads (Table S2). To identify new junction reads for the annotated gene models, the spliced read mapping tool TopHat, which is a read-mapping algorithm designed to align RNA-Seq reads to a reference genome without relying on known splice sites (Trapnell et al., 2009), was used. Using the built-in short-read alignment tool Bowtie (Langmead et al., 2009) in TopHat, more than 350 million reads were mapped to the Arabidopsis genome, yielding an mean coverage depth of 105 reads per base for each DNA strand. Five parameters were set to facilitate the mapping of reads to annotated Arabidopsis gene models: (i) maximal intron length of 2000 bases, as the vast majority of introns (99.5%) of annotated Arabidopsis gene models are smaller than 1200 bases; (ii) a maximum of two mismatches in read alignments; (iii) no mismatch in spliced alignments; (iv) a minimum of eight matched bases for a junction read at either end; (v) use of the classic splicing motifs GT-AG and GC-AG only. For intron retention, we only considered introns that were completely covered by mapped sequencing reads.
Comparison of alternative splicing frequency at MBSs and other genome regions
To compare MBSs with other transcribed regions, a pool of 50 000 21 nt sequences was randomly generated from all gene models in TAIR release 10, excluding pseudogenes. These short sequences were searched against all annotated gene models and the RNA-Seq data to identify those that are affected by alternative splicing. A subset of 354 short sequences was then randomly selected from the pool, and the number of sequences affected by alternative splicing was calculated. After repeating the process 1000 times, a histogram of the frequency of alternatively spliced sequences was determined. A curve-fitting approach was used to determine the best-fitted normal distribution curve according to the equation:
where μ, the mean value, was determined to be 27.1, and σ, the standard deviation, was determined to be 3.96.
To compare MBSs to other portions of the miRNA target transcripts, the transcript sequences were stacked and aligned at the MBS. Starting from the MBS, sliding windows (window size = 21 nt, step = 10 nt) were applied in both directions. In each window, the number of sequences affected by alternative splicing was obtained as percentage of the total sequences in that window. In each window, occurrence of the dinucleotides GT and AG was also recorded. The numbers were then normalized against the window at a symmetric position on the opposite side of the MBS.
The Arabidopsis plants used in this study were in the Columbia background. Seeds were grown on MS medium (Sigma-Aldrich, http://www.sigmaaldrich.com) and incubated at 4°C in the dark for 4 days, after which they were exposed to continuous white light (approximately 150 μmol m−2 sec−1) at 22°C. Seedlings were transferred to soil and maintained under continuous light until collection of plant organs at various development stages as indicated. Plant age was measured from the time seeds were exposed to light. Flowering time represents the time takes for the first open flower to appear. To generate transgenic plants over-expressing SPL4 isoforms, SPL4-1, SPL4-2 and SPL4-3 cDNA was PCR-amplified using Phusion DNA polymerase (New England Biolabs, http://www.neb.com) using the primers listed in Table S3. PCR products were cloned into the 35S-pKANNIBAL vector between the XhoI and SpeI restrict sites and sequenced. Plants were transformed with the sequence-confirmed constructs using the standard floral dipping method (Clough and Bent, 1998). Transformants were selected on MS medium containing 20 mg L−1 BASTA (bioWORLD, http://www.bio-world.com). T3 generation plants homozygous for individual transgenes were identified by PCR analysis of genomic DNA and used for all experiments.
Total RNA was extracted using TRIzol reagent (Invitrogen, http://www.invitrogen.com/) and treated with RNase-free DNase I. For RT-PCR analysis, total RNA was reverse-transcribed using SuperScript II reverse transcriptase (Invitrogen). The resultant cDNA was PCR-amplified using Phusion DNA polymerase. PCR products were cloned into the pCR4Blunt-TOPO vector (Invitrogen) and sequenced. Quantitative PCR analysis of the reversed-transcribed cDNA was performed using the ABI 7500 system and Power SYBR Green PCR master mix (Applied Biosystems, http://www.appliedbiosystems.com/). Actin7 was used as an endogenous control to normalize the relative expression level. At least three independent experiments were performed for each amplicon, and the data were analyzed using the ABI 7500 system SDS software (Applied Biosystems). Primer sequences are listed in Table S3. Low-molecular-weight RNA blotting was performed as previously described (Yang et al., 2011; Zhang et al., 2011). Probe sequences are listed in Table S3.
We thank Dr R. Scott Poethig for kindly providing the 35S::miR156 seed. We also thank Yan Hu and Robert Arthur for technical assistance. This work was supported by a grant from the US National Science Foundation Plant Genome Program (DBI-0922526) to L.L.