Alternative mRNA processing increases the complexity of microRNA-based gene regulation in Arabidopsis

Authors


(fax +1 434 982 5626; e-mail ll4jn@virginia.edu).

Summary

MicroRNAs (miRNAs) represent an important class of sequence-specific, trans-acting endogenous small RNA molecules that modulate gene expression at the post-transcriptional level. They function by binding to partial complementary cis-regulatory sites (miRNA binding sites) in their target mRNAs. Based on two recent observations from plant genome studies, namely that alternative splicing is a common phenomenon and that miRNA regulates a significant proportion of the transcriptome, we hypothesize that there may be a mechanism for gene regulation that involves both processes. In the present study, we performed a systemic search in the model plant Arabidopsis thaliana using annotated gene models as well as publically available high-throughput RNA sequencing data with a total of 570 million reads. Of the 354 high-confidence miRNA binding sites identified in Arabidopsis, at least 44 (12.4%) were affected by alternative splicing such that mRNA isoforms of the same miRNA target gene differ in the sequences encoding the miRNA binding sites. By simulation, we found that the frequency of alternative splicing at miRNA binding sites is significantly higher than at other regions. Comparative and functional analyses further indicated that the alternative splicing events are important for target gene expression and miRNA action. Together our results show that alternative splicing of miRNA binding sites is a plausible mechanism for attenuating miRNA-mediated gene regulation.

Introduction

Temporal and spatial control of transcript abundance for expressed genes is crucial for many biological processes and developmental programs. In eukaryotes, gene expression regulation occurs at both the transcriptional and post-transcriptional levels. Concurrent with transcription, the primary transcripts are recognized and processed by the spliceosome to splice out introns and join exons. Alternative processing of the primary transcripts by use of different splice junctions can produce multiple mRNA isoforms from a single gene. Alternative splicing is increasingly being recognized as a major cellular mechanism generating transcriptome plasticity and proteome diversity in plants (Reddy, 2007). For example, 20–30% of transcripts were found to be alternatively spliced in both Arabidopsis thaliana and rice (Oryza sativa) through large-scale EST-genome alignments (Campbell et al., 2006; Wang and Brendel, 2006). Recently, deep sampling of the transcriptome using high-throughput RNA sequencing (RNA-Seq) has indicated that at least 40% of intron-containing genes in Arabidopsis are alternatively spliced (Filichkin et al., 2010).

At the post-transcriptional level, miRNAs are emerging as an important class of sequence-specific, trans-acting endogenous small RNA molecules that modulate gene expression (Bartel, 2009; Voinnet, 2009). The 20–24 nt mature miRNAs are processed from much longer primary transcripts via stem-loop-structured intermediates (Bartel, 2009). In plants, miRNA processing is performed in the nucleus, mainly by the endonuclease Dicer-like 1 (DCL1) (Papp et al., 2003). The mature miRNA is exported to the cytoplasm and integrated into the RNA-induced silencing complex, where it is used as a guide to recognize mRNA targets through base pairing with miRNA binding sites (MBSs) (Bartel, 2009). Interaction between the miRNA and its targets leads to repression of the target genes through cleavage (Llave et al., 2002; Reinhart et al., 2002) and translational inhibition (Brodersen et al., 2008) of the target transcripts, or miRNA-directed DNA methylation at the target loci (Wu et al., 2010). Currently, hundreds of miRNAs have been annotated in a broad spectrum of plant lineages (Kozomara and Griffiths-Jones, 2011), and dozens have been studied experimentally. It is well-established that many miRNAs are crucial for diverse plant development processes and responses to environmental challenges (Jones-Rhoades et al., 2006; Garcia, 2008; Voinnet, 2009).

Recent studies indicate that lineage-specific miRNAs have continuously emerged in evolution (Rajagopalan et al., 2006; Fahlgren et al., 2007; Molnar et al., 2007; Zhu, 2008; Yang et al., 2011). The new miRNAs, once incorporated into the gene regulatory networks through formation of new MBSs, are thought to generate new gene circuits that expand the scope of miRNA-mediated cellular processes (Rajagopalan et al., 2006; Fahlgren et al., 2007). The many miRNAs that are highly conserved across plant lineages (Axtell and Bowman, 2008) usually regulate homologous targets at identical MBSs in every species in which they are found. The MBSs have also been shown to be subject to strong purifying selection (Guo et al., 2008b). Based on these observations, it has been argued that genetic changes resulting in beneficial miRNA–target interactions are maintained while non-productive or deleterious changes continue to drift or are purged (Chen and Rajewsky, 2006; Axtell and Bowman, 2008).

We are keen to understand how the seemingly rigid miRNA–target interaction that exists when the miRNA is fully incorporated into the regulatory gene networks continues to evolve to allow fine-tuning of gene activity and adaptation. In the context of multi-layer gene regulation, it is worth noting that MBSs are present in mRNAs and must be transcribed and processed to be functional. Alternative processing of pre-mRNAs can conceivably eliminate or create functional MBSs in different mRNA isoforms of the same gene. Such alternative splicing events may provide a mechanism to bypass the strict constraint at MBSs but at the same time maintaining the interaction between miRNAs and their targets. Thus, global identification and analysis of alternative splicing events associated with MBSs is highly desirable to further understand the biological significance of this form of transcriptomic diversity and the intrinsic complexity of miRNA–target interactions in plants.

In the present study, we performed a series of analyses in the model plant Arabidopsis thaliana to assess alternatively spliced MBSs as a mechanism for regulating gene expression. By genome-wide examination, we identified 44 high-confidence alternative splicing events from annotated gene models and RNA-Seq data that produce mRNA isoforms of the same miRNA target gene that differ in the sequences required for miRNA binding. Comparative and functional studies indicate that these events are important for target gene expression and miRNA function. Together, our results reveal that alternatively spliced MBSs represent a plausible and prevalent mechanism for regulating miRNA–target gene circuits in plants.

Results

Compilation of high-confidence MBSs

To perform a genome-wide survey of alternative splicing events that affect miRNA-based gene regulation in Arabidopsis, we aimed to compile a comprehensive list of valid MBSs. Because plant miRNAs have near-perfect complementarity to their binding sites within the target mRNAs (Schwab et al., 2005; Bartel, 2009), they typically induce target cleavage and produce relatively stable RNA intermediates with a 5′ phosphate (Llave et al., 2002). Cloning these cleavage intermediates based on the 5′ RACE technique has been widely used to validate miRNA targeting and to identify the corresponding MBSs in plants. We first obtained 109 individually validated MBSs from the Arabidopsis Information Resource (TAIR). We next utilized 510 putative MBSs identified from high-throughput sequencing of the 5′ ends of polyadenylated RNAs that are predicted to be compatible with miRNA-mediated mRNA decay (German et al., 2008). Using a recent algorithm specifically designed for predicting plant miRNA targets (Alves et al., 2009), we filtered out MBSs that are not capable of forming stable RNA duplexes with known miRNAs, and obtained 349 high-confidence MBSs. By combining the two sets, we identified a total of 354 non-redundant MBSs for downstream analyses (Figure 1a and Table S1).

Figure 1.

 Identification of alternatively spliced MBSs in the model plant Arabidopsis.
(a) Numbers of experimentally supported MBSs identified in the Arabidopsis genome (all MBSs) and alternatively spliced MBSs.
(b) Models for identifying alternatively spliced MBSs. Based on mRNA to genome alignment, MBSs are differentially spliced in model I by use of tandem splice donor or acceptor sites, in model II by retention or removal of an intron, in model III by differential initiation or termination, and in model IV by an exon cassette or mutually excluding exons.

Identification of alternative splicing events at MBSs

Conceptually, there are four possible models for alternative splicing to create a functional MBS in one mRNA isoform but not the other (Figure 1b). In model I, this is achieved by use of tandem splice donor or acceptor sites near the MBS; in model II, it is achieved by retention or removal of an intron either encompassing or encompassed in the MBS; in model III, it is achieved by differential initiation or termination; in model IV, it is achieved by an exon cassette or mutually excluding exons (Figure 1b). Based on these models, we performed a systemic search using two sets of data to identify alternative splicing events relevant to the 354 MBSs of Arabidopsis. The first dataset contains 5885 alternatively spliced genes obtained from TAIR release 10 that produce 13 594 gene models. Mapping both the MBSs and the gene models to the genome revealed alternative splicing events that affect 14 MBSs. The second dataset involves publically available high-throughput RNA-Seq data generated from 12 libraries with over 570 million reads (Table S2). Utilizing this dataset, we identified 30 additional MBSs affected by alternative splicing, bringing the total number of cases to 44 (Figure 1a). The corresponding miRNAs include members of the most conserved families (e.g. miR156) and those have so far only been found in Arabidopsis (e.g. miR864) (Table S1). Thus, the data indicate that alternatively spliced MBSs are an experimentally supported phenomenon affecting a significant portion (12.4%) of the known miRNA target genes in Arabidopsis.

We use the DCL1 gene (At1g01040) as an example to illustrate the effect of pre-mRNA splicing on a functional MBS. The annotated DCL1 gene model (DCL1-1) contains 20 exons and possesses a functional binding site for miR162 that is split across exons 12 and 13 (Figure 2a,b) (Xie et al., 2003). Analysis of RNA-Seq data revealed that intron 12 is spliced in multiple ways (Figure 2a). Removal of the 84 nt intron generates DCL1-1 (Figure 2b). The presence of specific junction reads revealed another isoform (DCL1-2) that is derived from utilizing a wobble splice donor site within intron 12 and that differs from DCL1-1 by a GUU trinucleotide (Figure 2a–c). The sequencing reads also indicate retention of the entire intron, resulting in DCL1-3 (Figure 2b). Additionally, we performed RT-PCR to clone a portion of the DCL1 transcript (Figure 2c). By sequencing 19 independent clones (Figure 2d), we discovered a shorter DCL1 isoform that is derived from an in-frame cryptic splice acceptor site located within exon 13, 69 nt downstream of the known splice site of intron 12 (Figure 2b). Inspection of all four DCL1 gene models indicated that only DCL1-1 forms an mRNA:miRNA duplex that is compatible with miR162 targeting (Figure 2e). In previous studies (Xie et al., 2003; German et al., 2008), the 5′ ends of cleaved DCL1 mRNA resulting from miRNA-guided slicing all mapped to DCL1-1 (Figure 2e). Together, these results indicate that pre-mRNA processing generates sequence polymorphisms around the miR162 binding site in DCL1 such that only DCL1-1 possesses a functional MBS.

Figure 2.

 Alternative splicing of the miR162 binding site in DCL1.
(a) RNA-Seq reads mapped to the genomic region surrounding the MBS. Chromosome coordinates are shown above. Reads mapped to a single locus and those split into two loci are shown as gray and black horizontal bars, respectively. The copy number of reads retrieved more than once is indicated.
(b) Proposed splicing models of intron 12 in the DCL1 pre-mRNA. Exon and intron sequences of the annotated DCL1 gene model are shown in upper- and lower-case letters, respectively. The MBS is shown in bold upper-case letters. The two splice donor sites, which are separated by 3 nt, and the two acceptor sites, which are separated by 69 nt, are indicated by arrows.
(c) Partial gene structure of the four identified DCL1 isoforms. Exons and introns are shown as boxes and horizontal lines, respectively. The positions of the primers used for obtaining amplicons flanking the MBS are indicated by the horizontal arrows.
(d) Comparison of the results from RNA-Seq and amplicon sequencing. DCL1 amplicons were obtained by RT-PCR analysis of seedlings using the primers shown in (c).
(e) Predicted RNA duplex between miR162 and the four DCL1 isoforms. Base pairing is indicated by a short vertical line and G:U wobble pairing by a circle. Mapped 5′ ends of DCL1 mRNA resulting from miR162-guided cleavage are indicated by vertical arrows. Numbers are taken from Xie et al. (2003) and German et al. (2008).

Alternative splicing of the MBS in non-coding TAS genes

Most plant MBSs are located in the open reading frame (ORF), which creates a constraint on splicing. To evaluate the effect of alternative splicing of MBSs without the constraint of an ORF, we examined the TAS genes, which produce non-coding transcripts. Eight TAS genes have been identified in Arabidopsis that each produce a series of trans-acting siRNAs after miRNA-directed cleavage of the primary transcript (Allen et al., 2005; Chen et al., 2007). RNA-Seq data indicate that intron removal results in exclusion of MBSs for TAS1A and TAS2, indicating that MBSs in non-coding genes are also subject to regulation by alternative splicing.

TAS1A (At2g27400) encodes a 930 nt transcript (TAS1A-1) that contains a functional binding site for miR173 (Yoshikawa et al., 2005; Montgomery et al., 2008). RNA-Seq data conclusively show that this gene is subject to alternative splicing (Figure 3a). Based on specific junction reads, we propose that 572 and 594 nt introns can be spliced out from TAS1A-1, resulting in TAS1A-2 and TAS1A-3, respectively (Figure 3b). Neither TAS1A-2 or TAS1A-3 contain the MBS (Figure 3c). Indeed, scanning the sequences of all three TAS1A isoforms revealed no plausible binding site for miR173 except the original MBS in TAS1A-1 (Figure 3d). In contrast to the protein-coding gene DCL1 (Figure 2d), RT-PCR and sequence analysis revealed that TAS1A-2, an isoform without an MBS, is predominant in all organ types examined (Figure 3e). Similar results were also obtained when TAS2 was analyzed (Figure S1). These results indicate that, in the absence of an ORF, alternative splicing is a major determinant of the relative abundance of transcripts competent for miRNA-based regulation.

Figure 3.

 Alternative splicing of the miR173 binding site in TAS1A.
(a) RNA-Seq reads mapped to the genomic region surrounding the MBS. Chromosome coordinates are shown above. Reads mapped to a single locus and those split into two loci are shown as gray and black horizontal bars, respectively. The copy number of reads retrieved more than once is indicated.
(b) Proposed splicing model of TAS1A. The MBS is shown in upper-case letters. The optional splice donor site and the two acceptor sites are indicated by arrows.
(c) Gene structure of the three identified TAS1A isoforms. Exons and introns are shown as boxes and horizontal lines, respectively. The positions of the primers used for obtaining amplicons flanking the MBS are indicated by the horizontal arrows.
(d) Predicted RNA duplex of miR173 and TAS1A-1. Base pairing is indicated by a short vertical line and G:U wobble pairing by a circle.
(e) Developmental expression profiles of TAS1A-1 and TAS1A-2 for which amplicons were obtained by RT-PCR analysis using primers indicated in (c). Sequencing indicates that TAS1A-3 was not amplified. Actin7 was used as the loading control.

Frequency of alternative splicing increases at MBSs

We reasoned that increased frequency of alternative splicing at MBSs, relative to other transcribed regions of the genome, is indicative of selection and hence the functional importance of this form of gene regulation. To test this, we first randomly extracted 354 mRNA segments (the same number as the number of MBSs) that were 21 nt long (a typical length for MBSs), and identified splicing events that coincide with these sequences. For this analysis, we did not consider intron retention for the RNA-Seq data (corresponding to three cases of alternatively spliced MBSs) because of potential ambiguity (Figure S2). After repeating the process 1000 times, we found that the simulated frequency of splicing events affecting the random RNA segments forms a normal distribution, with mean and standard deviation of 27.1 and 3.96, respectively. The observed number of alternatively spliced MBSs (41) is significantly (< 0.001) greater than the mean of the simulated data (Figure 4a).

Figure 4.

 Frequency of alternative splicing increases near MBSs.
(a) Simulation of the number of alternative splicing events affecting random 21 nt mRNA segments. Gray bars represent the simulated data comprising 1000 subsets of 354 randomly selected mRNA segments. The x axis shows the number of mRNA segments affected by alternative splicing and the y axis shows the frequency of observing such a number. The solid black line is the best fitted curve following a normal distribution. The arrow indicates the observed number of alternatively spliced MBSs, which is greater than the simulated mean by 3.46 times the standard deviation (< 0.001).
(b) Trend line of the frequency of alternative splicing in sequences flanking MBSs. After aligning all MBSs, the frequency of alternative splicing in the surrounding genomic regions was calculated using a sliding-window approach. The x axis indicates the position of a given window relative to the MBS, and the y axis indicates the observed frequency of alternative splicing for that window.
(c) Trend line of the frequency of potential splice sites in sequences flanking MBSs. After aligning all MBSs, the frequency of GT and AG was calculated for the upstream and downstream regions of MBS, respectively, with varying distance from the MBSs. The x axis indicates the position of the flanking sequences, and the y axis indicates the observed frequency of the dinucleotides.

To rule out the possibility that the elevated alternative splicing frequency at MBSs is caused by an unusually high background of alternative splicing of miRNA target genes, we compared MBSs with flanking mRNA regions. After aligning all MBSs, we used a sliding-window approach to establish a trend line of the frequency of alternative splicing in the regions surrounding MBSs. This analysis clearly shows a decrease of splicing frequency in the regions surrounding the MBSs (Figure 4b), indicating that the observed high frequency of alternative splicing at MBSs is specific to the binding sites. We further analyzed the occurrence of the dinucleotides GT (the almost invariant sequence of splice donor sites) and AG (the almost invariant sequence of splice acceptor sites) in the regions upstream and downstream of MBSs, respectively. This analysis indicated that the frequency of the potential splice sites tracks that of observed splicing events very well (Figure 4c). Together, these results demonstrate an elevated splicing frequency at MBSs in Arabidopsis that correlates with the occurrence of potential splice sites immediately adjacent to the MBSs.

The increased frequency of alternative splicing at the MBSs suggests that splicing events may not be conserved in different species, due to divergence of sequences outside the MBSs. To test this prediction, we obtained and compared DCL1 sequence from 21 plant species. Although exon 13 is highly conserved, the in-frame alternative acceptor site is conserved only in the Brassicaceae and Rosaceae, but not any other families (Figure S3). By contrast, the wobble donor site in intron 12 is only conserved in the Brassicaceae and Fabaceae (Figure S3). The lengths of intron 12 in all other species, unlike that in Arabidopsis (84 nt), are not multiples of three (data not shown). It is unlikely these introns will be retained because of frame shifts. Together, these results indicate that specific alternative splicing events at MBSs may have a phylogenetic distribution that is more limited than the MBSs themselves.

Functional implication of an alternatively spliced MBS

The miR156-regulated SPL4 gene encodes a transcription factor that is a member of the SPL (squamosa-promoter binding protein-like) gene family (Guo et al., 2008a). The function of this gene circuit in regulating vegetative to reproductive development has been well characterized in Arabidopsis (Wu and Poethig, 2006; Wang et al., 2009; Wu et al., 2009). We therefore selected SPL4 as an example to demonstrate the role of alternatively spliced MBSs in gene expression and function. There are two annotated gene models from the SPL4 locus. At1g53160.1 (SPL4-1) has two exons and contains an experimentally confirmed miR156 binding site within its 3′ UTR (Figure 5a,b) (Wu and Poethig, 2006). Another annotated gene model At1g53160.2 (SPL4-3) contains three exons and differs from SPL4-1 immediately downstream of the stop codon. The two mRNA isoforms encode identical proteins but the MBS is spliced out in SPL4-3 (Figure 5a,b). When sequencing the PCR amplicons derived from SPL4 mRNAs, we found a new isoform in which an upstream donor site for the last intron is used. Not only does this splicing event eliminate the MBS, but it alters the N-terminal portion (last 18 amino acids) of the protein (Figure 5a,b). We have named the new gene model represented by this mRNA isoform SPL4-2. We then scanned the three gene models for possible miR156 binding sites and found only the original one in SPL4-1 (Figure 5c). Thus, alternative splicing produces multiple SPL4 mRNA isoforms with or without the binding site for miR156.

Figure 5.

 Alternative splicing of the miR156 binding site influences SPL4 expression.
(a) Proposed splicing model of SPL4 spanning the MBS, which is shown in upper-case letters. The optional splice donor site and the two acceptor sites are indicated by arrows.
(b) Gene structure of the three identified SPL4 isoforms. Exons and introns are shown by boxes and horizontal lines, respectively. Untranslated regions, coding regions and the miR156 binding site are shaded as indicated. The positions of the primers used for obtaining amplicons flanking the MBS are indicated by the horizontal arrows.
(c) Proposed miRNA:mRNA hybrid between miR156 and SPL4-1.
(d) RT-PCR analysis of the developmental expression profiles of SPL4-1, SPL4-2 and SPL4-3. Amplicons were obtained using primers indicated in (b). Actin7 was used as the loading control.
(e) Northern blot analysis of the expression profiles of miR156. U6 snRNA was used as the loading control.
(f) RT-PCR analysis of the relative expression levels of SPL4 isoforms in seedlings and rosette leaves of adult plants. WT, wild-type plants. 35S::miR156, transgenic plants in which expression of miR156 is driven by the constitutive CaMV 35S promoter.

To examine the influence of alternative splicing on the regulation of SPL4, we compared the expression pattern of the three mRNA isoforms with that of miR156 in various organ types. RT-PCR analysis revealed the presence of three isoform-specific products in the seven examined organ types, albeit at varying abundance. SPL4-1 was the predominant isoform, with a higher expression level than SPL4-2 and SPL4-3 in essentially all examined samples (Figure 5d). Consistent with previous reports (Wu and Poethig, 2006), the expression level of SPL4-1 is highest in adult tissues (e.g. rosette and cauline leaf) and lowest in seedling and root. Such a pattern is clearly opposite to that of miR156, whose expression is high in seedling and root and low in adult leaves (Figure 5e). The only exception is the silique, in which miR156 is not expressed. By contrast, the other isoforms (especially SPL4-3) do not exhibit such an opposite expression pattern to miR156. We further compared wild-type and transgenic plants in which miR156 expression is driven by the constitutive CaMV 35S promoter. In the 35S::miR156 plants, the miR156 expression level is elevated in rosette leaves (Wu and Poethig, 2006). However, only SPL4-1 expression is down-regulated, not that of SPL4-2 or SPL4-3 (Figure 5f). These results indicate that alternative spliced SPL4 transcripts with or without the miR156 binding site are regulated differently by miR156. We therefore conclude that alternative splicing at the MBS constitutes a mechanism for controlling the development-specific expression levels of SPL4.

SPL4 and its paralog SPL3 both contain a functional miR156 binding site (Figure S4a,b) (Wu and Poethig, 2006). However, only SPL4, but not SPL3, has been found to be capable of producing alternative mRNAs that differ in the miR156 binding site. We reasoned that if alternative splicing is important for the regulation of SPL4, its expression pattern should deviate further from that of SPL3 under conditions where miR156 is expressed. To test this hypothesis, we reconstructed the expression profiles of SPL3 and SPL4 during the vegetative to reproductive phase based on published microarray data (Balasubramanian et al., 2006). Because miR156 level decreases in this phase (Wu and Poethig, 2006; Wang et al., 2009; Wu et al., 2009), the steady-state levels of SPL3 and SPL4 both increase (Figure S4c). Importantly, it has been shown that expression of miR156 exhibits circadian regulation (Hazen et al., 2009). However, only SPL3, but not SPL4, shows a moderate clock-dependent expression pattern (Figure S4c), indicating that the endogenous SPL4 isoforms without the miR156 binding site contribute to the distinctive expression pattern of SPL4.

Because of genetic redundancy among the SPL genes, a loss-of-function mutation in SPL4 does not result in obvious development defects (Wu and Poethig, 2006). We therefore chose to demonstrate the importance of the alternative transcripts by a gain-of-function approach. We created transgenic lines over-expressing the three SPL4 isoforms driven by the CaMV 35S promoter (Figure 6a) and assayed their development phenotypes (Figure 6b). Over-expressing SPL4-1 has no significant effect on timing of flowering compared to wild-type (Figure 6b,c), but there is a slight reduction in the number of adult leaves (leaves with abaxial trichome) at flowering (Figure 6d). By contrast, the 35S::SPL4-2 and 35S::SPL4-3 plants whose SPL4 transcripts lack the MBS show an accelerated rate of flowering induction (Figure 6b,c) and significantly fewer adult leaves at bolting (Figure 6d). Thus, constitutive expression of SPL4 transcripts with or without the miR156 binding site resulted in distinguishable phenotypes.

Figure 6.

 Functional analysis of the alternatively spliced miR156 binding site in SPL4.
(a) RT-PCR analysis of the relative transcript abundance of the three SPL4 isoforms in wild-type (WT) and transgenic plants over-expressing individual SPL4 isoforms.
(b) Morphology of wild-type (WT) and the transgenic plants at the bolting stage.
(c, d) Flowering time (days after planting) (c) and number of leaves with abaxial trichomes at bolting (d) in wild-type (WT) and transgenic plants expressing SPL4 with (35S::SPL4-1) or without the miR156 binding site (35S::SPL4-2 and 35S::SPL4-3). Error bars represent standard deviation (= 12).

Discussion

As trans-acting gene regulators, miRNAs function by binding to the complementary MBS in their target mRNAs (Bartel, 2009). MBSs are present in mRNA molecules and are transcribed from DNA. In the present study, we show that mRNA splicing in the model plant Arabidopsis can generate transcript isoforms that differ in the MBS. This process affects a substantial proportion of MBSs (Figure 1a and Table S1), and involves all major types of alternative RNA processing (Figure 1b) (Reddy, 2007). Two observations led us to conclude that the reported cases of alternatively spliced MBSs are an under-estimation. First, the RNA-Seq data used in the current study were derived from limited tissue types and physiological conditions (Table S2), such that genes with low expression or rare isoforms may not be sufficiently covered (e.g. Figure S2). Second, different DCL1 isoforms were recovered by direct amplicon sequencing and whole-genome RNA-Seq (Figure 2d), indicating that neither analysis is exhaustive. miRNAs and miRNA target genes are increasingly being discovered in plants, and alternative splicing of mRNA sequences encoding functional MBSs has the potential to significantly enhance the regulatory complexity of miRNA-mediated gene networks.

A series of intriguing questions arose following the initial discovery. First, is alternatively spliced MBS a regulated event or simply a coincidence between two separate molecular processes? MBSs are parts of mRNA molecules, so it is perhaps not surprising that some are alternatively spliced by chance. However, our results show that the frequency of alternative splicing events encompassing an MBS is significantly higher than for other portions of the transcriptome (Figure 4a). Within the miRNA target genes, the frequency of alternative splicing specifically increases at the MBS (Figure 4b). Importantly, we show that the increased frequency of alternative splicing tracks the occurrence of potential splice sites surrounding the MBS (Figure 4c). These results collectively suggest a scenario in which the presence of an MBS causes changes to local sequences, increasing the possibility of alternative mRNA splicing. Future experiments aimed at detecting the biochemical interactions between the nucleus-located RNA-induced silencing complex and components of the spliceosome (Bayne et al., 2008; Ohrt et al., 2008) should help to fully elucidate this potential new genetic mechanism.

Second, will alternatively spliced MBSs influence regulation of the corresponding miRNA target genes? Because plant MBSs are often located in the coding regions, it is important to distinguish the impact on the encoded proteins from the MBSs when the influence of alternative splicing to the miRNA target genes is concerned. This was one of the reasons that we chose SPL4, for which two mRNA isoforms (SPL4-1 and SPL4-3) encode an identical protein but differ with respect to the presence or absence of an miR156 binding site (Figure 5). We found that alternatively spliced SPL4 transcripts with or without the miR156 binding site are regulated differently by miR156 at the post-transcriptional level in a development-specific manner (Figure 5 and Figure S4). We also found that increasing the expression level of the SPL4 transcripts with or without the miR156 binding site by transgenic means resulted in distinguishable phenotypes (Figure 6). Together with the finding that at least two TAS genes are alternatively spliced at the MBS (Figure 2 and Figure S1), our results indicate that alternative splicing of MBSs constitutes a mechanism to specifically attenuate miRNA-based regulation of target genes.

Third, what is the influence of alternatively spliced MBSs on the function of miRNAs? In addition to birth and death of miRNA genes, MBSs may conceivably serve as an important determinant for the evolutionary dynamics of miRNA–target interactions. On the one hand, the short length of MBSs may have given them the opportunity to mutate and change specificity rapidly during evolution. On the other hand, many MBSs are deeply conserved across plant lineages (Axtell and Bowman, 2008), suggesting that they are under strong purifying selection. An investigation of paralogous gene families targeted by miRNAs in rice supports this view (Guo et al., 2008a). Conserved MBSs in human are also under stronger purifying selection than surrounding sequences (Chen and Rajewsky, 2006). Compared to mutation in either the MBS or the miRNA, alternatively processing of MBS at the mRNA level provides a more flexible mechanism by which an miRNA target gene may attenuate rigid miRNA regulation but at the same time maintain its competence for such regulation. The additional layer of regulation conferred by alternative splicing may link spliceosome activity to the regulation of certain miRNA–target interactions. A predicted advantage of this mechanism is that coordinated regulation by miRNAs and splicing of the target genes could allow plants to integrate multiple input signals to fine-tune target transcript abundance for precise and timely execution of developmental programs.

Finally, what is the implication of alternative splicing in the evolution of miRNA–target gene circuits? Given that many plant MBSs are located in the coding regions, selection at MBSs should have substantially influenced the evolution of miRNA target genes and hence the biological processes involving these genes. Our work shows that one such influence is increased occurrence of mRNA splicing sites near MBSs to generate transcript isoforms that bypass miRNA regulation (Figure 4). Consistent with reports from cross-species EST alignments (Wang et al., 2008), we show that the alternative splicing events affecting MBSs may be family- or species-specific (Figure S3). This means that mRNA processing could increase divergence in the miRNA-mediated gene networks beyond the presence of new miRNA genes across plant lineages. Our finding further implies that genetic changes in other parts the gene where the selective constraints may be weaker (e.g. synonymous mutations that create/eliminate functional splice sites) may be used to modulate selected MBSs. The many possible ways to affect MBSs by alternative splicing (e.g. in-frame cryptic splice sites) suggest that this mechanism may constitute a means to differentially regulate orthologous genes that underlie numerous developmental processes. The divergence brought about by this mechanism may conceivably provide a genetic basis for the extensive physiological and ecological diversity among plants.

Taken together, our results indicate that alternative splicing of MBSs is a plausible and prevalent mechanism for regulating gene expression in Arabidopsis. Further identification of alternatively spliced MBSs in other plants is therefore important to fully elucidate the biological significance and intrinsic complexity of this form of gene regulation. Such studies will provide new insights into the gene networks that integrate both transcriptional and post-transcriptional regulation. Perhaps more importantly, these efforts may create opportunities to integrate different genetic components both within the genome (e.g. mRNA processing and miRNA function) and within the genes (MBSs and sequence changes that create/eliminate splice sites) to provide holistic view of gene expression programs that underpin development and responses to environmental challenges in diverse plant species.

Experimental procedures

Data sources

Annotated whole-genome, cDNA, intron sequences and other gene model features of Arabidopsis thaliana used in this study correspond to release 10 from TAIR and were downloaded from ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release/. The 12 RNA-Seq libraries used to verify annotated alternative gene models and to detect new splicing events were downloaded from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) (numbers GSM613465, GSM613466, GSM284751, GSM623879, GSM623880, GSM607726, GSM607724, GSM607723, GSM607725, GSM607727, GSM607728 and GSM607729) (Table S2). 5′ RACE-validated miRNA targets (set I) were obtained from the Arabidopsis Information Resource at ftp://ftp.arabidopsis.org/home/tair/Genes/SmallRNAsCarrington/. Putative miRNA targets identified by high-throughput sequencing of 5′ RACE products (set II) (German et al., 2008) were obtained from the Arabidopsis Next-Gen sequence databases (http://mpss.udel.edu/at_pare/). The miRNA dataset for Arabidopsis was downloaded from miRBase release 16.0 (http://www.mirbase.org/). A BLAST search for homologous DCL1 sequences was performed using the nucleotide collection, EST and genomic survey sequences databases of the National Center for Biotechnology Information (http://blast.ncbi.nlm.nih.gov/Blast.cgi).

MBS filtration and compilation

Set II miRNA target genes were assessed as previously described (Alves et al., 2009) to identify high-confidence MBSs. Briefly, the program RNAhybrid (Rehmsmeier et al., 2004; Kruger and Rehmsmeier, 2006) was used to predict energetically plausible miRNA:mRNA duplexes with plant-specific constraints: (i) perfect base pairing of the miRNA:mRNA duplex from nucleotides 8–12 counting from the 5′ end of the miRNA, (ii) loops and bulges no longer than one nucleotide long in either strand, and (iii) end overhangs no longer than two nucleotides in size. In addition, G:U wobble base pairs were not treated as mismatches, but were considered to contribute less favorably to the overall free energy. The putative MBSs were further filtered using the ratio between the minimum free energy of the identified miRNA:mRNA duplexes and the minimum duplex energy of the miRNA when bound by perfectly matched targets as calculated in RNAhybrid. A cut-off value for minimum free energy/minimum duplex energy of 0.70 was used (Alves et al., 2009). Of the 510 set II MBSs, 349 were retained after the filtering steps. These MBSs were combined with the 109 set I MBSs to create a non-redundant collection of 354 high-confidence MBSs.

Identification of new splicing events from RNA-Seq data

The RNA-Seq data utilized in this study include 12 independent libraries with more than 570 million trimmed and filtered reads (Table S2). To identify new junction reads for the annotated gene models, the spliced read mapping tool TopHat, which is a read-mapping algorithm designed to align RNA-Seq reads to a reference genome without relying on known splice sites (Trapnell et al., 2009), was used. Using the built-in short-read alignment tool Bowtie (Langmead et al., 2009) in TopHat, more than 350 million reads were mapped to the Arabidopsis genome, yielding an mean coverage depth of 105 reads per base for each DNA strand. Five parameters were set to facilitate the mapping of reads to annotated Arabidopsis gene models: (i) maximal intron length of 2000 bases, as the vast majority of introns (99.5%) of annotated Arabidopsis gene models are smaller than 1200 bases; (ii) a maximum of two mismatches in read alignments; (iii) no mismatch in spliced alignments; (iv) a minimum of eight matched bases for a junction read at either end; (v) use of the classic splicing motifs GT-AG and GC-AG only. For intron retention, we only considered introns that were completely covered by mapped sequencing reads.

Comparison of alternative splicing frequency at MBSs and other genome regions

To compare MBSs with other transcribed regions, a pool of 50 000 21 nt sequences was randomly generated from all gene models in TAIR release 10, excluding pseudogenes. These short sequences were searched against all annotated gene models and the RNA-Seq data to identify those that are affected by alternative splicing. A subset of 354 short sequences was then randomly selected from the pool, and the number of sequences affected by alternative splicing was calculated. After repeating the process 1000 times, a histogram of the frequency of alternatively spliced sequences was determined. A curve-fitting approach was used to determine the best-fitted normal distribution curve according to the equation:

image

where μ, the mean value, was determined to be 27.1, and σ, the standard deviation, was determined to be 3.96.

To compare MBSs to other portions of the miRNA target transcripts, the transcript sequences were stacked and aligned at the MBS. Starting from the MBS, sliding windows (window size = 21 nt, step = 10 nt) were applied in both directions. In each window, the number of sequences affected by alternative splicing was obtained as percentage of the total sequences in that window. In each window, occurrence of the dinucleotides GT and AG was also recorded. The numbers were then normalized against the window at a symmetric position on the opposite side of the MBS.

Plant materials

The Arabidopsis plants used in this study were in the Columbia background. Seeds were grown on MS medium (Sigma-Aldrich, http://www.sigmaaldrich.com) and incubated at 4°C in the dark for 4 days, after which they were exposed to continuous white light (approximately 150 μmol m−2 sec−1) at 22°C. Seedlings were transferred to soil and maintained under continuous light until collection of plant organs at various development stages as indicated. Plant age was measured from the time seeds were exposed to light. Flowering time represents the time takes for the first open flower to appear. To generate transgenic plants over-expressing SPL4 isoforms, SPL4-1, SPL4-2 and SPL4-3 cDNA was PCR-amplified using Phusion DNA polymerase (New England Biolabs, http://www.neb.com) using the primers listed in Table S3. PCR products were cloned into the 35S-pKANNIBAL vector between the XhoI and SpeI restrict sites and sequenced. Plants were transformed with the sequence-confirmed constructs using the standard floral dipping method (Clough and Bent, 1998). Transformants were selected on MS medium containing 20 mg L−1 BASTA (bioWORLD, http://www.bio-world.com). T3 generation plants homozygous for individual transgenes were identified by PCR analysis of genomic DNA and used for all experiments.

RNA analyses

Total RNA was extracted using TRIzol reagent (Invitrogen, http://www.invitrogen.com/) and treated with RNase-free DNase I. For RT-PCR analysis, total RNA was reverse-transcribed using SuperScript II reverse transcriptase (Invitrogen). The resultant cDNA was PCR-amplified using Phusion DNA polymerase. PCR products were cloned into the pCR4Blunt-TOPO vector (Invitrogen) and sequenced. Quantitative PCR analysis of the reversed-transcribed cDNA was performed using the ABI 7500 system and Power SYBR Green PCR master mix (Applied Biosystems, http://www.appliedbiosystems.com/). Actin7 was used as an endogenous control to normalize the relative expression level. At least three independent experiments were performed for each amplicon, and the data were analyzed using the ABI 7500 system SDS software (Applied Biosystems). Primer sequences are listed in Table S3. Low-molecular-weight RNA blotting was performed as previously described (Yang et al., 2011; Zhang et al., 2011). Probe sequences are listed in Table S3.

Acknowledgements

We thank Dr R. Scott Poethig for kindly providing the 35S::miR156 seed. We also thank Yan Hu and Robert Arthur for technical assistance. This work was supported by a grant from the US National Science Foundation Plant Genome Program (DBI-0922526) to L.L.

Ancillary