Alternative splicing (AS) generates multiple types of mRNA from a single type of pre-mRNA by differential intron splicing. It can result in new protein isoforms or down-regulation of gene expression by transcript decay. The evolutionary conservation of AS events in plants is largely unexplored and only a small number of AS events have been identified as conserved between divergent species.
We performed a large-scale analysis of cDNA data from Brassica and Arabidopsis to identify and further characterize conserved AS events.
We identified 537 conserved AS events in 485 genes. Alternative donor and acceptor events are significantly overrepresented among conserved events, whereas intron retention and exon skipping events are underrepresented. Conserved AS events are significantly shorter, less likely to be in the 3′UTR, and they are enriched for genes whose products function in the chloroplast. AS modified a functional domain for about half of the genes with conserved events. We further characterized three genes with conserved AS events.
This study identifies many AS events that are conserved between Brassica and Arabidopsis, revealing features of conserved AS events. Many of the conserved AS events may have important, but uncharacterized, functions.
Alternative splicing (AS) is a cellular process producing multiple types of mRNA isoforms from the same type of pre-mRNA by differential inclusion or exclusion of intronic or exonic sequences. AS has been inferred to affect over 50% of genes in Oryza sativa and Arabidopsis thaliana; exact percentages depend on whether all genes or just intron-containing genes are counted (Lu et al., 2010; Marquez et al., 2012). By contrast, AS is more prevalent in mammalian genes, with over 95% having AS (Pan et al., 2008; Nilsen & Graveley, 2010). There are several types of AS, including exon skipping (ES), where an exon is excluded from the mature mRNA, intron retention (IR), in which a complete intron remains in the transcripts, and AS at the 5′ end of an intron (alternative donor (AltD)) or the 3′ end (alternative acceptor (AltA)), or at both ends (alternative position (AltP); reviewed in Reddy, 2007; Barbazuk et al., 2008). AS plays a role in the regulation of gene expression by degradation of some transcripts by the nonsense-mediated decay (NMD) pathway (Lewis et al., 2003; Hori & Watanabe, 2005; Arciga-Reyes et al., 2006; Kalyna et al., 2012). AS also plays a role in proteome diversity when AS isoforms are translated into proteins (Kazan, 2003). There are different effects of AS on proteins, such as loss or gain of function, altered subcellular localization, protein stability, and enzyme activity (reviewed in Stamm et al., 2005; Reddy, 2007; Barbazuk et al., 2008).
Studies of AS in plants have characterized functions of AS events in certain genes and gene families, for example, in the regulation of alternative splicing by members of the serine/arginine (SR) gene family (Lopato et al., 1996; Isshiki et al., 2006), in development by SR45 (Zhang & Mount, 2009), in gene regulation with a subgroup of MYB transcription factor genes (Li et al., 2006), in disease resistance with the tobacco N gene (Dinesh-Kumar & Baker, 2000), and in photosynthesis with the Rubisco activase (Werneke et al., 1989; Parry et al., 2008), among other functions. However, only a relatively small number of AS events in plants have been functionally characterized. Some AS events may be nonfunctional because they do not produce a functional protein or regulate gene expression.
The evolutionary conservation of AS patterns in plants has mostly been studied from the perspective of identifying conserved events between species, although comparisons between duplicated genes in A. thaliana have also been reported (Zhang et al., 2010). Most of the studies of evolutionary conservation between species consisted of comparing a specific gene or gene family, known to contain AS, between several species. For example, the evolution of AS in the serine/arginine protein family of alternative splicing factors has been extensively studied, revealing several cases of AS event conservation between eudicots and monocots (Iida & Go, 2006; Kalyna et al., 2006). Other examples of cases of AS conservation in a specific gene include two MYB genes that play a role in transcription regulation where conserved AS events were found between Arabidopsis and rice (Li et al., 2006), and the TTL gene where an AS event influencing the protein localization to peroxisomes was conserved between Arabidopsis and rice (Lamberto et al., 2010).
The first study in plants that compared conserved AS between species on a larger scale consisted of comparisons of AS in Arabidopsis and rice using cDNA and expressed sequence tag (EST) sequences (Wang & Brendel, 2006). Forty-one homologous genes have the same type of AS event conserved between Arabidopsis and rice, although not necessarily homologous events in the same position in the genes. In another study, conservation of AS between Arabidopsis, Medicago and rice was analyzed using EST sequences. Only six AS events were found to be conserved between at least two of the three species and only two AS events were conserved between the three species (Baek et al., 2008). In a more recent study comparing AS in orthologous genes between Arabidopsis and rice, and between rice and maize, 18 and 41 conserved AS events (same AS type at the same position) were identified, respectively (Severing et al., 2009). Conserved AS was also studied in legumes (Wang et al., 2008). In this study, the authors used cross-species EST alignments to find 22 completely conserved AS events between two or more legume species.
In this study, we investigated the conservation of AS events between orthologous genes in Brassica and A. thaliana by cross-species comparisons of cDNA sequences. We found a set of 537 conserved AS events, which is the largest number of conserved AS events found to date in angiosperms. We did various analyses of the conserved AS events compared with AS events without evidence for conservation to determine what features are enriched in conserved AS events.
Materials and Methods
Data sets and sequence alignments
Arabidopsis thaliana genes, TAIR gene models and cDNA sequences were downloaded from TAIR10. EST sequences were downloaded from dbEST at the National Center for Biotechnology Information (NCBI) on 8 December 2010 and sorted with an in-house script to extract sequences from A. thaliana and all Brassica species. A summary of sequences used is given in Table 1.
To detect AS events not annotated in TAIR, A. thaliana gene models, cDNAs and ESTs were aligned against Arabidopsis genes using Gmap (Wu & Watanabe, 2005). We kept only best-hit alignments having similarity and coverage scores ≥ 0.95. We included only sequences spanning at least one intron to avoid ESTs that might actually represent DNA contamination.
Brassica ESTs were aligned on A. thaliana genes using Gmap and Sim4 cc (Zhou et al., 2009). As it was a cross-species comparison, we kept only best-hit alignments having similarity and coverage scores ≥ 0.80. We kept ESTs for which the same structure was found by both Gmap and Sim4 cc. Only Brassica ESTs aligned on A. thaliana genes detected as showing AS were retained. Brassica genomes contain large numbers of duplicated genes, many of which arose from ancient genome triplication that is specific to the Brassica lineage after it diverged from the Arabidopsis lineage (Mun et al., 2009; Cheng et al., 2012). Thus, a single gene in Arabidopsis could have one, two, or three orthologs in Brassica (or more if tandem duplications occurred after the triplication). In our analysis, the paralogous genes derived from the triplication should align against the orthologous (unduplicated) gene in Arabidopsis. Brassica rapa genome annotation used was downloaded from BRAD (Cheng et al., 2011).
Detection of AS and conserved AS
To find genes with AS, we made pairwise comparisons of Gmap gene structures for each locus. In order to avoid false AS detection as a result of TAIR gene model predictions, we used only TAIR gene models where the AS was supported by at least one cDNA or EST sequence. AS events were defined as in Wang & Brendel (2006), in six classes when comparing intron and exon structures of two isoforms (Supporting Information, Fig. S1): IR, AltD, AltA, AltP, ES, and mutually exclusive exons (MEs).
Conserved events were assessed by aligning ESTs from both species on the genes in Arabidopsis. We defined two categories of conserved AS events: the conserved position events (CPs) and the conserved junction events (CJs). CPs were defined as the same type of events (IR, etc.), having the same position in the orthologous genes between Brassica and Arabidopsis, that affect the homologous intron at the same exon–intron junctions. These events reflect true evolutionary conservation of AS events. CJs were defined as the same type of events (IR, etc.) affecting the same exon–intron junctions where the AS event is not in exactly the same position. Most analyses focused on the CP events. Other events, found only in A. thaliana, were classified as AS events having no evidence for conservation (NEC). Distributions of conserved and NEC AS were compared using G-tests. G-tests and all other statistical analyses (Wilcoxon tests, Fisher's exact test) used in this study were performed with the R-package (version 2.12.1, http://cran.r-project.org/).
Analysis of AS events: length, open reading frame (ORF) translation, position, and effects
The length of AS was computed when comparing two isoforms. For AltA and AltD, the length corresponded to the number of base pairs that were included in the exon region of one isoform and in the intron region in the other isoform. For ES, the length was the number of base pairs of the exon skipped in one isoform. For IR, the length was the number of base pairs of the intron retained in one isoform. The length of AS events was compared between conserved and NEC events for each AS category using Wilcoxon tests. Lengths of ME and AltP events were not compared because there was only one conserved ME event and the effect of AltP events on length is problematic to define.
Locations of AS events were defined as being in the 5′UTR, coding region, or 3′UTR according to Wang & Brendel (2006). To determine the effects of each AS event on protein sequences, we considered only pairs of isoforms showing only one alternative splicing event. We defined the in silico translated AS isoform as the longest ORF when the AS event was taken into account. An event was defined as creating a premature termination codon (PTC) if the stop codon position in the AS isoform resulted in a shorter ORF compared with the constitutive isoform. In some cases, the possibility existed to have multiple possible translation start sites in the same reading frame because a PTC created by the AS event results in an alternative start codon downstream. Sometimes the second possible start codon is used for translation, for example, alternative isoforms of AtMYB59 and AtMYB48 use the second AUG start codon to initiate translation (Li et al., 2006). Thus two possible ORFs were present in some cases: one having the same start codon as the constitutive ORF but truncated, and the longest ORF starting from a downstream start codon, illustrated in Fig. S2. In these cases, we did a second analysis that considered both possible translations. We defined them as possible-PTC because it was not clear which translation start site was actually being used, if the transcripts are translated.
G-tests were used to find significant differences between conserved and NEC AS events for position and PTC analyses. However, owing to the small number of AS events in ME, ES and AltP, Fisher's exact tests were also used.
Gene ontology and domain analyses
Gene ontology (GO) enrichment analyses were performed using the R topGO package version 2.2.0 (Alexa et al., 2006) from the Bioconductor project. We compared GO categories for genes having conserved AS with all genes having an AS event to identify overrepresented categories. The significance of overrepresented categories was computed using Fisher's test with the ‘weight’ algorithm implemented in topGO, and overrepresented categories had a P <0.05. The ‘weight’ algorithm tested the enrichment of each GO category in a gene set by examining the GO hierarchy in a bottom-up order. Once a GO category was determined to be significant, all genes associated with it were down-weighted for the remaining categories. We used this algorithm in order to limit issues from genes belonging to multiple ‘parent–child’ GO categories.
Protein domains were identified with hmmscan from the HMMER3 program suite (Durbin et al., 1998, http://hmmer.janelia.org/) using PfamA hidden Markov models downloaded from Pfam25.0 (Finn et al., 2010). Domains detected with an e-value < 1e−10 were taken into account.
Brassica rapa and A. thaliana were grown in a glasshouse and harvested at 3 wk after planting. For RNA extractions, DNase treatments, and reverse transcription of RNA into cDNA, we followed the procedures described in Zhou et al. (2011). PCR methods are listed in Table S1.
Sequence alignments and AS detection
We mapped ESTs from Brassica rapa, Brassica oleracea, and Brassica napus onto homologous genes in the genome of A. thaliana using GMAP and Sim4 cc (see the 'Materials and Methods' section for mapping details). To limit false positives as a result of alignment errors, only Brassica sequences showing the same structure after alignments on the Arabidopsis genome were used for further analyses. Among 1 026 869 Brassica ESTs, 416 993 (41%) were mapped to an Arabidopsis nuclear protein-coding locus. We also mapped 1652 596 A. thaliana transcript sequences from TAIR gene models, cDNAs and ESTs to A. thaliana gene sequences to identify AS events not annotated by TAIR (Table 1). Seventy-two per cent of these sequences were aligned on a nuclear locus, and 71% to a protein-coding region. When comparing the results of the mapping from Brassica and Arabidopsis, 180 060 (18%) of the Brassica ESTs were mapped on one of 4726 genes out of the 5686 alternative spliced genes in A. thaliana (Table 1).
Alternative splicing genes were characterized and classified in six categories: IR, AltD, AltA, AltP, ES, and ME (Fig. S1). For the A. thaliana dataset, a total of 10 572 AS events for 5686 genes were obtained using sequences from TAIR gene models, cDNA and ESTs (Fig. S3). Consistent with previous studies, IR events occurred most frequently (38.4%), followed by AltA events (25.3%), AltD (14.6%), ES (12.4%), AltP (9.1%), and ME (0.1%) (Ner-Gaon et al., 2004; Wang & Brendel, 2006).
Identification of 537 conserved AS events
Conserved AS events between A. thaliana and Brassica species were detected using Brassica EST sequences aligned on A. thaliana genes with AS. CP events were defined as the same type of events (IR, etc.), having the same position in the orthologous genes between Brassica and Arabidopsis, that affect the homologous intron at the same exon–intron junctions. Those events provide a strictly defined set of evolutionarily conserved AS events. In cases of genes duplicated by whole genome triplication in the Brassica lineage after it diverged from the Arabidopsis lineage (Mun et al., 2009; Cheng et al., 2012), if one of the duplicates in Brassica has a conserved AS event with its homolog in Arabidopsis, it was considered to be a conserved event (lack of a conserved event discovered in the other paralogs in Brassica could reflect divergence in AS patterns after gene duplication or it could be a result of missing EST data, as discussed later).
Five hundred and thirty-seven CP AS events were found in a total of 485 genes (Table S2). This is the largest set of conserved AS events between divergent plant species reported to date. Analyses of the CP AS events revealed 220 AS events not referenced in TAIR gene models and 139 Arabidopsis genes not described as alternatively spliced in TAIR. Sometimes AS events that are of the same type and which affect the same intron–exon junction are considered to be conserved, despite not being at exactly the same position (Wang & Brendel, 2006; Wang et al., 2008). Thus, we defined a second set of conserved events, CJ events, as an AS event detected in the same exon–intron junction between orthologs of Brassica and Arabidopsis (see the 'Materials and Methods' section). Because the CJ events are a less strict definition of evolutionarily conserved events, we focused our study on the CP events; however, the results from analyses of the CJ events are presented in the supporting information. When we included the CJ events in addition to the CP events, we found 694 conserved events in a total of 597 genes (Fig. S3). The remaining 9878 AS events mapped to 5426 genes and were classified as NEC events (Table 2, Fig. 1).
Table 2. Alternative splicing (AS) distribution
Number of genes
IR, intron retention/exclusion; AltA, alternative acceptor; AltD, alternative donor; ES, exon skipping/retention; AltP, alternative positions; ME, mutually exclusive exons. ns, nonsignificant; ***, P <0.001.
It is important to note that the available Sanger-generated EST sequences from Brassica spp. at the time this project was performed were not comprehensive and thus there are likely to be many more conserved AS events between Brassica and Arabidopsis that we could not identify. However, the EST sequences come from a variety of organ types and growth conditions, giving us a chance to detect organ or condition-specific AS. Moreover, using EST sequences provides unambiguous detection of AS events because single sequences usually span the entire AS event. Many of the ESTs cover large regions of the transcript, allowing detection of real transcript isoforms containing multiple AS events, as well as allowing us to distinguish transcripts having multiple AS events from those having only one AS event. That facilitates the determination of the effects of conserved AS on the features of the proteins.
Comparisons of features between conserved and NEC AS events
Our set of 537 conserved AS events allows for comparison of conserved (CP) vs NEC events to determine features of the conserved events. When comparing the distribution of AS events by type of event, conserved AS events showed a significant difference compared with NEC events (Table 2, Fig. 1). Among the conserved AS events, the most common type was AltA (46.9%), followed by IR (24.6%), and then AltD (19.7%); by contrast, the most common type of AS event overall in Arabidopsis was IR. IR, ES and AltP events were significantly less common (P <0.001 for the three classes), whereas AltA and AltD were significantly more common (P <0.001 for both classes) in the set of conserved AS events compared with NEC AS events.
To compare the length of AS between conserved and NEC AS events, we calculated the length difference created by the AS for different types of AS event. For IR and ES, this length represented the length of the intron or exon retained or skipped. For AltA and AltD it represented the number of added or deleted base pairs at one end of an intron. IR and ES were the largest events among both conserved and NEC AS, whereas AltA and AltD events were the smallest (Table 3). IR events were significantly shorter among the conserved events, with a mean length of 108 bp compared with 156 bp for the NEC events (Table 3). AltA and AltD events also were significantly shorter among the conserved events (Table 3).
Table 3. Length of alternative splicing (AS) (bp)
IR, intron retention/exclusion; AltA, alternative acceptor; AltD, alternative donor; ES, exon skipping/retention. *, P <0.05; ***, P <0.001.
Analyses of AS effects were done using translated sequences from TAIR's A. thaliana gene models and cDNAs (we did not use a reconstruction of transcripts from multiple EST sequences in order to not generate isoforms with combinations of AS events that are not actually present). Isoforms having one AS event were analyzed to look specifically at the effect of AS in one isoform per gene. Following this description, we extracted 303 AS events out of the 537 CP AS events, and 4118 out of the 9878 NEC AS events, which were used for further analyses. When looking at the locations of AS positions relative to the ORF, conserved and NEC events had approximately the same distribution frequencies in the coding region and in the 5′UTR, with the vast majority being in the coding region (Table 4). These results are in agreement with previous reports on AS in A. thaliana where most of the AS events were found in the protein coding region and more AS events were found in the 5′UTR than in the 3′UTR (Campbell et al., 2006; Wang & Brendel, 2006). Conserved (CP) AS events were less likely to be in the 3′UTR than NEC events (P <0.001), and slightly more likely to be in the coding region (P <0.01), but there were no significant differences between conserved and NEC events in the 5′UTR (Table 4). Overall, 255 and 3165 AS were found in the coding region of conserved and NEC AS, respectively, and the corresponding genes were used to study effects of AS on coding regions.
We then checked the distribution of these AS events along the length of the coding region. We did not find any significant differences in distribution when comparing conserved (CP) events with NEC events (Fig. S4). We also compared the positions of CP and CJ events and observed that the CP events were more likely to be in the coding region and CJ events were more likely to be in 5′ and 3′UTR regions (Fig. S5). Differences between CP and CJ events in terms of their positions in the gene are probably explained by the fact that the coding regions tend to be more conserved than the UTR regions.
We determined whether the AS event introduced a PTC or a frameshift. The set of conserved AS events in this analysis was composed of 255 AS out of the 537 conserved AS, and that of NEC was composed of 3165 AS out of the 9878 NEC AS. Overall, no significant distribution difference was found between conserved (CP) and NEC AS events in PTC analysis (Table S3). When considering the first AUG as the real start codon, in conserved and NEC cases, the majority of AS events led to a PTC (61.2 and 60.3% for conserved and NEC events, respectively). Most categories were not more likely to have a PTC on conserved vs NEC events, except for ES, where the number of conserved events was small. We obtained the same results when considering the longest ORF. Likewise, categories were not more likely to have a frameshift in conserved than in NEC events (Table S4).
Types of genes with conserved AS: enrichment in genes for chloroplast proteins
An analysis of enrichment of GO categories was conducted to determine whether some GO categories were overrepresented in conserved AS proteins compared with all Arabidopsis AS proteins, by performing a Fisher's exact test using the ‘weight’ method provided by topGO (Alexa et al., 2006). For the biological process category, response to stimuli (response to far red light, response to blue light, response to red light, P <0.001), biological regulation (regulation of proton transport, P <0.001) and other metabolic processes (light harvesting, photosynthesis, carbon fixation, histone H2B ubiquitination, cysteine biosynthetic process, P <0.001) were overrepresented (Table S5). For the cellular component category, chloroplast (chloroplast thylakoid, photosystem II or chloroplast stroma, P <0.001) and organelle part (organelle envelope, P <0.001) were overrepresented. For molecular function, ribulose-bisphosphate carboxylase activity and chlorophyll binding (P <0.001) were overrepresented.
Overall, results of the GO analysis of conserved AS events revealed that there are more conserved AS events in genes whose products are involved in chloroplast functions. Some of these genes encode proteins that function in the same complex. For instance, we found AS conserved for proteins involved in the light-harvesting chlorophyll protein complex (LHC): two proteins associated with photosystem II (LHCA1 and LHCA3) and four proteins associated with photosystem I (LHCB1.4, LHCB3, LHCB4.2 and LHCB5). All conserved AS events were IR except in LHCB5 where there was an AltD. We identified other proteins involved in photosystems I and II with conserved AS events, for example conservation of an IR in PsaK, two ES in PsbP and an AltD in PsbR (Table S2). It is possible that AS plays an important role in the function of chloroplast proteins or is a major part of regulation of nuclear genes for chloroplast proteins.
PsbP is an interesting example of a photosynthesis gene having conserved AS. In both species, a new donor site (AltD) for intron 1 and a new acceptor site (AltA) for intron 3 result in conserved skipping of two exons (Fig. 2a). The skipped exons plus the AltA and AltD events reduce the size of the coding region from 263 aa to 95 and 94 aa for Arabidopsis and Brassica, respectively. We confirmed the alternative isoforms by RT-PCR (Fig. 2b). In both species, the major isoform (with four exons) was the same and the minor isoform is present at low levels. The minor isoform appears to be present at lower levels in Brassica than in Arabidopsis relative to the major isoform. PSBP contains a bipartite transit sequence containing the information for import across the chloroplast envelope as well as for targeting to the thylakoid (Wales et al., 1989; Schnell, 1998) . Interestingly, the targeting sequence to thylakoid was lost in the alternative isoforms in both Arabidopsis and Brassica, whereas the targeting sequence to the chloroplast remained, suggesting that the product localizes to the chloroplast stroma if the AS isoform is translated. Moreover, skipping of exons 2 and 3 disrupted the PsbP domain, which is the site of phosphorylation (Reiland et al., 2009).
Domain analysis of genes with conserved AS
We next analyzed the effects of AS on domains. Using the 255 AS events found in a coding region when comparing isoforms having only one AS, we found 141 cases (55.3%) where AS modified one or more domains in one isoform compared with the other. We defined modified as the complete loss of at least one domain or the modification of at least one domain region when comparing two isoforms. A few examples of genes with domain modifications caused by AS are listed in Table 5. In many cases, one domain was modified and another domain was not. By contrast, there were 76 cases (29.8%) where the protein domains remained the same with an AS event, 11 cases (4.3%) where it was unclear if the domain was affected by AS, mainly cases where there were two possible translation start sites, and 27 cases (10.6%) where no protein domains were identified.
Table 5. Examples of conserved position (CP) alternative splicing (AS) modifying a known domain
AS length (bp)
SRF-type transcription factor (DNA-binding and dimerization domain)
SRF-type transcription factor (DNA-binding and dimerization domain)
RNA recognition motif. (aka RRM, RBD, or RNP domain)
Nuclear transport factor 2 (NTF2) domain
An example of a conserved AS event that affected domains is the AT4G36690 locus encoding for the U2AF65a protein. U2AF65a is the large subunit of the U2 snRNP auxiliary factor and a component of the 3′-splice-site-recognition complex, playing a key role in the assembly of splicing complexes (Wahl et al., 2009). U2AF65a is composed of three RNA recognition motif (RRM) domains and a N-terminal RS domain (Fig. 3a). The two first N-terminal RRM domains interact with the intron polypyrimidine tract, while the C-terminal RRM domain interacts with the SF1 protein and helps U2snRNP to recognize the branch site during the spliceosome assembly at the branch point sequence level (Domon et al., 1998; Wahl et al., 2009; Mackereth et al., 2011). We found conservation of two AS events involving intron 11 between Arabidopsis and Brassica. There is conserved retention of the 273 bp intron and an AltA of 173 bp (Fig. 3a). These alternative isoforms were confirmed by RT-PCR (Fig. 3b). In both species, the full-length isoform is the one that is most highly expressed. In Brassica the middle-sized isoform may be present at lower levels relative to the major isoform compared with the two isoforms in Arabidopsis. In silico translation analyses, on Arabidopsis and Brassica sequences, showed both AS events causing PTCs that would lead to the loss of the C-terminal RRM domain. The conservation of the two different AS isoforms may play a role in regulating the spliceosome assembly, although their exact functions have not been characterized.
Domain disruption and potential functional change caused by conserved AS in BPM1
The AT5G19000 locus coding for the BPM1 protein has an AS event conserved with Brassica. BPM1 has a meprin and TRAF homology (MATH) domain and a broad complex, tramtrack, bric-a-brac/POX virus and zinc finger (BTB/POZ) domain (Fig. 4a). The MATH domain of BPM proteins assembles with members of the ethylene response factor/Apetala2 (ERF/AP2) transcription factor family (Weber & Hellmann, 2009). The BTB/POZ domain interacts with CUL3 proteins to form E3 complexes involved in ubiquitination of proteins and can target them for degradation by the 26S proteasome (Hellmann & Estelle, 2002; Gingerich et al., 2005). AS events in Brassica and Arabidopsis, including a conserved retained exon, disrupt the BTB/POZ domain. Both Arabidopsis and Brassica contain an ES event of 82 bp in exon 3 (Fig. 4a). The presence of alternative isoforms was confirmed by RT-PCR (Fig. 4b). For both species, in silico translations of these ESTs revealed shorter proteins with deletions in the BTB/POZ domain. Disruption of the BTB/POZ domain prevents binding of CUL3 and thus ubiquitination and degradation of the transcription factor RAP2.4, a member of the ERF/AP2 family of transcription factors that are involved in mediating light and ethylene signaling (Lin et al., 2008), which binds to the MATH domain in BPM (Fig. 4c). In BPM1 in vitro assays, RAP2.4 binding can take place without the BTB/POZ domain (Weber & Hellmann, 2009) and thus RAP2.4 is not degraded when the BTB/POZ domain is disrupted in the AS form. Thus, if these alternative isoforms are translated into proteins, the AS event has the effect of preventing binding of CUL3 and subsequent degradation of RAP2.4.
Features of conserved alternative splicing events
In this study we identified 537 conserved (CP) AS events in 485 genes. Alternative donor and acceptor events are significantly overrepresented among conserved events, whereas intron retention and exon skipping events are underrepresented. AltA might be the most overrepresented type of AS among the conserved events, because the largest number of AltA events are 3 bp, which would not disrupt the reading frame if proteins are produced from the AS transcripts (Fig. S3; also seen in Campbell et al., 2006). It is possible that there is an overabundance of AltA and AltD types among the conserved events, because the large majority are short (Table 3) and have smaller effects on the coding region, as long as they do not introduce a frameshift. Thus the protein structure and stability are more likely to be conserved if those events are translated. Moreover, short AS located in a domain that do not create a frameshift could be a means of protein regulation (domain switch on/off) without affecting the protein structure. AS events that disrupt protein structure and function would likely be selected against and are less likely to be evolutionarily conserved. We found that conserved AS events are significantly shorter and less likely to be in the 3′UTR than are NEC events. Conserved AS events may be significantly shorter because some of them might have a smaller effect on the protein sequence, if translated, than longer events. In many cases, shorter events might have a less negative impact on protein structure if the transcripts are translated.
We found that conserved AS events are enriched for genes whose products function in the chloroplast. It is possible that AS plays an important role in the regulation of nuclear genes for chloroplast proteins, or that it is involved in producing alternative isoforms of many chloroplast proteins. An AS event modified a functional domain for about half of the genes with conserved events. Thus, if the transcripts are translated, the resulting proteins might have functional changes or might be nonfunctional, depending on the effect of the functional domain modification.
Evolutionary conservation of alternative splicing in angiosperms
In this study we have identified the largest number of conserved AS events between two plant species to date. Both Brassica and Arabidopsis are in the same family (Brassicaceae), whereas some of the previous large-scale comparisons have been between the much more distantly related plants Arabidopsis and rice (Wang & Brendel, 2006; Baek et al., 2008; Severing et al., 2009). Thus one would expect to find a much higher number of conserved events between Arabidopsis and Brassica than between Arabidopsis and rice. However, some previous studies have compared AS event conservation between species within the same family, including three legume species (Medicago, Glycine, and Lotus in Wang et al., 2008) and rice and maize (Severing et al., 2009), but only a small number of conserved events were found. One possibility to explain those results is that the number of ESTs available from the legume species was relatively small, compared with Brassica, Arabidopsis and rice.
Another factor that affects the detected amount of AS conservation is whether conservation of a specific AS event (i.e. the same type of event affecting the same intron in the same position) is compared or whether finding the same type of event in homologous genes between species, even if it affects different introns, is compared. In this study we compared events of the same type affecting the same intron in the same position (conserved position), because these represent true evolutionary conservation of an AS event, as well as events of the same type that affect the same exon–intron junction (conserved junction). We did not analyze events of different types at the same exon–intron junction. Genes that have different types of conserved AS events that affect the same intron, or conserved AS events that affect different introns, might show effects of the AS event that are similar in some cases. For example, if homologous genes from two species both have an IR event, but it affects a different intron and both IR events create a PTC, the effect might be similar.
A drawback of the previous studies of AS conservation in angiosperms and the current study, each of which used Sanger-sequenced ESTs as a source of transcribed sequences, is that the ESTs are not comprehensive with regard to AS and thus many AS events are not detected. That almost certainly leads to an underestimate, perhaps a major underestimate, of the number of conserved AS events between species. Future studies using transcribed sequences derived from higher-throughput sequencing, such as 454 and paired-end Illumina reads, will be needed to estimate the degree of conservation of AS events between species more accurately.
Potential functional and regulatory importance of conserved AS events
Using cross-species analyses to detect conserved AS events is a way of finding AS-creating isoforms with potentially important function, as compared with AS events with no function or ones created by splicing noise (Boue et al., 2003; Ast, 2004; Sorek et al., 2004; Wang & Brendel, 2006; Reddy, 2007; Severing et al., 2009; Keren et al., 2010). In this way, our finding of evolutionary conservation of 537 AS events between Brassica and Arabidopsis suggests a functional importance of many of those AS events. Such events may be good candidates for functional studies, especially those uncharacterized AS events in genes whose functions (of the major or nonAS isoforms) are known. However, not all conserved events are likely to be functional, particularly if the event is conserved because it is created by a mechanism, such as secondary structure, that promotes AS and is conserved between species without having any effect on regulation or function. Likewise, events that are not evolutionarily conserved are not necessarily nonfunctional, because they could have species-specific functions.
What kinds of functional effects might be present for conserved AS events? At the level of transcriptome diversity, AS can regulate mRNA levels by creating transcripts that are targeted to the NMD pathway for degradation, thus lowering the total level of gene expression (Lewis et al., 2003; Hori & Watanabe, 2005; Arciga-Reyes et al., 2006; Kalyna et al., 2012). Several examples of gene regulation by AS and transcript decay have been reported (reviewed in Reddy, 2007; Barbazuk et al., 2008). For example, in the flowering time gene FCA in A. thaliana that controls the transition from the vegetative to the reproductive phase, there are three AS forms produced, none of which encodes a full-length protein, and AS of FCA limits the amount of FCA protein, both spatially and temporally, to prevent precocious flowering (Macknight et al., 2002). Transcript degradation by AS can also have an autoregulatory function. GRP8 and GRP7 in A. thaliana autoregulate and cross-regulate their own expression by AS and NMD (Staiger et al., 2003; Schöning et al., 2008). In this study we found that AS in GRP8 (AT4G39260) is conserved between Brassica and Arabidopsis (Table S2). Transcripts that contain PTCs, in particular those where the stop codon is > 50 bp upstream of the last exon–exon junction or > 350 bp upstream of the end of the transcript, have been considered to be likely targets for degradation by the NMD machinery (reviewed in Wang & Brendel, 2006; Reddy, 2007). However, a recent study of 270 genes in A. thaliana mutants for the NMD pathway revealed that many predicted NMD targets might not actually be degraded by NMD, with only 11–17% of analyzed putative NMD targets overrepresented when NMD was inhibited (Kalyna et al., 2012). Moreover, they found that many IR events that introduce a PTC do not appear to be degraded by NMD. That study highlights the difficulty in identifying real NMD targets in plant transcriptomes without mutant analyses, because the presence of a PTC does not necessarily indicate that a transcript will be degraded. Some of them might be translated to make proteins with altered functions, especially if functional domains are lost or disrupted. Future studies identifying transcripts that are targets of NMD on a genome-wide scale, using transcriptome sequencing of NMD mutants, will be very useful for showing how many of which types of AS events are involved in NMD.
At the level of proteome diversity, AS can have different effects on proteins, such as changing protein function, altering subcellular localization, or affecting the protein affinity or stability (reviewed in Stamm et al., 2005; Reddy, 2007). Analyses of domains allowed us to determine that for half of the genes containing a conserved AS event, the AS modified a functional domain. Three interesting examples, presented earlier, are BPM1, PsbP, and U2AF65a. We also found some cases of short conserved AS events located in predicted domains that did not disrupt the potential protein. Studies of such short AS events in animals (length < 50 nucleotides) revealed that, in some cases, these short nucleotide insertions or deletions modified the secondary protein structure and could, for example, influence the activity of an enzyme or modify the active site conformation (Oakley et al., 2001; Peneff et al., 2001; Wen et al., 2004).
An emerging class of small proteins, called microProteins (miPs) have been described as dominant-negative regulatory proteins disturbing the formation of functional proteins dimers by forming nonfunctional, homotypic protein complexes with their targets (Staudt & Wenkel, 2011). MiPs could result from an AS event that deletes the catalytic, activation or other functional domain from the protein, but retains the protein binding domain. In these cases, the alternative protein could bind its target but create a nonfunctional dimer. Cases of miPs generated by AS that are functional regulators have been shown in animals (Yang et al., 2000; Laitem et al., 2009). Formation of miPs may be an important function of some AS events in plants, including events with evolutionary conservation. We found conservation of an AS event between Brassica and Arabidopsis for a recently reported miP created by AS, Jas1 (also called Jaz10; AT5G13220; Table S2), that was shown to be conserved among several plants (Chung et al., 2010). In the case of BPM1, reported here, the isoform generated by an exon skipping event could be an miP, if this isoform is translated into protein. It is possible that many of the AS transcripts containing PTCs that are not degraded by NMD might be translated into miPs.
Future studies of conserved alternative splicing events will benefit from genome-wide studies of NMD candidates to identify those that might be involved in the regulation of transcript abundance by NMD. Future studies could examine AS event conservation between species at different phylogenetic depths, as well as extending analyses of AS conservation to a genome-wide scale by using RNA-seq approaches to compare matched tissue-type samples from multiple species.
We thank Yichun Qiu in the Adams laboratory for performing the RT-PCR experiments. This study was supported by the Natural Science and Engineering Council of Canada.