Intron retention is a major phenomenon in alternative splicing in Arabidopsis


For correspondence (fax 972-8-9344181; e-mail


Alternative splicing (AS) combines different transcript splice junctions that result in transcripts with shuffled exons, alternative 5′ or 3′ splicing sites, retained introns and different transcript termini. In this way, multiple mRNA species and proteins can be created from a single gene expanding the potential informational content of eukaryotic genomes. Search algorithms of AS forms in a variety of Arabidopsis databases showed they contained an unusually high fraction of retained introns (above 30%), compared with 10% that was reported for humans. The preponderance of retained introns (65%) were either part of open reading frames, present in the UTR region or present as the last intron in the transcript, indicating that their occurrence would not participate in non-sense-mediated decay. Interestingly, the functional distribution of the transcripts with retained introns is skewed towards stress and external/internal stimuli-related functions. A sampling of the alternative transcripts with retained introns were confirmed by RT-PCR and were shown to co-purify with polyribosomes, indicating their nuclear export. Thus, retained introns are a prominent feature of AS in Arabidopsis and as such may play a regulatory function.


Typically, alternative splicing (AS) will determine the differential inclusion of coding and non-coding sequence in a transcript. The resulting transcripts may differ in their stability or produce multiple polypeptide isoforms displaying specific chemical and biological activity. AS can thus play a major role in expanding the potential informational content of eukaryotic genomes. Recent evidence indicates a high incidence (above 35%) of AS in the human genome (Brett et al., 2000) compared with a much lower level (approximately 5%) in the plant Arabidopsis (Brett et al., 2002).

Splicing alterations can take various forms (Figure 1a): exon skipping occurs when an exon cassette is sometimes included or excluded from the final transcript. A constitutive exon can be extended or shortened yielding alternative 5′ or 3′ splicing sites. Differentially spliced start or end points will yield distinct transcript termini. An intron can be spliced out or retained. The latter type of AS presents a special case as mechanistically it does not involve a choice between sets of competing splice sites but rather a choice between splicing taking place or transcript transport (Black, 2003). Thus, the presence of retained introns represents a nuclear transport challenge as a nuclear/cytoplasmic checkpoint must be breached. Hence, partially spliced transcripts are retained in the nucleus, probably by the presence of spliceosomal components that are detected by the nuclear export apparatus. Nonetheless, in specific known instances, for example, retrovirus biology, the non-spliced form can override this checkpoint by using factors encoded by the host cell genome (Pasquinelli et al., 1997).

Figure 1.

Types of alternative splicing.
(a) Alternative splicing and retained introns at the transcript terminus or internal sections of the transcript include: exon skipping, alternative 5′ splice donor sites and alternative 3′ splice acceptor sites, intron retention, and alternative splicing in the transcripts termini.
(b) Detection of alternative splicing by the EST pairs gapped alignment (EPGA, see text) method showing the requirement for matches on both sides of an indel.

In humans, 2–5% of the genes have been reported to appear in the form of intron retentions (Clark and Thanaraj, 2002; Kan et al., 2002). Plant introns differ considerably from human introns in their average intron size (150 versus 740 in plants and humans, respectively). Furthermore, plant introns also contain a less well-defined intron branch point (Tolstrup et al., 1997), less strict intron borders (Zhu et al., 2003), more uridine and adenosine-rich (UA-rich) sequences (Brown et al., 2002; Lorkovic et al., 2000), and at least 9% of Arabidopsis genes may have introns in their UTRs (Haas et al., 2002; Zhu et al., 2003). In addition, plants have a considerably more complex family of spliceosomal-associated serine arginine (SR) factors (Lorkovic and Barta, 2002). We hypothesized that these differences may lead to altering of splicing strategy in plants and examined both public and self-generated databases for alternative transcript types in the model plant Arabidopsis. Unexpectedly, intron retention was found to be a large fraction of alternatively spliced transcripts. Sampled retained introns were confirmed by RT-PCR to be part of transcripts and to be associated with ribosomal complexes, thus showing that intron retention is a major alternative transcript form in Arabidopsis.


Intron retention comprises a large proportion of alternatively spliced transcripts

TIGR features a tentative consensus sequences database that incorporates all predicted genes and expressed sequence tag (EST) from all Arabidopsis ecotypes into clusters. Currently, over 1588 tentative alternative splice forms were detected (Quackenbush et al., 2000) ( We applied a genomic comparison search algorithm requiring strict homology alignment (see Experimental procedures) to those results, to find pairs that share common genomic location. The alignments were then parsed for their distribution of AS-type. Indels (insertion or deletion of sequence in one transcript compared with the other) that were defined by alignment as introns were required to be larger then 49 bp and include canonical borders (GT..AG, GC..AG, AT..AC) (Reddy, 2001). In this database, 1470 pairs match those criteria and intron retention comprised 17.7% of the alternative splice choice detected (TIGR-TC; Table 1).

Table 1.  Alternative splicing (AS) in the A. thaliana genome detected by EPGA and TIGR databasesa
  1. Values in parentheses represent percentage.

  2. aTypes of AS in the EPGA database (detailed in the supplementary data Table S1) are compared with TIGR-TC database (; release April 24, 2003) and to TIGR-Atg database (Haas et al., 2003) (; release June 2003).

  3. bDatabases were compared by applying the same constraints used for parsing EPGA transcripts. ‘Other’ in EPGA includes transcript pairs with more than one type of AS in the same pair. ‘Other’ in both TIGR databases includes occurrences of more than one type of AS and AS that appear in the termini region.

AS detected43614701168
Intron retention280 (64.2)261 (17.7)356 (30.5)
Exon skip14 (3.2)82 (5.6)37 (3.2)
Alt 5′ and/or 3′126 (28.9)455 (30.9)419 (35.9)
Otherb16 (3.7)672 (45.7)356 (30.5)

An additional public AS database features the compilation of full-length cDNA from three A. thaliana ecotypes; Columbia, Wassilewskija and Landsberg erecta as well as EST alignments with the Arabidopsis genome ecotype Columbia (Haas et al., 2003; Applying to this database the strict requirement for genomic homology followed by parsing algorithm as above, we note that 30.5% of AS are of the intron retention type (TIGR-Atg; Table 1).

Intron retention distribution using EST pairs gapped alignment

High intron retention rates are not a regular feature of animal splice variants, which raises the question of whether the high proportion of intron retention is a bona fide presence in Arabidopsis transcripts. We examined for the possibility that heuristic algorithms of gene structure prediction or the process of contig building may influence the AS distribution. In addition, the use of EST arising from divergent ecotypes may introduce false AS because of natural sequence polymorphisms. For example, between Columbia and Ler the mean size of indels is 175 bp and at least 747 indels of larger than 100 bp size have been detected (Jander et al., 2002). Therefore, we conducted a direct EST-driven search comparison of ecotype-specific EST (Columbia) to which a direct indel search algorithm was applied. As illustrated in Figure 1(b), EST pairs gapped alignment (EPGA) detects exon skips, intron retention and 5′ and/or 3′ alternative splice sites as long as the indel size is shorter than the EST size. The method cannot detect other types of AS, for example, transcript with different termini. EPGA sensitivity for detecting AS is lower than TIGR for the following reasons: First, the database is smaller as it is restricted solely to ESTs from the Columbia ecotype. Second, the average size of EST is shorter compared with that achieved by contigs. Third, strict sequence alignment requirements are applied to prevent paralogous pairing. Paralogous pairing can potentially cause false positives in EPGA as 65% of the genes in Arabidopsis belong to multigene families (Arabidopsis Genome Initiative, 2000). Thus, stringent requirements for exact pairing of EST used in EPGA are intended to keep the interference from this source low. EPGA was applied to 191 293 ESTs (NCBI, November 2003) and detected 436 pairs containing indels which share common genomic location. As shown in Table 1, 280 (64.2%) of the AS were intron retentions that were larger then 49 bp and had canonical borders (GT..AG, GC..AG, AT..AC). Only one retained intron belongs to the rare AT..AC borders that may be indicative of U12 type spliceosome processing. The higher total percentage of intron retention in EPGA compared with the TIGR databases is because of the lack of retrieval of AS at the transcript termini. As shown in Table 1, those AS types are approximately 30–46% of the total AS which would indicate that the corrected proportion of intron retention type in EPGA is above 44%.

Intragenic distribution of unspliced introns

The location of retained introns within a transcript is of interest in the context of its possible relevance to their formation, stability and biological function. In order to establish the intragenic distribution of retained introns, we compared the retained introns detected above with annotated gene structures. The annotated genome transcripts have been distributed into databases by their components, that is introns, 3′ UTR, 5′ UTR and CDS-coding sequences (TAIR;; release 28 February 2004). If TAIR detected a transcript and its alternative, the components of both transcripts were distributed independently into the databases. For example, the sequence of a retained intron that contains a stop codon will be divided into both the CDS and UTR database as well as the intron database. Retained introns that were not detected by TAIR as appearing in alternative transcripts appear only in the intron database.

Retained introns recovered from the EPGA or TIGR-TC databases were aligned to those databases using BLAST (Table 2). Alternative introns that appear in what was annotated by TAIR as part of the CDS or CDS + UTR will yield a different CDS, that is it may be longer or contain different start or end protein translation points. If the retained intron is detected in what was annotated by TAIR as UTR regions, the length of the UTR will change. Approximately 50% of the retained introns in both EPGA and TIGR-TC databases fall into those classes, while the remaining retained introns have been annotated exclusively as part of intron regions and their retention in a transcript may cause premature termination and be subject to non-sense-mediated decay (NMD). Interestingly, in both EPGA and TIGR-TC databases, one-third of the retained introns that appear exclusively in the intron database are either the only intron in the transcript (i.e. single intron), or the last intron in the transcript. The rest appear as internal introns (Table 2). Thus, the distribution of positions of retained introns may impact on the transcript stability.

Table 2.  Positional distribution of retained introns as detected in EPGA and TIGR-TC
Intron locationaEPGATIGR-TCPredicted change in decoded transcriptSampled transcripts (Table 3)
  1. Values in parentheses represent percentage.

  2. aDatabases used for retained intron distribution are the introns, 3′ UTR, 5′ UTR and CDS from TAIR BLAST databases ( Intron location as determined in the splice variant present in the TAIR database.

CDS32 (11.43)52 (19.92)Longer CDSAt4g01690, At2g23550, At3g13300, At2g18690, At2g33340, At5g20160
CDS + 3′ UTR16 (5.71)21 (8.05)Earlier stop codonAt2g44920, At1g70180
CDS + 5′ UTR1 (0.36)3 (1.15)Later start codon 
3′ UTR51 (18.21)41 (15.71)Longer 3′ UTR, same reading frameAt2g19940, At1g28330, At1g70620, At1g79090, At3g14930, At2g24420
5′ UTR30 (10.71)27 (10.34)Longer 5′ UTR, same reading frameAt4g17190
Internal intron93 (33.21)74 (28.35)Not determined as AS type in TAIRAt5g07890, At4g07410, At1g04440, At2g44650, At5g57860, At1g10760, At4g01290
Last/single intron53 (18.93)39 (14.94)Not determined as AS type in TAIRAt5g13190, At4g34150
No hit4 (1.43)4 (1.53)  
Number of introns280261  

Distribution of biological functionalities of transcripts with retained introns

Examining the assigned global biological activity of transcripts which appear with alternatively spliced retained introns may serve to explain their biological function and suggest conditions that promote their expression. In Figure 2, global gene ontology groups were chosen if they contained at least 1% of the database transcripts from the total genome or EPGA-retained intron database. The data is shown as the ratio of percentage in the retained introns database relative to the percentage in the total genome analysis. Bars, which are greater or smaller than 1, represent overabundance or underrepresentation of EPGA transcripts, respectively, in relation to total genes distribution. The difference in distribution of ontology between EPGA and the total genome is highly significant statistically (chi-square; d.f. = 8, P < 6 × 10−5). Interestingly, the groups that are enriched in EPGA depict situations of physiological flux; including photosynthesis, stress response and stimuli response. In contrast, transcripts related to metabolism (∼40% of the transcripts) and cell maintenance (e.g. cell cycle) are particularly underrepresented in the intron retention group.

Figure 2.

Gene ontology classification of transcripts with intron retention from the EPGA database in relation to the total Arabidopsis gene ontology distribution. The distribution is based on the first level categories of the Gene Ontology classification of the biological processes of their protein products ( Gene ontology may lead to the classification of one gene into multiple categories. Only classifications containing at least 1% abundance in one of the databases were included as detailed in supplementary data Table S2. Transcripts listed as ‘biological process unknown’ were not included.

Retained introns are part of expressed transcripts and are associated with ribosomes

Transcripts with retained introns may represent bona fide transcripts or arise from cDNA libraries that are contaminated by genomic DNA, or be a result of incompletely spliced heteronuclear RNA. In addition, as shown in Table 2, their presence may contribute to non-sense-mediated transcript decay. In order to differentiate between these possibilities, 27 transcripts were examined by quantitative RT-PCR (see Experimental procedures). Among them were three transcripts with constitutively spliced introns and 24 transcripts with retained introns. Samples were chosen such that examples of high (>60%) and low (<30%) incidence of retained introns accretion are included as well as examples of intron retention in different positions in the transcript (Table 3). Primers that flank a retained intron were designed for simultaneous RT-PCR amplification of both the spliced and putative unspliced variants. Total RNA (lane t, Figure 3) was isolated from the whole plant and subjected to RT-PCR. DNase pre-treatment of the RNA samples eliminated all non-RT-dependent background (not shown). When the constitutively spliced transcripts were examined, only one fragment indicative of a fully spliced transcript could be detected (note fragment S in At5g17920; Figure 3, lane t). In contrast, as exemplified in At1g79090, two fragments that include upper unspliced and lower spliced transcripts can be detected (fragments I and S, Figure 3). Probes for the AS candidates showed that introns were detected in 18 of the 24 transcripts analysed (Table 3). In some cases where the retained introns made up >50% of the EST only the intron band was detected (e.g. At2g33340, At2g18690; Figure 3 and Table 3). Thus, the existence of introns in transcripts could be confirmed for 75% of the transcripts tested.

Table 3.  Detection of retained introns by RT-PCR examination
Gene identifieraEST pairbTotal RNAcNumber of AS (EST) detecteddLocation of introneRibosome associationf
  1. aESTs pairs were aligned, using BLAST, to the TAIR database [AGI genes (+introns, +UTR), release 17/4/2003] and the best hit was considered to be the gene identifier.

  2. bA representative EST pair was chosen. In cases where no AS was detected, that is constitutive splicing, a single EST is indicated.

  3. cSemi-quantitative RT-PCR reactions were conducted with total RNA and transcript-specific primers (Table S3). The oligonucleotide primers were designed for amplification of both the splice and unspliced variants. I, recovery of intron flanked by exons sequences; S, recovery of spliced junction that is the joining of the two exons after splicing.

  4. dThe number of EST detected containing a retained intron is indicated out of the number of total EST (parenthesis).

  5. eIntron location was determined as described in Table 2.

  6. fN indicates that I and S fragments were not shifted by puromycin or were only in the supernatant. ND, not done – as the retained intron was not detected in total RNA.

At2g19940AI997610, AA721836I, S2 (12)3′ UTRS
At1g28330AV440763, AA728497I, S2 (3)3′ UTRS
At1g70620AV524140, T41739I, S1 (4)3′ UTRI, S
At1g79090AV555994, AV523464I, S2 (6)3′ UTRI, S
At3g14930AV520930, AV521221I, S2 (5)3′ UTRI, S
At2g24420AV561186, AV567152I7 (8)3′ UTRI
At2g44920AV440813, AV522476I, S2 (3)3′ UTR-CDSS
At1g70180AV537577, AI998760I, S1 (2)3′ UTR-CDSS
At4g17190AV534971, BE038361I, S1 (2)5′ UTRS
At4g01690AV538014, AI997032I4 (5)CDSI
At2g23550R90111, BE526909I2 (4)CDSN
At3g13300AV554924, AV554887I, S3 (6)CDSI, S
At2g18690AV552678, AV549043I6 (7)CDSI
At2g33340N96659, AV548843I5 (6)CDSI
At5g20160T20959, BG459274S3 (8)CDSND
At5g13190AV544876, AV544995S2 (6)Intron 2-lastND
At4g34150AV547769, AV539092S2 (17)Intron 6-lastND
At5g07890BE520323, BE523070I, S3 (4)Intron 3S
At4g07410AV530043, BE522268I, S2 (7)Intron 1S
At1g04440H37476, AV553839I, S1 (2)Intron 11N
At2g44650T76501, AA597793S2 (5)Intron 2ND
At5g57860AI998080, AV552629S1 (2)Intron 3ND
At1g10760AV524353, AV529779S1 (5)Intron 30ND
At4g01290AV528467, AV529428I, S1 (2)Intron 4S
At4g24190AV816779S0 (51)No ASS
At5g17920AV523685S0 (50)No ASS
At5g60390AV552498S0 (51)No ASND
Figure 3.

RT-PCR of total and ribosome-associated RNA. For subcellular fractionation, cell extracts were loaded on 15 and 60% sucrose cushion gradients, with or without puromycin (±Pu) treatment. The RNA was extracted from the resultant pellet (p) and supernatant (s). The RT-PCR results of the ribosome fractions and of total RNA (t) are presented. Size in bp of expected PCR products of the unspliced (I) and spliced (S) transcripts are shown. Identity of the introns were confirmed by either direct sequencing or by additional PCR using an intron-specific primer. The annotated gene structures and corresponding ESTs are shown on the right side. The gene depictions (upper inserts) show the annotated gene structure from TAIR. The full length of the ESTs that are responsible for the splicing variation are illustrated directly below each gene structure. A gap in an EST represents missing sequence, the result of intron splicing, and an EST without a gap represents a transcript that includes this intron. White boxes correspond to the protein-coding portion and black boxes correspond to the UTRs. Lines between the boxes represent introns.

The RT-PCR result of the total RNA that is presented in Figure 3 (lane t) confirms the existence of transcripts in which intronic sequences are not removed. The unspliced transcript variants may reside in the nucleus as an RNA intermediate, for example, of slowly processed hnRNA. Alternatively, they may represent transcripts present in the cytoplasm in association with ribosomes. To distinguish between these possibilities, experiments were undertaken to determine whether splice variants reside on ribosomes, a marker for extranuclear, cytoplasmic localization. To this end, ribosomes were isolated on sucrose cushion gradients. Absorbance measurements of the pellet following sucrose density gradient fractionation, revealed a typical multi-peak distribution of mono and polyribosomes (upper insert, Figure 4). Pre-application of 1 mm puromycin disrupted polyribosome-RNA polymers complex, as implicated by enrichment of 80S, 60S and 40S moieties (lower insert, Figure 4). To follow the destination of transcripts with retained introns, RNA was extracted from the ribosome pellets and their corresponding supernatants (lanes p and s, Figure 3) that were prepared in the presence or absence of puromycin (+Pu and −Pu, Figure 3). RT-PCR analysis revealed that the constitutively spliced transcript At5g17920 was enriched in the polyribosome-containing fraction. By contrast, upon puromycin treatment, the transcripts were recovered preferentially in the supernatant, suggesting that the puromycin treatment released the transcripts from polyribosomes (Figure 3; compare fractions p and s). Similar analysis of the 27 transcripts are summarized in Table 3 and exemplified in Figure 3.

Figure 4.

Polyribosomes fractionation pattern. Cell extracts was loaded on 15 and 60% sucrose cushion gradient and centrifuged. The resultant pellets were then loaded on 15–60% linear sucrose gradients supplemented with or without 1 mm puromycin. Gradient fractions were collected and the absorbance profiles of the gradients at 260 nm were determined (Barkan, 1998). Ribosomes from Rabbit reticulocytes (80S) and from Thermos thermophilus (50S) were used as size markers.

The RT-PCR results of At1g79090, At3g13300, At2g18690 and At2g33340 (Figure 3 and Table 3) demonstrate that both transcripts, that is the retained introns and the fully splice fragments (when present) are enriched on polyribosome-containing fractions. In contrast, in At2g19940, At1g28330, At2g44920, At5g07890 and At4g07410, the intron-containing transcript is either not puromycin sensitive or it does not appear in the pellet (Figure 3; Table 3). Thus, a subset of transcripts with retained introns is associated with ribosomes.


Intron retention in Arabidopsis and human databases

We have combined a variety of methods and databases to determine the types of AS in Arabidopsis. All methods used predict the presence of a large fraction (∼30%) of retained intron types in Arabidopsis AS. We confirm, using RT-PCR that at least 75% of the sampled intron retention candidates (18/24) can be found in transcripts. Cases in which introns were not confirmed may indicate that the databases contain genomic sequence contamination or that the intron-containing transcript is expressed under restricted conditions, for example, specific developmental stage or environmental stress not emulated here. Alternatively, the unspliced variant transcript may be present below the detection limit used here for linear-range quantitative RT-PCR experiments. In many cases (8/18), the variants were associated with polyribosomes. The location of other transcripts not associated with ribosomes is not known. They may be localized to the cytoplasm in non-ribosome complexes (Gebauer et al., 1998), or reside as RNA in the nucleus that awaits splicing or transport. The significance of retained introns has largely been ignored and relegated to the inclusion of incompletely spliced transcript in EST libraries (Zhu et al., 2003). Our experiments emphasize that retained introns are a genuine part of RNA metabolism.

AS in Arabidopsis is estimated to occur in 7–10% of the genes, about 30% of them are reported here as intron retention. Thus, 2–3% of the genes transcripts have retained introns. Bioinformatic analysis of 6400 known human genes with sufficient EST coverage showed that 17–28% of human genes contained at least one alternative transcript and approximately 5% of the genes contained a retained intron (Kan et al., 2002). Hence, on a genome-wide basis the total rate of intron retention in Arabidopsis and human appears to be similar despite the vastly reduced AS rate in plants. Interestingly, intron positions in human genes show the greatest similarity to Arabidopsis compared with C. elegans and D. melanogaster (Rogozin et al., 2003).

Regulatory implication of retained introns

A significant fraction of the retained introns appears as part of CDS, or bridging both CDS and UTR regions (17.5%), indicating that the AS transcript is likely part of a new open reading frame. The resulting transcripts would yield two proteins that can differ in their activity (Golovkin and Reddy, 1996; Paterno et al., 2002). Interestingly, the retained introns solely in the UTR region represent 29% of the AS distribution, however, UTR regions represent only 10% of the total genomic transcript region and indeed only 9% of the transcripts contain introns in those parts (Zhu et al., 2003). Thus, the preponderance of retained introns detected in these positions may indicate their regulatory role, for example, affecting RNA stability or translational efficiency (Gebauer et al., 1998). For example, the presence of a retained intron in the 3′ UTR in yeast HAC1 transcript blocks mRNA translation that is released only after novel cytoplasmic splicing as part of the unfolded protein response (Ruegsegger et al., 2001). Such mechanisms of regulation have not been explored in plants.

Retained introns may contain in-frame stop codons that are followed by spliced sections that contain exon–exon junctions. In animals, if such a junction is present more than 50–55 nt downstream from a premature stop codon the molecular phenomenon of NMD can come into play. Inspection of the positions of retained introns (Table 2) indicate that 29% of the retained introns are in the 3′ or 5′ UTR (i.e. are not within reading frames). A further 17.5% have open reading frames (CDS, CDS +3′ and 5′ UTR). A further 19% are present as last or single introns that do not have downstream exon–exon junctions. Thus, based on the animal model only a minority of retained introns (<35%) would come under the ‘position-of-an-exon-exon-junction’ rule (Maquat, 2004) and in those cases NMD could play a role. However, seven such transcripts were examined directly and in four cases (At5g07890, At4g07410, At1g04440, At4g01290) the intron-containing AS-type was readily detected (Table 3). Thus, it remains to be seen what role NMD plays in the control of plant mRNA quality control.

Stress such as exposure to cold, heavy metals or anaerobiosis, affects the efficiency or patterns of splicing (Luehrsen et al., 1994; Simpson and Filipowicz, 1996). The Bronze2 (Br2) locus in maize encodes for glutathione-S-transferase and is highly induced by heavy metals such as cadmium (Marrs and Walbot, 1997). Treatment of maize seedlings with cadmium caused a specific 20-fold increase in Bz2 message accumulation and a 50-fold increase in the presence of the unspliced, intron-containing transcript (Marrs and Walbot, 1997). The mechanisms by which some types of stress influence splicing in plants are largely unknown. It is therefore of great interest that transcripts related to situations of stress or stimuli inputs comprise approximately 10% of the transcripts that contain retained introns (Figure 2; Table S2). Thus, if particular stimuli influence intron retention, then conceivably the presence or absence of the intron either stabilizes the transcript or serves to modify its biological function. In contrast, transcripts related to cellular maintenance are underrepresented globally and may represent a cellular need to avoid stress-dependent intron retention.

We have demonstrated by bioinformatics tools and direct biochemical techniques that intron retention is a major phenomenon present in Arabidopsis AS. As predicted by the bioinformatics analysis and confirmed by RT-PCR, the transcripts containing introns can represent a considerable proportion of the total gene expression. The availability of a well-defined AS database that differentiates AS types facilitates the elucidation of the cellular mechanism that regulate expression, transport and biological function of intron retention.

Experimental procedures

Data sources

Arabidopsis ESTs were obtained from NCBI (, November 2003), and separated by cultivar based on their annotation in NCBI to create a cultivar-specific database. For the Arabidopsis genome sequence, the 4th April 2003 version was obtained from the Arabidopsis Information Resource center (

Alternative splicing and cluster analysis

EPGA was developed to identify AS by searching for pairs of ESTs that share two sequentially matching regions which flank a discontinuity in the alignment. BLAT was used to align all the ESTs to each other. A pair was reported to indicate AS if it had a 20 bp long alignment, followed by an indel of at least 4 bp, followed by a subsequent match of at least 20 bp. The total match between the two ESTs was at least 75 bp. As validation of EPGA in Arabidopsis, each pair suggested to indicate AS was matched against Arabidopsis genome using BLAT, and analysed for authenticity as described in the text. The quality of the alignment was used to determine whether it was an authentic or paralogous alignment. For identification, EST pairs were aligned, using BLAST to the TAIR database ATH1_seq (; 17 April 2003). The best hit was considered to be the transcript origin.

Positions of introns in transcripts were established using BLAST alignment to four TAIR databases: intron, 3′ UTR, 5′ UTR and CDS (; 28 February 2004). In this case, the potential genome transcripts have been annotated by their components, that is introns, 3′ UTR, 5′ UTR and CDS-coding sequences. All filtering programmes were written in perl (available upon request from H. N-G). Gene ontology classification of transcripts was carried out using the first level categories of the Gene Ontology classification of the biological processes of their protein products (

Polyribosome isolation and RT-PCR

All equipment was washed with 0.1 n NaOH and then with RNase-free water. Arabidopsis (strain Columbia) was grown on sterile Nitsch media for 10–13 days, 16-h light/8-h dark cycle. Approximately 2 g fresh weight from whole plants was ground in liquid nitrogen in 4 ml of Polysome extraction buffer [200 mm Tris–HCl, pH 9, containing 200 mm KCl, 35 mm MgCl2-hexahydrate, 25 mm EGTA, 200 mm sucrose, 500 μg ml−1 heparin, 100 mmβ-mercaptoethanol, 1% (w/w) Triton X-100 and 2% (w/w) polyethylene-10-tridecyl-ether]. Crude extracts were incubated for 5 min on ice and then centrifuged for 10 min at 15 000 g at 4°C. The supernatant was then transferred to a new tube and 0.5% sodium deoxycholate was added, followed by incubation for 5 min on ice and then centrifuged for 10 min at 15 000 g at 4°C. The supernatant was then loaded on four cushion sucrose gradients [1.4 ml 15% on top of 2.1 ml 60%] in 40 mm Tris–HCl, pH 9, containing sucrose (1.75 m for the 60% or 0.5 m for the 15%), 200 mm KCl, 30 mm MgCl2-hexahydrate, 5 mm EGTA, 500 μg ml−1 heparin and 100 mmβ-mercaptoethanol), and centrifuged for 3 h at 90 000 g at 4°C. Supernatants were discarded, and pellets were air-dried. For 15–60% linear sucrose gradients, the pellets were resuspended each in 500 μl of polysome buffer. The supernatants were collected and 400 mm KCl was added. The supernatant was divided and 1 mm puromycin was added where indicated. Tubes were incubated for 5 min in 37°C and additional 5 min in RT and then centrifuged for 10 min at 15 000 g at 4°C. The supernatants were loaded onto two different 4 ml 15–60% sucrose gradients in 40 mm Tris–HCl, pH 9, containing sucrose, 20 mm KCl, 10 mm MgCl2-hexahydrate, and 500 μg ml−1 heparin. Gradients were centrifuged for 1 h at 175 000 g at 4°C and were harvested from the top of the gradient. The absorption of the gradient at 260 nm was monitored using UVIKON spectrophotometer 930 (Kontron, Zurich, Switzerland).

For RT-PCR, plant extracts were subjected to the two-step cushion sucrose gradient as described above with the following modification. Extracts were incubated for 5 min on ice and then centrifuged for 10 min at 15 000 g at 4°C. The supernatant was collected and supplemented with 400 mm KCl. The supernatant was then divided and 1 mm puromycin was either added or excluded. Tubes were incubated for 5 min at 37°C and an additional 5 min at room temperature and then centrifuged for 10 min at 15 000 g at 4°C. To each tube 0.5% sodium deoxycholate was added. Tubes were incubated for 5 min on ice and then centrifuged for 10 min at 15 000 g at 4°C. Supernatants were loaded on two cushion sucrose gradient and centrifuged (as described for polyribosomes). The top 3 ml were collected from each gradient and pellets were air-dried.

RNA was extracted from the pellet or supernatant obtained from the two cushion gradients, using RNeasy (Qiagen, Hilden, Germany). RT-PCR was conducted as previously published (Savaldi-Goldstein et al., 2003). Oligonucleotide primers were designed for simultaneous amplification of both the spliced and unspliced variants. A complete list of the PCR primers used in this study is available as supplementary data Table S3.


This research was supported by the Israel Science Foundation grant no. 388/02 and BARD United States-Israel Binational Agricultural Research and Development Fund grant no. IS-3454-03.

Supplementary material

The following material is available from

Table S1 EPGA ESTs pairs

Table S2 Gene ontology classification distribution

Table S3 PCR primers