What makes a nucleotide sequence an exon (or an intron) is a question that still lacks a satisfactory answer. Indeed, most eukaryotic genes are full of sequences that look like perfect exons, but which are nonetheless ignored by the splicing machinery (hence the name ‘pseudoexons’). The existence of these pseudoexons has been known since the earliest days of splicing research, but until recently the tendency has been to view them as an interesting, but rather rare, curiosity. In recent years, however, the importance of pseudoexons in regulating splicing processes has been steadily revalued. Even more importantly, clinically oriented screening studies that search for splicing mutations are beginning to uncover a situation where aberrant pseudoexon inclusion as a cause of human disease is more frequent than previously thought. Here we aim to provide a review of the mechanisms that lead to pseudoexon activation in human genes and how the various cis- and trans-acting cellular factors regulate their inclusion. Moreover, we list the potential therapeutic approaches that are being tested with the aim of inhibiting their inclusion in the final mRNA molecules.
Towards the end of the 1970s, in the beginning of pre-mRNA splicing research [1,2], defining exons and introns was essentially based on observing the final composition of the mature mRNA molecule. In 1978, any sequence that was included in a mature mRNA became tagged as an ‘exon’, whereas all the intervening genomic sequences that were left out during the splicing process became defined as ‘introns’ . However, this way of thinking did not explain what makes an exon an exon or an intron an intron. The discovery of the basic splice site consensus sequences during the same years [4,5], and later on of enhancer and repressor elements, has taken us a long way in the direction of discovering exon- and intron-definition complexes [6–8]. Nowadays, the splicing signals that define exons/introns have been greatly aided by basic research, bioinformatic approaches and advanced sequencing tools [9,10]. In this regard, we certainly know much more about splicing regulation than we did 20 years ago. Considering that several reviews have been written recently on the subject, the reader is referred to them for further information on the latest discoveries [11–14]. Most important, in this respect, have been the initial observations that in alternative splicing processes the same nucleotide sequence could be defined by the spliceosome as an intron or an exon in response to specific signals [15,16]. It is now clear that these kinds of decision (What is an exon? What is an intron?) are of paramount importance in explaining genome complexity and evolutionary pathways [17–20]. However, the sum of this new knowledge does not necessarily mean that we are near the goal of understanding most splicing decisions. Indeed, even the latest attempts at ‘designing’ exons based on current state-of-the-art knowledge have basically demonstrated that there is still a long way to go before we can become as good as the spliceosome in deciding what is an exon and what is an intron .
Where do pseudoexon sequences come into the story?
Central to the issue of deciding what is an exon and what is an intron is the question of their origin, a very much debated field to this day that basically deals with deciding the order of appearance of introns during evolution, whether first, early or late . Whatever the answer to this question will turn out to be, it is now clear that many of the ‘new’ exons in our genome originate from the insertion of transposable sequence elements belonging to the SINE and LINE classes in the eukaryotic genome [23–25]. In particular, exonization of Alu elements (which are primate specific and represent the most abundant mobile elements in the human genome) through retrotranposition–mutation events is a prominent source of new exons in the eukaryotic transcriptome, as schematically depicted in Fig. 1 [26,27].
However, even if we ignore this particular class of exonization event, every in silico analysis shows that ‘false exons’ are very abundant in the intronic sequences of most genes [with this term we refer to any nucleotide sequence between 50 and 200–300 nucleotides in length with apparently viable 5′ and 3′ splice sites (5′ss and 3′ss) at either end]. Presently, there is evidence that inclusion of many of these sequences is actively inhibited due to the presence of intrinsic defects , the presence of silencer elements [29–31] or the formation of inhibiting RNA secondary structures . Even if a combination of all these elements succeeds in repressing the use of many of these pseudoexon sequences, we have to consider the possibility that there must be many exceptions to this rule.
First, it is probable that several of these pseudoexons may actually be recognized only in particular circumstances, such as a consequence of particular external stimuli [33,34] or present in a given tissue or developmental stage. Proof of this possibility is the observation that ‘novel’ exons keep being identified even in well-known and studied genes, such as the DMD gene .
Second, our failure to observe their use in normal conditions may also be due to the fact that their inclusion can intentionally lead to premature insertion of a termination codon in the mature mRNA and the consequent rapid degradation by nonsense-mediated decay (NMD) pathways  (Fig. 1). Such an occurrence has been described in the rat α-tropomyosin gene with a putative pseudoexon sequence localized downstream of two mutually exclusive exons: an upstream exon that is included only in smooth muscle tissue and a downstream exon that is included in most cell types . Experimental analysis has shown that, when this pseudoexon is included in the mRNA molecule together with the ubiquitously expressed downstream exon, the formation of a stop codon causes activation of the NMD pathway. On the other hand, when inclusion of this pseudoexon occurs with the upstream smooth muscle tissue-specific exon, then it can still be removed through a resplicing pathway (and a normally processed mRNA molecule can be generated). For this reason, the term ‘nonsense’ exon is now preferred to define these kinds of sequence, which according to bioinformatic analyses may be more prevalent in human genes than previously thought .
Nonetheless, from a human disease point of view, many pseudoexon intronic sequences seem poised on the brink of becoming exons (Fig. 1) and a comprehensive list of more than 60 published pathological pseudoexon events is presented in Table 1. Although briefly reviewed previously elsewhere , the recent advances in pseudoexon research warrant a second look at several pseudoexon-related issues, especially with regards to novel therapeutic approaches.
Table 1. Pathological pseudoexon inclusion events in human disease. NA, not available; SRE, splicing regulatory element.
a Alu-derived pseudoexons. b LINE-2-derived pseudoexons.
As previously mentioned, most pathological pseudoexon inclusion events originate from the creation of new splicing donor or acceptor splice sites within an intronic sequence, followed by the subsequent selection of weaker ‘opportunistic’ acceptor or donor site sequences (Fig. 2A). A preliminary analysis of the strength of donor sites activated in pseudoexon inclusion events has highlighted their relatively high strength (according to in silico prediction programs) with respect to normally processed exons and to cryptic donor sites activated following normal donor site inactivation . In a slightly lower number of cases, pseudoexon activation has been observed following the creation of de novo acceptor sites (Table 1), whereas branch-point creation still represents a minority (probably owing to the fact that a new branch point needs to find both a viable acceptor and donor site nearby, rather than just one of them).
In addition to de novo creation of strong donor, acceptor and branch site sequences, the other most frequent mechanisms that may lead to pseudoexon activation involves the creation/deletion of splicing regulatory sequences that will be discussed more in detail below (Fig. 2B). Finally, in two individual cases, the rearrangement of genomic regions through gross deletions (Fig. 2C)  or genomic inversions (Fig. 2D)  has also been described to give rise to pseudoexon inclusion events. This has come about either by bringing together viable splice sites that would normally be too far away from each other on the gene sequence or by activating exons in what would normally have been the antisense genomic strand.
In a few genes, a particularly interesting method of pseudoexon activation event has also occurred following the inactivation of naturally occurring upstream 5′ss (FAA, IDS, MUT) [42–45] or downstream 3′ss (BRCA2, CFTR) [46,47] (Fig. 2E). These findings suggest that the processivity of these mRNA transcripts probably represents an element capable of determining pseudoexon repression apart from being capable of influencing normal splicing levels .
On a more general note, a still underappreciated aspect of pseudoexon recognition that concerns the effect of cis-acting sequences is represented by the potential influence of RNA secondary structure on splicing efficiency . Recently, it has been shown that donor site usage in the inclusion of two pseudoexon sequences in the ATM and CFTR genes is strongly dependent on their availability in the single-stranded region . Interestingly, the same conclusion was reached in a recent study by Schwartz et al.  analysing the differences between exonized and nonexonized Alu elements. In this work, it was found that one of the major discriminating factors between these two classes of Alu elements was represented by the potential availability of 5′ss sequences in an unstructured conformation.
Trans-acting factors in pseudoexon inclusion
Not many studies have focused on identifying the role played by trans-acting factors in pseudoexon inclusion. However, because of its significance, this is an area of research that would probably benefit from increased attention by researchers in the future.
In the case of nonpathologically related pseudoexons carrying nonsense codons, the presence of splicing regulatory elements may well provide a clue with regards to the possible roles played by these sequences. For example, in the case of the previously described tropomyosin pseudoexon , the specific binding of hnRNP H/F proteins has been described as a potential key modifier of this pseudoexon inclusion event . The fact that these proteins are particularly downregulated in cardiomyocytes may explain the cell-specific repression of the downstream ‘normal’ exon 3 that is otherwise present in all cell types (Fig. 3A).
Interestingly, repression of the tropomyosin nonsense exon was also observed following PTB overexpression. PTB is a well-known and powerful splicing modifier that plays a major role in alternative splicing regulation [8,53]. Recently, this protein has been reported to also downregulate the inclusion efficiency of a pathological pseudoexon in NF-1 intron 31 independently of the activating mutation that creates a very strong splicing acceptor site  (Fig. 3B). This finding suggests that silencer binding sites may be actively used by evolutionary mechanisms to decrease the probability that random activating mutations may determine the constitutive inclusion of pseudoexon sequences.
In this respect, one interesting molecular complex is U1snRNP, a ribonucleoprotein complex normally associated with 5′ss recognition in the normal splicing process . First, U1snRNP binding to an intronic splicing processing element has been found to inhibit pathological pseudoexon inclusion in intron 20 of the ATM gene (Fig. 3C). Inactivation of this element through a four nucleotide deletion causes pseudoexon inclusion and occurrence of ataxia telangiectasia in a patient . In a second case, binding of hnRNP E1 and U1snRNP to a weak 5′ss efficiently silences pseudoexon inclusion in the GHR gene , preventing the development of Laron syndrome (Fig. 3D).
Finally, it should also be noted that in a variety of pseudoexon inclusion events, the activating mutations potentially created new splicing enhancer sequences (Table 1). Although in very few of these cases was trans-acting factors binding to these elements identified, in silico and experimental analyses have shown that several of the newly created enhancer sequences strongly correlate with potential binding to the SR protein class of splicing regulators.
Therapeutic strategies aimed at correcting pseudoexon inclusion in genetic diseases
Therapeutic strategies based on antisense oligonucleotide (AON) chemistry, which uses base pairing to target specific sequences in RNAs, have been extensively employed to correct splicing disorders in human genes [58,59]. Interestingly, apart from these therapeutic applications, short nuclear RNAs may also play a similar functional role to physiologically regulate exon inclusion, such as the case of snoRNA HBII-52 in the regulation of exon Vb inclusion in the serotonin receptor 2C . AONs are thought to modulate the splicing pattern by steric hindrance of the recruitment of the splicing factors to the targeted splicing competent cis-elements, thus forcing the machinery to use the natural sites. Dominski and Kole  were the first to pioneer the antisense-mediated modulation of pre-mRNA splicing. In the earliest examples, AONs were aimed at activated cryptic splice sites in the β-globin and CFTR genes in order to restore normal splicing in β-thalassaemia and cystic fibrosis patients [61,62]. Currently, however, AON strategies have been used successfully to restore normal splicing in several disease models.
Afibrinogenemia is caused by genetic abnormalities within any of the three genes that encode the fibrinogen molecule: FGA, FGB, FGG. Recently, Davis et al.  showed that a homozygous c.115–600A>G point mutation located deep within intron 1 of FGB causes pseudoexon inclusion. In this study, pseudoexon inclusion was corrected by targeting this mutation with an antisense phosphorodiamidate morpholino oligonucleotide.
In several forms of β-thalassaemia, two single nucleotide mutations (IVS2-705 and IVS2-654) in the β-globin gene have been reported to cause pathological pseudoexon insertion. In 1993, Dominski and Kole  successfully tested 2′-O-methylribose AONs to restore correct splicing. Later, Sierakowska et al.  also restored correct splicing and β-globin polypeptide production using a phosphorothioate 2′-O-methyl-oligoribonucleotide targeted to the aberrant 3′ss. More recently, Gorman et al. [65,66] engineered the U7 snRNA gene to correct pre-mRNA splicing by replacing the antihistone sequence with sequences targeting β-globin aberrant splice sites (Fig. 4A).
The congenital disorders of glycosylation are caused by defects in the PMM2 gene. Recently, Vega et al.  studied a c.640–15479C>T deep intronic mutation that creates a new aberrant 5′ss in intron 7 and caused pseudoexon activation. Antisense morpholino oligonucleotides that targeted the aberrant 5′ss and 3′ss sites achieved 100% restoration of correctly spliced mRNA.
Pseudoexon-activating mutation 3849 + 10 kb C > T in intron 19 of the CFTR gene has been reported to frequently cause cystic fibrosis. In their study, Friedman et al.  reported that a cocktail of 2′-O-methyl phosphorothioate oligoribonucleotides against different regions of this pseudoexon abolished pseudoexon inclusion and partially restored production of normal mRNA and CFTR processed protein (Fig. 4B).
Mutations in the DMD gene are known to cause Duchenne and Becker muscular dystrophies. Recently, Gurvich et al.  demonstrated that 2′-O-methyl ribose phosphorothioate AONs restored normal splicing in primary myoblast cultures established from two individual patients carrying out-of-frame pseudoexon insertion mutations (Fig. 4C).
Methylmalonic acidaemia and propionic acidaemia are caused by different gene defects in the MUT, PCCA and PCCB genes. Ugarte et al.  recently reported the identification of three novel deep intronic mutations in each of these genes that potentially lead to pseudoexon activation through diverse mechanisms. Antisense therapeutics using antisense morpholino oligomers correctly restored almost complete normal splicing that was effectively translated.
Ocular albinism type 1 involves mutations in the OA1 gene. Vetrini et al.  identified a deep intronic point mutation g.25288G>A that created a new acceptor splice site in intron 7 of this gene and resulted in pseudoexon inclusion. Treatment of a patient’s melanocytes with antisense morpholino AONs complementary to the mutant sequence rescued mRNA and protein expression levels.
Mutations in the NF-1 gene cause neurofibromatosis type 1. Recently, Pros et al.  identified six neurofibromatosis type 1 patients carrying three different deep intronic mutations that create new 5′ss leading to the activation of the pseudoexon in the mature mRNA. In this study, antisense morpholino oligonucleotides were targeted against these newly created 5′ss, effectively restoring normal NF-1 splicing.
All of these different therapeutic strategies are summarized in Table 2.
Targeting of antisense against aberrant splice sites, in-frame stop codon and predicted exonic splicing enhancers within pseudoexons
This review is part of a miniseries co-ordinated by Diana Baralle  to look at emerging topics in splicing research, such as the correct assessment of sequence variants as pathogenic mutations ; the development of novel splicing-based therapeutic agents to treat HIV-1 infections ; and new methods in the global analysis of alternative splicing profiles . We decided to examine the role of pseudoexons in recent research, as no specialized reviews have appeared in the past dealing with this particular kind of event.
From a basic science point of view, the possibility for researchers to look at the splicing process on a much more global scale than the single exon or the individual gene will clarify the issues examined in this review by helping to distinguish clearly between exons and pseudoexons [19,75,76]. In turn, this will provide a better appreciation regarding how the splicing process has evolved to define ‘exons’, how it distinguishes them from similar potentially pathological sequences (pseudoexons) and what is the preferential way it has chosen to repress their recognition. In this respect, pseudoexon research will also provide us with an unparalleled opportunity to understand evolutionary mechanisms that cause some of these sequences to become exons and, of course, vice versa.
Considering that aberrant pseudoexon inclusion events are an increasing phenomenon linked with disease, just the simple characterization of these sequences may have some very practical consequences. The studies reported in this review clearly highlight the feasibility of using AONs to correct these types of splicing defect (even in the absence of a complete or even partial understanding of the ‘basic science’ explaining their occurrence). From a therapeutic point of view, the major advantage of targeting pseudoexon inclusion events is provided by the supposition that AONs targeted against what would normally be intronic sequences would not be expected to remain bound to the mature mRNA (and thus interfere with later stages of RNA processing, such as export/translation). However, several factors will still need to be improved before human application becomes a reality. These start from basic studies aimed at optimizing gene/exon specificity (that will necessarily have to be made on an individual gene-specific basis) to the development of appropriate carrier systems. These systems will be absolutely necessary to achieve successful delivery, low toxicity and avoidance of undesired immune responses. Furthermore, even after achieving all of these aims, there will still remain the need to optimize recurrent administration protocols (this is an often overlooked consideration, as none of these methods will cause a permanent correction of mRNA splicing defects), and determining their clearance/accumulation in human organs/tissues. However, notwithstanding all of these difficulties, AON technology [59,77] has already entered the clinical trial stage for diseases such as Duchenne muscular dystrophy (http://www.clinicaltrials.gov) and this represents a bright hope for the not too distant future.
This work was supported by Telethon Onlus Foundation (Italy) (grant no. GGP06147) and by a European community grant (EURASNET-LSHG-CT-2005-518238). We thank Professor F. E. Baralle for helpful discussion.