Correspondence to: R. Zhao, Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Aurora, CO 80045. E-mail: firstname.lastname@example.org
Pre-mRNA splicing is a critical event in the gene expression pathway of all eukaryotes. The splicing reaction is catalyzed by the spliceosome, a huge protein-RNA complex that contains five snRNAs and hundreds of different protein factors. Understanding the structure of this large molecular machinery is critical for understanding its function. Although the highly dynamic nature of the spliceosome, in both composition and conformation, posed daunting challenges to structural studies, there has been significant recent progress on structural analyses of the splicing machinery, using electron microscopy, crystallography, and nuclear magnetic resonance. This review discusses key recent findings in the structural analyses of the spliceosome and its components and how these findings advance our understanding of the function of the splicing machinery.
In eukaryotes, DNA is first transcribed into pre-mRNAs that often contain introns. The removal of introns (pre-mRNA splicing) is an essential step in the gene-expression pathway of all eukaryotes. In higher eukaryotes such as mammals, an average of 95% of the nucleotides in the primary transcript of a protein-encoding gene are introns. These introns need to be removed precisely by splicing before the mRNA can be transported out of the nucleus and translated. Alternative splicing greatly expands the gene coding capacity, and over 95% of human genes are alternatively spliced. It is also becoming increasingly clear that alternative splicing is a fundamental component of eukaryotic gene regulation, influencing cell differentiation, development, and many processes in the nervous system. Errors in splicing contribute to at least 15% of human genetic disorders, such as retinitis pigmentosa, spinal muscular atrophy, and myotonic dystrophy.[6-8] Aberrant splicing also plays a significant role in the onset and development of many other diseases.[6, 7] For example, alterations in the alternative splicing of several pre-mRNAs (such as WT1 and CD44) correlate with neoplastic conversion and metastatic potential in cancer.
A typical intron contains a conserved 5′ splice site (5′ ss), a branch point sequence (BPS) followed by a polypyrimidine tract (PYT), and a 3′ splice site (3′ ss).[9, 10] Introns are removed through two transesterification reactions. In the first transesterification reaction, the 2′ hydroxyl group of the critical adenosine residue in the BPS attacks the phosphate group at the 5′ ss, generating a lariat intermediate. In the second transesterification reaction, the newly freed 3′ hydroxyl group from the cleaved 5′ exon attacks the phosphate group at the 3′ ss, releasing the lariat and ligating the two exons.
Pre-mRNA splicing is catalyzed by the spliceosome, a huge RNA/protein complex. The spliceosome contains five small nuclear RNAs (U1, U2, U4, U5, and U6 snRNAs), which form five small nuclear ribonucleic particles (snRNPs) with their associated proteins, and numerous non-snRNP associated protein factors.[11, 12] Proteome studies indicate that the Saccharomyces cerevisiae spliceosome contains ∼80 and the human spliceosome contains ∼170 different proteins. Spliceosome components assemble on pre-mRNA in a step-wise manner (Fig. 1). The first spliceosome assembly step involves the initial recognition of an intron by the spliceosome and the formation of the E complex. The 5′ ss is recognized by U1 snRNP, the BPS by SF1 (BBP in yeast), and the PYT by U2AF (MUD2 in yeast). Subsequently, the U2 snRNP joins the spliceosome, replaces SF1, and forms the A complex. The subsequent joining of the U4/U6.U5 tri-snRNP signifies the B complex. Extensive structural rearrangements occur at this stage to activate the spliceosome. During this activation process, the base-pairing between the 5′ ss and U1 snRNA is disrupted, and the 5′ ss interacts with U6 snRNA instead, using largely the same nucleotides that base-paired with U1 snRNA. The base-pairing between U4 snRNA and U6 snRNA is also disrupted, and new interactions between U2 snRNA and U6 snRNA are formed, which are mutually exclusive with those in the original U4/U6 complex. These rearrangements help convert the spliceosome to the catalytically active B* complex that is ready for the first catalytic reaction. After the first transesterification reaction, the spliceosome repositions the substrate to form the C complex that is ready for the second catalytic reaction. The second reaction is followed by postcatalytic rearrangements to liberate the mature mRNA for export and release the lariat intron to be degraded and snRNPs to be recycled. The spliceosomal assembly, activation, and disassembly process involves many ATP-dependent conformational rearrangements facilitated by DExD/H-box RNA helicases,[9, 10, 15] which are potentially important for the fidelity control of splicing.
The large size and highly dynamic nature (both in composition and conformation) of the spliceosome presented daunting challenges to structural studies. However, there has been significant progress in structural analyses of the splicing machinery using a variety of approaches. This review will focus on recent progress on understanding the structural basis of pre-mRNA splicing and how it relates to the function of the splicing machinery. We will highlight a number of electron microscopy (EM) structures that generated the overall (but less detailed) view of the shape and structural organization of the spliceosome, as well as crystallographic and nuclear magnetic resonance (NMR) studies that provided high resolution structures of the individual components, fragments, or subcomplexes.
Overall Structural Organization of the Spliceosome Revealed by EM
Single particle EM reconstruction remains the method of choice for determining the overall morphology and architecture of the spliceosome. Although often limited to low or medium resolutions, EM has the advantage of requiring 100- to 1000-fold less samples compared to crystallography and NMR. Spliceosomes used for EM studies are typically assembled in vitro by incubating a well-defined pre-mRNA substrate with HeLa nuclear extract or yeast whole cell extract, then purified by affinity chromatography using antibodies specific for spliceosomal components, affinity tag on protein splicing factors, biotinylated pre-mRNA, or pre-mRNA containing MS2-binding aptamers. Double affinity purification or gradient centrifugation is often used to further improve homogeneity, but sample heterogeneity is still the major challenge for improving resolutions of EM structures. The resolution of most EM structures of spliceosomes is too low to directly identify secondary structures and individual components. Labeling techniques using antibody or genetic tags (often large fusion proteins) are often needed to decipher spatial arrangements of components in the spliceosome.
Three-dimensional (3D) structures of several spliceosomal complexes are determined, including the human A complex at 40–50 Å resolution, BΔU1 complex at 40 Å resolution, the core C complex at 30 Å resolution from the Moore laboratory, and a larger C complex at 20–29 Å resolution from the Stark and Luhrmann laboratories (Fig. 2). The A complex (overall size ∼230 × 200 × 195 Å) contains a globular main body with several smaller protruding elements, including a prominent head domain and foot-like protrusions [Fig. 2(a)]. The BΔU1 complexes (overall size ∼370 × 270 × 170 Å) consist of a flexible head domain attached to a rigid triangular body domain that resembles the tri-snRNP in size and shape [Fig. 2(b)]. The head domain can accommodate the A complex structure. Labeling studies show that the 5′ exon, 3′ exon, intron, and SF3b155 (a protein in U2 snRNP that is in close proximity to the BPS and 3′ ss) are all located in the head domain at several distinct areas. The C complex structure determined by Jurica et al. is ∼270 × 220 × 240 Å, which is similar in size and shape to the triangular domain in the B complex and may represent a stable core C complex [Fig. 2(c)]. The core C complex contains a cylindrical top domain and a larger bottom domain with an extended arm domain. These three domains are arranged in a relatively open conformation surrounding a central cavity. U5 snRNP, U2 snRNP, and Prp19 (a core component of the NineTeen Complex involved in spliceosomal activation) potentially fit into these different domains. By introducing a RNA hairpin that binds to the bacterial phage coat protein PP7, the two exons in pre-mRNA are shown to be close to each other and located in the general area between the top and bottom domain. Using actin filament labeling, the intron region upstream of the branchpoint is located to the top domain. The Luhrmann laboratory purified and determined the 3D EM structure of a larger C complex (∼360 × 340 × 270 Å), whose larger size is potentially due to a milder purification condition compared to the core C complex [Fig. 2(d)]. Labeling studies indicate that the 5′ end of the 5′ exon is located in domain h or the upper portion of domain 2. Fitting of a postcatalytic 35S U5 snRNP and a salt stable C complex core (containing all components of the 35S U5 snRNP plus U2 and U6 snRNAs) indicate that the 35S U5 snRNP occupies the central and lower region of the C complex, and the catalytic core containing the U2 and U6 snRNAs is likely located in domain 2a.
In addition to the 3D EM structures discussed above, 2D EM images of the following spliceosomal complexes have been obtained: human spliceosomal B and Bact (a complex that has lost U1 and U4 snRNAs compared to the B complex but has not undergone the conformational change catalyzed by helicase Prp2 to become the B* complex) complexes, yeast spliceosomal B, Bact, B*, and C complexes,[25, 26] and fly spliceosomal B and C complexes. In general, the same complex in different species is well conserved in terms of shape and size. For example, B complex from human, yeast, and fly all have the triangular or rhombic shape with some subtle differences at the head region. On the other hand, although all complexes are ∼400 Å in their maximum dimensions, the different spliceosomal complexes look significantly different, likely reflecting the compositional and conformational change that occurred during the splicing process. For example, B complex has a triangular or rhombic shape, but the Bact and C complex are progressively more compact. The difference between Bact and B* before and after Prp2-mediated conformational arrangements is also obvious.
Improving sample homogeneity will be the key for improving the resolution of EM structures of spliceosomal complexes. The yeast spliceosome is less complex in its components, and there are many genetic approaches that can arrest the spliceosomal complexes at particular stages. We may see more EM structures of yeast spliceosome in the future.
Structures of snRNPs Revealed by EM and Crystallography
EM structures of snRNPs
snRNPs are much more stable in terms of composition and conformation than the entire spliceosomal complex. There have been medium resolution EM studies of several snRNPs. Human U1 snRNP is the best-studied snRNP biochemically and structurally since it is the smallest snRNP in the cell. The cryo-EM structure of U1 snRNP is determined at 10–14Å resolution. The EM structure is highly consistent with the 5.5Å crystallographic structure of U1 snRNP and the structural details will be discussed later in the crystallographic section of U1 snRNP.
Human U2 snRNP contains U2 snRNA, seven Sm proteins, and 15 U2-specific proteins, most of which are components of the SF3a and SF3b subcomplexes. SF3b plays an important role in BPS recognition and its p14 protein component directly crosslinks to the branch point adenosine.[31-33] The 3D structure of the entire U2 snRNP is unknown but the EM structure of SF3b has been determined to less than 10 Å resolution [Fig. 3(a)]. SF3b has a shell like body roughly 125 × 150 × 160 Å in size, which encloses a large cavity. The intermediate resolution of the EM structure allows the tentative assignment of several protein components with known structural folds, including SF3b49 and p14 [both containing RNA recognition motif (RRM) domain] as well as Sf3b155 (containing 22 tandem HEAT repeats) into the complex without labeling. Protein p14 is likely located in the central cavity of the SF3b complex and the 22 tandem HEAT repeats of SF3b155 are located in the outer shell of the complex enclosing p14. Interestingly, since p14 seems to be located in the interior of the closed SF3b structure, conformational changes of SF3b may be needed for p14 to access the BPS.
The U4/U6.U5 tri-snRNP is the largest snRNP in the spliceosome. The EM structures of the human tri-snRNP (21 Å resolution, ∼305 × 200 ×175 Å) and its U5 (26–32 Å resolution, ∼265 × 150 × 120 Å) and U4/U6 (∼40 Å resolution) snRNP components, as well as the yeast tri-snRNP and U6 snRNP, have been determined [Fig. 3(b)]. The human and yeast tri-snRNP have a similar triangular shape with a head domain and the main body connected at the center. Labeling studies show that U4/U6 snRNP is located in the arm domain and the head domain contains U5 protein Brr2, while Prp8 and Snu114 are located around the center of tri-snRNP where the head and body domain meet. Loop 1 of U5 snRNA that was thought to align the two exons during the splicing reaction[38-40] is located at the center of the tri-snRNP. The head and arm domains are flexible, which may have functional implications during spliceosomal activation.
Crystal structure of snRNPs
Human U1 snRNP is the largest subcomplex in the spliceosome whose crystallographic structure has been determined, either by reconstituting with all recombinant components or by direct purification from HeLa cells. Human U1 snRNP contains U1 snRNA, seven Sm proteins common to all snRNPs, and three U1-specific proteins (U1-A, U1-70K, and U1-C). The base-pairing between the 5′ end of U1 snRNA and the 5′ ss is critical for 5′ ss recognition. There have been extensive biochemical studies and structures of U1 snRNP components before the entire U1 snRNP structure was determined. U1 snRNA was thought to form four stem loops (SL1–4) through previous biochemical studies and computer modeling. U1-70K and U1-A bind to SL1 and 2, respectively, through their RRM domains.[45, 46] While U1-C does not bind to U1 snRNA by itself, the addition of the first 97 residues of U1-70K enables the association of U1-C with U1 snRNP. Crystal structures of several Sm proteins were determined, leading to a model of seven Sm proteins arranged in a ring-like structure. The crystal structures of the RRM domain of U1-A in complex with SL2, and the NMR structure of the Zn-finger domain of U1-C, were also determined. Building on these successes, Nagai and coworkers reconstituted the entire human U1 snRNP using recombinant proteins (omitting the nonessential U1-A) and in vitro transcribed U1 snRNA and determined the crystal structure of U1 snRNP at 5.5 Å resolution (Fig. 4). The Sm proteins, U1C Zn-finger, and a homology model of U1-70K RRM domain were fitted into the electron density map using anomalous peaks from zinc and Se-Met substituted proteins as landmarks. Path of the long N-terminal arm of U1-70K was established utilizing seven Met mutations engineered in the N-terminus.
U1 snRNA upstream of the Sm binding site forms a four-way junction where SL1 and 2 as well as SL3 and helix H coaxially stack [Fig. 4(a)]. The seven Sm proteins form a heptameric ring on the Sm binding site, which interacts and stabilizes the four-way junction [Fig. 4(a)]. The RRM domain of U1-70K interacts with SL1 [Fig. 4(b)]. The N-terminal arm of U1-70K extends over 180Å from its RRM domain, wraps around the core Sm domain, and finally contacts U1-C [Fig. 4(b)], which explains why the N-terminus of U1-70K is necessary and sufficient for U1-C binding to the U1 core domain. In the crystal, the 5′ end of U1 snRNA basepairs with its counterpart from a neighboring molecule and mimics the interaction between U1 snRNA and the 5′ ss [Fig. 4(b)]. Helix A of U1-C binds in the minor groove of this duplex. The loop between the two His residues in the Zn-finger of U1-C is in close proximity to the proposed pre-mRNA strand. Arg21/Lys22 on Helix A and Arg28/Lys29 in this loop can potentially interact with the phosphate backbone of RNA. These structural features are consistent with the essential role of U1-C and R28/K29 in mediating the binding between U1 snRNA and the 5′ ss to form the E complex.
Weber et al. also determined the crystal structure of human U1 snRNP purified from HeLa cells to 4.4 Å resolution, after in situ limited proteolysis in the crystallization drop. The overall structure is similar to the structure of the reconstituted human U1 snRNP. The structure revealed more details on the organization of Sm proteins on RNA at the Sm core, although the interpretation cannot yet reach atomic level at this resolution.
Leung et al. assembled Sm proteins on a fragment of U4 snRNA and determined a 3.6 Å resolution structure of the U4 snRNP core, providing significant more details on the structural organization of the Sm core. The structure demonstrates that the Sm site heptad (AUUUUUG) resides inside the central channel of the heptameric ring of Sm proteins, interacting one-to-one with SmE-SmG-SmD3-SmB-SmD1-SmD2-SmF [Fig. 4(c)]. Each base of the Sm binding site interacts in a distinct manner with four key residues at equivalent positions in the L3 and L5 loops of the Sm fold. These interactions explain the specific requirement for an adenine at the first position of the heptad, the three uredines at Positions 2–4, and the tolerance of positions five and six for other bases. The same Sm proteins demonstrate some structural differences (particularly in the terminal extensions) between U1 and U4 snRNP, potentially providing selectivity for snRNP specific proteins during snRNP assembly.
High Resolution Structures of Spliceosomal Components Revealed by Crystallography and NMR
Although the composition and conformation dynamics made crystallographic studies of the entire spliceosome difficult, crystallography and NMR studies of protein or RNA components of the spliceosome continue to make important contributions to our understanding of the structure and function of the splicing machinery (Table 1; Supporting Information Table 1). We will review below high resolution structural studies of representative classes of spliceosomal proteins and snRNAs.
Table 1. Major Crystallographic or NMR Structures (Complete or Partial) of Unique Spliceosomal Protein or RNA Components Determined to Date
Unique protein or RNA structures (complete or partial) determined
When more than one homologous structure of a protein is available, only the protein from one species is listed. More details of the structure are provided in Supporting Information Table 1.
U1 snRNP, U1-70K, U1A, U1C, and Prp40
SF3a components (Prp9, Prp11, Prp21), SF3b components (SF3b155, SF3b14, SF3b2, SF3b4), U2AF35, U2AF65, PUF60, SPF45, U2 A', U2 B'', and BPS/U2 snRNA
Prp8 is arguably the most intriguing protein in the spliceosome. Prp8 is a component of the U5 snRNP and is also present in the tri-snRNP and the spliceosome (reviewed in Ref. 53). Prp8 is one of the largest proteins in the nucleus and Prp8 from all species are over 2000 amino acids in length. Prp8 is also one of the most conserved proteins in the nucleus. There is over 60% sequence identity between human and yeast Prp8. However, Prp8 has remarkably low sequence similarities with other proteins, making it difficult to deduce function from sequence analyses. On the other hand, functional studies clearly point out Prp8 as a key player at the center of the splicing reaction. UV crosslinking experiments demonstrate that Prp8 extensively crosslinks with the 5′ ss, 3′ ss, BPS, U5, and U6 in in vitro-assembled snRNPs and spliceosomes. Analyses of in vivo RNA-binding sites of Prp8 in the entire yeast cell using crosslinking experiments with intact cells followed by next generation sequencing revealed extensive Prp8 footprints on U5 snRNA, followed by U6, U1, U2, and pre-mRNAs. The Prp8 footprints and crosslinking sites on U6, U2, and U5 snRNAs and pre-mRNA led to the hypothesis that Prp8 may help form/stabilize the catalytic core or even contribute functional groups to the splicing reaction (reviewed in Refs. 53 and 54). Genetic analyses identified numerous Prp8 mutants that either suppress or exacerbate other splicing mutations, many of which are on splicing factors important in spliceosomal activation, leading to the hypothesis that Prp8 may also be a master regulator of spliceosomal activation.
Crystal structures of the C-terminal domain (CTD) of Prp8 from Caenorhabditis elegans and yeast (Residues 2147–2413) reveal a central MPN domain with N- and C-terminal extensions.[55, 56] This domain will be referred to as the MPN domain of Prp8. A subset of the MPN domains containing a JAMM motif was thought to coordinate a Zn++ ion and functions as metalloproteases, although the general function of most MPN domains remains unknown. Prp8 CTD contains a partial JAMM motif, which does not coordinate a Zn++ ion and is unlikely to be a Zn-dependent metalloprotease. On the other hand, GST pull down and yeast two-hybrid experiments demonstrate that the CTD of Prp8 interacts with Brr2 and Snu114, which may have implications for the role of Prp8 in the regulation of spliceosomal activation. Prp8 contains a number of mutations implicated in retinitis pigmentosa type 13, a human genetic disorder that leads to photoreceptor degeneration and eventual blindness. All retinitis pigmentosa type 13 mutations are located on the C-terminal extension of the Prp8 CTD structure. Deletion of the C-terminal extension reduces the binding between Prp8 CTD and Brr2 or Snu114, suggesting a potential mechanism for the disease phenotype of these mutations.
The structure of another domain immediately upstream of the CTD was determined (Residues 1822–2095 in yeast),[57-59] which contains an RNase H core and a prominent β-hairpin finger protruding out of the middle of the protein. This domain will be referred to as the RNase H-like domain of Prp8. Canonical RNase H domains contain a conserved DDE active site that coordinates a Mg++ ion critical for catalysis. The RNase H domain in Prp8 does not contain a complete DDE triad and does not coordinate Mg++ ion. However, the RNase H domain in Prp8 seems to have maintained the RNA-binding capacity of canonical RNase H domains. This Prp8 domain binds a U2/U6 mimic and U4/U6 much better than arbitrary RNAs. Modeling RNA on the RNase H domain of Prp8 places the 5′ ss close to the previously observed 5′ ss cross-linking site on Prp8 in a cleft close to the β-finger. The protruding β-finger is strongly reminiscent of many ribosomal proteins with extensions (in the form of extended loops, α-helix, or β-hairpin fingers) protruding from globular protein bodies. These protrusions in ribosomal proteins insert into folded 16S or 23S rRNAs and stabilize the RNA structure. Mutations at the β-finger of Prp8 affect the conformational equilibrium between the first and second catalytic step and suppress U4-cs1 cold sensitivity, suggesting that Prp8 and its β-finger may interact extensively with RNAs in the spliceosomal complex or tri-snRNP, stabilizing these complexes. Mutations on the β-finger alter these interactions, resulting in the observed phenotypes.
The crystal structure of a large fragment of yeast Prp8 (Residues 885–2413, 176 kD) in complex with Aar2, a U5 snRNP assembly factor, was recently determined (Fig. 5). The structure of this fragment contains a large ∼1000-residue domain (Residues 885–1824) spanning the entire length of the complex, the RNaseH-like domain, and the MPN domain [Fig. 5(a)]. The latter two domains are connected by flexible linkers, fold back, and interact with the large domain through Aar2 (see the next paragraph for details) [Fig. 5(b)]. The large domain can be further divided into a large polymerase-like domain (Residues 885–1375), a linker domain, and a small type II restriction endonuclease-like domain (Residues 1650–1810). The polymerase-like domain is composed of three canonical subdomains: palm, fingers, and thumb. In canonical polymerase, the palm subdomain contains four conserved motifs harboring three Asp residues that coordinate an Mg++ ion required for catalysis. Consistent with the same theme observed in the Prp8 MPN and RNase H-like domains, the polymerase-like domain in Prp8 only contains one of the three conserved Asp residues and is unlikely to bind divalent metal ions. The palm domain of Prp8 has been noticed recently to have sequence similarity with the reverse transcriptase (which is a RNA-dependent DNA polymerase and a class of polymerase) palm domain of bacterial group II intron-encoded protein. Therefore, the polymerase-like domain will be more appropriately referred to as the reverse transcriptase domain. The type II restriction endonuclease domain (Residues 1650–1810) of Prp8 is structurally most similar to the influenza virus polymerase acidic subunit, even though there is no detectable sequence similarity. In the influenza polymerase acidic endonuclease domain, two Glu, one Asp, and one His residues are involved in coordinating two metal ions critical for catalysis. Although these residues are all present in Prp8, mutations of these residues do not have any effect on yeast viability. This large domain in Prp8 will be referred to as the reverse transcriptase/endonuclease (RT/En) domain.
The RT/En, RNase H-like, and MPN domains are connected by flexible linkers but form a large assembly through a network of interactions involving Aar2. For example, Aar2 interacts extensively with the En and linker subdomains. In addition, the C-terminal helical domain of Aar2 interacts with the RNase H-like domain. The very C-terminal end of Aar2 forms an incredible intermolecular β-sheet zipping together the β-finger of RNase H-like domain and the β-sheet in the MPN domain [Fig. 5(b)]. While the MPN and the RNase H-like domains on their own have limited to no interaction with the RT/En domain, Aar2 holds these domains together to form a large assembly. The relative orientations of these domains may be different when Prp8 is bound to Brr2 and/or Snu114 (the binding partners of Prp8 in mature U5 snRNP).
The crystal structure of this large Prp8 fragment reveals the potential active site cavity for the spliceosome. Prp8 extensively crosslinks to key components of the splicing reactions, including U2, U5, U6 snRNAs, and pre-mRNA.[53, 54] These crosslinkings almost all map to the RT/En domain. In particular, UV crosslinking in the spliceosome captured right before the second catalytic step maps to the region between Residues 1585 and 1598. This region is disordered in the crystal structure but is located in the thumb subdomain of RT [Fig. 5(c)]. The thumb/En domains and the RNase H domain face each other and form a large cavity. Numerous Prp8 mutants that suppress mutations in the splice sites and branch point map to the surface of this cavity, on both the thumb/En domains and the RNase H domain, suggesting that this cavity holds the spliceosome active site. As discussed in the RNase H domain structure of Prp8, multiple Prp8 suppressor mutations map onto the β-finger of RNase H domain, which may sense and regulate the spliceosomal conformational switch between the first- and second-step reactions, supported by the flexibility of the RNase H domain relative to the RT/En domain in different crystal forms.
The crystal structure of this large Prp8 fragment also revealed potential mechanisms for Prp8's role in regulating spliceosomal activation. Over 40 Prp8 mutants (clustered into Regions a–e on the primary sequence) were identified that suppress the U4-cs1 mutant, a mutant on U4 snRNA that hyperstabilizes U4/U6 by extending Stem 1 in U4/U6. Regions d and e of the U4-cs1 suppressor mutants are located on the crystal structure, most of which map to the same face of the RT/En domain, and the remaining five on the RNase H-like domain. Mutants on Prp8 were also identified that suppress the brr2-1 mutant that has defective Brr2 helicase activity.[60, 64] These brr2-1 suppressors map on the same face of RT/En as the U4-cs1 suppressors. More specifically, these brr2-1 suppressors are located in the same region in the palm domain as some of the U4-cs1 suppressors. This face of the RT/En may be a RNA or protein-binding surface that is critical for U4/U6 unwinding and spliceosomal activation. Brr2 is a candidate that can bind to this face of the RT/En, especially the palm domain. This binding will place Brr2 in a perfect position to feed U6 into the spliceosomal active site after U4/U6 unwinding.
The crystal structure of this large fragment of Prp8 revealed intriguing insight into the evolutionary origin of the spliceosome. The palm and thumb domain have considerable sequence and structural similarity with the bacterial and fungal Group II intron-encoded proteins (IEPs). Group II introns are mobile genetic elements found in organelles and bacteria genomes that self-splice through two transesterification steps identical to the eukaryotic splicing reaction. The self-splicing of group II introns is facilitated by the maturase activity of their IEPs, which usually contain a reverse transcriptase and a endonuclease domain. The IEP binds to the spliced intron, targeting it to a homing site on the genomic DNA where integration is achieved through reverse splicing. The opposite DNA strand is then cleaved by the endonuclease domain and used as a primer for reverse transcription of the intron by the reverse transcriptase domain of IEP. The ancestral nuclear pre-mRNA splicing could have evolved from the IEP (ancestral Prp8) and Group II intron RNAs on separate transcripts. When Group II intron ceases to be mobile elements, the selective pressure to maintain an active reverse transcriptase is lost. But the reverse transcriptase domain continues to serve as an assembly platform and maturase that facilitate splicing. The crystal structure of Group II intron reveals a tightly packed active site core organized by surrounding RNA scaffold. Biochemical studies indicate that the IEP binds to the Group II intron catalytic core to promote splicing. The Prp8 active site cavity is roughly the right size to accommodate the functional core of Group II intron RNA. This notion is supported by the observation that the spliceosomal RNA catalytic core crosslinks to the RT/En and RNase H domains, and the surface in these regions of Prp8 are remarkably conserved and highly positive. Prp8, maybe like an IEP, has replaced the RNA scaffold surrounding the functional core of Group II intron RNA, providing an intriguing evolutionary link between Group II self-splicing introns and the spliceosome.
There are many RNA-binding proteins in the spliceosome. We will discuss in this section some major RNA-binding domains and their representative structures.
RRM, sometimes referred to as the RBD, is the most common RNA-binding motif in eukaryotes (reviewed in Ref. 65) and is highly abundant in spliceosomal proteins. The RRM is approximately 90 amino acids long containing an RNP1 motif ([RK]-G-[FY]-[GA]-[FY]-[ILV]-X-[FY]) and an RNP2 motif ([ILV]-[FY]-[ILV]-X-N-L). The first 3D structure of the RRM domain determined was the U1-A RRM domain. The structure is made of two α-helices packed against four antiparallel β-strands with a topology of β1α1β2β3α2β4 (Fig. 6). RNP1 is located on β3 and RNP2 on β1. Typically, the three aromatic rings from the first two conserved [FY] residues of RNP1 and the [FY] in RNP2 interact with single-stranded RNA through base stacking or interaction with sugar rings.
A number of spliceosomal proteins contain multiple RRM domains. For example, splicing factor U2AF contains 3 RRM domains. RRM1 and 2 of U2AF are responsible for recognizing the polypyrimidine tract near the 3′ ss. The crystal structure of an engineered U2AF RRM1–2 domain (linker shortened from 30 to 10 residues) in complex with a 7nt poly U revealed specific hydrogen bonds between the protein and uracil bases, providing a structural basis for the preference for polyuridine. A recent study using NMR demonstrates that the two RRM domains adopt a closed conformation with RRM1 and 2 interacting with each other when there is short polypyrimidine or no RNA present. The two RRM domains adopt an open conformation in the presence of long and high-affinity polypyrimidine tracks, with both RRMs contacting the RNA. The two RRM domains undergo a population shift between the open and the closed conformation when bound to polypyrimidine tract with different length and/or sequence. The equilibrium between the open and close conformation serves as a molecular rheostat that quantitatively correlates the natural variation in polypyrimidine tract (sequence and lengths) to the efficiency of recruiting U2 snRNP to the pre-mRNA. Mutations that affect the equilibrium but not RNA binding also affect the splicing activity. This multidomain conformational selection represents an interesting mechanism in the recognition of degenerate nucleotide or amino acid motifs by multidomain proteins and may be applicable to other biological systems.
Prp24, an essential component of the U6 snRNP, is another example of a splicing factor with multiple RRM domains (four in Prp24) performing both RNA binding and other unique functions. Prp24 functions to anneal U4/U6 during spliceosomal assembly. The crystal structure of RRM1–3 and the solution structure of RRM1–2, RRM2 in complex with a hexa-ribonucleotide in U6 snRNA, and RRM4 were determined. Each of RRM1–3 adopts a canonical fold. RRM1 and 2 are tightly packed against each other, forming a single RNA-binding surface. RRM2 binds U6 snRNA using its canonical RNA-binding β-sheet face and RRM1 binds noncanonically through a positively charged surface. RRM3 makes no stable contact with RRM2 and its RNA binding property is unclear. The structure of RRM4 assumes a noncanonical RRM fold with two additional flanking α-helices that occlude the β-sheet face (designated as the occluded RRM or oRRM). The flanking α-helices form a large positive-charged surface. The oRRM binds to and unwinds the U6 ISL, the internal stem loop in U6 snRNA that has to be unwound before U6 and U4 annealing.
Some RRM domains also evolved to perform protein-binding functions. The crystal structure of the U2AF/U2AF hetereodimer and the NMR structure of U2AF/SF1 reveal that the RRM domain in U2AF and RRM3 of U2AF both interact with protein instead of RNA. These RRM-like domains are termed U2AF homology motifs (UHM). UHMs often contain aliphatic instead of basic or aromatic residues in RNP1 and RNP2. A Trp residue in the protein ligand inserts into a hydrophobic pocket formed between the α-helix and the RNP1 and RNP2-like motifs in UHM. A conserved R-X-F motif on the α2/β4 loop contributes to the Trp-binding pocket. In addition, a series of acidic residues in α1 of UHM interact with basic residues at the N-terminus of the protein ligand. More proteins were subsequently found to contain UHM that interact with proteins.
In addition to RRM domains, other RBDs exist in spliceosomal proteins. For example, U1-C, a component of the U1 snRNP, contains a C2H2 type of Zn-finger domain.[29, 50] The C2H2 type of Zn-finger is very common in transcription factors. It forms a simple ββα fold with the α-helix typically binding to the major groove, specifically interacting with the DNA bases. The α-helix in the U1-C Zn-fingers, as well as the loop between the two His residues in U1-C, are in close proximity to the proposed 5′ss and U1 snRNA duplex, potentially stabilizing the 5′ss and U1 snRNA interaction. Other Zn-finger containing spliceosomal proteins include Prp9, a component of the SF3a complex, which is a part of U2 snRNP.
Another RBD present in spliceosomal proteins is the K homology (KH) domain, which is a ∼70-amino acid fold that typically binds RNAs or single-stranded DNAs. It has a βααββα topology and is present in splicing factor 1 (SF1, which recognizes the BPS and helps the formation of the E complex). NMR studies demonstrated that the minimal BPS-binding region of SF1 contains a KH domain followed by a QUA2 (quaking homology 2) region. The KH domain recognizes the 3′ end (UAAC) of the BPS (a preferred BPS sequence UACUAAC was used for this study) through a hydrophobic cleft formed by the G-P-R-G motif and the variable loop of the KH domain, a recognition mechanism common to other KH domains. The QUA2 region of SF1 augments the KH domain and recognizes the 5′ end of the BPS (ACU).
At least eight DExD/H-box proteins (UAP56, Prp5, Prp28, Brr2, Prp2, Prp16, Prp22, and Prp43) are involved in various steps of the spliceosomal assembly and activation process (Fig. 1). DExD/H-box proteins belong to superfamily 2 (SF2) of helicase superfamilies, a class of enzymes that utilize the energy of NTP hydrolysis to unwind double-stranded (ds) DNAs or RNAs. All superfamily 1 and 2 helicases contain the minimal helicase core, some with additional domains.[79-81] The minimal helicase core is composed of two RecA domains encompassing at least eight conserved helicase motifs. Key amino acids in Motifs I and II are highly conserved, with residues “DExD/H” in Motif II.[78, 82] Other motifs are conserved within each family but not across the entire SF1 or SF2 superfamily, so these motifs are used to further classify SF1 or SF2 helicases into subfamilies. Multiple DExD/H-box proteins in the spliceosome have been demonstrated to have weak helicase activity in vitro.[83-88] Two major unwinding mechanisms have been proposed for DExD/H-box RNA helicases. The first mechanism is used by DEAD-box RNA helicases that bind dsRNA and unwind by local strand separation without translocation. The second mechanism is used by viral RNA helicases NPH-II and NS3 and potentially other eukaryotic RNA helicases, which bind to a single-stranded region adjacent to the duplex and translocate along the loading strand with specific polarity/directionality (3′ to 5′ or vice versa), displacing the complementary strand.[91, 92] The spliceosomal DExD/H-box proteins have been proposed to be critical for the rearrangement of many mutually exclusive RNA/RNA and RNA/protein interactions in the splicing process. Kinetic proofreading mediated by DExD/H-box proteins through timely product progression and substrate discard is thought to be responsible for the maintenance of splicing fidelity at every ATP-dependent transition throughout the splicing cycle.[93-97] The regulation of these DExD/H-box proteins' activities are likely important for the fidelity of splicing.
Structural information on domains or full-length of spliceosomal helicases are available, including DECD protein UAP56,[98, 99] the DEAH/RNA helicase A (RHA) proteins Prp43 and Prp22 (only the CTD), as well as ski2 type helicase Brr2. UAP56 is required for the formation of A complex. The structure of UAP56 contains essentially the minimal helicase core of two RecA domains connected by an interdomain linker. It likely unwinds its RNA substrates through RNA bending and local strand separation, similar to other DEAD-box RNA helicases.[90, 101]
Prp2, 16, 22, and 43 belong to the DEAH/RHA helicase family whose members all contain two conserved CTDs (a helicase-associated domain HA2 and another domain with unknown function) following the helicase core. Prp43 is involved in the release of intron lariat after the splicing reaction as well as ribosomal RNA biogenesis. Crystal structure of yeast Prp43 in complex with ADP and Mg++ revealed unexpected structural similarity with ski2 type DNA helicase Hel308[104, 105] [Fig. 7(a,b)]. Prp43 contains 6 domains, with a Prp43 specific N-terminal domain (Domain 1), two RecA domains (Domain 2 and 3), two domains structurally homologous to the corresponding domains in Hel308, and a CTD with an oligonucleotide binding (OB)-fold that is different from the CTD of Hel308. Two key features in Hel308 that were thought to be crucial for the processive unwinding of DNA by Hel308 are both present in Prp43: a prominent β-hairpin in the second RecA that was thought to be responsible for separating the two nucleotide strands, and the ratchet helix in domain 4 that was thought to be responsible for single strand translocation. The OB-fold is generally associated with nucleic acid-binding activity. The nucleic acids-binding groove in the OB-fold of Prp43 is solvent exposed and highly positively charged and is in a perfect position to interact with the RNA substrate entering the unwinding cavity. Indeed, truncation of the C-terminal OB-fold in Prp43 retains basal ATPase activity but has significantly reduced RNA-binding affinity and RNA-stimulated ATPase activity. In addition, the C-terminal OB-fold serves as a binding site for G-patch proteins (such as Paf1) and is responsible for the G-ptach protein-mediated stimulation of the Prp43 ATPase and helicase activity.
The Prp43 structure revealed a general structural model for DEAH/RHA family of RNA helicases, including splicing helicases Prp2, Prp16, Prp22, and Prp43. They all likely contain two RecA domains, a winged helix (WH) and a ratchet domain similar to Hel308, and a C-terminal OB-fold domain. The crystal structure of the C-terminal region of human Prp22 downstream of the two RecA domains indeed confirmed that this region of Prp22 also contains the WH and ratchet domain of Hel308 and an OB-fold domain. The N-terminal domain upstream of the first RecA domain is varied in lengths (∼100 residues for Prp2 and Prp43 but over 500 residues for Prp16 and Prp22) and seems to be unique for each DEAH/RHA helicase. The structural similarity between DEAH/RHA family of RNA helicases and Hel308 suggests that these RNA helicases use a similar unwinding mechanism as Hel308, which involves the threading and translocation of single-stranded RNA in a 3′ to 5′ direction inside an enclosure formed by four domains in these RNA helicases and the separation of the dsRNA by the β-hairpin in the second RecA domain. This is consistent with the proposed function for Prp22, which potentially translocate on the mRNA and disrupt the mRNA and U5 interaction. Prp43 can potentially also translocate on the lariat intron to remove proteins or RNAs associated with the intron. The OB-fold facilitates RNA binding and is responsible for the binding and stimulation by G-patch proteins, at least in the case of Prp43 (which interacts with G-patch protein Paf1) and Prp2 (interacts with G-patch protein Spp2).
Brr2, a U5 snRNP protein, is a large (250 kD) and unique helicase that contains two tandem helicase cassettes. It is responsible for U4/U6 unwinding during spliceosomal activation and U2/U6 unwinding during spliceosomal disassembly.[85, 108-110] Helicase motifs in the first but not the second helicase cassette are critical for ATPase activity, U4/U6 unwinding, and cell viability. The crystal structure of the C-terminal region (Sec63 domain) of the second helicase cassette revealed unexpected resemblance to Domains 4 and 5 of DNA helicase Hel308 with an additional fibronectin 3 CTD,[111, 112] leading to the hypothesis that the full-length Brr2 is composed of an N-terminal domain and two consecutive Hel308-like cassettes (Hel308-I and -II). Crystal structure of a large fragment of human Brr2 encompassing the two helicase cassettes (Residues 403–2136) were also recently determined, revealing that the two Hel308-like cassettes are tightly packed against each other [Fig. 7(c,d)]. The N-terminal helicase cassette can unwind U4/U6 on its own, but with a much lower activity. Based on the Hel308-DNA structure, the RNA substrate was thought to thread through the enclosure formed by Domains 1–5, translocate in a 3′ to 5′ direction with the help of the ratchet helix and unwind with the help of the β-hairpin in the second RecA domain. Since both 3′ and 5′ end of U4 and U6 are occupied by Sm proteins, the authors propose that the N-terminal helicase cassette can open up its enclosure between the second RecA domain and the ratchet domain and bind the ss U4 snRNA in the middle. The C-terminal helicase cassette retained ATP binding ability but not ATP hydrolysis. The tight packing between the two cassettes may help the N-terminal cassette to achieve the best conformation for catalysis. The large size and expanded surface of the C-terminal cassette present an opportunity for long distance regulation of the activity of the N-terminal cassette, potentially by multiple protein factors. Indeed, the second Hel308 cassette interacts with Prp8 and Snu114 in vitro and in vivo, potentially serving as a mediator for the regulation of Brr2's activity by Prp8. The C-terminal region of Prp8 (Prp8-CTR, including the MPN domain and the RNase H-like domain) facilitates the binding of the Brr2/Prp8-CTR complex to U4/U6, suggesting a potential role of Prp8-CTR as an auxiliary substrate binding and specificity domain for Brr2.
Sequence and structural analyses reveal that most spliceosomal helicases contain a minimal helicase core with additional domains. These additional domains function to stabilize the helicase core and interact with substrate RNA or other proteins that modulate the helicase function. The Hel308 fold seems to be present predominantly in splicing helicases. In spite of the low sequence similarity between DEAH/RHA or Brr2 and Hel308, both the DEAD/RHA and Brr2 helicases have the Hel308 fold. This indicates that Brr2 and DEAH/RHA family helicases (Prp2, 16, 22, and 43) are all more processive than the typical RNA helicases that are thought to act through local strand separation, although confirming this will require detailed biochemical analyses. There are a few helicases whose structures remain unknown (for example, Prp5 and Prp28). Structural studies of these helicases will likely shed new light into their function just as the study of the other splicing helicases. In addition, although we know roughly which step each splicing helicase is involved, the precise targets are unknown. Identifying the physiological targets of spliceosomal helicases and structural analyses of these helicases with their endogenous RNA targets will be a significant step toward understanding the unwinding mechanism, function, and regulation of these helicases.
Since the chemistry of the splicing reaction is identical to the Group II self-splicing introns and both reactions require Mg++ ions, the spliceosome was hypothesized to be mainly a ribozyme and snRNAs play important roles in the splicing reaction. U6 snRNA is most likely to play a catalytic role in the splicing reaction based on phylogenetic, genetic, and biochemical analyses. The structure of the highly conserved U6 intramolecular stem loop (ISL) from yeast was determined using NMR.[115-117] The Watson-Crick paired region of the stem has standard A-form helical geometry. U80 in the 3nt internal loop of ISL binds an Mg++ ion, which is possibly the Mg++ ion required for catalysis. The GCAUA pentaloop in ISL makes a GNRA-type fold that often mediates tertiary interaction with RNA and was hypothesized to bring the metal ion in ISL close to the 5′ ss. There are intriguing structural and functional similarities between U6 ISL and domain V of Group II intron [Fig. 8(a)]. U6 ISL and Domain V both form stem-loops with similar length and secondary structures. Each stem-loop has two helices separated by an internal Mg++-binding bulge and both stem-loops have a GNRA-type loop.
Butcher and coworkers[119, 120] also analyzed the structures of various yeast U2/U6 constructs (including the ISL region) using NMR and small angle x-ray scattering (SAXS).[119, 120] The shorter U2/U6 constructs truncated at Helix I or II (or both) form a four-helix junction structure, which is potentially an artifact of the truncated constructs. The longer U2/U6 construct (110nt) containing full length Helix I and II [Fig. 8(b)] form a three-helix junction consistent with extensive genetic studies. SAXS studies suggest that the 110nt U2/U6 complex assumes a Y shape, with the U6 ISL, Helix Ia, Ib, and III forming a continuous stacking [Fig. 8(c)]. Components in this U2/U6 complex that are critical for catalysis, including the U80 metal-binding site, AGC triad, and the pre-mRNA recognition site, all localize to one face of the structure, potentially interacting with the pre-mRNA substrate or other cofactors.
Over the past several decades, we have made significant progress in understanding the structure and function of the spliceosome. However, high resolution structures of large spliceosome or subcomplexes are still lacking. In the future, improving sample homogeneity will be the key for improving resolution of the EM structure and enabling crystallographic studies of large spliceosome or subcomplexes. Purifying the spliceosome or snRNPs under more stringent conditions or from different species, using small molecule inhibitors or temperature sensitive mutants to trap the spliceosome, or reconstituting spliceosomal subcomplexes from recombinant proteins, are all approaches that can potentially improve sample homogeneity. The hybrid approach that combines multiple structural biology approaches (EM, crystallography, and NMR) as well as biochemical and genetic data have made significant contributions to our understanding of other large cellular machinery (such as the nuclear pore complex). The same method is likely the best approach in the foreseeable future to obtain a detailed and comprehensive view of this complicated molecular machinery.
We apologize to our colleagues whose work is not cited due to space limit.