Correspondence: Nicolás Toro, Grupo de Ecología Genética de la Rizosfera, Estación Experimental del Zaidín, Consejo Superior de Investigaciones Científicas, Profesor Albareda 1, 18008 Granada, Spain. Tel.: +34 958181600; fax: +34 958129600; e-mail: firstname.lastname@example.org
Group II introns are both catalytic RNAs (ribozymes) and mobile retroelements that were discovered almost 14 years ago. It has been suggested that eukaryotic mRNA introns might have originated from the group II introns present in the alphaproteobacterial progenitor of the mitochondria. Bacterial group II introns are of considerable interest not only because of their evolutionary significance, but also because they could potentially be used as tools for genetic manipulation in biotechnology and for gene therapy. This review summarizes what is known about the splicing mechanisms and mobility of bacterial group II introns, and describes the recent development of group II intron-based gene-targetting methods. Bacterial group II intron diversity, evolutionary relationships, and behaviour in bacteria are also discussed.
Group II introns are large catalytic RNAs (ribozymes) and mobile retroelements [reviewed by Pyle & Lambowitz (2006)] that splice by means of a lariat intermediate, in a mechanism similar to that of spliceosomal introns. The transfer of these introns either to intronless alleles (retrohoming) or to ectopic sites (retrotransposition) is mediated by ribonucleoprotein (RNP) complexes consisting of the intron-encoded protein (IEP) and the excised intron RNA lariat. Group II introns were initially identified in the mitochondrial and chloroplast genomes of lower eukaryotes and plants (Michel et al., 1989). They were later identified in bacteria (Ferat & Michel, 1993), and have recently been found in the archaeal genus Methanosarcinales (Toro, 2003; Dai & Zimmerly, 2003; Rest & Mindell, 2003). It is thought that both nuclear spliceosomal introns and non-Long Terminal Repeat (LTR) retrotransposons evolved from mobile group II introns [Sharp, 1985; Cech, 1986; Cavalier-Smith, 1991; Eickbush, 1994; Zimmerly et al., 1995; recently reviewed by Koonin (2006)]. According to this hypothesis, group II introns originated in bacteria and invaded the nucleus of a primitive eukaryote, possibly from the alphaproteobacterial progenitor of the mitochondria; they were then fragmented to form the spliceosome (Cavalier-Smith, 1991; Stoltzfus, 1999). It has recently been suggested that the spread of group II introns and their decay into spliceosomal introns created a strong selective pressure, necessitating compartmentalization of the nucleus and cytoplasm (Martin & Koonin, 2006). In addition to their possible evolutionary significance in eukaryogenesis, mobile group II introns are of interest because they have been used to develop a new type of gene-targetting vector (Lambowitz & Zimmerly, 2004; Lambowitz et al., 2005), with potential applications in biotechnology and medicine.
A typical group II intron (Fig. 1) consists of a highly structured RNA with six distinct double-helical domains (DI to DVI) and an internally encoded (ORF within DIV) reverse transcriptase (RT) maturase. The IEP is required for in vivo folding of the intron RNA into a catalytically active structure (Michel & Ferat, 1995). Unlike organellar introns, most bacterial group II introns have an IEP (Lambowtiz & Zimmerly, 2004). The IEPs of these introns (Fig. 1) have four conserved domains: an N-terminal RT domain; domain X – a putative RNA-binding domain associated with RNA splicing or maturase activity-, a C-terminal DNA-binding domain (D); and a DNA-endonuclease domain (En). More than half of the bacterial group II IEPs annotated to date lack the En domain, and many also lack the D domain (Martínez-Abarca & Toro, 2000b; Zimmerly et al., 2001; Belfort et al., 2002; Toro, 2003; Lambowitz & Zimmerly, 2004). It has been suggested that all current group II introns are descended from an RT-encoding group II intron in bacteria (Toor et al., 2001). Three main phylogenetic subclasses (IIA, IIB and IIC) of group II introns have been described on the basis of IEP and conserved intron RNA structures (Michel et al., 1989; Toor et al., 2001; Zimmerly et al., 2001; Ferat et al., 2003; Toro, 2003), with the IIC subclass unique to bacteria. Bacterial group II introns differ markedly from organellar introns in not generally being located in conserved genes and in being mostly associated with mobile DNAs or located within intergenic regions (Dai & Zimmerly, 2002; Toro, 2003).
Group II introns, which may predate prokaryotes (Koonin, 2006), are still present in many bacteria. So, what role do group II introns play in the bacterial genome? Although they are generally seen as ‘selfish’ elements, studies of group II introns are continually discovering surprising features such as the association of some lineages with specific genetic signals or with the host replication machinery for mobility.
Bacterial group II intron splicing
Group II introns splice by means of a lariat intermediate, in a mechanism resembling that of spliceosomal introns. Intron excision from eukaryotic mRNA requires a set of host-cell snRNA molecules and proteins, whereas the catalytic functions involved in group II intron splicing reside within the RNA molecule itself. Nevertheless, the folding of the intron RNA into the catalytically active structure in vivo depends on the maturase activity of the IEP encoded within intron DIV, outside the catalytic core of the ribozyme.
The results of these studies suggest that the pathway generally used for group II intron splicing involves two sequential transesterification reactions (Fig. 2a). The first involves a nucleophilic attack on the 5′ splice junction by the 2′-OH of a bulged nucleotide, typically an adenosine residue (bulging A) located in DVI near the 3′ end of the intron, releasing the 5′ exon and generating an intron-3′ exon branched intermediate. The second reaction involves a nucleophilic attack on the 3′ splice site by the free 3′-OH of the last nucleotide of the 5′-exon, yielding the ligated exons and the intron RNA lariat with a 2′–5′ phosphodiester bond as the major splicing products. Alternatively, the splicing reaction may be initiated by hydrolysis (Fig. 2b), followed by the second step, resulting in the excision of the intron as a linear molecule rather than a lariat. Most of the self-splicing bacterial group II introns identified to date display both excision pathways in vitro. The exceptions are some bacterial class C introns such as GBSi1, identified in some Streptococcus B isolates, which splice solely via a linear intermediate molecule (Granlund et al., 2001). In addition, it has been reported that yeast intron aI5γ can be also excised as a true circular form in vitro (Fig. 2c). The circle seems to result from the formation of a 2′–5′ phosphodiester bond between the last and first residues of the intron, and it has been proposed that this reaction requires the 3′ exon to have been released first from precursor molecules by a trans-splicing reaction triggered by 5′ exon molecules previously generated by the so-called spliced exon reopening (SER) reaction. Subsequent to trans splicing, the 2′-OH of the terminal intron residue will attack the 5′ splice site, releasing the 5′ exon and an intron circle (Murray et al., 2001).
Group II ribozyme positioning on the precursor mRNA substrate and cleavage site fidelity are determined principally by base-pairing interactions between short-sequence elements located in loop structures of intron DI (exon-binding sites, EBS) and sequence stretches flanking the splice site (intron-binding sites, IBS). EBS1 and EBS2, each typically five to seven nucleotides long, recognize their partner sequences – IBS1 and IBS2 – at the 3′ end of the 5′ exon. Interestingly, IIC introns are self-splicing-competent, despite lacking the EBS2–IBS2 pairing (Granlund et al., 2001; Toor et al., 2006). The recognition of the 3′ intron–exon junction involves two additional base-pair interactions. The first is a single tertiary base-pair interaction between the last position of the intron (γ′) and another intron nucleotide (γ) between DII and DIII. The second involves the first one or several specific exon and intron nucleotides. The identity of this IBS for the 3′ exon can be used to differentiate between the two major subclasses of group II introns. In IIA introns, the sequence immediately upstream from EBS1 (δ sequence) base-pairs with one to three nucleotide residues (δ′ position) of the 3′ exon for intron splicing (3 nt in the Lactococcus lactis Ll.LtrB intron). In contrast, most IIB introns display no significant signs of δ–δ′ complementarity. These group II ribozymes recognize the first nucleotide in the 3′ exon, referred to as IBS3 rather than δ′, by canonical base-pairing with an intron nucleotide, referred to as EBS3 rather than δ, located in the so-called ‘coordination loop’ of DI (Costa et al., 2000; de Lencastre et al., 2005). The δ position is involved in a new tertiary interaction with another residue (denoted δ′), also located in the coordination loop (Costa et al. 2000). These nucleotides in the loop are involved in aligning the two exons for the second step of splicing (δ–δ′ and IBS3–EBS3 interactions). The three-dimensional arrangement of the active site of group II ribozymes can be further modelled by a complex network of tertiary interactions involving the various RNA domains (Michel & Ferat, 1995; Costa et al., 2000; Swisher et al., 2001 and references therein).
Thus, 5′ exon binding through DI interactions and the conserved DV are involved in the formation of the minimal catalytic core of the ribozyme, with DII and DIII also contributing to RNA folding and catalytic efficiency (Fedorova et al., 2003; for a review see Pyle & Lambowitz, 2006). The branch site of DVI, both exons, the essential catalytic regions of DV (bulge and AGC triad regions), conserved nucleotides in the joining region (J) between DII and DIII (J2/3), and the ɛ–ɛ′ substructure are all in close proximity before the first splicing step (de Lencastre et al., 2005). It has recently been shown that the branch site binds, with specific polarity, to the coordination loop substructure, thereby placing the nucleophilic adenosine in a position in which it can react with the 5′ splice site (Hamill & Pyle, 2006). The docking site for DV, a tetraloop receptor motif composed of the ζ–ζ and κ–κ′ interactions, is separated from the coordination loop by a short helical stem (reviewed by Lehmann & Schmidt, 2003). The choice of branch site is strictly controlled by a molecular caliper, which measures the distance between the base of DV and the nucleophilic 2′-hydroxyl in DVI (Chu et al., 2001). Thus, DV and the branch-site receptors function together to position the nucleophile correctly, so as to promote catalysis (Hamill & Pyle, 2006).
Alternative EBS-IBS interactions, shifted from the expected EBS-IBS pairings, have also been described in the bacterial group IIB introns B.a.I2 and RmInt1 (Robart et al., 2004; Costa et al., 2006a, b). As shown in vitro, the 3′ splice site of B.a.I2 is flexible but depends on potential γ–γ′ and IBS3–EBS3 pairings downstream from the wild-type site. Alternative splice-site usage either creates frame shifts within the exon-encoded ORF or joins the two exonic ORFs in frame, ultimately affecting the output of host mRNA translation in vivo (Robart et al., 2004). On the other hand, it has been suggested that an RmInt1 ribozyme reaction at a surrogate IBS sequence located 3′ to the intron might be responsible for the generation of unconventional self-splicing products along with the ligated exons and the excised intron. These unconventional self-splicing products include a 3′ exon truncated at its 5′ end, truncated variants of the linear and lariat forms of the intron –3′ exon intermediate, and putative circular molecules derived from this intermediate (Costa et al., 2006a). The self-splicing profiles of B.a.I2 and RmInt1 may therefore highlight mechanisms underlying the control of host-gene expression by bacterial group II introns.
Conserved RT and X domains are common to both Ll.LtrB and RmInt1 IEPs, being the X domains associated with the splicing-promoting (maturase) activity of the proteins. However, most of the available information on the mechanisms involved in IEP-assisted bacterial group II intron splicing is based on the lactoccocal intron (Matsuura et al., 1997, 2001; Saldanha et al., 1999; Wank et al., 1999; Singh et al., 2002; Cui et al., 2004). Ll.LtrB and its encoded protein, LtrA, are efficiently expressed and functional in E. coli, facilitating the detailed biochemical characterization of both the intron RNA and the IEP (Matsuura et al., 1997). The purified LtrA protein binds to the unspliced precursor RNA via a rapid bimolecular reaction, followed by a slower unimolecular step involving a change in RNA conformation that ultimately promotes Ll.LtrB splicing when assayed in vitro under low salt concentrations shown to inhibit the self-splicing reaction (Saldanha et al., 1999). In certain specific conditions, LtrA also binds nonspecifically to other group II introns and RNA molecules. However, this nonspecific binding is inefficient for RNA splicing, and LtrA is therefore considered to be an intron-specific splicing factor (Saldanha et al., 1999). This specificity is based on protein recognition of a unique structural feature in the intron subdomain DIVa as a primary high-affinity binding site; this site contains the Shine–Dalgarno sequence and the initiation codon of the LtrA ORF (Wank et al., 1999; Singh et al., 2002). This binding, which also autoregulates LtrA translation, requires domain X and parts of the RT domain of the IEP, and extends further into the 5′ and 3′ conserved regions of the catalytic core and DIV through weaker secondary contacts that are sufficient to support residual, but inefficient, splicing in vivo when the high-affinity binding site in DIVa is deleted (Wank et al., 1999; Matsuura et al., 2001; Singh et al., 2002; Cui et al., 2004). Further chemical probing and RNA footprinting experiments have demonstrated that this array of protein–RNA contacts promotes the generation of a novel set of conserved long-range tertiary interactions in the Ll.LtrB RNA, which are not stably formed in the absence of the maturase under low-salt conditions, but which are required to model the active conformation of the ribozyme core in vivo (Matsuura et al., 2001). The major protein-assisted splicing products of Ll.LtrB are the conventional linear and lariat forms of the intron, along with exons ligated in frame (Saldanha et al., 1999).
The maturase activity of the RmInt1 IEP has not been biochemically characterized. However, recent studies have demonstrated that RmInt1 splices in vivo leading to the formation of the intron lariat form, along with circular molecules in which the first and last intron residues are ligated (Molina-Sánchez et al., 2006). Genetic analysis of the conserved X domain of the RmInt1 IEP not only confirmed the requirement of the maturase for intron splicing in vivo but also supported an additional role for the IEP in controlling the balance between the two intron excision pathways (Molina-Sánchez et al., 2006). Thus, in vivo the presence of the IEP seems to ensure cleavage at the correct 3′ splice site or at position +1 in the 3′ exon before intron RNA circle formation. Circular RNAs had previously only been described as splicing products of group II introns in eukaryotic cells, in which they were thought to be linked to cellular senescence (Murray et al., 2001 and references therein). These novel findings therefore open up new perspectives for investigation of the functional diversity of bacterial group II ribozymes and their biological significance.
As described above, analyses of the splicing reaction, using the purified Ll.LtrB-encoded maturase and RNA, predict a remarkable self-sufficiency of bacterial group II introns in the promotion of their own splicing. However, it has also been shown that the efficiency of the in vivo splicing reaction of the group II intron RmInt1 decreases when the intron is expressed in a genetic background other than that of its natural bacterial host, S. meliloti (Martínez-Abarca et al., 1998, 1999). Several nuclear gene products have been identified as RNA chaperones involved in the splicing of group II introns from yeast, algae or higher plants (Lambowitz & Perlman, 1990; Lambowitz et al., 1999; Lehmann & Schmidt, 2003; Ostheimer et al., 2003). It remains unclear whether this is also the case for bacterial group II introns, and to what extent splicing efficiency depends on host functions.
Bacterial group II intron mobility mechanisms
Group II introns act both as large catalytic RNAs and as site-specific retroelements (more recently reviewed by Lambowitz & Zimmerly, 2004; Pyle & Lambowitz, 2006). After splicing, the intron RNA lariat and the IEP remain associated, forming a ribonucleoprotein particle (RNP) promoting intron insertion into DNA target sites identical to the splice site through retrohoming, which is the principal and most efficient group II intron RNA-based mobility pathway (Fig. 3). Retrohoming has been unequivocally demonstrated for the lactococcal Ll.LtrB and rhizobial RmInt1 introns, using engineered ‘twintron’ shuttle constructs in which the bacterial group II intron was interrupted by a DNA sequence encoding a splicing-competent group I intron (tdI). Consistent with a mobility pathway involving an RNA rather than a DNA intermediate, the group I intron was found to be absent from the homing products generated in plasmid-borne target sites as a result of tdI splicing in the precursor twintron RNA (Cousineau et al., 1998; Martínez-Abarca et al., 2004). Group II introns can also insert, at a lower frequency, into noncognate sequences resembling the homing site – a process known as ectopic transposition (retrotransposition) – which contributes to group II intron dispersal in nature (Cousineau et al., 2000; Martínez-Abarca & Toro, 2000a; Muñoz et al., 2001; Dai & Zimmerly, 2002; Ichiyanagi et al., 2002).
Of all the bacterial group II introns characterized as splicing-competent in vivo, only the L. lactis Ll.LtrB and S. meliloti RmInt1 introns have been shown to be efficient mobile genetic elements, leading to the use of these elements as model experimental systems for deciphering the bacterial group II intron mobility pathways and mechanisms (Lambowitz & Zimmerly, 2004 and references therein).
Retrohoming through target DNA-primed reverse transcription (TPRT)
The retrohoming pathway of bacterial group II introns was first investigated for Ll.LtrB, by plasmid-based genetic assays in vivo, in E. coli or L. lactis. It was then further dissected by characterization of the biochemical activities of the intron-encoded RNPs reconstituted with a purified LtrA expressed in E. coli and an Ll.LtrB RNA lariat excised from a precursor transcript in vitro (Matsuura et al., 1997; Cousineau et al., 1998; Saldanha et al., 1999; Aizawa et al., 2003). Neither the Ll.LtrB RNA itself nor the purified LtrA protein displays reverse splicing or endonuclease activities of the RNP particle (Saldanha et al., 1999). Group II intron RNPs initially bind DNA nonspecifically and then search for DNA target sites (Aizawa et al., 2003). Mutagenesis experiments, DNA footprinting, and modification interference mapping have shown that Ll.LtrB-encoded RNPs recognize a relatively long DNA target site (Fig. 3a) extending from position −25 to +9 relative to the intron insertion site (Guo et al., 2000; Mohr et al., 2000; Karberg et al., 2001; Singh & Lambowitz, 2001). Retrohoming then occurs by a target DNA-primed reverse transcription mechanism (TPRT) involving several sequential reactions (Fig. 3b). The RNA component of the RNP cleaves the sense-strand precisely at the exon junction in the double-stranded DNA recipient allele, by a reverse splicing reaction that integrates the intron RNA at the target site, whereas LtrA uses its endonuclease domain (En) to cleave the antisense-strand at position +9 relative to the insertion site. The 3′ end of the cleaved antisense strand is then targetted to the active site of the reverse transcriptase, where it is used as a primer for reverse transcription of the inserted intron RNA by the RT domain of LtrA, generating a cDNA copy of the intron that is subsequently integrated into its new location by homologous recombination-independent repair mechanisms (Cousineau et al., 1998).
The target sequence is recognized principally by the RNA component of the RNP complex, through EBS2-IBS1, EBS1-EBS2 and δ-δ′ base-pairing interactions, which, for Ll.LtrB, involve positions −12 to −8, −6 to −1 and +1 to +3, respectively. The nucleotide residues within the IBS and δ′ regions contribute to DNA target-site recognition to different extents, but mutations in these nucleotides have an overall inhibitory effect on the reverse splicing activity of the reconstituted RNPs and Ll.LtrB homing frequencies in vivo; nonetheless, both can be restored to wild-type levels by introducing the corresponding complementary mutations into the intron RNA (Guo et al., 2000; Mohr et al., 2000). The variable DNA-binding domain (D) is involved in the recognition of a subset of key nucleotide residues (positions T-23, G-21 and A-20) in the distal 5′ exon region via major groove interactions bolstered by phosphate-backbone contacts (Singh & Lambowitz, 2001). LtrA may also be involved in the recognition of distal nucleotides of IBS2, probably on the complementary strand of the target site, as the restoration of EBS2-IBS2 base-pairing at these positions does not result in the recovery of wild-type reverse splicing levels or homing frequencies. Mutations at these key positions in the distal 5′ exon region block reverse splicing of the intron RNA into double-stranded, but not into single-stranded DNA, substrates. This suggests that the interaction of these positions with LtrA triggers local DNA unwinding, thereby placing the intron RNA in a position to base-pair with the IBS and δ′ sequences for reverse splicing into double-stranded DNA target sites (Mohr et al., 2000). Consistent with LtrA interaction and base-pairing of the intron RNA being concerted rather than sequential to promote the efficient unwinding of DNA at cognate sites, KMnO4 modification experiments revealed that RNPs reconstituted with intron RNAs with EBS/δ mutations preventing base-pairing with the DNA target site do not trigger DNA unwinding. Recent data indicate that before the reverse splicing of intron RNA, RNPs bend the target DNA by maintaining initial contacts with the 5′ exon while engaging in 3′ exon interactions, gradually bringing the scissile phosphate into position for bottom-strand cleavage (Noah et al., 2006). Second-strand cleavage, catalyzed by the conserved En domain of LtrA, occurs after a time lag, and requires additional interactions of the protein with fixed positions in the 3′ exon, the most critical of which being position T+5, which lies within a single-stranded region after initial DNA unwinding (Mohr et al., 2000; Singh & Lambowitz, 2001).
En-independent retrohoming is linked to DNA replication in bacteria
TPRT-based retrohoming requires the generation of a primer for reverse transcription of the intron RNA via second-strand cleavage, catalyzed by the conserved En domain of the IEP. Phylogenetic analysis based on the sequence and domain structure of group II intron IEPs, of both bacterial and eukaryotic origin, assigned a putative mitochondrial lineage to the Ll.LtrB intron, with most bacterial introns located elsewhere in the tree. Furthermore, the C-terminal En domain is found in only c. 40% of the bacterial group II IEPs currently annotated in databases (Martínez-Abarca & Toro, 2000b; Zimmerly et al., 2001; Dai & Zimmerly, 2002). The S. meliloti RmInt1 intron is the best characterized of the group II introns clustering on the bacterial branch. RmInt1 encodes a protein with conserved RT and maturase domains and a functionally uncharacterized C-terminal extension of 20 amino acid residues that appears to be unrelated to the DNA-binding (D) domain of LtrA (Martínez-Abarca et al., 2000; San Filippo & Lambowitz, 2002). Despite the absence of recognizable D and En domains, RmInt1 retrohomes very efficiently, with 100% of the intron-containing strains undergoing insertion events at plasmid-borne target sites (homing frequency), and 20–45% of target copies within a single cell invaded by the intron (homing efficiency) (Martínez-Abarca et al., 2000; Jiménez-Zurdo et al., 2003). However, experiments with engineered En− mutant Ll.LtrB derivatives and target sites have shown that Ll.LtrB can retrohome, albeit much less efficiently, without second-strand cleavage (D'Souza & Zhong, 2002; Zhong & Lambowitz, 2003). These findings suggest that there are alternative pathways operating in the mobility of En− bacterial group II introns linked, as described below, to DNA replication (Muñoz-Adelantado et al., 2003; Zhong & Lambowitz, 2003; Martínez-Abarca et al., 2004).
RmInt1 mobility (Fig. 3a and c) depends on the intron RNA and the IEP, with the RmInt1-encoded RNP complex recognizing a DNA target site extending 20 nt into the 5′ exon and 5 nt into the 3′ exon, as inferred from genetic characterization of the intron and target site coupled to homing assays in vivo (Jiménez-Zurdo et al., 2003; Muñoz-Adelantado et al., 2003). Subsequent mutational analysis demonstrated that this site is recognized primarily by the intron RNA, through the characteristic pairing pattern of IIB intron splicing: EBS2-IBS2, EBS1-IBS1 and EBS3-IBS3, involving positions −13 to −9, −7 to −1 and +1 in the RmInt1 target site, respectively (Jiménez-Zurdo et al., 2003). Unlike Ll.LtrB, RmInt1 has less stringent requirements for recognition of the distal 5′ and 3′ exon regions, with only single nucleotide positions (T-15 and G+4) required for wild-type retrohoming (Jiménez-Zurdo et al., 2003). These positions were initially thought to be recognized by the RmInt1 IEP, but recent findings suggest that there may be an alternative intron–exon pairing between an intron segment overlapping with the EBS2 exon-binding site and a 5′ exon site located just distal of IBS2 relative to the splice junction (IBS2*; Costa et al., 2006a, b). Thus, pairing of the first A of EBS2 with the last T of IBS2* at position −15 of the intron target site might render the last T of IBS2* critical for homing. The removal of nucleotides −20 to −16 from the RmInt1 target site dramatically decreases homing efficiency (Jiménez-Zurdo et al., 2003). The presumably small number of IEP–target-site interactions raises questions about the identity of the IEP domains involved and mechanistic implications for DNA target-site recognition.
RmInt1-derived RNP particles have RT activity and promote the reverse splicing of intron RNA into both single-stranded and double-stranded DNA substrates, through RNA-mediated cleavage at the intron insertion site. However, they cannot carry out the En-dependent second-strand cleavage in double-stranded DNA substrates (Muñoz-Adelantado et al., 2003). The efficiency of the reverse splicing reaction is markedly lower with long double-stranded DNA substrates. RmInt1-RNPs have higher levels of RT activity on exogenous synthetic substrates than on the endogenous intron RNA template, the reverse transcription of which in vitro has no obvious priming bias, suggesting that positioning of the reverse transcriptase on the precursor RNA facilitates initiation of the RT reaction various distances downstream from the intron (Muñoz-Adelantado et al., 2003). According to this biochemical analysis, the RmInt1-encoded RNP complex should recognize the key nucleotide residues in both single-stranded and double-stranded DNA target sites. However, the limited contact with the IEP is insufficient for the promotion of local DNA unwinding, suggesting that the preferred retrohoming pathway of RmInt1 involves reverse splicing into a transiently single-stranded DNA target site and that there may be alternative, uncharacterized priming mechanisms for reverse transcription of the intron RNA (Muñoz-Adelantado et al., 2003).
Homing assays in vivo, using recipient plasmids in which the RmInt1 target site was inserted in both orientations relative to the origin of replication of the plasmid, have provided evidence for two distinct retrohoming pathways for RmInt1 (Fig. 3c). The main mechanism of mobility is favoured by cell division and involves the reverse splicing of the intron RNA into single-stranded DNA at DNA replication forks, with a bias towards the DNA strand that serves as a template for lagging strand synthesis (Martínez-Abarca et al., 2004). This preference suggests that the intron should be inserted once the replication fork has passed the DNA target site, avoiding the potentially disruptive passage of the DNA polymerase complex through the region occupied by the RNP. Reverse transcription of the intron RNA is then primed by the nascent lagging strand, using either the RNA primer synthesized by the primase or partially polymerized Okazaki fragments. There is a second possible but minor retrohoming pathway (Fig. 3c), independent of DNA replication, that may involve reverse splicing into either double-stranded or transiently single-stranded DNA target sites and alternative priming mechanisms, such as random nonspecific DNA nicks (Schäfer et al., 2003), nascent leading strand (Zhong & Lambowitz, 2003) or de novo initiation priming (Wang & Lambowitz, 1993). The blocking of Ll.LtrB-mediated second-strand cleavage by mutations in the En LtrA domain or substitutions of key nucleotide residues in the 3′ exon revealed a minor retrohoming pathway for the lactococcal intron, dependent on DNA replication, with preferential use of the nascent leading strand to prime reverse transcription – a bias opposite to that for RmInt1 retrohoming (Zhong & Lambowitz, 2003).
Other naturally occurring En− group II introns include those belonging to the IIC subgroup of bacterial introns. These introns are often found inserted at target sites with recognizable IBS1 but not IBS2 sequences. The IBS2 sequence is replaced by a palindromic Rho-independent transcription terminator motif or similar structures (Granlund et al., 2001; Centron & Roy, 2002; Dai & Zimmerly, 2002). Recent data suggest that exon recognition by IIC introns rely on IBS1 and IBS3 pairings, and putatively on a new nonpairing interaction with a stem-loop (Toor et al., 2006).
Dependence of retrohoming on host functions
Group II introns are genetic elements with molecular features allowing them to maintain themselves and to spread in a genome. However, their mobility depends, to a variable extent, on host genetic background, suggesting that retrohoming requires the recruitment of host functions for its successful completion (Karberg et al., 2001; Toro et al., 2003). In addition to requiring the replication machinery during key initial stages, the retrohoming pathway of En− bacterial group II introns requires, at later stages, repair functions poorly characterized for both endonuclease-dependent and -independent retrohoming. A recent study of Ll.LtrB homing efficiency in various E. coli mutants with defective DNA/RNA repair functions led to the construction of a retrohoming completion model including exonucleases (Recj, MutD and PolI) and RNAses (RNase H) for DNA resection and removal of the intron RNA template after first-strand cDNA synthesis, together with complexes of DNA polymerases and repair polymerases (PolII, PolIII, PolIV and PolV) for synthesis and proofreading of the second strand of the intron cDNA (Smith et al., 2005). However, further functional characterization is required to determine unambiguously the role of these genes and proteins in the retrohoming pathway.
RNA-based retrotransposition was first investigated in bacteria using an Ll.LtrB twintron variant carrying the self-splicing group I td intron and a kanamycin resistance gene for the selection of mobility events (Cousineau et al., 2000). In contrast to the situation described for mitochondrial group II introns, these experiments suggested that there may be a major En-independent and RecA-dependent Ll.LtrB retrotransposition pathway in L. lactis, probably involving reverse splicing of the intron RNA into cellular RNA rather than into DNA targets (Cousineau et al., 2000). However, the results of other studies are not consistent with this RNA-targetted mechanism as the preferred general pathway for the retrotransposition of bacterial group II introns: (1) the S. meliloti RmInt1 intron displays RecA-independent insertion into an ectopic site within the oxi1 gene (Martínez-Abarca & Toro, 2000a); (2) many bacterial group II introns are located in nontranscribed intergenic regions (Martínez-Abarca & Toro, 2000a; Dai & Zimmerly, 2002); (3) for the group II introns analysed to date, the IEP has been shown to have an affinity for DNA, rather than for RNA.
Ll.LtrB retrotransposition was analysed further, using improved twintron donor constructs incorporating the kanamycin resistance gene as a retrotransposition indicator gene (RIG) inserted into intron DIV and interrupted by the td intron. Retrotransposition events can thus be reliably detected on the basis of kanamycin resistance once the td intron has been spliced out of the precursor RNA mobility intermediate (Ichiyanagi et al., 2002). The pattern of Ll.LtrB spread within the L. lactis genome, as shown by RIG analysis, is consistent with intron retrotransposition into double-stranded or single-stranded DNA targets through a homologous recombination-independent mechanism, as described for the mitochondrial and bacterial RmInt1 introns. Insertion events were biased toward the use of the lagging strand as a template. This suggests a possible major En-independent retrotransposition pathway involving reverse splicing of the intron RNA into transiently single-stranded DNA target sites at replication forks, and the use of the nascent lagging strand to prime reverse transcription of the inserted intron RNA, a mechanism similar to the retrohoming mechanism of En− bacterial group II introns (Ichiyanagi et al., 2002). An analysis of ectopic insertion sites for bacterial group II introns revealed that bona fide IBS1 sequences were conserved, whereas the IBS2 element and key nucleotide positions in the 5′ and 3′ exon sequences recognized by the protein component of the RNP in TPRT-based retrohoming events were not (Cousineau et al., 2000; Martínez-Abarca & Toro, 2000a, b; Dai & Zimmerly, 2002; Ichiyanagi et al., 2002). However, these target sites retain essential recognition elements for insertion events independent of second-strand cleavage. Nevertheless, recent studies have reported a preferential En-dependent Ll.LtrB retrotransposition mechanism in E. coli, suggesting that the retrotransposition strategies of bacterial group II introns may be largely influenced by the host (Coros et al., 2005). During the retrotransposition of the lactoccocal group II intron Ll.LtrB in E. coli, insertion occurs preferentially within the Ori and Ter macrodomains of the chromosome. This bipolar distribution of retrotransposition events results from the presence of the IEP (LtrA) at the poles of the cell (Zhao & Lambowitz, 2005; Beauregard et al., 2006).
Bacterial group II intron retrotransposition has been shown to occur in natural bacterial populations and is favoured by targets in transmissible genetic elements, such as plasmids, providing further support for this mobility mechanism as the major dissemination strategy of bacterial group II introns in nature (Muñoz et al., 2001; Ichiyanagi et al., 2003).
Group II introns as biotechnological tools
Derivatives of group II intron Ll.LtrB have been used for gene disruption in various gram-negative (E. coli, Shigella flexneri and Salmonella typhimurium; Karberg et al., 2001) and gram-positive (i.e. Lactococcus lactis, Clostridium perfringens and Staphylococcus aureus; Chen et al., 2005; Yao et al., 2006) bacteria.
A number of characteristics render group II introns suitable for use as biotechnological tools: (1) they are mobile elements, integrating with high efficiency into their DNA targets by a homologous recombination-independent process; (2) they recognize the target DNA mainly by base pairing, so target specificity can be changed by modifying the EBS sequences within the intron RNA; (3) they can mobilize foreign genetic information inserted within the intron; and (4) minimal host functions are required to support intron mobility and are probably provided by conserved housekeeping proteins.
Group II introns were first engineered to increase the frequency and efficiency of mobility (Guo et al., 2000; Nisa-Martínez et al., 2007; Plante & Cousineau, 2006). Initial modifications to Ll.LtrB included a deletion in the domain IV loop removing most of the LtrA ORF (ΔORF intron), with the intact LtrA ORF cloned and expressed downstream from the 3′ exon. This configuration increased the frequency of mobility to almost 100% (Guo et al., 2000). Smaller ΔORF Ll.LtrB intron derivatives were recently constructed in which LtrA was inserted upstream from the 5′ exon; these derivatives also had a higher homing efficiency (Plante & Cousineau, 2006). Similar engineering has been applied to the RmInt1 intron, resulting in a higher retrohoming efficiency than for the wild-type intron, with constructs in which the IEP has been deleted from domain IV and expressed either upstream from the 5′ exon or downstream from the 3′ exon. The retrohoming efficiency obtained was also higher (reaching 90%) if the constructs carried short stretches of wild-type 5′ and 3′ exons (−20/+5) flanking the ΔORF intron (Nisa-Martínez et al., 2007). The greater mobility of the engineered introns seems to be the result of greater stability of the intron RNA, probably owing to a decrease in susceptibility to degradation by host nucleases (Guo et al., 2000). In these constructs, the IEP promotes splicing of the ΔORF intron. However, after insertion into a new location, splicing of the ΔORF intron cannot occur in the absence of the IEP, resulting in disruption of the target gene. Introns targetted to the antisense strand lead to unconditional disruption of the gene. However, if ΔORF intron derivatives are retargetted to the sense strand, insertion of the intron generates a conditional disruption, as splicing can take place if the IEP is expressed in trans (Frazier et al., 2003; Nisa-Martínez et al., 2007).
Another type of conditional gene disruption has also been achieved with Ll.LtrB. This conditional mutagenesis is based on the temperature-sensitive splicing displayed by this group II intron. The essential gene hsa of S. aureus was disrupted by an Ll.LtrB-ΔORF derivative in the sense orientation with respect to gene transcription. It could therefore be removed by RNA splicing, permitting the translation of Hsa. As IEP-assisted splicing is temperature-sensitive, hsa mutants can grow at 32°C, a temperature at which splicing can take place, but cannot grow at 43°C, a temperature at which splicing cannot occur (Yao et al., 2006).
Thus, it is currently possible to reprogram group II introns to insert into any desired target sequence. Guo et al. (2000) have developed a method based on a random intron library, making it possible to select introns inserting into the selected target DNA. Two plasmids are used in this system. The donor plasmid expresses a set of intron molecules from a ΔORF-IEP configuration with randomized EBS sequences that also contain a T7 promoter, replacing the ORF within DIV. The second plasmid (the recipient) harbours the selected target sequence upstream from a promoterless tetR gene. A sequence encoding an E. coli rrnB T1 transcription terminator capable of terminating transcription by both the E. coli and T7 RNA polymerases is inserted upstream from the target site, and an rrnB T2 terminator, which terminates transcription by the E. coli RNA polymerase, but not by the T7 RNA polymerase, is inserted between the target site and the tetR gene. A phage T7 TΦ terminator is inserted downstream from the tetR gene to act as a terminator for the T7 RNA polymerase. Insertion of the intron carrying the T7 promoter into the target site activates expression of the tetR gene. This system can be used to select modified introns from the randomized library that can insert into the chosen gene. These introns can be further modified to improve base-pairing with the target DNA. This system was designed for use in E. coli, but genes from other organisms could be cloned and used in this system to select the intron disrupting the cloned gene most efficiently. This technique has been used to select an intron disrupting the human CCR5 gene (coreceptor for human HIV-1 virus; Guo et al., 2000). The CCR5 gene, carried by a plasmid within embryonic kidney cells, was then disrupted with the selected intron, which was introduced into the human cells as RNPs.
The retargetting of bacterial group II introns can be optimized with a computer algorithm, which scans the target sequence, selects the best position for IEP recognition, and then designs primers to modify the intron EBS1, EBS2 and δ sequences accordingly (Perutka et al., 2004). The IBS1 and IBS2 sequences in the 5′ exon of the donor plasmid are also modified to make them complementary to the retargetted EBS sequences for efficient RNA splicing. The IEP recognizes few enough positions with a sufficient flexibility for the identification of several target sites in any gene. Introns integrated into the target DNA can be selected by colony PCR, or using a selectable marker, such as an antibiotic resistance gene. The selectable marker strategy is possible because group II introns can carry foreign DNA sequences within their RNA core. This heterologous DNA is carried in DIV of the group II intron, after removal of the IEP sequence. The presence of this foreign DNA generally has a negative effect on homing, but the modified introns nonetheless display detectable homing (Guo et al., 2000; Ichiyanagi et al., 2002; Frazier et al., 2003; Zhong et al., 2003). The main concern with conventional antibiotic resistance genes is that they are expressed from the donor plasmid independently of homing, which can hinder the selection of true homing events. The development of the retrotransposition-activated selectable marker (RAM) system has overcome this problem. The RAM system is based on RIG markers, interrupted by a splicing-competent tdI group I intron (Zhong et al., 2003). The selectable marker is thus expressed only if retrohoming occurs, facilitating the monitoring of group II intron dispersal through retrohoming. Almost all resistant, RAM-containing colonies display disruption of a single target. Subsequent modifications, in which the TpR gene was flanked by Flp Recombinase Target sequence (FRT), have adapted the system for multiple gene disruptions (Broach et al., 1982). The inserted FRT-flanked RAM marker can thus be excised by expressing the Flp recombinase, leading to 100% excision. Once the marker has been excised, a second mutagenesis with the same RAM is possible. Engineered group II introns carrying both the RAM-Tp marker and randomized EBSs have been used to generate a library of E. coli mutants (Zhong et al., 2003). Most of the intron insertions were found to be located close to the origin of replication of the chromosome, indicating a bias favouring this region. However, the library was complex enough for the detection by PCR of insertions in less favoured regions. The integrated introns can be isolated and used to obtain single knockouts in any gene (Yao et al., 2005).
The RNA structures of the group II intron ribozyme and the IEP seem to have coevolved, leading to the ‘retroelement ancestor hypothesis’, which suggests that all extant group II introns descended from an RT-encoding group II intron in bacteria with RNA structural features not conforming to canonical IIA or IIB structures (Robart & Zimmerly, 2005). Two different lineages, the canonical IIA and IIB, then became associated with organelles and the ORF was repeatedly lost, resulting in the formation of numerous ORF-less organellar introns (Toor et al., 2001; Lambowitz & Zimmerly, 2004; Robart & Zimmerly, 2005). ORF-less introns have also been found in Cyanobacteria (Nakamura et al., 2002), Bacillus (van der Auwera et al., 2005; Tourasse et al., 2006) and Archaea (Dai & Zimmerly, 2003), but they seem to be derived from intron-carrying ORFs. Interestingly, truncated group II introns are also present in bacteria (Dai & Zimmerly, 2002; Fernández-López et al., 2005). Some have a 5′ truncation that could be accounted for by incomplete reverse transcription during TPRT. However, 5′ truncations account for only a minority of fragmented introns (Robart & Zimmerly, 2005). Introns like RmInt1 tend to be inactivated by fragmentation, with loss of the 3′ terminus. In some cases, such inactivation arises from the insertion of other mobile elements and further genetic rearrangements (Dai & Zimmerly, 2002; Fernández-López et al., 2005). The fragmentation of group II introns, together with the accumulation of mutations and natural selection may well be responsible for the current status of self-splicing group II introns in bacteria, with the intrinsically large population size and genome simplification (genomic streamlining) also playing important roles (Lynch, 2006).
The IEP of full-length bacterial group II introns generally contains both the RT and maturase domains, but different groups of introns can be distinguished on the basis of the presence or absence of the D and/or En domains. The absence of the D and En domains may result from their specific loss in some introns. However, whole intron groups within the IIB class (Fig. 4), such as the IIB3 (also known as the D class, Zimmerly et al., 2001) and IIB5 lineages (also known as the E class, Toro et al., 2002; Lambowitz & Zimmerly, 2004; Robart & Zimmerly, 2005), seem to lack the D and En domains entirely (Toro, 2003). The IIC class of introns (Fig. 4), characterized by the absence of an EBS2 sequence and a shorter ribozyme DV (Toor et al., 2001; Toro, 2003), also lacks the En domain, but still has a putative D domain (San Filippo & Lambowitz, 2002). It is widely accepted that the ancestral RT probably lacked the D and En domains (Fig. 5) and that their acquisition and loss in some cases may have contributed to the evolution of the various bacterial group II intron lineages.
Phylogenetic analyses of IEP and the intron RNA structures (Martínez-Abarca & Toro, 2000b; Toor et al., 2001; Zimmerly et al., 2001; Toro, 2003) have shown the IIB and IIA classes to be sister lineages and the IIC class to be the most divergent lineage (Fig. 4). So which of these lineages of group II introns is the oldest? The data currently available provide no final answer to this question, but, according to the retroelement hypothesis, the earliest branching class of introns is most likely to be class IIC.
Regardless of the evolutionary relationships between the current subclasses of group II introns in bacteria, which require further investigation, it is also plausible that the ancestor of these introns was actually a catalytic RNA intron functioning in the absence of the RT protein (Fig. 5), perhaps in extreme environments capable of supporting such catalytic activity (Lambowitz & Zimmerly, 2004). The acquisition of the RT-maturase protein may have enabled group II introns to extend into bacterial populations, in which they survived and spread by inserting into nonessential genes, such as other mobile elements. Their mobility would have been linked to the replication machinery of the host cell, with insertion into single-stranded DNA target sites at a replication fork, using a nascent lagging DNA strand as the primer for reverse transcription. Subsequent acquisition of the D and En domains would then have enabled group II introns to insert efficiently by reverse splicing into double-stranded DNA target sites (Lambowitz & Zimmerly, 2004). This acquisition may have provided group II introns present in the alphaproteobacterium ancestor of the mitochondria, which invaded an archaeal host during eukaryogenesis (Fig. 5), with the features required for the massive invasion of the host-cell genome, leading to the creation of spliceosomal introns (Koonin, 2006).
Conclusions and perspectives
Since the discovery 14 years ago of group II introns in bacteria, information has accumulated about the architectural and functional organization of group II introns as catalytic RNAs, the mechanism of interaction between the IEP and the intron RNA catalytic core leading to RNP assembly, the mechanisms of intron mobility and DNA target site recognition, and the development of these retroelements as new gene-targetting vectors. These advances have provided new insight into the fundamental nature and applications of these ribozymes.
Improvements in our understanding of the course of evolution of group II introns in bacteria should also provide clues to the origin of the spliceosome and the possible role of these introns in eukaryogenesis. What role do group II introns play in bacterial genomes? Although some group II introns are inserted into nonessential regions of the genome, others are found in essential genes. The current situation may result from bacterial genome streamlining and population size thus decreasing intron numbers or leading to such genetic elements being entirely absent in some bacterial species. There may also be unknown mechanisms in bacteria controlling intron expression and mobility. Despite extensive debate concerning the way in which introns have played a critical role in eukaryote evolution by inserting into protein-coding genes, very little is known about group II intron evolution in bacteria. Progress in bacterial genome sequencing programs may shed light on the evolution of bacterial group II introns, revealing lineages other than the current known subclasses IIA, IIB and IIC, and possibly on group II intron RNAs functioning in the absence of the RT protein. This work may involve sequencing metagenomes from extreme environments. Studies on natural bacterial populations may also improve our understanding of the current contribution of group II introns to bacterial evolution and the possible functional role of these introns. The current view of group II introns as selfish elements maintained throughout prokaryotic evolution may turn out to be a simplistic assumption based on our current lack of knowledge of bacterial group II intron evolution.
Future studies on group II introns will focus not only on fundamental evolutionary aspects, but also on the development of gene-targetting vectors. The development of such tools based on other bacterial introns, with specific mechanistic intron mobility features different from those of Ll.LtrB, should make this technology more broadly applicable. The mobility mechanism of these bacterial introns does not depend on recombination. This feature makes these retroelements potential tools for genetic manipulation in higher eukaryotes with low levels of recombination, such as plants and animals. Like the splicing factors identified for group II introns in organelles, bacterial group II introns probably use certain host factors that might contribute to intron RNA folding and mobility. In the next few years, efforts should be made to identify such factors, which might increase the functionality of these bacterial group II introns in various prokaryotic and eukaryotic hosts.
Studies on the architectural and functional organization of group II introns will continue to expand our knowledge about the splicing mechanism of these ribozymes, and will also contribute to parallel investigations on related elements, such as spliceosomes and retrotransposons. The excision of group II introns as circles has been demonstrated to occur in vivo for yeast introns such as aI2 and for some plant mitochondrial introns. The recent detection of circular intron molecules in bacteria indicates that this particular mode of excision may be more widespread in nature than initially thought. The splicing mechanism involved in group II intron circle formation, and the biological role of such circles in intron spread remain key issues that we can expect to see resolved in the next few years.
Work at the authors' laboratory was supported by the Spanish Ministerio de Educación y Ciencia (BIO2005-02312) and Junta de Andalucía (CVI-01522). F.M.G.-R. holds an I3P postdoctoral fellowship (CSIC), and J.I.J.-Z. is a Ramón & Cajal hired scientist.