Persistence of a mobile DNA element in a population reflects a balance between the ability of the host to eliminate the element and the ability of the element to survive and to disseminate to other individuals. In each of the three biological kingdoms, several families of a mobile DNA element have been identified which encode a single protein that acts on nucleic acids. Collectively termed homing endonuclease genes (HEGs), these elements employ varied strategies to ensure their survival. Some members of the HEG families have a minimal impact on host fitness because they associate with genes having self-splicing introns or inteins that remove the HEGs at the RNA or protein level. The HEG and the intron/intein gene spread throughout the population by a gene conversion process initiated by the HEG-encoded endonuclease called ‘homing’ in which the HEG and intron/intein genes are copied to cognate alleles that lack them. The endonuclease activity also contributes to a high frequency of lateral transmission of HEGs between species as has been documented in plants and other systems. Other HEGs have positive selection value because the proteins have evolved activities that benefit their host organisms. The success of HEGs in colonizing diverse genetic niches results from the flexibility of the encoded endonucleases in adopting new specificities.
Mobile insertion elements have evolved strategies to ensure their long-term survival in natural populations. Clues to how these elements maintain their existence have come primarily from studies of transposable elements, the most ubiquitous and best-studied class of mobile genetic elements. Transposable elements cannot exist independently of their hosts, and so have evolved mechanisms to mitigate the deleterious effects of their insertion into host genomes, such as self-restriction of their copy number or insertion into non-coding regions . Accumulating evidence suggests that some transposable elements have evolved several different methods to confer selective advantages to their host organism [1,2].
HEGs (homing endonuclease genes) are a class of mobile genetic elements that all encode DNA endonucleases or related nucleic acid processing proteins which exhibit some of the same survival strategies as transposable elements (reviewed in [3,4]). These highly invasive elements occur in organisms from each of the biological kingdoms, the archaea, bacteria and eukarya. In eukaryotic cells, HEGs have been identified within both the mitochondrial and chloroplast genomes, and also the nuclear genome. They can be found as intergenic freestanding genes, or within genes that encode introns or inteins (in-frame polypeptide sequences that self-splice from within precursor proteins). These elements may be considered the most streamlined DNA parasites because they consist of only a single gene that encodes a single protein. This minireview describes some ideas about how HEGs may have evolved the underlying molecular mechanisms that account for their proliferation.
2Classification of enzymes encoded by homing endonuclease genes
The proteins encoded by HEGs introduce site-specific breaks within genomic DNA. This enzymatic activity underlies the efficient propagation of these DNA elements within a population and allows them to move laterally between organisms. The presence of conserved sequence motifs within the proteins permits three HEG families to be defined; one of these is present in Group I and Group II introns and the other two are associated with Group I introns only. The dissimilarity of the structures and sequences of the endonucleases indicates that the families arose independently, even though they utilize similar survival strategies. This minireview will focus on the elements found in Group I introns and inteins.
2.1The LAGLIDADG family
This family includes over 130 known members and is comprised predominantly of endonucleases, but also includes RNA maturases that facilitate RNA splicing . The genes that encode these proteins occur in nearly every possible arrangement; as freestanding ORFs, in association with Group I and archaeal intron genes and as part of intein genes. Proteins in this family were originally identified by the presence of either one or two conserved consensus motifs termed the LAGLIDADG or dodecapeptide motifs . Analysis of the family using a hidden Markov model reveals that only 8% of the residues are highly conserved with most of these occurring in the LAGLIDADG sequences . Like the other endonuclease families, the 14–31 bp target sites that are recognized by LAGLIDADG enzymes are extremely long as compared to those of type II restriction enzymes (4–8 bp). They require divalent cations as co-factors, and generate four nucleotide extensions with 3′-overhangs. The X-ray structures of three LAGLIDADG enzymes, I-CreI, I-DmoI and PI-SceI [7–9], have been determined.
2.2The ββα-Me family
Two homing endonuclease families containing histidine-rich consensus motifs had been defined previously. Proteins belonging to the His-Cys box family contain a ∼30 amino acid region with two histidines and three cysteines  while those in the H-N-H family contain conserved histidine and asparagine residues spanning 30–33 amino acids [11,12]. It is now clear that these conserved residues comprise the active sites of proteins in both families that are structurally related . This has prompted the re-classification of the H-N-H and His-Cys box families as a single family (the ‘ββα-Me’ family). Proteins in this family are encoded by freestanding ORFs and by intron and intein genes. Those containing the H-N-H consensus sequence act on nucleic acids in diverse ways, including as DNA endonucleases and nuclease colicins. The motif is also present within enzymes that initiate homing of Group II introns. Unlike the Group I endonucleases, these enzymes form ribonucleoprotein complexes with their respective intron RNAs to effect double strand cleavage .
The recent structural comparison of a His-Cys box protein (the I-PpoI endonuclease ), an H-N-H protein (the E9 DNase colicin ), and a protein that belongs to neither class (the non-specific Serratia nuclease ) indicates that the metal ion co-factor and the two β-strands and α-helix that comprise the active sites are superimposable among the three proteins [13,18]. The catalytically important histidine residues co-ordinate the phosphate molecules or the catalytic metal ion co-factor. Interestingly, the conserved asparagine in the E9 colicin that is part of the H-N-H motif forms hydrogen bonds that stabilize the active site and is replaced by a histidine in the His-Cys box protein I-PpoI that co-ordinates a structural zinc atom.
2.3The GIY-YIG family
The GIY-YIG module is 70–100 amino acids long and consists of five conserved sequence motifs, the first of which contains signature ‘GIY’ and ‘YIG’ tripeptides that flank a 10–11 amino acid segment . The genes that encode these endonucleases occur as intergenic ORFs or within Group I intron genes and have been detected in bacteriophage and fungi. Limited structural information is available for these enzymes, but it is known from NMR studies that the GIY-YVG residues of one enzyme from bacteriophage T4, I-TevI, are part of a three-stranded anti-parallel β-sheet that may play a role in positioning the DNA substrate for cleavage .
3Integration of HEGs into intron and intein genes
DNA elements that lessen the evolutionary survivability of the host might eventually be eliminated. Frequently, HEGs are found associated with genes that encode introns or inteins, which stabilize their relationship with the host organism. The RNA and protein splicing activities of introns and inteins excise the HEG information from the host genes, either at the mRNA or protein level, and thereby minimize its potentially negative effect on host fitness. The HEG-encoded endonuclease activity initiates a mechanism that provides for the stable replication and mobility of the endonuclease element and its associated intron or intein gene.
3.1HEGs in introns
Endonuclease open reading frames are found in both Group I introns and archaeal introns, where they are translated from the precursor RNA or from the spliced product. Being situated within an intron renders the endonuclease ORF nearly invisible to natural selection . The intron and its associated ORF are propagated when the endonuclease promotes ‘intron homing’, a gene conversion event whereby the intron is transferred from an intron-containing allele (the donor) to the identical location in an intronless allele (the recipient) (Fig. 1). Unlike transposases that play multiple roles in transposition, the sole function of homing endonucleases, apparently, is to initiate homing by introducing a double strand break into the intronless allele. Host repair of this strand break results in copying the intron from the donor to the recipient with concomitant co-conversion of flanking sequences. Any event that introduces a double strand break can initiate intron homing; indeed, the EcoRI restriction enzyme, which does not occur naturally as an intron-encoded ORF, stimulates intron homing from an artificial construct . Homing endonuclease open reading frames are also located within Group II introns and also initiate homing of their DNA elements, but use a different pathway than the Group I HEGs.
Several intron-encoded LAGLIDADG proteins function as RNA maturases (proteins that promote the RNA splicing of introns) reflecting the extent to which these proteins can adapt. Three maturases are encoded by introns within the yeast mitochondrial cytb gene, and point mutations in these ORFs result in splicing defects that can be complemented by expression of the maturase in trans (reviewed in ). That some LAGLIDADG proteins function as both DNA endonucleases and RNA maturases suggests an evolutionary relationship between these two activities. Yeast mtDNA intron aI4α codes for a homing endonuclease with a latent maturase activity that can be activated by a point mutation . Two other intron ORFs, one encoded by the second intron of the Saccharomyces capensis cytb gene and the other from the first intron of the cox1 gene of Schizosaccharomyces pombe function as both endonucleases and maturases in vivo , and both activities have been demonstrated in vitro for the intron-encoded protein of the Aspergillus nidulans apocytochrome b gene . From an evolutionary perspective, it seems likely that the maturase activities only evolved after pre-existing Group I introns acquired LAGLIDADG endonucleases. The evolution of the maturase activity may have stimulated mobility of the intron and the ORF if transposition involved a reverse splicing mechanism that required an RNA intermediate.
3.2HEGs in inteins
In another display of the ingenious ways that endonuclease elements minimize their impact on host fitness, they are found inserted in over 100 instances within self-splicing intein genes. The combined intein gene and endonuclease ORF form a unit that is situated in-frame within a host gene . Protein splicing by the intein removes itself and the accompanying endonuclease sequences at the protein level, and concomitantly generates a functional host protein. The endonuclease effects mobility of its own element and the splicing activity by initiating intein homing, which is similar to intron homing and involves cleavage of inteinless alleles and subsequent gene conversion (Fig. 1). Members of both the LAGLIDADG and ββα-Me families have been identified within inteins, but some non-canonical inteins have been described that contain sequences that bear little or no resemblance to the known families .
4Evolutionary dynamics of endonuclease genes
Homing endonuclease genes can be considered to have a life cycle similar to one proposed for transposable elements that includes dynamic replication, inactivation, and eventual elimination [28,29]. The replication events not only include intron and intein homing, which are one form of intraorganismal propagation, but also horizontal transmission events that mobilize them to other phyla or species. Movement to another organellar compartment or organism counteracts the eventual degradative process that occurs within a particular niche over evolutionary time by providing new, potentially fertile environments to parasitize.
The homing endonuclease cleavage activity not only drives intein and intron homing, but also underlies the transposition of these elements within and between species. When an endonuclease creates recombinogenic free DNA ends within the genome, it creates the opportunity for its own gene to integrate into the cut site at some frequency, perhaps by illegitimate recombination. Consistent with this scenario is the observation that transposable Ty elements become inserted at the position of a double strand break in the yeast nuclear genome when the break cannot be repaired using homologous sequences . The observation that short mitochondrial DNA fragments can also become inserted at the break site is particularly intriguing because the mode by which this DNA travels to the nucleus may recapitulate how ancestral organellar HEGs transited and invaded the nuclear genome [31,32]. Intergenic regions and regions encoding introns and inteins are expected to be particularly amenable to invasion because no important function is disrupted, and indeed, these are the most frequently observed locations for HEGs. There is telltale evidence that intron invasion occurred at some time in the past because endonuclease recognition sequences flank the endonuclease gene within one of the mobile T4 phage Group I introns  as would be expected if HEGs inserted into pre-existing introns.
The HEGs and their associated introns or intein genes are likely to have had independent origins for several reasons. First, HEG endonucleases from very different families occur in closely related introns (). For example, three homologous introns of phage T4 in the td, nrdB and sunY genes contain HEG endonucleases from the GIY-YIG and ββα-Me families. Similarly, HEG endonucleases from both the LAGLIDADG and ββα-Me families are found in inteins [5,11]. Second, HEGs from the same family are found inserted at different positions within the catalytic core of Group I introns . HEG ancestry is independent of any particular niche because members from a single family (i.e. LAGLIDADG) occur within intergenic regions, intein genes and intron genes.
The structure of the PI-SceI intein  graphically depicts the outcome when an HEG that encodes an endonuclease invades a pre-existing gene that encodes a protein splicing element to create a hybrid protein with both activities. In the primary structure of the protein, the protein splicing residues are separated into two polypeptide chains by the endonuclease residues, which fold into a single endonuclease domain. The splicing protein chains fold to create a normal splicing domain, despite being interrupted by the endonuclease residues (Fig. 2C).
Although intron and intein homing have been reproduced in the laboratory, transposition of HEGs to new sites in the same organism or lateral transmission to new organisms, have not been duplicated. Instead, evidence that these events occur has come primarily from extensive phylogenetic comparisons. For example, a common LSU rRNA Group I intron HEG is found in the chloroplast of a Chlamydomonas pallidostigmatica and in the mitochondrion of an Acanthamoeba castellanii, suggesting both interphylum and interorganellar transfer . The full extent of the proliferative nature of HEGs is dramatically illustrated by a survey of >300 diverse land plants that reveals the recent occurrence of a massive invasion of plant mitochondria by a Group I intron in the cox1 gene [36,37]. This may have originated from a fungal donor. Several lines of evidence suggest that this invasion resulted from more than 30 separate lateral transfers of the intron and its associated HEG. First, the cox1 intron is sporadically distributed among the angiosperm cox1 genes, which would not have been predicted from vertical transmission models. Second, the phylogenetic histories of the cox1 intron and of the organisms where it is located are incongruent; those introns that are most similar are located in distantly related plant species. Finally, the patterns of the co-conversion tracts in the exons flanking the intron insertion site, which are created during the double strand break repair process, are consistent with multiple independent invasion events having taken place. The recent occurrence of this lateral transfer and its pervasiveness is striking.
The high frequency of lateral transmission of HEGs, their degradation and eventual loss is also evident from a study of the ω genetic element in the mitochondrial LSU rRNA gene of yeast . Of 20 strains surveyed, 14 have the intron, five of which contain the HEG that encodes the I-SceI endonuclease. Three of the five HEGs are presumed functional while the remaining two are likely to be inactive due to insertions. As in the case of the plant study , extensive horizontal transmission is inferred from the sporadic distribution of the different states of the intron and the incongruency between the phylogenies of the intron or HEG and that of the host. However, the similarity of the intron and HEG phylogenies to each other indicates that once the two elements were associated, they traveled together during the invasive cycle. As a whole, the phylogenetic survey supports the notion of a ‘life cycle’ for these elements that includes (1) rapid invasion of a population by horizontal transmission, (2) slow degeneration of the elements, (3) eventual loss, presumably by precise excision, and (4) reinvasion into a naïve host (Fig. 3, ). In the case of the ω-element, the frequency of horizontal transmission has been estimated as being on a time scale of 106–107 years, which is remarkably high given that these elements do not exist in an extracellular form and therefore must await a chance interaction between species . Thus, the frequent transfer of ω to new organisms plays a key role in the long-term persistence of this element.
The same general conclusions drawn for intron-associated HEGs, their frequent transposition during evolution and their transposition to related species, apply to HEGs located within intein genes. A phylogenetic analysis of the LAGLIDADG HEGs in intein genes reveals that once they associate with a particular class of host element, either with an intein gene or intron gene, they do not move to a different class . As in the case of intron HEGs, degeneration of HEGs within intein genes is observed; a splicing proficient intein from certain yeast strains contains endonucleases that are active only under conditions of reduced specificity  or are presumed inactive due to active site mutations (unpublished observation concluded from ). Deletion of HEGs from intein genes occurs to give rise to ‘mini-inteins’ that catalyze protein splicing but not intein homing [5,26,41]. Deletion of the HEG removes a dispensable endonuclease activity and need not occur precisely as long as the splicing activity remains intact. By contrast, loss of the intact intein may be deleterious because the presence of remaining endonuclease and/or protein splicing residues may impair the activity of the host protein since intein elements are often situated in close proximity to critical active site residues .
5Endonuclease genes can confer benefits to their hosts
Freestanding genes that exist outside the protective confines of introns or inteins are prone to deletion or mutation, but those that evolve functions that increase the fitness of the host organism will be positively selected and maintained. An example of the evolution of beneficial functions by HEG proteins is the HO endonuclease (F-SceII) that initiates mating type interconversion in yeast, whereby haploid α or a cells produce cells of the opposite mating type . These can conjugate to form MATα/MATa diploids, which may be at a selective advantage relative to haploids in certain environments. Mating type switching resembles intein or intron homing in several respects. HO introduces a double strand break at the MAT locus to initiate double strand break repair that uses genetic information from one of two non-transcribed homologues on the same chromosome. However, unlike HEGs that initiate intron/intein homing, the HO gene does not propagate its own gene by double strand break repair. The evolution of beneficial host functions by endonuclease families is not restricted to HEG proteins. For example, ancestors of type II restriction endonucleases have given rise to the MutH enzyme that participates in methyl-directed DNA mismatch repair .
Some members of the ββα-Me family in bacteria confer an advantage to their host by functioning as colicins, proteins that enable the bacteria to colonize the gut by killing competitor strains . During times of nutrient or environmental stress, these plasmid-encoded toxins are expressed by Enterobacteriaceae, enter the competing strains after binding to a receptor, and kill the bacteria by digesting the ribosomal RNA or chromosomal DNA. Thus, the nucleolytic function of these enzymes has been utilized to provide a practical benefit to the bacteria.
The splicing of HEG information from within introns or inteins may not entirely counter any negative effects that these elements exert on the host. In some cases, there is evidence these genes can confer positive benefits. Long-term mixed culture experiments that include both intron− and intron+ strains of Sulfolobus acidocaldarius result in the spreading of the intron through the population due to a selective advantage of intron+ cells . Furthermore, an intron-encoded endonuclease in the Bacillus subtilis phage SP82 cleaves the DNA of a related phage, SPO1, near the intron during mixed infections, thereby promoting SP82 propagation at the expense of its competitor, SPO1 . Whether all HEGs found in introns and inteins confer a host advantage requires additional study.
6Evolution of LAGLIDADG endonuclease structure
Clues to the evolution of endonuclease elements come predominantly from studies of the LAGLIDADG proteins since these are the most prevalent and the best characterized at the structural level. Evolution of these proteins has occurred at two levels; different subunit compositions have evolved within the protein family, and each protein has evolved its own distinct specificity. LAGLIDADG proteins exist both as single-motif proteins, such as I-CreI, which forms homodimers and binds to pseudopalindromic sites, and as two-motif proteins, such as PI-SceI or I-DmoI, which are monomers that recognize sites with little or no symmetry. A possible scenario for the evolution of this family is that the ancestral gene encoded a single-motif protein that underwent gene duplication to give rise to the two-motif proteins. The internal two-fold symmetry within two-motif proteins that would have resulted from gene duplication was first suggested for I-DmoI and I-PorI based on protein footprinting studies . The subsequent structural determination of the I-CreI, PI-SceI and I-DmoI enzymes directly reveals this symmetry [7–9] (Fig. 2). The I-CreI dimer closely resembles the I-DmoI monomer and the endonuclease domain of PI-SceI [9,49,50]. In all three proteins, the two LAGLIDADG motifs comprise two α-helices that pack tightly against one another and form the hydrophobic core. Interestingly, in spite of the structural similarity between I-CreI and PI-SceI, one sequence alignment analysis excludes I-CreI from being classified as a LAGLIDADG endonuclease , which may indicate that this protein is more closely related to ancestral one-motif enzymes. The evolution of fused two-domain endonucleases, such as I-DmoI and PI-SceI, permitted each subdomain to evolve binding specificities independently, thereby allowing these enzymes to recognize asymmetric rather than palindromic sequences. This feature may have contributed to the flexible nature of these enzymes in adapting new binding specificities. Further evolution of the DNA binding activity of the PI-SceI intein occurred because there is a short sequence, the DNA recognition region (DRR), inserted within the splicing domain that forms an extended subdomain and contains additional specificity determinants . It is absent from the mini-inteins, which do not bind DNA, and the structurally related hedgehog proteins .
HO may represent an example of a homing endonuclease that evolved an altered target specificity from a pre-existing protein through mutation. HO and PI-SceI, which are both LAGLIDADG endonucleases, are 34% identical, suggesting that they share a common ancestor . Moreover, the close proximity of the HO and PI-SceI genes on chromosome IV (<80 000 bp) suggests that they arose via gene duplication rather than from separate invasion events . Interestingly, HO contains the intein motifs except for one at the extreme C-terminal end  and lacks protein splicing activity. Taken together, this implies that HO is a degraded intein that may have evolved from PI-SceI. PI-SceI and HO are like most LAGLIDADG endonucleases in that their recognition sequences are extremely long (31 and 18 bp, respectively), but only a smaller subset of these nucleotides is critical for cleavage activity [54,56]. The HO and PI-SceI target sites are similar and within these sequences, most of the nucleotides required for cleavage by both enzymes are in common . In turn, this observation implies that these positions are contacted by homologous amino acids in both proteins. Thus, the evolution of non-overlapping specificities for the two proteins may be accounted for by mutation of only a limited number of specific contacts. The ability of these enzymes to acquire new specificities to related target sites reflects the extreme plasticity of homing endonucleases that not only underlies their success in evolving beneficial host functions, but also their ability to move laterally across species barriers.
One evolutionary question that remains unresolved is how an RNA splicing activity evolved from a DNA binding enzyme. Although much is known about the structure and binding interactions of LAGLIDADG endonucleases, the maturases are poorly understood. A recent study of the maturase encoded by the A. nidulans AnCOB Group I intron reveals that it binds tightly to the mRNA and facilitates its rapid RNA folding, but does not catalyze the RNA-mediated reaction . If the LAGLIDADG maturases and endonucleases are structurally analogous, it is likely that residues in maturases analogous to those located along the saddle-shaped DNA binding surface of the endonucleases establish RNA contacts. The single amino acid substitution that unmasks the aI4α maturase activity occurs C-terminal to the first LAGLIDADG motif , which is the same general region that mediates DNA binding of several endonucleases. Thus, a DNA endonuclease may evolve maturase activity when RNA binding contacts are acquired on the protein surface that binds DNA. Detailed mutational and structural analyses of maturases will be required to define the determinants of maturase activity.
Homing endonuclease genes are selfish DNA elements that have utilized diverse strategies to avoid being eliminated from the host genome. They have associated with introns and inteins, have transferred laterally between cellular compartments and between phyla, and have developed novel activities that benefit their host organism. This success in filling a variety of niches can be attributed to the flexibility of the encoded proteins in adopting new recognition site specificities. This observation offers the possibility that enzymes with altered specificity can be developed in the laboratory using combinatorial or other methods. Rare-cutting reagents are useful in a variety of protocols that involve the manipulation of complex genomes . Questions remain regarding the mechanism of horizontal transmission, including the nature of the vectors that transfer the elements, the identity of the nucleic acid intermediates that permit integration of the elements at foreign loci and the types of molecular changes that alter the specificity of the enzymes. Answers may be provided as additional organismal genomes are sequenced, and as structures of additional HEG proteins are determined.
I thank the two anonymous reviewers for improving the clarity of the review and for bringing to my attention the structural relatedness of the H-N-H and His-Cys box families.