Standard Article

You have free access to this content

Retrotransposition and Human Disorders

  1. Eric M Ostertag,
  2. Haig H Kazazian Jr

Published Online: 27 JAN 2006

DOI: 10.1038/npg.els.0005492

eLS

eLS

How to Cite

Ostertag, E. M. and Kazazian, H. H. 2006. Retrotransposition and Human Disorders. eLS. .

Author Information

  1. University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA

Publication History

  1. Published Online: 27 JAN 2006

This is not the most recent version of the article. View current version (15 SEP 2011)

Introduction

  1. Top of page
  2. Introduction
  3. Insertional Mutagenesis
  4. Unequal Homologous Recombination
  5. Frequency of Disease Caused by Retrotransposons
  6. See also
  7. References
  8. Further Reading
  9. Web Links

Retrotransposons are a class of transposable element that are extremely prevalent in humans, constituting approximately one-third of the genome (International Human Genome Sequencing Consortium, 2001). Retrotransposons are classified as autonomous or nonautonomous based upon whether they encode proteins required for their retrotransposition. In the human genome, the most abundant retrotransposons are the autonomous L1 element and the nonautonomous Alu element. Many scientists consider retrotransposons to be molecular parasites that have expanded selfishly into the genome. By expanding into the genome over evolutionary time, it is clear that retrotransposons have played a substantial role in determining the structure of the genome, and the possibility exists that they may indeed serve some beneficial function for the host. However, their ability to occasionally cause genetic disease is irrefutable. The main mechanisms by which transposable elements cause disease are insertional mutagenesis and unequal homologous recombination. See also Short Interspersed Elements (SINEs), and Telomeric and Subtelomeric Repeat Sequences

Insertional Mutagenesis

  1. Top of page
  2. Introduction
  3. Insertional Mutagenesis
  4. Unequal Homologous Recombination
  5. Frequency of Disease Caused by Retrotransposons
  6. See also
  7. References
  8. Further Reading
  9. Web Links

Insertional mutagenesis occurs when a retrotransposon inserts into or near a gene and thereby abolishes or alters the gene's expression or results in the production of a mutant protein. Most of the retrotransposons in the human genome have acquired mutations sufficient to inactivate them and therefore they cannot cause disease by insertional mutagenesis. However, some retrotransposons remain active. In 1988, the first examples of human disease caused by insertional mutagenesis were reported (Kazazian et al., 1988). Two independent insertions of L1 elements had inserted de novo into the factor VIII gene (coagulation factor VIII, procoagulant component (hemophilia A) (F8)) of hemophilia A patients. There are now 13 known recent or de novo L1 insertions that have resulted in independent cases of human disease.

Mechanism of retrotransposition

As autonomous retrotransposons, L1 elements encode the proteins required to promote their own mobilization. These elements produce a bicistronic ribonucleic acid (RNA), a single transcript with two open reading frames (ORFs). The first ORF encodes an RNA- binding protein and the second ORF encodes a protein with endonuclease and reverse transcriptase activity. While it is clear that both the ORF1 and ORF2 proteins are required for autonomous retrotransposition, it is unknown if additional host proteins are also required (Moran et al., 1996).

The mechanism of human L1 retrotransposition remains largely undetermined. However, the study of these elements in a cultured cell assay has provided some clues and additional inferences can be made based upon the study of similar retrotransposons from other organisms (Figure 1). The L1 element is first transcribed from an internal promoter, probably by RNA polymerase II. The message is next translated in the cytoplasm by an unknown mechanism to produce ORF1 and ORF2 protein. After gaining access to the nucleus, the transcript is reverse transcribed and reintegrated into the genome. It is thought that reintegration occurs by a process called target primed reverse transcription (TPRT). During TPRT, the endonuclease domain of the ORF2 protein cleaves genomic deoxyribonucleic acid (DNA), creating a structure that the reverse transcriptase domain of the ORF2 protein can use to prime reverse transcription of the RNA (Luan et al., 1993) (Figure 2). The retrotransposition process results in a DNA copy of the L1 element that produced the RNA transcript. Reintegration often creates a 7–20 base pair direct repeat of the endonuclease target site on each end of the inserted L1, called the target site duplication (TSD). The endonuclease domain of the L1 ORF2 protein displays a weak target site preference such that L1 insertions occur relatively randomly throughout the genome. However, a target site preference for the AA|TTTT hexanucleotide and minor variants is sufficiently strong such that the TSD can be used as a genetic signature of an insertion effected by the L1 endonuclease (Jurka, 1997).

thumbnail image

Figure 1. Simplified schematic of the proposed mechanism of L1 retrotransposition. (1) A full-length active L1 element in the genome is first transcribed using an internal promoter. (2) The L1 transcript is exported to the nucleus. (3) The ORF1 and ORF2 proteins are translated and preferentially bind the RNA molecule that encoded them (cis preference). (4) The L1 RNA and associated protein(s) return to the nucleus by active transport or entry during nuclear membrane breakdown at mitosis. (5) The L1 RNA is reverse transcribed and integrated into the genome by target primed reverse transcription (TPRT). The process depicted results in a DNA copy of the original L1 element at a new genomic location. Note that the target site duplications flanking the original L1 element (represented by black rectangles) will differ from the target site duplications flanking the L1 copy at a new genomic location (represented by shaded rectangles). The new L1 copy also often differs from the original by truncating or rearranging during the retrotransposition process.

thumbnail image

Figure 2. Target primed reverse transcription (TPRT). The L1 retrotransposon is thought to integrate by TPRT. (1) During L1 TPRT, the retrotransposon's endonuclease cleaves one strand of genomic DNA at its target site (rectangle), producing a 3′ hydroxyl (OH) at the nick. (2) The retrotransposon RNA hybridizes at the nick. (3) The retrotransposon's reverse transcriptase uses the free 3′ OH to prime reverse transcription. Reverse transcription proceeds, producing a cDNA of the retrotransposon RNA. (4) The endonuclease cleaves the second DNA strand of the target site to produce a staggered break. (5) The cDNA inserts into the break by an unknown mechanism. (6) Removal of RNA and completion of DNA synthesis produces a complete insertion flanked by target site duplications (TSDs).

In addition to the mutations caused by insertion of L1 elements, insertions of nonautonomous retrotransposons that are either causative or associated with disease include at least 18 insertions of Alu elements (Deininger and Batzer, 1999) and two insertions of an element termed SVA (Kobayashi et al., 1998; Rohrer et al., 1999). Recently inserted nonautonomous elements are flanked by TSDs that resemble those created by the L1 endonuclease. Therefore, it appears that all reports of de novo insertional mutagenesis by transposable elements in the human genome are the result of the L1 ORF2 protein working in cis to retrotranspose its own transcript or in trans to retrotranspose transcripts of nonautonomous retrotransposons. Recent studies of L1 retrotransposition in cultured cells have demonstrated that the L1 proteins show a strong preference for retrotransposing the transcript that encoded them (Esnault et al., 2000; Wei et al., 2001). Apparently, some nonautonomous retrotransposons have a mechanism to undermine the strong cis preference.

Mechanism of disease

Nine of the 13 reported disease-causing L1 insertions are inserted directly into a gene exon. Examples include two independent insertions into the factor VIII gene (F8) causing hemophilia A, four independent insertions into the dystrophin gene (dystrophin (muscular dystrophy, Duchenne and Becker types) (DMD)) causing Duchenne muscular dystrophy or X-linked dilated cardiomyopathy, an insertion into the cytochrome b-245, beta polypeptide (chronic granulomatous disease) (CYBB) gene causing chronic granulomatous disease, an insertion into the factor IX gene (coagulation factor IX (plasma thromboplastic component, Christmas disease, hemophilia B) (F9)) causing hemophilia B, and an insertion into the adenomatosis polyposis coli (APC) gene causing colon cancer. In most of these cases, the mechanism of disease is presumably the introduction of nonsense codons into the coding sequence or the skipping of the disrupted exon during splicing. For example, one of the dystrophin insertions results in exon skipping and subsequent out-of-frame translation. However, in the case of the L1 insertion causing X-linked dilated cardiomyopathy, the insertion was into the 5′ untranslated region of the muscle exon 1 of the DMD gene. The insertion did not disrupt the dystrophin reading frame, but likely affected transcription or transcript stability. The disease-causing Alu insertions are also frequently the result of direct insertions into a gene exon. Examples include, but are not limited to, independent insertions into the F9 gene causing hemophilia B, an insertion into the eyes absent homolog 1 (Drosophila) (EYA1) gene causing brachiootorenal syndrome, an insertion into the fibroblast growth factor receptor 2 (bacteria-expressed kinase, keratinocyte growth factor receptor, craniofacial dysostosis 1, Crouzon syndrome, Pfeiffer syndrome, Jackson–Weiss syndrome) (FGFR2) gene causing Apert syndrome, and an insertion into the hydroxymethylbilane synthase (HMBS) gene causing acute intermittent porphyria.

The four known L1 inserts into gene introns cause disease by introducing alternative splice sites that result in an improperly spliced transcript, by decreasing transcription of the gene or by decreasing the stability of the primary transcript. For example, insertion of an L1 into intron 5 of the CYBB gene of a patient with chronic granulomatous disease and an L1 insertion into intron 7 of the Fukuyama type congenital muscular dystrophy (fukutin) (FCMD) gene of two related Fukuyama-type congenital muscular dystrophy patients both resulted in heterogeneous splicing, while insertions into intron 2 of the β-globin gene (hemoglobin, beta (HBB)) of a β-thalassemia patient and intron 1 of the retinitis pigmentosa 2 (X-linked recessive) (RP2) gene of a retinitis pigmentosa patient caused low or absent messenger RNA (mRNA) levels without evidence of aberrant splicing. Likewise, Alu elements can cause disease when inserting into gene introns. Examples include an insertion into the neurofibromin 1 (neurofibromatosis, von Recklinghausen disease, Watson disease) (NF1) gene of a neurofibromatosis patient resulting in exon skipping and out-of-frame translation, and an insertion into the FGFR2 gene of a patient with Apert syndrome also resulting in exon skipping. Apert syndrome is a dominant gain-of-function disorder and in this case the Alu insertion resulted in ectopic expression of the mutant protein. Occasionally, an innocuous retrotransposon insertion into an intron can result in disease over time by accumulating mutations that create potential splice sites. An ancient Alu element in intron 3 of the ornithine aminotransferase (gyrate atrophy) (OAT) gene had acquired a single base mutation creating a donor splice site that resulted in aberrant splicing of the OAT mRNA (Mitchell et al., 1991). A patient with gyrate atrophy of the choroid and retina inherited two copies of this mutation from consanguineous parents and produced almost no normal mRNA.

All but one of the L1 insertions occurred either prefertilization in the germ line or postfertilization and very early during embryogenesis. However, the insertion into the APC gene was a somatic event occurring in a colon cancer patient. The APC insertion was present in the cancerous cells but not in the surrounding normal cells. The pattern of retrotransposition corresponds well with the known activity of the mammalian L1 promoter. All studies to date have demonstrated L1 transcripts present only in germ-line cells and certain transformed cells but not in normal somatic tissues. As Alu elements are thought to use the L1 retrotransposition machinery for their mobility, it is not surprising that the Alu insertions also occurred in the germ line or early in development. One Alu insertion into the MLVI2 locus causing an association with leukemia may represent a somatic event although normal tissue was not available for comparison. If L1 is able to retrotranspose in cancerous cells as in the case of the APC insertion, it is plausible that Alu elements can be mobilized in cancerous cells also.

Ten of the 13 reported disease-causing L1 insertions are into genes on the X chromosome. This is probably due in part to a selection bias for discovering recent insertions in hemizygous genes. As an example, consider genes that cause recessive disorders when mutated. A new L1 insertion into a gene on the X chromosome will manifest phenotypically in the first male offspring who inherits the mutation because he will have no wild-type copy of the gene, while a new L1 insertion into a gene on one of the autosomes will be phenotypically silent for both males and females unless the gene on their other autosome is also mutated. The X chromosome does contain a higher density of L1 elements than the autosomes, which could reflect an insertion preference for the X chromosome. However, the increased density of L1 on the X chromosome may also be a postinsertion selection bias such as a slower rate of loss of L1 elements from the X chromosome because of reduced recombination rates or positive selection because of possible participation of L1 elements in X-chromosome inactivation.

For a table summarizing the retrotransposon insertions resulting in human disease, visit the Website at the University of Pennsylvania Health System (see Web Links).

Unequal Homologous Recombination

  1. Top of page
  2. Introduction
  3. Insertional Mutagenesis
  4. Unequal Homologous Recombination
  5. Frequency of Disease Caused by Retrotransposons
  6. See also
  7. References
  8. Further Reading
  9. Web Links

Human disease can also occur when two similar transposable elements undergo unequal homologous recombination, thereby causing a deletion or duplication. It is no surprise that the transposable elements which are most abundant in the genome, the L1 and Alu elements, are responsible for all reported cases of human disease caused by unequal homologous recombination of transposable elements. There have been three reported examples of disease caused by L1 elements, one producing a partial deletion of the paired collagen type IV genes collagen, type IV, alpha 5 (Alport syndrome) (COL4A5) and collagen, type IV, alpha 6 (COL4A6), one producing a partial deletion of the human beta subunit of the phosphorylase kinase gene (PHKB), and one producing a partial deletion of the ataxia telangiectasia mutated (includes complementation groups A, C and D) (ATM) gene (Gatti, personal communication; Burwinkel and Kilimann, 1998; Segal et al., 1999). These L1-mediated deletions resulted in Alport syndrome with associated diffuse leiomyomatosis, glycogen storage disease type two and ataxia telangiectasia respectively. A recent survey of disease-producing mutations caused by unequal homologous recombination of Alu elements reported 49 independent mutations occurring in both the germ-line and somatic cells (Deininger and Batzer, 1999).

It is unclear why Alu elements participate in a much greater number of unequal homologous recombination events than the L1 elements do, especially considering that L1 elements are on average longer than Alu inserts, providing longer stretches of sequence identity for recombination, and also make up a greater percentage of the genome by mass. Several possible explanations have been proposed to explain this discrepancy. First, Alu elements may contain sequences that make them more recombinogenic; however, this was not the finding in at least one experiment using cultured cells. Second, the average genomic distance between two L1 elements is greater than the average distance between two Alu elements, making any L1/L1 unequal recombination event both less likely to occur and more likely to result in a lethal mutation. Third, L1 elements tend to reside in more AT-rich DNA, while Alu elements reside in GC-rich DNA. As Alu elements are thought to use the L1 proteins for integration, this distribution is somewhat perplexing and may represent a postintegration selection bias. In any case, the fact that L1 elements tend to reside in the AT-rich, gene-poor DNA may indicate that L1/L1 unequal homologous recombination events do occur more frequently than suspected, but do not usually result in deletions of gene sequences.

Frequency of Disease Caused by Retrotransposons

  1. Top of page
  2. Introduction
  3. Insertional Mutagenesis
  4. Unequal Homologous Recombination
  5. Frequency of Disease Caused by Retrotransposons
  6. See also
  7. References
  8. Further Reading
  9. Web Links

Most of the several hundred thousand copies of L1 in the human genome are inactive and unable to cause insertional mutagenesis. L1 elements tend to truncate or rearrange during insertion (11 of the 13 disease-causing L1 insertions truncated during integration), resulting in an inactive copy of the progenitor element. Over time, L1 elements also accumulate spontaneous mutations that may inactivate them. Inactive L1 elements that can produce RNA are not retrotransposed in trans by active L1 elements because of the cis preference of the L1 proteins. Only full-length elements with ORFs are capable of insertional mutagenesis. Interestingly, 12 of the 13 reported recent disease-causing L1 insertions contain a short diagnostic nucleotide sequence indicating that they are members of a small active subfamily of L1 elements termed the Ta subfamily. Current estimates indicate that the average human diploid genome contains between 40 and 70 active L1 elements (International Human Genome Sequencing Consortium, 2001; Sassaman et al., 1997). Similarly, the de novo Alu element insertions are members of closely related subfamilies termed Ya5, Yb8 and Alu Y, indicating that only a small number of the Alu elements in the genome are currently capable of insertional mutagenesis (Deininger and Batzer, 1999).

The frequency of insertional mutagenesis caused by transposable elements has not yet been estimated experimentally. However, estimates can be calculated by taking the number of new mutations caused by insertion of transposable elements and dividing by the total number of characterized human mutations. Although arguments can be made that such calculations may be biased on either the high side or the low side, calculations by several groups indicate that L1 and Alu insertions each contribute to approximately 0.1% of human disease (Kazazian, 1999; Deininger and Batzer, 1999). Similar calculations indicate that approximately 1 in 4 to 1 in 100 people will have a new transposable element inserted somewhere in their genome, a fraction of which will insert into and mutate genes. An additional 0.3% or more of human disease is caused by unequal homologous recombination of retrotransposable elements. Taken together, these data indicate that retrotransposable elements continue to be a notable cause of human disease.

References

  1. Top of page
  2. Introduction
  3. Insertional Mutagenesis
  4. Unequal Homologous Recombination
  5. Frequency of Disease Caused by Retrotransposons
  6. See also
  7. References
  8. Further Reading
  9. Web Links

Further Reading

  1. Top of page
  2. Introduction
  3. Insertional Mutagenesis
  4. Unequal Homologous Recombination
  5. Frequency of Disease Caused by Retrotransposons
  6. See also
  7. References
  8. Further Reading
  9. Web Links
  • Deragon JM and Capy P (2000) Impact of transposable elements on the human genome. Annals of Medicine 32: 264273.
  • Furano AV (2000) The biological properties and evolutionary dynamics of mammalian LINE-1 retrotransposons. Progress in Nucleic Acid Research and Molecular Biology 64: 255294.
  • Kazazian Jr HH (2000) L1 retrotransposons shape the mammalian genome. Science 289: 11521153.
  • Kazazian Jr HH and Moran JV (1998) The impact of L1 retrotransposons on the human genome. Nature Genetics 19: 1924.
  • Luan DD and Eickbush TH (1996) Downstream 28S gene sequences on the RNA template affect the choice of primer and the accuracy of initiation by the R2 reverse transcriptase. Molecular and Cellular Biology 16: 47264734.
  • Makalowski W, Mitchell GA and Labuda D (1994) Alu sequences in the coding regions of mRNA: a source of protein variability. Trends in Genetics 10: 188193.
  • Maraia RJ (1995) Alu elements as a source of genomic variation: deleterious effects and evolutionary novelties. In: Maraia RJ (ed.) The Impact of Short Interspersed Elements (SINEs) on the Host Genome, pp. 124. Georgetown, TX: Landes Bioscience.
  • Miki Y (1998) Retrotransposal integration of mobile genetic elements in human diseases. Journal of Human Genetics 43: 7784.
  • Moran JV and Gilbert N (2001) Mammalian LINE-1 retrotransposons and related elements.In: Craig N (ed.) Mobile DNA, pp. 836869. Washington, DC: American Society for Microbiology.