Simple DNA repeats (trinucleotide repeats, micro- and minisatellites) are prone to expansion/contraction via formation of secondary structures during DNA synthesis. Such structures both inhibit replication forks and create opportunities for template-primer slippage, making these repeats unstable. Certain aspects of simple repeat instability, however, suggest additional mechanisms of replication inhibition dependent on the primary DNA sequence, rather than on secondary structure formation. I argue that expanded simple repeats, due to their lower DNA complexity, should transiently inhibit DNA synthesis by locally depleting specific DNA precursors. Such transient inhibition would promote formation of secondary structures and would stabilize these structures, facilitating strand slippage. Thus, replication problems at simple repeats could be explained by potentiated toxicity, where the secondary structure-driven repeat instability is enhanced by DNA polymerase stalling at the low complexity template DNA.
This minireview is dedicated to the FASEB-2012 meeting “Dynamic DNA Structures in Biology”, organized by Nancy Maizels and Sergei Mirkin.
Expansion of simple repeats: Genetic instability driven by propensity for secondary DNA structures
Population heterogeneity in copy number of simple DNA repeats, aptly called “dynamic mutation”, underlies many health conditions in humans 1. Trinucleotide repeat expansion is the mechanistic basis of such human diseases as Fragile X syndrome (CGG/CCG repeats), myotonic dystrophy, Huntington's disease (both CAG/CTG repeats), and Friedreich's ataxia (GAA/TTC repeats). While the position of repeats around the affected genes, as well as the mechanisms leading from repeat expansion to the disease state, differ 2–5, the common theme is repeat copy number instability (hence, “dynamic mutation”), especially their propensity to expand. The dynamic mutation phenomenon is not limited to trinucleotide repeats, but for unknown reasons the dinucleotide or tetranucleotide repeats, although numerous in the human genome, are associated with fewer disease-causing genes 3, 6. Some human minisatellite (much longer) repeats are also prone to expansion or contraction 7, 8, again without pathology in the absence of consequential genes nearby.
It was postulated early on that an important ingredient of the repeat propensity to expand is the ability of these sequences to form secondary structures in pure DNA in vitro – triple-stranded or slipped structures when in duplex DNA, hairpins and guanine quartet (G4) clusters when in single-stranded (ss) DNA 9, 10. Such secondary structures not only stabilize strand-slippage events, promoting contractions or expansions at simple repeats (Fig. 1), but may also interfere with DNA synthesis directly. Secondary structures in ssDNA are indeed known to interfere with DNA replication in vitro 11–13. In fact, out of the ten possible trinucleotides, the bulk of dynamic mutations associated with human diseases comes from CNG trinucleotides that readily form hairpins, whereas other trinucleotides that have lower propensity to form such structures are not known to be associated with diseases 9. In contrast to other possible trinucleotide repeats, the various CNG trinucleotide repeats are also unstable in yeast 14, suggesting that their primary DNA sequence makes them unstable in any system. Likewise, human minisatellite repeats still show expansion and contraction when inserted in the yeast genome 15, whereas their mutant variants that cannot form G4 clusters in vitro lose their “dynamic mutation” nature in this system 16.
Subsequent findings that replication forks stall at expanded repeats in vivo suggested that possible formation of these structures poisons DNA replication, offering insights into the mechanistic understanding of their pathogenesis 17–22. A generic scenario of repeat-promoted instability envisions replication fork passage generating ssDNA opportunities in the lagging strand template for simple repeats to form secondary structures (Fig. 2). The secondary structures then inhibit the replication fork, creating an opportunity for strand slippage (Fig. 2). These problems lead to additions/deletions in the newly synthesized DNA strands, which are then fixed by the next replication round as repeat expansions or contractions (Fig. 2). The replication-expansion connection is supported by experimental systems in which repeats expand in proliferating cells, but not in quiescent cells 23, and by those where patterns of repeat instability are dramatically influenced by changing replication origin position relative to repeats 24.
Although the general outline of the model is consistent with the majority of experimental observations 3, the tendency of simple repeats to form secondary structures during DNA synthesis is not the only determinant of genetic instability. Other factors also contribute to repeat expansion: (i) repeats of the same size expand with dramatically different probabilities in different individuals and families, indicating genetic modifiers of the expansion 25, 26; (ii) in the affected individuals, instability at one repeat is sometimes associated with instability at a different repeat, suggesting genome-wide increased expansion propensity 27, 28; (iii) the same repeats of a particular length, expanding with high probability in humans, show only small-scale copy-number variation in mice 29, 30; and (iv) the same repeats may show steady expansion in one cell type, while being stable in other cell types, some of them derived from the first one 31, demonstrating a tissue specificity of the expansion 32. Thus, there are common denominators that control repeat instability, which are highly variable between cell lines and tissues, or in a population, or between mammalian species. These critical determinants are not likely to be in the replication machinery itself, as this system is too important to show such variability, but they can be in systems that influence replication without being directly associated with replisomes. Below, several other phenomena are described that are poorly explained by the general model (Fig. 2) and that point to other possible factors in addition to the propensity for secondary structures, highlighting the mysterious role of DNA precursors and of the primary nucleotide sequence (or composition?) of repeats.
Low DNA complexity as an exacerbating factor
Lagging/leading strand versus expansion/contraction
The relationship between secondary structure interference with DNA synthesis (the process) and expansion or contraction (the final result) versus the replication fork terminology can be confusing. The replication fork has two template strands – for the leading strand and for the lagging strand synthesis – and two newly synthesized strands: the leading strand and the lagging strand (Fig. 3A). To stall a replisome, a repeat has to form a hairpin in either template strand (Fig. 3B). The same is true for repeat contraction: a hairpin formed in the template strand can be skipped by the replisome, leading to deletion (Fig. 3C). Conversely, repeat expansion can only happen if a hairpin forms in the nascent (“primer”) strands (leading or lagging; Fig. 3E). Somewhat confusingly, hairpins in the primer strands have to form behind DNA polymerases – therefore, this primer slippage is unlikely to inhibit the progress of the replication fork (Fig. 3D). That is, while repeat contractions may be mechanistically linked to replication fork inhibition at secondary structures, repeat expansions do not have to. In other words, if repeat expansions still require replication inhibition in the repeat region, DNA synthesis has to be inhibited by something other than DNA secondary structures.
It is unclear what makes the repeat DNA single-stranded in the first place. The orientation dependence of the replication fork block usually suggests that during DNA replication secondary structures form in the lagging strand template 20, 21 that contains these single-strand gaps by default. However, there are situations in which a replication block 21, 33, repeat destabilization 34, or repeat expansion 35, 36 clearly assume formation of secondary structures in the leading strand template, which, due to its replication co-directionality with the overall fork movement, should have no ssDNA regions. Yet, single-stranded regions in the leading strand template may form if the leading strand polymerase is inhibited, while the replicative helicase charges on. Remarkably, this leading strand polymerase inhibition should be the cause, rather than the consequence, of the presumed formation of the secondary structure in the leading strand template, as, for example, is specifically acknowledged in the “template-push” model of repeat-stimulated contraction 33.
According to the model, simple repeat instability is due to formation of secondary structures in the template strands or due to increased slippage in the nascent strands (Figs. 1–3). However, it is not clear what makes these secondary structures resistant to ironing out by DNA polymerases. Trinucleotide repeats can form only partial duplex secondary structures featuring a lot of mismatches and single-strand mini-loops 3, 9 and should be easy to unwind/disassemble. Yet, DNA polymerases traverse these imperfect hairpins in template strand with difficulty, as if there is something in their primary sequence that inhibits DNA synthesis. A related question is what could additionally facilitate strand slippage at these repeats? The imperfect hairpin should not be energetically strong enough to drive the slippage in the vicinity of the replication point, and yet simple repeats are “extremely slippery”, as if polymerases are encouraged to slip strands at them due to some property of the template DNA itself.
There is a particular size of trinucleotide repeats (up to ∼30 copies) that is propagated faithfully, in contrast to the longer repeats that show instability as a function of repeat size above this threshold 2, 5. Such a high threshold level is counterintuitive. Indeed, if simple repeats cause problems by forming secondary DNA structures, then the smallest number of repeats that can form these structures (from 5 to 15 for trinucleotide repeats 9) should already cause detectable instability, since its probability of being completely covered by a single-stranded region is maximal. For example, the melting temperature of the structures formed by 10 versus 30 trinucleotide repeats is the same 37. Yet, the instability starts at much longer repeats, as if the probability of slippage or becoming single-stranded does not even exist for smaller repeats, which does not make sense. Of course, it could be argued that replisomes simply “run out of steam” with longer repeats and cannot prevent DNA strands from slipping past each other to form secondary structures behind the replisomes, but then what is exactly the nature of this “steam” that drives DNA polymerization through the repeats?
Expansion in non-mitotic cells
The model (Figs. 2 and 3) frames repeat instability essentially as a replication fork phenomenon, and yet rapid and dramatic expansions of already long repeats, leading to disease, readily happen in specific types of post-mitotic cells 38, 39. In the absence of replication it is unclear what makes the repeat region single-stranded or what in general creates an opportunity for the secondary structures to form. Heavy transcription of the region is documented to enhance the expansion in the Drosophila model of trinucleotide repeat disease 40, as well as in replicating 41 or non-replicating human cells 42, 43, which adds to the mystery as transcription does not generate ssDNA outside RNA polymerase, but is known to make DNA vulnerable to damage. Interestingly, it has been proposed that excision repair of DNA damage within the trinucleotide repeat stimulates expansion via strand slippage during repair synthesis 44, 45. Importantly, non-mitotic cells have low levels of DNA precursors 46, 47, suggesting a link between repair synthesis, decreased dNTP pools and repeat expansion.
Complexity versus non-B-DNA structure
The ability of simple repeats to form secondary structures in ssDNA seems like another integral part of the expansion model (Figs. 1–3), but there are cases in which repeat expansion is efficient even when secondary structures cannot form. Expansion of certain tetranucleotide repeats, like (TTTC)n is one such case 48, while (TC) dinucleotide repeat expansion is another 49. An AT-rich 33 nucleotide-long minisatellite cannot form a secondary structure either, but it expands nevertheless, causing chromosome fragility 50. Mononucleotide (non-G) repeats are also involved in fragility 51 and cause polymerases to stall 52, 53 – but they again cannot form secondary structures. It looks like the low complexity of the template DNA itself causes replication problems, whereas the propensity for secondary structures is an aggravating factor.
Local specific dNTP depletion should transiently stall repeat replication
The last observation that the non-G mononucleotide runs also cause fragility 51 provides a possible answer to all the above questions. A long poly-A tract or an expanded (AAAG)n repeat cannot form a hairpin when in single strand, but synthesis of the complementary strand should cause transient local depletion of dTTP pools, resulting in short-lived polymerase stalling. By the same token, greatly expanded simple repeats comprising two complementary nucleotides (either AT/TA or GC/CG) should locally exhaust the pools of the corresponding DNA precursors. I propose that the resulting transient polymerase stalling due to the local DNA precursor pool depletion, rather than formation of secondary DNA structures, is the primary cause of inhibition during replication of low-complexity DNA repeats. The repeats then have more chances to mediate strand slippage in the nascent strands. This idea provides reasonable explanations to the six paradoxes above.
First, in the case of repeat expansion (requiring hairpin formation in the primer DNA strand), transient local depletion of the DNA precursor pools offers, perhaps, the only possible explanation of synthesis inhibition at low complexity DNA.
Second, the transient local depletion of the DNA precursor pools at replication forks will promote formation of single-strand gaps in both the leading and the lagging strands, giving the chance for trinucleotide repeats to fold into secondary structures in either template strand. In the case of the leading strand, the replicative helicase will continue DNA unwinding, and the single-strand gaps between this fast helicase and the stalling replisome of the leading strand should, therefore, widen. As mentioned above, the “template-push model” explains inhibition of DNA polymerase in vitro at the CAG/CTG repeat in terms of unspecified polymerase interaction with the primary DNA sequence of the trinucleotide template, rather than in terms of the thermodynamic stability of a hairpin this sequence can form 33.
Third, the imperfect secondary structures formed by simple repeats around the replisome should be stabilized against ironing out by DNA polymerase because the power of the enzyme to enforce the perfect template-primer pairing would be compromised by the transient lack of specific dNTPs. Likewise, this transient local depletion of specific dNTPs should stimulate strand slippage by polymerase stalling at simple repeats.
Fourth, the threshold size, beyond which repeat expansion becomes a problem, can be explained as representing the length of the simple repeat template, synthesis across which depletes the local pools of the utilized DNA precursors, leading to polymerase inhibition (“running out of steam”) and giving a chance for the secondary structure in the primer DNA strand to form. In fact, this idea further predicts that the longer the repeat, the deeper the depletion of the dNTP pools during its synthesis, explaining increase in repeat instability with repeat length. The downstream processes, like inhibition of mismatch repair at multiple (clustered) slip-outs, possible only at long repeats 54, should further contribute to the threshold phenomenon.
Fifth, repeat expansion requires DNA synthesis, but it does not have to come from chromosomal replication – a “long-patch” DNA repair synthesis covering a portion of the repeat should suffice. Curiously, simple repeats cause base-excision repair to switch from short-patch to long-patch mode 55, increasing chances of expansion due to DNA repair within the repeat and due to the documented problems of flap endonucleases with the displaced strands that can fold into secondary structures 56. Moreover, due to the significantly lower pools of DNA precursors in non-replicating cells 46, 47, DNA repair synthesis in these cells should be acutely susceptible to their local transient depletion. Finally, since repair synthesis affects one DNA strand only, repair synthesis across any di- or trinucleotide repeat will unbalance the local dNTP pool independently of the sequence of repeat.
Sixth, the repeat length being equal, the lower the complexity of the repeat, the higher the probability of the local transient DNA precursor pool depletion during synthesis across the repeat. Thus, mononucleotide runs (of the same length) will inhibit replication better than dinucleotide or trinucleotide repeats, and they do not even have to form secondary structures for efficient slippage.
Finally, some of the trans-acting factors modulating repeat expansion may determine the level of DNA precursors, which is known to be the subject of significant variation between different cell lines 57, although this variation has not been systematically measured between individuals.
Observations consistent with this idea and predictions to test
The idea of replication inhibition by specific dNTP depletion at simple repeats is essentially a kinetic argument: the transiently inhibited DNA synthesis gives more chances for hairpins to form in the primer strand, and also stabilizes them against DNA polymerase attempts to iron them out or flap-endonuclease attempts to cut them off. General inhibition of replication elongation is known to enhance trinucleotide repeat expansion both in yeast 58 and in human cells 23.
This is not the first time that DNA precursor pool imbalances have been proposed to explain DNA rearrangements. For example, depletion of dCTP pools is thought to be responsible for long deletions in human mitochondrial DNA, by mechanisms similar to those discussed here, via replication inhibition enhancing template switching (a long-range strand slippage) at microhomologies 59, 60.
The link between DNA precursor pools and chromosome fragility has also been long recognized. As revealed in conditions limiting dCTP/dTTP in the DNA precursor pools, expanded CCG/CGG repeats form the basis of folate-sensitive rare chromosome fragile sites; likewise, expanded AT-rich minisatellites form the basis of BrdU-sensitive rare chromosome fragile sites, that are revealed by the poisonous BrdU incorporation in place of dT 61. Remarkably, this link between the DNA precursor metabolism and human chromosome fragility had been noted even before the trinucleotide repeat or micro/minisatellite nature of rare fragile sites became known, and correctly predicted to be due to low DNA complexity of fragile sites 62.
The idea of the transient local depletion of specific dNTPs upon replication of low complexity DNA assumes that the rate of replication in general and completion of individual Okazaki fragments in particular are sensitive to the levels of DNA precursors. In fact, the rate of replication fork progress in human cell lines is known to dramatically respond to the endogenous levels of DNA precursors 63–66. Of particular interest, purine deprivation leads to the accumulation of Okazaki fragments (in other words, single-strand gaps) in the replicated DNA 66.
Since nucleotide pool imbalances are known to be mutagenic 60, 67, the idea of local specific dNTP pool depletion during replication of low-complexity DNA also predicts increased mutagenesis in the region. The increased formation of point mutations in and around trinucleotide repeats has been recently demonstrated 58, 68. Late-replicating regions in the human genome, which are enriched for low-complexity DNA, also experience higher mutagenesis 69.
Finally, the idea generates testable predictions about genetic instability in low-complexity repeated DNA in response to increased or decreased DNA precursor pools in vivo. In several organisms, mutants with changes in the DNA precursor pools are now available, and behavior of some of them in this type of assay can be assessed. The general expectation from the idea is that the higher the dNTP pools, the fewer the problems with low-complexity DNA synthesis. It may be even simpler and more direct to test this idea in vitro, where polymerase stalling at trinucleotide repeats is predicted to be relieved by higher concentrations of specific dNTPs in the reaction.
As with any new idea, besides the supporting evidence there are also conflicting observations. One of the potential problems is the model's assumption that DNA precursor diffusion through the nucleus is either limited or slow, which is counterintuitive. The literature on the diffusion rates in the nucleus is scant, but would suggest that nucleotides, as any small molecules, should diffuse freely through the nucleus (or cytoplasm, for this matter). However, the rate of nucleotide diffusion, although never measured directly, turns out to be extremely low. The dTTP pool turnover in rapidly growing mouse fibroblasts is reported at ∼5 min 70. At the same time, equilibration of the intracellular DNA precursor pools with exogenous thymidine takes at least 30 min 70. Even if the nuclei of such cells are directly microinjected with radioactive dNTPs, in the amount of ≤10% of the cellular dNTP content to avoid unbalancing the endogenous pools, it takes cells ∼30 min to incorporate all this label into their DNA 71, confirming the extremely slow diffusion. Thus, the rate of DNA precursor incorporation at least matches, and may be several times faster than, the rate of DNA precursor diffusion throughout the nucleus. The situation is exacerbated by the fact that ribonucleotide reduction to dNDPs in eukaryotic cells happens in the cytoplasm, rather than in the nucleus 72, so DNA precursors have to diffuse into the nucleus. Thus, local transient depletion of specific dNTPs does not look improbable in the light of slow dNTP diffusion through the nucleus.
Another problem is the well-known observation that chromosomal replication in eukaryotic nuclei is carried out at ∼150 replication factories, serving ∼1,500 replication bubbles (each bubble being composed of two forks moving away from the same origin), so each factory apparently services multiple replication bubbles 73. From this perspective, it is unclear why a transient depletion of specific DNA precursors at a particular fork should cause an imbalance in the pool of dNTPs supplying the entire replication factory. However, recently improved detection methods demonstrate that the number of replication factories is much greater, approaching the number of replication bubbles 74, so the transient DNA precursor depletion should affect only two forks of the same bubble. Curiously, the DNA precursor imbalance at the second fork in the bubble (that is not replicating through repeats) provides a possible explanation for the increased general mutagenesis in the DNA region surrounding repeats 58, 68.
It may be unclear why inhibition of DNA polymerase should cause strand slippage, and whether there is any evidence for these processes. There is indirect evidence for this in the form of trinucleotide repeat expansion in human cells promoted by replication inhibition with drugs 75 or by knockdown of the fork stabilization proteins 18. Template-primer slippage during DNA synthesis in vitro is quite common and can even be used to synthesize simple repeat DNA 76. An abasic site in template DNA blocks primer extension in vitro and induces efficient slippage by DNA polymerase I, causing massive repeat expansion when the “floating primer”, complementary to the repeat itself, is used 77. Interestingly, somatic expansion of the Huntington's disease repeat is dependent on the 8-oxo-guanine-removing glycosylase, OGG1 44, which mostly generates abasic sites 78. Repair of clustered 8-oxo-Gs in opposite strands could lead to a situation in which DNA repair synthesis within the repeat will encounter an abasic site, causing the slippage of the primer strand and repeat expansion.
Since strand slippage at trinucleotide repeats assumes hairpin formation, is there any evidence for hairpin formation at trinucleotide repeats in vivo? The strong genetic evidence for hairpin formation in the primer strands at the replication fork in vivo is the quasipalindrome-promoted mutagenesis 79, 80 and the strong effect of the mismatch repair status on the trinucleotide-repeat expansions 81. The effects of mismatch repair on the trinucleotide-repeat contractions 81 offers genetic evidence for hairpin formation in the template strand. Finally, expression of zinc-finger nucleases specific for trinucleotide-based hairpins offered the first direct demonstration of replication-dependent formation of such hairpins in human cells 75.
Can the primer slippage inhibit the replicating polymerase?
Mechanistically, the proposed trinucleotide repeat instability due to transient specific dNTP depletion represents a general phenomenon of template-primer slippage at simple repeats induced by polymerase stalling of any kind. In addition to depletion of specific dNTPs, the slippage-inducing stalling may be due to lesions in the template strand, misincorporation of a chain terminator into the primer strand, malfunctioning or poisoning of the DNA polymerase itself – in short, expanded trinucleotide repeats are simply prone to slippage 82, 83. Interestingly, although this slippage-inducing polymerase stalling explains potential for expansion, it does not explain why relatively short trinucleotide repeats (n ≤ 30) do not expand, while very long repeats expand with a probability approaching 1. In fact, the threshold phenomenon and, especially, the inevitable expansion of long trinucleotide repeats indicate a transition from a stochastic nature of expansion at short repeats to a deterministic nature at long repeats. Such a switch suggests some intrinsic vulnerability of the replication machinery to template-primer slippage at simple repeats that reached a particular length.
An intriguing possibility is offered by the ubiquitous feature of the interaction of the replicative polymerases with the DNA clamp, proposed for all systems in which it was modeled from bacteriophages to bacteria and archaea. In the replicative mode, there is a several-bp-long unconstrained DNA “window” between the polymerase and the clamp 84–86 (Fig. 4A). In other words, if the primer back-slippage generates a hairpin in the primer strand, this hairpin may be able to extrude right between the polymerase and the clamp (Fig. 4B). Notably, if this hairpin interferes with the clamp progress (certain sizes may not 87) while polymerase keeps synthesizing (Fig. 4C), the unmovable clamp will eventually pull the enzyme back, repeating the slippage and increasing the hairpin (Fig. 4D). A similar idea has been considered before 87. This futile cycle feeding hairpin growth should continue until the stalled clamp is removed by the clamp loader – this may explain why the expansions in long repeats become saltatory 88 or match the size of Okazaki fragments 58. This idea is directly testable in vitro. Parenthetically, the sum of the known footprint of the replicative polymerases (∼30 bp) 89 and the minimal size of clamp-blocking hairpins (∼30 nucleotides) 87 may define the minimal length (∼20 repeats) below which trinucleotide repeats cannot expand.
I propose that as DNA synthesis proceeds across expanded simple repeats, specific dNTP pools are depleted, and that this invites strand slippage, leading to repeat-length instability. This hypothetical phenomenon should be even more pronounced in out-of-S-phase cells because of their low levels of DNA precursors. The instability of simple expanded repeats is an example of potentiated toxicity, because the proposed inhibiting power of specific dNTP depletion is multiplied by the propensity of repeats to form non-B-DNA structures facilitating strand slippage. If an artificially induced increase of the DNA precursor pools will indeed suppress repeat instability in model systems in vivo, this will offer a potential way to slow down progression of trinucleotide repeat expansion diseases. Mechanistically, hairpin extrusion between the DNA polymerase and the clamp can be modeled in vitro, directly testing the proposed model of saltatory repeat expansion.
I wish to thank Bob Lahue, Kirill Lobachev and Sergei Mirkin for helpful suggestions and presubmission discussion. Unknown reviewers and Dr. Andrew Moore, BioEssays' Editor-in-Chief, immensely helped with strengthening the arguments and the logic of the presentation. Experimental work in this laboratory is supported by grant GM 073115 from the National Institutes of Health.