• cancer;
  • CpG transitions;
  • serial replication slippage;
  • hypermutability


  1. Top of page
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Conclusions and Perspectives
  7. Acknowledgment
  8. References
  9. Supporting Information

Tandem base substitutions (TBSs) are multiple mutations that comprise two or more contiguous nucleotide substitutions without any net gain or loss of bases. They have recently become recognized as a distinct category of human genomic variant. However, their role in causing human inherited disease so far has not been studied methodically. Here, using data from the Human Gene Mutation Database (, we identified 477 events to be TBSs (doublets, 448; triplets, 16; and quadruplets to octuplets, 13). A comprehensive sequence pattern and context analysis implied the likely fundamental importance of translesion synthesis (TLS) DNA polymerases in generating these diverse TBSs but revealed that TLS polymerases may operate differently in generating TBSs of ≤3 bases (bypass of endogenous DNA lesions) than those of ≥4 bases (serial replication slippage). Moreover, GC was found to be the most frequently affected dinucleotide with GC/GC>AA/TT being the most frequent double TBS. Comparison with cancer genome mutational spectra allowed us to conclude that human germline TBSs arise predominantly through the action of endogenous mechanisms of mutagenesis rather than through exposure to exogenous mutagens. Finally, the rates of double and triple TBSs were estimated to be 0.2–1.2 × 10−10 and 0.8–4.8 × 10−12 per base per generation, respectively.


  1. Top of page
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Conclusions and Perspectives
  7. Acknowledgment
  8. References
  9. Supporting Information

The study of pathogenic mutations is not only central to molecular medicine but has also generated many useful insights into the underlying mutational mechanisms [Chen et al., 2007; 2005d; Cooper et al., 2011; Liu et al., 2012]. More recently, the role of concurrent clustered mutations in causing both cancer and inherited disease has become increasingly apparent [Chen et al., 2012b]. The term “clustered mutations” refers to the presence of two or more cis-linked mutations in a spatially localized genomic region, the precise meaning of the term “spatially localized” being necessarily context-dependent [Chen et al., 2012b]. Thus, for example, although two cis-linked single-base substitutions located 10-kb apart may not be regarded as being “spatially localized,” two similarly separated large-scale deletions, each removing millions of bases, may be considered to be so. “Concurrent” describes the process of generation of all the component mutations in a simultaneous or quasi-simultaneous manner within a single cell cycle, through such diverse mutational mechanisms as transient hypermutability [Chen et al., 2009; Drake et al., 2005], chromothripsis [Kloosterman et al., 2012; Stephens et al., 2011], and other replication-based mechanisms [Chen et al., 2005a; 2005b; Hastings et al., 2009; Lee et al., 2007; Liu et al., 2011; Sheen et al., 2007].

We previously performed a meta-analysis on type A closely spaced multiple mutations (CSMMs), those inherited pathological lesions that comprise at least two noncontiguous component mutations [Chen et al., 2009]. It was concluded that CSMMs in human genes comprising at least one pair of mutations and separated by ≤100 bp may constitute signatures of transient hypermutability. Further, we highlighted the potential importance of specialized DNA polymerases known as translesion synthesis DNA polymerases (TLS polymerases) in generating concurrent clustered mutations. In the current study, we have gone further by performing a meta-analysis of type B CSMMs [Chen et al., 2009], also known as tandem base substitutions (TBSs). TBSs denote those multiple mutations that comprise two or more contiguous nucleotide substitutions without the net gain or loss of bases, and irrespective of whether or not the component base substitutions occurred simultaneously (Fig. 1).


Figure 1. Schematic illustration of archetypal tandem base substitutions (TBSs). A: Some examples of TBS subtypes in accordance with the number of bases substituted. B: Alternative models for the generation of TBSs. The component substitutions of a given TBS can be acquired simultaneously (left panel) or independently and sequentially over different stages of cell division or even during different cell cycles (right panel). In the former case, the TBS is termed “concurrent,” whereas in the latter case, the TBS is termed “nonconcurrent.” Herein, a double TBS is used for illustration. The principle can, at least in theory, be extended to TBSs of ≥ 3 component substitutions. As indicated in the box, “substituted bases” and “substituting bases” were used to describe the wild-type and mutant sequences of a given TBS, respectively.

Download figure to PowerPoint

In a comparative evolutionary analysis, Averof et al. showed that two adjacent nucleotides can sometimes be simultaneously substituted; further, these workers suggested the potential involvement of this mutational mechanism in human genetic disease [Averof et al., 2000]. With the advent of whole genome and exome sequencing, TBSs have become recognized as a distinct category of human genomic variant [Rosenfeld et al., 2010]. Further, both the analysis of spontaneous mutations in transgenic Big Blue mice [Buettner et al., 1999; Hill et al., 2003] and the sequencing of 21 breast cancer genomes [Nik-Zainal et al., 2012] identified an excess of somatic double-nucleotide substitutions over and above that expected under a model of independently acquired alterations. In parallel with these developments, experimental evidence that TLS polymerases are capable of generating concurrent clustered mutations including TBSs has been steadily accumulating (see [Stone et al., 2012] and references therein).

To date, however, the role of TBSs in causing human inherited disease has not been evaluated in a systematic manner. Using data from the Human Gene Mutation Database (HGMD;; [Stenson et al., 2012]), we sought to ascertain the patterns and derive the mutational signatures of TBSs causing human inherited disease. We also attempted to correlate the disease data with those obtained from other relevant sources, particularly cancer exome sequencing, with a view to improving our understanding of the molecular mechanisms of TBS generation in the human germline as well as in the human soma.

Materials and Methods

  1. Top of page
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Conclusions and Perspectives
  7. Acknowledgment
  8. References
  9. Supporting Information

Collection and Analysis of TBSs Causing or Associated with Human Inherited Disease

TBSs causing or associated with human inherited disease were identified from the 1,847 “small indels” collated in the Professional version of HGMD (as of June 27, 2012). They were manually reviewed and systematically annotated by reference to the sense strands of their respective reference genes. The annotated TBSs (with 310-bp flanking sequences on both sides) and together with the original publication references, are available upon request. Patterns and mutational signatures of the TBSs were examined in the context of ≥4 component base substitutions, triple TBSs and double TBSs, respectively (see Results and Discussion for details).

Collection of Cancer Somatic TBSs

Reports of recent whole-exome sequencing (WES) or whole-genome sequencing (WGS) of cancer were manually reviewed. Somatic TBSs were extracted from the affiliated Supplementary Files wherever possible.

Results and Discussion

  1. Top of page
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Conclusions and Perspectives
  7. Acknowledgment
  8. References
  9. Supporting Information

An Overview of TBSs Causing Human Inherited Disease

Distinguishing the TBSs from the “Indels”

Until now, TBSs have not been fully appreciated as a distinct category of mutation causing human inherited disease in their own right. Indeed, in HGMD, such mutations are still logged within the category of “small indels.” We visually inspected the genomic coordinates of all 1,847 “small indels” in HGMD (as of June 27, 2012) and identified 477 to be TBSs. Thus, TBSs account for 25.8% of the “small indels” and ∼0.4% of all variants (n = 123,656, also as of June 27, 2012) registered in HGMD (Professional version), respectively.

Authenticity of the collected TBSs

One might pose the question as to whether the component base substitutions of some collected TBSs could actually have occurred in trans (i.e., on two different alleles) instead of in cis. This concern may be of particular relevance in the era of WES or WGS, when huge numbers of samples are analyzed simultaneously. It is however unlikely to be a major issue in the case of the current disease dataset for two reasons. First, the authenticity of the collected TBSs was often ensured by means of disease-segregation and/or the sequencing of cloned DNA sequences. Second, rare ambiguous cases were almost invariably resolved by contacting the original authors before inclusion in HGMD.

Cellular origin of the collected TBSs

Mutations causing or associated with human inherited disease were routinely sought in genomic DNA prepared from peripheral blood, irrespective of the target tissue(s) of the disease. A resulting mutation was considered to have arisen in the germ cells, except in those cases where it was found to have occurred de novo and there was evidence of somatic mosaïcism in the carrier. In the latter situation, the mutation would often have occurred postzygotically during early embryonic development. Of the 477 TBSs, 33 were experimentally demonstrated de novo mutations (Table 1) but none were reported to display somatic mosaïcism. Consequently, in the absence of any evidence to the contrary, we considered all 477 TBSs to be bona fide germline mutations.

Table 1. General Description of the 477 Tandem Base Substitutions Causing Human Inherited Diseasea
Classification criteriaNumberTotal (%)
  1. a

    Representing all such events registered in the Human Gene Mutation Database (Professional version, as of June 27, 2012).

  2. b

    Numbers in parentheses refer to experimentally demonstrated de novo events.

Number of bases substituted
Doublet448 (31)b93.9
Triplet16 (1)b3.4
Octuplet1 (1)b0.2
Genes affected
ncRNA (both doublets)20.4
Generation of component base substitutions
Nonconcurrent (all doublets)153.1
Mutational mechanism
See text47299.0
Gene conversion (all doublets)51.0
Global distribution patterns of the collected TBSs

Although 475 events affected protein-coding genes, only two affected noncoding RNA genes (Table 1). An excess of 92% (n = 439) of these 475 events were found to be located within the coding sequences and/or splice sites of protein-coding genes; this would appear to be due to clinical selection for functionally significant TBS.

TBSs comprising two component substitutions (double TBSs or doublets) are the most prominent, accounting for 93.9% (n = 448) of the 477 events listed in HGMD. TBSs comprising three base substitutions (triple TBSs or triplets) constitute the second most frequent subtype but account for only 3.4% (n = 16) of the total. TBSs comprising 4–8 component substitutions (i.e., quadruplet, pentuplet, sextuplet, septuplet, and octuplet) are even rarer, together accounting for only 2.6% (n = 13) of the total (Table 1). However, the fact that such multicomponent substitutions occur at all, and can occur up to octoplet length, in our sample of 477 TBSs, argues strongly for the authenticity of TBSs as a discrete mutational category originating from a common underlying molecular mechanism.

General Points with Respect to Underlying Mutational Mechanisms

TBSs could, in principle, be the net consequence of the sequential accumulation of spontaneous point mutations independently generated during different stages of cell division within the same cell cycle or even during different cell cycles (Fig. 1B). In other words, the component substitutions of a given TBS need not have been generated in a simultaneous or near-simultaneous manner [Chen et al., 2009]. However, the probability for ≥4 tandem bases to be substituted wholly independently is extremely low. Therefore, all the observed TBSs comprising ≥4 component base substitutions were considered to be concurrent clustered mutations. As for the triple and double TBSs, see their respective sections below.

One well established mechanism capable of generating concurrent clustered mutations is gene conversion, which involves the unidirectional transfer of genetic material from a “donor” sequence to a highly homologous “acceptor” sequence [Chen et al., 2007]. We thus first sought to identify the TBSs that could potentially have been generated by this mechanism. Using previously established procedures [Chen et al., 2009; Chuzhanova et al., 2009], we found that five (all doublets) of the 477 TBSs were consistent with a mechanism of gene conversion (Table 1); in each case, the two substituting bases were present in the corresponding positions of the acceptor gene's highly homologous donor paralog(s). We excluded these five probable gene conversion events from further consideration. We also excluded the two events affecting ncRNA genes (both doublets, and classified as concurrent TBSs) from further consideration for reasons of simplicity.

Before providing a detailed description of the remaining 470 TBSs involving doublets, triplets and ≥4 component base substitutions, we would like to emphasize one point. As in the case of various other types of mutation [Chen et al., 2010], a given TBS may be potentially explicable in terms of two or more distinct mutational mechanisms. Indeed, any of the 477 TBSs reported here, if considered separately, could be explained by the error-prone nonhomologous end joining repair of DNA double-strand breaks by virtue of its mechanistic flexibility [Chen et al., 2010; Lieber, 2008]. Only the mechanism(s) considered to be the most likely will be discussed here in any detail.

TBSs of ≥4 Component Base Substitutions

First formal recognition

TBSs of ≥4 component base substitutions have not previously been identified or reported as a distinct mutational category in different types of study including (1) Big Blue mouse studies that focused on the analysis of TBSs [Buettner et al., 1999; Hill et al., 2003], (2) the in vitro characterization of mutational spectra associated with low fidelity DNA polymerases [Stone et al., 2012; Zhong et al., 2006], and (3) surveys of multinucleotide polymorphisms in the human genome [Rosenfeld et al., 2010]. Here, we identified up to 13 such events, including five quadruplets, three pentuplets, one sextuplet, three septuplets, and one octuplet (Table 1). Of these, one occurred within an exon/intron boundary (EYA1), one occurred within an intron affecting the second base of a consensus splice donor site and the following four bases (HMBS), whereas the remaining 11 TBS occurred entirely within the coding sequence (Fig. 2).


Figure 2. The 13 pathogenic tandem base substitutions (TBSs) comprising 4–8 component substitutions. All TBSs are annotated in accordance with the sense strand of the affected genes. The bases being substituted (red) as well as 20 bp flanking sequences on both sides are shown for each TBS, with the substituting bases (blue) being indicated below. Coding sequences are in bold, 5′-untranslated region sequences are in purple, and intronic sequences are in normal font. In situ inversions are indicated by dotted crosses, neighboring sequence duplications are indicated by boxes, and balanced deletion and insertion TBSs are indicated by bars. Flanking short inverted repeats are underlined with green dotted lines, whereas flanking short direct repeats are underlined with green solid lines. See text for details.

Download figure to PowerPoint

Mutational signatures and mechanisms

We have previously reported that (1) ≥5 bp insertions occurring in the context of small scale complex rearrangements and (2) ≥5 bp inversions causing human inherited disease tend to be templated by the local DNA sequence environment [Chen et al., 2005a; 2005b; 2005c]. This also appears to be the case for all 13 TBSs of ≥4 component base substitutions examined here (Fig. 2). These 13 TBSs can be further classified into three subtypes.

Ten in situ inversion and neighboring sequence duplication TBSs

In 10 of the 13 TBSs, the substituting bases represent either perfect or near-perfect reverse complements of the substituted bases (ATRX, EYA1, PTCH1, HMBS, and LBR) or have perfect or near-perfect matches to nearby upstream sequences (KAL1, OCA2, GLA, COMP, and UBE3A) (Fig. 2). Here, such lesions are termed in situ inversion and neighboring sequence duplication TBSs, respectively. Of these 10 events, four were characterized by the presence of short flanking direct or inverted repeats on both sides (ATRX, EYA1, OCA2, and HMBS; Fig. 2) and hence are readily explicable by the model of serial replication slippage (SRS) [Chen et al., 2005a; 2005b; 2005c]. SRS invokes ≥2 template switches and accounts for the generation of complex rearrangements including inversions. By contrast, the canonical replication slippage model invokes a single template switch and accounts for the generation of simple deletions or duplications.

SRS is also known as serial template switching, depending upon the context of the discussion. Serial template switching is certainly associated with serial DNA polymerase switching, although serial DNA polymerase switching may not necessarily be associated with serial template switching (e.g., DNA polymerases may switch several times upon the same template). Both SRS and the fork stalling and template switching model [Lee et al., 2007], assume serial replication slippage [Gu et al., 2008], and are contingent upon the action of high-fidelity replicative DNA polymerases. However, these models did not consider the potential impact of low fidelity TLS polymerases, which are able to bypass DNA damage that would normally block replication-fork progression by high-fidelity DNA polymerases [Arana and Kunkel, 2010; Loeb and Monnat, 2008; Sharma et al., 2012; Sutton, 2010].

Incorporation of specific TLS polymerases into the SRS model could potentially account for the generation of the remaining six in situ inversion and neighboring sequence duplication TBSs. Take the UBE3A septuplet (Fig. 2) as an example: upon encountering a damaged base(s), the replicating lagging strand appears to have misaligned with the newly synthesized leading strand through a 2 base-pair (bp) microhomology (i.e., ag); we surmise that continued DNA synthesis by a TLS polymerase would have firstly led to the deletion of a single base (i.e., t) and then to the synthesis of a neo or cryptic microhomology (i.e., cg) with respect to the original template strand; the replicating lagging strand would then have realigned to its original template through this neo microhomology while DNA synthesis would continue to be executed by a high-fidelity replicative DNA polymerase (Fig. 3A). This process need only be modified slightly to account for the generation of the KAL1 quadruplet and COMP sextuplet, both of which are characterized by the presence of a single short direct repeat (Fig. 2).


Figure 3. The serial replication slippage (SRS) model as modified by incorporating a role for a translesion synthesis (TLS) DNA polymerase. This extended SRS model is illustrated in the context of two tandem base substitutions (TBSs), the UBE3A septuplet (A) and LBR septuplet (B) (refer to Fig. 2). In both A and B, the leading strand is in blue and the lagging strand is in red; horizontal arrows indicate the direction of replication; the damaged base(s) blocking replication fork progression by high-fidelity replicative DNA polymerase is indicated by an solid oval; the box indicates the deletion of an internal base. In A, the short direct repeats used for strand misalignment are indicated by green blocks, whereas the neo-microhomology proposed to be generated by TLS DNA polymerase, and used to align back to the original template strand, is indicated by a purple block. In B, the inverted repeats are indicated by green and grey blocks. See text for details.

Download figure to PowerPoint

Let us take the LBR septuplet (Fig. 2) as another example. Upon encountering a damaged base(s), the replicating lagging strand appears to have become misaligned with the leading strand through a 3 bp inverted microhomology; continued DNA synthesis by a TLS polymerase could have generated an inversion flanked by the 3 bp inverted microhomology, with one internal base being deleted; the replicating lagging strand could then have become realigned to its original template through the 3 bp inverted repeat with continued DNA synthesis being executed by a high-fidelity replicative DNA polymerase (Fig. 3B).

The last two events of the 10 in situ inversion and neighboring sequence duplication TBSs, the PTCH1 quadruplet and GLA pentuplet, are not flanked by inverted or direct repeats at all (Fig. 2). They could nevertheless still be explicable in terms of the extended SRS model, if both the first and second misalignments were mediated by neo-microhomologies generated by TLS polymerase(s).

Two balanced deletion and insertion TBSs

Although the IDS pentuplet can be alternatively interpreted as the deletion of a cytosine (within a 2 bp cytosine repeat) and the insertion of an adenine (within a 3 bp adenine repeat), the HBB septuplet can be alternatively interpreted as the insertion of a thymine and the deletion of a cytosine (within a 2 bp cytosine repeat) (Fig. 2). These two lesions have here been termed balanced deletion and insertion TBSs, respectively. To the best of our knowledge, this type of mutation has never before been recognized. On the basis of the mutational spectrum resulting from the action of TLS polymerases during the bypass of endogenous DNA lesions [Harfe and Jinks-Robertson, 2000; Stone et al., 2012; Zhong et al., 2006], we are obliged to conclude that these two balanced deletion and insertion TBSs could have been generated in a similar way. There exist, however, two possible scenarios. The two component changes in each case may have been due to the error-prone bypass of two closely spaced DNA lesions. Alternatively, error-prone bypass of a single DNA lesion may have predisposed to the generation of the second component change, by analogy to the misincorporation slippage model for generating complex frameshifting mutations by TLS polymerase zeta (see Fig. 2 in [Harfe and Jinks-Robertson, 2000]).

The remaining OFD1 octuplet

The OFD1 octuplet (Fig. 2) is the longest TBS reported to date. Interestingly, four of the eight substituting bases represent an inverse complement of four of the substituted 8 bases. This, taken together with the aforementioned in situ inversion TBSs, prompted us to postulate that this mutation could also be explained by the extended SRS model, with the neo-microhomologies flanking the four-base template being synthesized by TLS polymerase(s). Alternatively, it may represent the end result of the simultaneous repair of two closely spaced double-strand breaks by nonhomologous end joining, with the deleted sequence tract being recaptured in an inverted orientation [Chen et al., 2010].

Triple TBSs

Frequency of triple TBSs by reference to double TBSs

We identified a total of 16 triple TBSs, ten of which affected two codons, five affected single codons, whereas the remaining TBS occurred within an exon/intron boundary (Fig. 4). Unlike the TBSs of ≥4 component base substitutions, triple TBSs have previously been described in several different contexts, including somatic mutations in Big Blue mice [Buettner et al., 1999; Hill et al., 2003], multinucleotide polymorphisms in the human genome [Rosenfeld et al., 2010], and clustered mutations generated by human TLS polymerase eta [Matsuda et al., 2001] and yeast TLS polymerase zeta [Buettner et al., 1999; Hill et al., 2003; Matsuda et al., 2001; Rosenfeld et al., 2010; Stone et al., 2012]. Obviously, it is meaningless to compare directly the total numbers of triple TBSs found in these very heterogeneous studies. Hence, to perform a cross-comparison, we employed a relative ratio, namely the ratio of the number of triple TBSs to the number of double TBSs. Here it should be pointed out that the disease data are strongly biased toward coding sequence variants, as a direct consequence of clinical selection. Therefore, to maximize relevance in the human context, we considered only disease-causing TBSs that affected exonic sequences and splice sites, and accordingly the TBSs that emanated from the Rosenfeld whole exome sequencing study [Rosenfeld et al., 2010]. The ratio of the number of triple TBSs to the number of double TBSs in our study was 4%, which is comparable both to the human exome data (3%) and the yeast TLS polymerase zeta data (3.8%) (Table 2).

Table 2. Frequency of Triple Tandem Base Substitutions (TBSs) by Reference to Double TBSs in Different Studies
StudyNumber of triple TBSs (a)Number of double TBSs (b)(a/b)%Reference(s)
  1. a

    Average from eight individual exomes.

  2. b

    Excluding the 30 TBSs occurred within introns (refer to Table 3).

Somatic TBSs in the Big Blue mice1651.5[Buettner et al., 1999; Hill et al., 2003]
Multinucleotide polymorphisms in human exomea51643[Rosenfeld et al., 2010]
Clustered mutations generated by human translesion DNA polymerase eta5846[Matsuda et al., 2001]
Clustered mutations generated by yeast translesion DNA polymerase zeta41053.8[Stone et al., 2012]
Exonic TBSs associated with human inherited disease16396b4This study

Figure 4. The 16 pathogenic triple tandem base substitutions (TBSs) and their predicted effect on amino acid sequences and splicing. The shading delimits the substituted bases (upper), with the substituting bases (blue) being indicated below. Coding sequences are in bold and intronic sequences are in normal font. The affected codons are highlighted in red or green. In situ inversions are shown by dotted crosses and potential templates in the immediate vicinity of the substituting bases are boxed.

Download figure to PowerPoint

All triple TBSs were classified as concurrent

TBSs of ≥4 component base substitutions have been invariably considered to be concurrent clustered mutations based upon the assumption that the probability of ≥4 contiguous bases being substituted in a noncontemporaneous manner would be extremely low. The probability of occurrence for three adjacent bases would be somewhat higher. Intuitively, a triple TBS in which one of the components corresponds to a polymorphism in the general population is more likely to be nonconcurrent than a triple TBS where all of the component substitutions were found exclusively in the affected subject. We therefore evaluated the component substituting bases of each triple TBS in the context of their corresponding reference gene sequences ( with respect to the SNP data added by NCBI during August 2012. In line with our previous study [Chen et al., 2009], we classified all 16 triple TBSs as nonconcurrent because none of the substituting bases were registered as a SNP in a control population dataset (e.g., 1000 Genomes or ESP_Cohort_Populations).

Mutational signatures and mechanisms

As stated above, all 13 TBSs of ≥4 component base substitutions appear to have been templated by the local DNA sequence environment, and all the putative template sequences are located within 15 bp of the flanking sequences (Fig. 2). We therefore sought to establish whether some of the 16 triple TBSs might have potential templates in such a sequence context. Here, it is necessary to point out that the smaller the number of base-pairs involved, the more likely it is that a viable “template” sequence could be found by chance alone. Even with this caveat in mind, no potential “template” sequences could be found for around half of the triple TBSs (Fig. 4). Thus, whereas most of the TBSs of ≥4 component base substitutions were likely to have been templated changes, most of the triple TBSs appeared to be nontemplated changes. Taking the information given in Table 2 into consideration, the most parsimonious conclusion to be drawn appears to be that most of the triple TBSs resulted from incorporation of incorrect nucleotides during bypass of damaged base(s) by TLS polymerases. As discussed previously in the subsection headed “Two balanced deletion and insertion TBSs,” different scenarios could also be envisaged; the three component changes in a given case may have been due to the error-prone bypass of a three-base DNA lesion or alternatively, error-prone bypass of a single base or two-adjacent-base DNA lesion may have predisposed to the generation of the remaining component change(s).

Double TBSs

Classification of double TBSs as nonconcurrent or concurrent

As compared with triple TBSs and TBSs of ≥4 component base substitutions, double TBSs have the highest probability for their components to occur nonconcurrently. Using the same procedures as described for triple TBSs, we classified 15 double TBSs as nonconcurrent (Table 1), each comprising a common SNP and a rare mutation. Here, it is important to point out that the only experimentally demonstrated nonconcurrent double TBS, a CC>TG mutation in the JAG1 gene causing Alagille syndrome [Krantz et al., 1998], is among the 15 events; whereas the 3′ C>G change is a common polymorphism (G allele frequency is 0.043 from 1000 Genomes Project data) and was inherited from the father, the 5′ C>T change was found to have occurred de novo.

Of the 426 double TBSs classified as concurrent (excluding the five probable gene conversion events and the two events affecting noncoding RNA genes), five contained two polymorphic components with an identical allele frequency, whereas all the remaining TBSs comprised two components that were found only in the affected subjects. It is worth mentioning that all 31 experimentally demonstrated de novo double TBSs (both component substitutions were absent from the parents, thereby distinguishing them from the JAG1 case) were found within this dataset. These 426 TBSs were further divided into several subclasses in accordance with their locations within the protein-coding genes (Table 3).

Table 3. Breakdown of the 426 Concurrent Double Tandem Base Substitutions in Accordance with Their Locations within the Affected Genesa
LocationNumberTotal (%)
  1. a

    Not including the five probable gene conversion events and the two events affecting ncRNA genes.

Entirely within coding sequences37487.8
Intron/exon boundary92.1
Exon/intron boundary61.4
5′-UTR/translation initiation codon boundary10.2
Promoter and/or 5′-UTR61.4
Validation of classification by reference to the CpG transition rate

To provide some support for the validity of the above classification, we compared the two datasets by assessing the relative proportions of CpG transition mutations. CpG transitions are C>T (or G>A on the other strand) changes that are compatible with a model of methylation-mediated deamination of 5-methylcytosine. As 5-methylcytosine in the human genome is almost exclusively confined to the CpG dinucleotide, this mechanism accounts for this dinucleotide being a hotspot for spontaneous single base-pair substitutions. We have previously claimed that the proportion of CpG transition mutations, as manifested by the component substitutions from a given set of multiple mutations, can be used as a crude indicator of the relative likelihood of contemporaneous mutation generation; the lower the proportion of CpG transition substitutions, the higher the likelihood that the multiple mutations will have arisen concurrently [Chen et al., 2009]. As a reference point, CpG transitions account for 17% (n = 855) of the 4,933 human spontaneous de novo germline point mutations collated by [Kong et al., 2012], whereas the comparable proportion of CpG transitions among all missense/nonsense mutations in HGMD is ∼20% [Chen et al., 2009].

To perform this comparison properly, we used only the double TBSs that occurred entirely within coding sequences, mainly due to their predominance in both the nonconcurrent (86.7% [13/15]) and concurrent (87.8% [374/426]; Table 3) datasets. As shown in Figure 5, the CpG transitions in double TBSs can be subdivided into four types, whose distributions appear to be qualitatively different between the two datasets. Specifically, the nonconcurrent double TBSs comprised primarily two CpG transition types: (1) either one of their two component substitutions is a CpG transition (type b) or (2) the first substitution predisposed to the second substitution through the formation of a new CpG dinucleotide (type d). In terms of the relative proportion of double TBSs comprising at least one putative CpG transition, the two datasets exhibited a statistically significant difference (85% vs. 11%; χ2 test, P < 10−7).


Figure 5. Types and relative frequencies of CpG transitions in the nonconcurrent and concurrent double tandem base substitution (TBS) datasets. Type A, both component base substitutions represent CpG transitions and occurred within two different CpG dinucleotides. Type B, either of the two component mutations could have occurred as a CpG transition within a single CpG dinucleotide. Type C, one of the two component mutations represents a CpG transition. Type D, the first substitution predisposed the second by forming a new CpG dinucleotide. In the illustrated examples, CpG dinucleotides are boxed and base changes are highlighted in color. The lowest panel indicates the distribution of different CpG transitions in the nonconcurrent and concurrent datasets. The frequency of double TBSs comprising at least one putative CpG transition in the two datasets are also shown in the right hand bottom box.

Download figure to PowerPoint

To provide further support for the idea that “concurrent” TBSs did indeed occur concurrently, we went on to evaluate the ratio of CpG transitions/CpG transversions. This analysis was based upon the observation that the occurrence of transitions in CpG is much higher than that of transversions in CpG among spontaneous single base substitutions. In this regard, the most accurate and relevant data came from the recent analysis of human spontaneous de novo germline point mutations; the observed rate of CpG transitions was 11.7-fold higher than that of CpG transversions (855 vs. 73; [Kong et al., 2012]). If the component base substitutions of the 374 “concurrent” double TBSs were to have arisen predominantly as spontaneous single point mutations in a nonsimultaneous manner, we might reasonably expect to observe more CpG transitions than CpG transversions. Of the 374 “concurrent” double TBSs that occurred entirely within coding sequences, 38 comprised at least one CpG transition, whereas 46 comprised a CpG transversion; the observed ratio of CpG transitions/CpG transversions is thus 1:1.21, a very significant deviation from the expected 11.7:1 ratio (χ2 test, P < 10−7).

Patterns and mutational signatures associated with concurrent double TBSs

In this section, we focus our analysis on those concurrent double TBSs that occurred entirely within coding sequences, exon/intron boundaries and intron/exon boundaries (Table 3). The other TBSs (i.e., promoter and/or 5′-untranslated region [5′-UTR], 5′-UTR and translation initiation codon boundary, and intron) were not analyzed for two reasons. First, their numbers are often limited (Table 3). Second and most importantly, the functional motifs disrupted in these cases are often not well-defined, which made it difficult to draw meaningful conclusions. The 374 double TBSs that occurred entirely within coding sequences were further divided into seven subtypes (Fig. 6); only the first three subtypes (A, B, and C; n = 366) were employed for detailed analysis.


Figure 6. Subtypes and number of concurrent double tandem base substitutions (TBSs) that occurred entirely within coding sequences. Different codons are shaded in different colors. The three component bases of each codon are indicated by filled circles, with the substituted bases being highlighted in red. In D to G, horizontal lines indicate introns.

Download figure to PowerPoint

GC is the most frequently affected dinucleotide and GC/GC>AA/TT is the most frequent double TBS

There are 16 possible dinucleotides, each with the potential of being subjected to nine possible substitutions. Altogether, therefore, there are 144 possible dinucleotide changes. Following this scheme, we first evaluated the distribution of the 366 concurrent double TBSs (subtypes A, B, and C; Fig. 6). As shown in Table 4, GC stands out as being the most frequently substituted dinucleotide (28.2%). In addition, GC>TT (n = 35) and GC>AA (n = 29) are the two most frequent TBSs. Since (1) GC>TT in the sense strand could have originated as GC>AA in the antisense strand, whereas GC>AA in the sense strand could have originated as GC>TT in the antisense strand and (2) the causative strand for each mutation could not be ascertained, they are collectively termed GC/GC>AA/TT TBSs. Obviously, these disease-derived findings cannot be readily generalized to the human germline because of clinical selection. We nevertheless addressed this issue by reevaluating these TBSs in their respective A, B, and C subtypes. The hypothesis behind this attempt is that were these findings to be attributable mainly to clinical selection rather than to their intrinsic sequence context, they would be unlikely to hold across the three subtypes. In other words, clinical selection might affect a dinucleotide in a particular codon context but not in a quite different codon context. To this end, we used nonsense mutations as an indicator of clinical selection because of their unequivocal effect on protein function. Importantly, both GC>AA and GC>TT represent the top two TBSs in each of the three subtypes before and after exclusion of the nonsense mutations (Supp. Table S1). In sharp contrast, the third most frequent TBS, TC>AA (n = 19; Table 4), is present predominantly as subtype B (n = 14); and 10 of these 14 events resulted in nonsense mutations (Supp. Table S1B). The overrepresentation of GC substitutions is certainly not due to an overrepresentation of GC in the human coding sequences genome-wide; based on data calculated from a set of 19,250 human genes (, GC ranks the 5th and 8th among the 16 possible dinucleotides with respect to the first and last two bases per thousand human codons (Supp. Table S2).

Table 4. Distribution of the 366 Concurrent Double Tandem Base Substitutionsa
Mutant allele
  1. a

    Corresponding to subtypes A, B, and C in Figure 6.

Wild-type alleleAA      1  3    1161.6
 AC    1  26  12   123.2
 AG     3 211     292.5
 AT    22  22  2   102.7
 CA  41       2   292.5
 CC7 41    5 325 27369.8
 CG12 1    31 111 3143.8
 CT1 2       5 321 143.8
 GA 137   2     128246.6
 GC29 171 214    7 73510328.2
 GG71 6 3 3    24 113710.1
 GT3 4         121 113.0
 TA   4  1  1      61.6
 TC19 281  64  3    4311.8
 TG31 512 4 1 2    195.2
 TT33  11   41     133.6

In the absence of sufficiently large germline mutation datasets for comparison, we turned to somatic mutations from recent WES or WGS of cancer. Manual inspection of the data from such studies revealed that TBSs have often not been reported, apparently due to the inherent technical difficulty in validating heterozygous TBSs (without cloning, one can never be certain that two somatic adjacent nucleotide substitutions are in cis) but probably also to the relative lack of awareness of the distinctness of this mutation category. Of the eight types of cancer with informative TBS data, five were analyzed by WES and three by both WES and WGS. In the latter three cases, we opted to use only the WES data for analysis because WES typically analyzed many more samples than WGS (breast cancer, 103 [Banerji et al., 2012] vs. 21 [Nik-Zainal et al., 2012]; prostate cancer, 112 [Barbieri et al., 2012] vs. 7 [Berger et al., 2011]; and melanoma, 135 [Hodis et al., 2012] vs. 25 [Berger et al., 2012]). This served to minimize sample selection bias, a particular concern given the striking heterogeneity of cancer.

We extracted somatic double TBS data from the eight cancer studies [Cancer Genome Atlas Network, 2011; Cancer Genome Atlas Network, 2012; Banerji et al., 2012; Barbieri et al., 2012; Hodis et al., 2012; Imielinski et al., 2012; Pugh et al., 2013; Stransky et al., 2011] and evaluated their distribution patterns (Supp. Table S3). These cancers may be divided into three classes in accordance with the most frequent TBS type (Table 5). The first class includes only melanoma, which is characterized by prominent CC/GG>TT/AA, the well-established mutational signature associated with UV exposure [Brash et al., 1991]. The second class includes lung adenocarcinoma, head and neck squamous cell carcinoma, ovarian carcinoma and colorectal cancer, which are characterized by a preponderance of CC/GG>AA/TT. An excess of G>T transversions is the primary characteristic of the mutagenic signature of DNA-damage by constituents of tobacco smoke in lung cancer [Govindan et al., 2012; Pfeifer et al., 2002]. Therefore, by analogy to the UV-light-induced mutational signatures (prominent C>T transversions and CC>TT TBSs) in melanoma [Brash et al., 1991]), it is tempting to attribute, at least in part, the prominent GG>TT (or CC>AA in the other strand) TBS observed in lung cancer to smoking. Smoking may also be a causative factor in head and neck cancer (most of which are squamous cell carcinomas that develop in the upper aerodigestive epithelium [Argiris et al., 2008]), ovarian carcinoma and colorectal cancer [Cogliano et al., 2011]; we note in passing that there is a correlation between the estimated relative risk conferred by smoking [Parkin, 2011] and the frequency of CC/GG>AA/TT in these cancers (Table 5). The third class includes breast cancer, high-risk neuroblastoma and prostate adenocarcinoma. Prostate adenocarcinoma is not known to be associated with exposure to any exogenous mutagens [Cogliano et al., 2011] and exogenous mutagen exposure can also be effectively excluded as a causative factor for neuroblastoma, an embryonal malignancy of early childhood [Maris, 2010]. As for breast cancer, evidence suggesting an association with tobacco smoking is somewhat limited [Cogliano et al., 2011]. Remarkably, the most frequent TBS type in these three cancers is GC/GC>AA/TT, as we observed in the inherited disease mutation dataset. In contrast to CC/GG>TT/AA and CC/GG>AA/TT, which are associated with the action of exogenous mutagens in melanoma and lung cancer, GC/GC>AA/TT may be more likely to be associated with endogenous mechanisms of mutagenesis.

Table 5. Number of Somatic Single-Base Substitutions and Double Tandem Base Substitutions (TBSs) Detected in Eight Cancer Whole Exome Sequencing Studiesa
Cancer whole exome sequencing (reference)Number of single-base substitutions (a)Number of double TBSs (b)(b/a)%Number of the most frequent TBSs (c)(c/b)%Established mutagen exposure
  1. a

    Human disease data and a relevant human study are also given in the last row to allow estimation of the TBS mutation rate in the human genome.

  2. b

    Including 68,773 missense mutations and 11,525 splicing mutations.

  3. c

    Excluding the 6 promoter or 5′-UTR mutations and 30 intron mutations (refer to Table 3).

135 melanomas [Hodis et al., 2012]220,4595,0402.3CC/GG>TT/AA (3,867)76.7UV light
183 lung adenocarcinomas [Imielinski et al., 2012]62,6761,0441.7CC/GG>AA/TT (457)43.8Estimated relative risks conferred by smoking (male, 21.3; female, 12.5) [Parkin, 2011]
92 head and neck squamous cell carcinomas [Stransky et al., 2011]9,398610.6CC/GG>AA/TT (15)24.6Estimated relative risks conferred by smoking (male, 10.9; female, 5.1) [Parkin, 2011]
489 ovarian carcinomas [Cancer Genome Atlas Network, 2011]18,2972171.2CC/GG>AA/TT (18)8.3Estimated relative risks conferred by smoking (2.1) [Parkin, 2011]
224 colon and rectal carcinomas [Cancer Genome Atlas Network, 2012]87,3992400.27CC/GG>AA/TT (16)6.7Estimated relative risks conferred by smoking (male, 1.24; female, 1.30) Parkin, 2011]
103 primary breast carcinomas [Banerji et al., 2012]4,653140.30GC/GC>AA/TT (2)14.3Evidence of association with smoking is limited [Cogliano et al., 2011]
240 high-risk neuroblastomas [Pugh et al., 2013]4,956390.78GC/GC>AA/TT (5)12.8Not associated with smoking [Cogliano et al., 2011]
112 prostate adenocarcinomas [Barbieri et al., 2012]5,642130.23GC/GC>AA/TT (4)30.8Not associated with smoking [Cogliano et al., 2011]
HGMD data (as of June 27, 2012)80,298b390c0.49
De novo germline mutations from whole genome sequencing in individuals with autism [Michaelson et al., 2012]58120.34
Identification of a sequence context predisposing to GG>TT

All six double TBSs that occurred within exon/intron boundaries affected a GG dinucleotide. Somewhat surprisingly, four of these resulted in GG>TT changes (Fig. 7). Moreover, half of the six double TBSs that occurred within intron/exon boundaries and affected a GG dinucleotide also resulted in TT substitutions (Fig. 7). These observations cannot be explained solely in terms of clinical selection bias because the substitution of the second G or the first base of the canonical splice donor site in the former cases, and the substitution of the first G or the second base of the canonical splice acceptor site in the latter cases, by any other bases is predicted to result in aberrant splicing. Rather, these observations revealed a strong sequence context effect, with GG>TT occurring preferentially at WGGT (W = T or A).


Figure 7. Pathogenic concurrent double tandem base substitutions occurring at exon/intron and intron/exon boundaries. The shading delimits the substituted bases (upper), with the substituting bases being indicated below. Coding sequences are in bold, whereas intronic sequences are in normal font.

Download figure to PowerPoint

To test the generality of the predisposition of the WGGT motif to GG>TT substitutions, we reevaluated the sequence contexts of the 37 GG-affecting and 36 CC-affecting TBSs of the 366 coding TBSs (Table 4). Having modified the WGGT motif to WGGW and combined the two datasets, we noted a significant difference between the number of GG>TT mutations occurring within WGGW and that the number occurring within SGGS (S = C or G) [(43.7% (14/32) vs. (20.1% (11/53); χ2 test, P = 0.02).

We then tested whether the WGGW sequence context might also pertain in the case of smoking-associated cancers, wherein CC/GG>AA/TT is the most frequent TBS type. We selected lung adenocarcinoma, ovarian carcinoma and colorectal cancer for this analysis because (1) the first of these exhibits the highest ratio (43.8%) and the latter two have the lowest ratios (8.3% and 6.7%) of CC/GG>AA/TT out of all double TBSs (Table 5) and (2) each has a reasonable number of TBSs. Analysis of the first 200 GG-affecting somatic double TBSs in lung adenocarcinoma showed that GG>TT occurred equally at WGGW and SGGS [69.3% (52/75) vs. 68.0% (85/125)]. By contrast, GG>TT occurred about 2–2.5 times more frequently at WGGW than SGGS, both in ovarian carcinoma [40.0% (10/25) vs. 20.5% (8/31)] and colorectal cancer [50.0% (9/18) vs. 20.6% (7/34)], which is quite similar to our findings from the inherited disease-derived data.

Mutational mechanisms underlying double TBS generation in the human germline

Having examined the inherited disease data in the context of codon usage and cross-checked them with the somatic mutation data, we were able to conclude with a reasonable degree of confidence that the distribution pattern of the disease-derived double TBSs in Table 4 essentially reflects that of double TBSs generated in the human genome. That the human germline double TBS distribution pattern differs significantly from somatic mutations in those cancers that were associated with exposure to exogenous mutagens (UV and smoking) yet displays considerable similarity to mutations from cancers without exposure to any known exogenous mutagens, strongly suggests an endogenous cause for TBS generation in the germline, and for TBS generation in at least a proportion of somatic TBS. Indeed, in addition to exposure to a variety of exogenous genotoxic agents, cells are constantly exposed to DNA-reactive substances of endogenous origin. For example, reactive oxygen species interact with DNA resulting in the formation of highly mutagenic 7,8-dihydro-8-oxoguanine (8-oxo-G) lesions [van Loon and Hubscher, 2009]; and lipid peroxidation-derived enals such as acrolein and crotonaldehyde cause DNA damage through forming 1,N2-propanodeoxyguanosine adducts [Chung et al., 1999]. It is possible that bypass of these damaged DNA base(s) by TLS polymerases could sometimes lead to the generation of double TBSs.

Rate of TBSs in the Human Germline

Disease data have traditionally been used to estimate the rate of different types of genomic variant in the human genome, including single-base substitutions [Haldane, 1935; Kondrashov, 2003], L1-mediated insertions [Kazazian, 1999] and copy number variations [Lupski, 2007]. In terms of the rate of single-base substitution, the earlier estimates have been essentially validated by more recent genome-wide approaches, particularly the direct detection of de novo mutations [Campbell et al., 2012; Conrad et al., 2011; Kong et al., 2012; Sun et al., 2012], with the average de novo mutation rate being 1.20 ×  10−8 per nucleotide per generation. In addition, as mentioned earlier, the proportion of single-base substitutions from the inherited disease data that are CpG transitions is 20% [Chen et al., 2009], a figure very close to the 17% obtained from the genome-wide detection of de novo mutations [Kong et al., 2012]. Here we attempted to infer the rate of double TBSs in the human genome by comparing their number to that of single-base substitutions. The inherited disease-derived ratio of double TBSs to that of single-base substitutions is 0.49%, close to the ratios derived from cancer genome-derived data (Table 5). Data from melanoma and lung adenocarcinoma should be excluded from consideration here because these tumors are known to be strongly associated with the action of exogenous mutagens. Recent whole-genome sequencing in individuals with autism identified two de novo double TBSs and 581 de novo single-base substitutions, the ratio of the former to the latter being 0.34%. The highly consistent findings between different mutation contexts suggest that the rate of double TBSs should be 0.2%–1% of single-base substitutions, namely 0.2–1.2 × 10−10 per base per generation. Accordingly, the rate of triple TBSs (Table 2) may be inferred to be 0.8–4.8 × 10−12 per base per generation and that of TBSs of ≥ 4 bases even lower.

Conclusions and Perspectives

  1. Top of page
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Conclusions and Perspectives
  7. Acknowledgment
  8. References
  9. Supporting Information

In the current analysis, TBSs were identified as being an important and distinct category of genomic variant causing human inherited disease. For the first time, the pattern and mutational mechanisms underlying human germline TBSs were systematically investigated. By reference to cancer mutational spectra, we concluded that human germline TBSs are caused predominantly by endogenous sources of DNA damage arising as a consequence of cellular metabolism rather than by exposure to exogenous mutagens. We highlighted the likely fundamental importance of TLS polymerases in generating these diverse TBSs. However, detailed sequence context analysis revealed that TLS polymerases may operate differently in generating TBSs of ≤3 bases than those of ≥4 bases. In the former, TLS polymerases probably incorporate incorrect bases against the damaged base(s) but in the latter, TLS polymerases function through SRS or serial template switching. In support of this idea, some TLS polymerases have been shown to be able to create and then extend primers containing two and sometimes even three consecutive terminal mismatches [Stone et al., 2012]. The extended SRS model incorporating TLS polymerases may find wide application in the generation of complex genomic rearrangements most of whose duplication junctions are associated with microhomologies.

Interestingly, we noted an approximate correlation between the estimated risk conferred by smoking and the frequency of CC/GG>AA/TT in several cancers (Table 5). Although CC/GG>AA/TT may form part of the mutational signature for tobacco smoking in lung adenocarcinoma (and perhaps also in head and neck cancer), its significance remains to be clarified in ovarian carcinoma and colorectal cancer. In this regard, it is perhaps pertinent to mention that GG>TT was identified as being a unique signature mutation caused by the action of air pollutant peroxyacetyl nitrate in the lung of Big Blue® mice [DeMarini et al., 2000]. Provided that the patterns of mutation in cancer are informative with regard to the DNA damage and repair processes, it is pertinent to note that CG>TA stands out as one of the most frequent TBS type in colorectal cancer; indeed, its number (n = 16) equals to that of CC/GG>AA/TT (Supp. Table S3). It is tempting to speculate that CG>TA may represent a mutational signature associated with colorectal cancer, where diet is clearly a major risk factor. In addition, aristolochic acid-associated urothelial cancer in Taiwan is characterized by A>T transversions, which are generally rare in other cancers [Chen et al., 2012a]. By analogy to UV- and smoking-associated mutational signatures (prominent C>T and CC>TT in the former and an excess of G>T and GG>TT in the latter), we might expect to see dominant AA>TT in the aristolochic acid-associated urothelial cancer once TBSs have been characterized. Indeed, following the same line of reasoning, it may well be that TBSs have the potential to be used to classify human germ-cell mutagens in the future [DeMarini, 2012].

Finally, we note that TBSs were not considered in the recent integrated map of genetic variation derived from 1,092 human genomes [1000 Genomes Project Consortium, 2012], impeding a full appreciation of the variation landscape in the human genome. We hope that our current analysis will draw attention to this unique type of genomic variant, with a view not only to improving our understanding of the underlying mutational mechanisms but also to improving our ability to evaluate its impact on health and disease.


  1. Top of page
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Conclusions and Perspectives
  7. Acknowledgment
  8. References
  9. Supporting Information

We thank Emmanuelle Masson (Brest, France) for help in extracting the tandem base substitution mutations causing human inherited disease.

Disclosure statement. The authors are not aware of any conflict of interest.


  1. Top of page
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Conclusions and Perspectives
  7. Acknowledgment
  8. References
  9. Supporting Information
  • 1000 Genomes Project Consortium, Abecasis GR, Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA. 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491:5665.
  • Arana ME, Kunkel TA. 2010. Mutator phenotypes due to DNA replication infidelity. Semin Cancer Biol 20:304311.
  • Argiris A, Karamouzis MV, Raben D, Ferris RL. 2008. Head and neck cancer. Lancet 371:16951709.
  • Averof M, Rokas A, Wolfe KH, Sharp PM. 2000. Evidence for a high frequency of simultaneous double-nucleotide substitutions. Science 287:12831286.
  • Banerji S, Cibulskis K, Rangel-Escareno C, Brown KK, Carter SL, Frederick AM, Lawrence MS, Sivachenko AY, Sougnez C, Zou L, Cortes ML, Fernandez-Lopez JC, et al. 2012. Sequence analysis of mutations and translocations across breast cancer subtypes. Nature 486:405409.
  • Barbieri CE, Baca SC, Lawrence MS, Demichelis F, Blattner M, Theurillat JP, White TA, Stojanov P, Van Allen E, Stransky N, Nickerson E, Chae SS, et al. 2012. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat Genet 44:685689.
  • Berger MF, Lawrence MS, Demichelis F, Drier Y, Cibulskis K, Sivachenko AY, Sboner A, Esgueva R, Pflueger D, Sougnez C, Onofrio R, Carter SL, et al. 2011. The genomic complexity of primary human prostate cancer. Nature 470:214220.
  • Berger MF, Hodis E, Heffernan TP, Deribe YL, Lawrence MS, Protopopov A, Ivanova E, Watson IR, Nickerson E, Ghosh P, Zhang H, Zeid R, et al. 2012. Melanoma genome sequencing reveals frequent PREX2 mutations. Nature 485:502506.
  • Brash DE, Rudolph JA, Simon JA, Lin A, McKenna GJ, Baden HP, Halperin AJ, Ponten J. 1991. A role for sunlight in skin cancer: UV-induced p53 mutations in squamous cell carcinoma. Proc Natl Acad Sci USA 88:1012410128.
  • Buettner VL, Hill KA, Halangoda A, Sommer SS. 1999. Tandem-base mutations occur in mouse liver and adipose tissue preferentially as G:C to T:A transversions and accumulate with age. Environ Mol Mutagen 33:320324.
  • Campbell CD, Chong JX, Malig M, Ko A, Dumont BL, Han L, Vives L, O'Roak BJ, Sudmant PH, Shendure J, Abney M, Ober C, et al. 2012. Estimating the human mutation rate using autozygosity in a founder population. Nat Genet 44:12771281.
  • Cancer Genome Atlas Network. 2011. Integrated genomic analyses of ovarian carcinoma. Nature 474:609615.
  • Cancer Genome Atlas Network. 2012. Comprehensive molecular characterization of human colon and rectal cancer. Nature 487:330337.
  • Chen JM, Chuzhanova N, Stenson PD, Férec C, Cooper DN. 2005a. Complex gene rearrangements caused by serial replication slippage. Hum Mutat 26:125134.
  • Chen JM, Chuzhanova N, Stenson PD, Férec C, Cooper DN. 2005b. Intrachromosomal serial replication slippage in trans gives rise to diverse genomic rearrangements involving inversions. Hum Mutat 26:362373.
  • Chen JM, Chuzhanova N, Stenson PD, Férec C, Cooper DN. 2005c. Meta-analysis of gross insertions causing human genetic disease: novel mutational mechanisms and the role of replication slippage. Hum Mutat 25:207221.
  • Chen JM, Stenson PD, Cooper DN, Férec C. 2005d. A systematic analysis of LINE-1 endonuclease-dependent retrotranspositional events causing human genetic disease. Hum Genet 117:411427.
  • Chen JM, Cooper DN, Chuzhanova N, Férec C, Patrinos GP. 2007. Gene conversion: mechanisms, evolution and human disease. Nat Rev Genet 8:762775.
  • Chen JM, Férec C, Cooper DN. 2009. Closely spaced multiple mutations as potential signatures of transient hypermutability in human genes. Hum Mutat 30:14351448.
  • Chen JM, Cooper DN, Férec C, Kehrer-Sawatzki H, Patrinos GP. 2010. Genomic rearrangements in inherited disease and cancer. Semin Cancer Biol 20:222233.
  • Chen CH, Dickman KG, Moriya M, Zavadil J, Sidorenko VS, Edwards KL, Gnatenko DV, Wu L, Turesky RJ, Wu XR, Pu YS, Grollman AP. 2012a. Aristolochic acid-associated urothelial cancer in Taiwan. Proc Natl Acad Sci USA 109:82418246.
  • Chen JM, Férec C, Cooper DN. 2012b. Transient hypermutability, chromothripsis and replication-based mechanisms in the generation of concurrent clustered mutations. Mutat Res 750:5259.
  • Chung FL, Nath RG, Nagao M, Nishikawa A, Zhou GD, Randerath K. 1999. Endogenous formation and significance of 1,N2-propanodeoxyguanosine adducts. Mutat Res 424:7181.
  • Chuzhanova N, Chen JM, Bacolla A, Patrinos GP, Férec C, Wells RD, Cooper DN. 2009. Gene conversion causing human inherited disease: evidence for involvement of non-B-DNA-forming sequences and recombination-promoting motifs in DNA breakage and repair. Hum Mutat 30:11891198.
  • Cogliano VJ, Baan R, Straif K, Grosse Y, Lauby-Secretan B, El Ghissassi F, Bouvard V, Benbrahim-Tallaa L, Guha N, Freeman C, Galichet L, Wild CP. 2011. Preventable exposures associated with human cancers. J Natl Cancer Inst 103:18271839.
  • Conrad DF, Keebler JE, DePristo MA, Lindsay SJ, Zhang Y, Casals F, Idaghdour Y, Hartl CL, Torroja C, Garimella KV, Zilversmit M, Cartwright R, et al. 2011. Variation in genome-wide mutation rates within and between human families. Nat Genet 43:712714.
  • Cooper DN, Bacolla A, Férec C, Vasquez KM, Kehrer-Sawatzki H, Chen JM. 2011. On the sequence-directed nature of human gene mutation: the role of genomic architecture and the local DNA sequence environment in mediating gene mutations underlying human inherited disease. Hum Mutat 32:10751099.
  • DeMarini DM. 2012. Declaring the existence of human germ-cell mutagens. Environ Mol Mutagen 53:166172.
  • DeMarini DM, Shelton ML, Kohan MJ, Hudgens EE, Kleindienst TE, Ball LM, Walsh D, de Boer JG, Lewis-Bevan L, Rabinowitz JR, Claxton LD, Lewtas J. 2000. Mutagenicity in lung of big Blue® mice and induction of tandem-base substitutions in Salmonella by the air pollutant peroxyacetyl nitrate (PAN): predicted formation of intrastrand cross-links. Mutat Res 457:4155.
  • Drake JW, Bebenek A, Kissling GE, Peddada S. 2005. Clusters of mutations from transient hypermutability. Proc Natl Acad Sci USA 102:1284912854.
  • Govindan R, Ding L, Griffith M, Subramanian J, Dees ND, Kanchi KL, Maher CA, Fulton R, Fulton L, Wallis J, Chen K, Walker J, et al. 2012. Genomic landscape of non-small cell lung cancer in smokers and never-smokers. Cell 150:11211134.
  • Gu W, Zhang F, Lupski JR. 2008. Mechanisms for human genomic rearrangements. PathoGenetics 1:4.
  • Haldane JBS. 1935. The rate of spontaneous mutation of a human gene. J Genet 31:317326.
  • Harfe BD, Jinks-Robertson S. 2000. DNA polymerase ζ introduces multiple mutations when bypassing spontaneous DNA damage in Saccharomyces cerevisiae. Mol Cell 6:14911499.
  • Hastings PJ, Ira G, Lupski JR. 2009. A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet 5:e1000327.
  • Hill KA, Wang J, Farwell KD, Sommer SS. 2003. Spontaneous tandem-base mutations (TBM) show dramatic tissue, age, pattern and spectrum specificity. Mutat Res 534:173186.
  • Hodis E, Watson IR, Kryukov GV, Arold ST, Imielinski M, Theurillat JP, Nickerson E, Auclair D, Li L, Place C, Dicara D, Ramos AH, et al. 2012. A landscape of driver mutations in melanoma. Cell 150:251263.
  • Imielinski M, Berger AH, Hammerman PS, Hernandez B, Pugh TJ, Hodis E, Cho J, Suh J, Capelletti M, Sivachenko A, Sougnez C, Auclair D, et al. 2012. Mapping the hallmarks of lung adenocarcinoma with massively parallel sequencing. Cell 150:11071120.
  • Kazazian HH, Jr. 1999. An estimated frequency of endogenous insertional mutations in humans. Nat Genet 22:130.
  • Kloosterman WP, Tavakoli-Yaraki M, van Roosmalen MJ, van Binsbergen E, Renkens I, Duran K, Ballarati L, Vergult S, Giardino D, Hansson K, Ruivenkamp CA, Jager M, et al. 2012. Constitutional chromothripsis rearrangements involve clustered double-stranded DNA breaks and nonhomologous repair mechanisms. Cell Rep 1:648655.
  • Kondrashov AS. 2003. Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum Mutat 21:1227.
  • Kong A, Frigge ML, Masson G, Besenbacher S, Sulem P, Magnusson G, Gudjonsson SA, Sigurdsson A, Jonasdottir A, Wong WS, Sigurdsson G, Walters GB, et al. 2012. Rate of de novo mutations and the importance of father's age to disease risk. Nature 488:471475.
  • Krantz ID, Colliton RP, Genin A, Rand EB, Li L, Piccoli DA, Spinner NB. 1998. Spectrum and frequency of jagged1 (JAG1) mutations in Alagille syndrome patients and their families. Am J Hum Genet 62:13611369.
  • Lee JA, Carvalho CM, Lupski JR. 2007. A DNA replication mechanism for generating nonrecurrent rearrangements associated with genomic disorders. Cell 131:12351247.
  • Lieber MR. 2008. The mechanism of human nonhomologous DNA end joining. J Biol Chem 283:15.
  • Liu P, Erez A, Nagamani SC, Dhar SU, Kolodziejska KE, Dharmadhikari AV, Cooper ML, Wiszniewska J, Zhang F, Withers MA, Bacino CA, Campos-Acevedo LD, et al. 2011. Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements. Cell 146:889903.
  • Liu P, Carvalho CM, Hastings PJ, Lupski JR. 2012. Mechanisms for recurrent and complex human genomic rearrangements. Curr Opin Genet Dev 22:211220.
  • Loeb LA, Monnat RJ, Jr. 2008. DNA polymerases and human disease. Nat Rev Genet 9:594604.
  • Lupski JR. 2007. Genomic rearrangements and sporadic disease. Nat Genet 39:S43S47.
  • Maris JM. 2010. Recent advances in neuroblastoma. New Engl J Med 362:22022211.
  • Matsuda T, Bebenek K, Masutani C, Rogozin IB, Hanaoka F, Kunkel TA. 2001. Error rate and specificity of human and murine DNA polymerase η. J Mol Biol 312:335346.
  • Michaelson JJ, Shi Y, Gujral M, Zheng H, Malhotra D, Jin X, Jian M, Liu G, Greer D, Bhandari A, Wu W, Corominas R, et al. 2012. Whole-genome sequencing in autism identifies hot spots for de novo germline mutation. Cell 151:14311442.
  • Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, Jones D, Hinton J, Marshall J, Stebbings LA, Menzies A, Martin S, et al. 2012. Mutational processes molding the genomes of 21 breast cancers. Cell 149:979993.
  • Parkin DM. 2011. 2. Tobacco-attributable cancer burden in the UK in 2010. Br J Cancer 105(Suppl 2):S6S13.
  • Pfeifer GP, Denissenko MF, Olivier M, Tretyakova N, Hecht SS, Hainaut P. 2002. Tobacco smoke carcinogens, DNA damage and p53 mutations in smoking-associated cancers. Oncogene 21:74357451.
  • Pugh TJ, Morozova O, Attiyeh EF, Asgharzadeh S, Wei JS, Auclair D, Carter SL, Cibulskis K, Hanna M, Kiezun A, Kim J, Lawrence MS, et al. 2013. The genetic landscape of high-risk neuroblastoma. Nat Genet 45:279284.
  • Rosenfeld JA, Malhotra AK, Lencz T. 2010. Novel multi-nucleotide polymorphisms in the human genome characterized by whole genome and exome sequencing. Nucleic Acids Res 38:61026111.
  • Sharma S, Helchowski CM, Canman CE. 2012. The roles of DNA polymerase zeta and the Y family DNA polymerases in promoting or preventing genome instability. Mutat Res, doi: 10.1016/j.mrfmmm.2012.11.002.
  • Sheen CR, Jewell UR, Morris CM, Brennan SO, Férec C, George PM, Smith MP, Chen JM. 2007. Double complex mutations involving F8 and FUNDC2 caused by distinct break-induced replication. Hum Mutat 28:11981206.
  • Stenson PD, Ball EV, Mort M, Phillips AD, Shaw K, Cooper DN. 2012. The Human Gene Mutation Database (HGMD) and its exploitation in the fields of personalized genomics and molecular evolution. Curr Protoc Bioinformatics Chapter 1:Unit 1 13.
  • Stephens PJ, Greenman CD, Fu B, Yang F, Bignell GR, Mudie LJ, Pleasance ED, Lau KW, Beare D, Stebbings LA, McLaren S, Lin ML, et al. 2011. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144:2740.
  • Stone JE, Lujan SA, Kunkel TA. 2012. DNA polymerase zeta generates clustered mutations during bypass of endogenous DNA lesions in Saccharomyces cerevisiae. Environ Mol Mutagen 53:777786.
  • Stransky N, Egloff AM, Tward AD, Kostic AD, Cibulskis K, Sivachenko A, Kryukov GV, Lawrence MS, Sougnez C, McKenna A, Shefler E, Ramos AH, et al. 2011. The mutational landscape of head and neck squamous cell carcinoma. Science 333:11571160.
  • Sun JX, Helgason A, Masson G, Ebenesersdottir SS, Li H, Mallick S, Gnerre S, Patterson N, Kong A, Reich D, Stefansson K. 2012. A direct characterization of human mutation based on microsatellites. Nat Genet 44:11611165.
  • Sutton MD. 2010. Coordinating DNA polymerase traffic during high and low fidelity synthesis. Biochim Biophys Acta 1804:11671179.
  • van Loon B, Hubscher U. 2009. An 8-oxo-guanine repair pathway coordinated by MUTYH glycosylase and DNA polymerase λ. Proc Natl Acad Sci USA 106:1820118206.
  • Zhong X, Garg P, Stith CM, Nick McElhinny SA, Kissling GE, Burgers PM, Kunkel TA. 2006. The fidelity of DNA synthesis by yeast DNA polymerase zeta alone and with accessory proteins. Nucleic Acids Res 34:47314742.

Supporting Information

  1. Top of page
  3. Introduction
  4. Materials and Methods
  5. Results and Discussion
  6. Conclusions and Perspectives
  7. Acknowledgment
  8. References
  9. Supporting Information

Disclaimer: Supplementary materials have been peer-reviewed but not copyedited.

humu22341-sup-0001-suppmat.pdf600Ksupplementary material

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.