Predicting mutation frequencies in stem–loop structures of derepressed genes: implications for evolution



This work provides evidence that, during transcription, the mutability (propensity to mutate) of a base in a DNA secondary structure depends both on the stability of the structure and on the extent to which the base is unpaired. Zuker's DNA folding computer program reveals the most probable stem–loop structures (SLSs) and negative energies of folding (–ΔG) for any given nucleotide sequence. We developed an interfacing program that calculates (i) the percentage of folds in which each base is unpaired during transcription; and (ii) the mutability index (MI) for each base, expressed as an absolute value and defined as ­follows: MI = (% total folds in which the base is unpaired) × (highest –ΔG of all folds in which it is unpaired). Thus, MIs predict the relative mutation or reversion frequencies of unpaired bases in SLSs. MIs for 16 mutable bases in auxotrophs, selected during starvation in derepressed genes, are compared with 70 background mutations in lacI and ebgR that were not derepressed during mutant selection. All the results are consistent with the location of known mutable bases in SLSs. Specific conclusions are: (i) Of 16 mutable bases in transcribing genes, 87% have higher MIs than the average base of the sequence analysed, compared with 50% for the 70 background mutations. (ii) In 15 of the mutable bases of transcribing genes, the correlation between MIs and relative mutation frequencies determined experimentally is good. There is no correlation for 35 mutable bases in the lacI gene. (iii) In derepressed auxotrophs, 100% of the codons containing the mutable bases are within one codon's length of a stem, compared with 53% for the background mutable bases in lacI. (iv) The data suggest that environmental stressors may cause as well      as     select     mutations     in     derepressed     genes. The implications of these results for evolution are discussed.


Gene-specific transcription-induced mutations have been described in growing cells of prokaryotes (Brock, 1971; Herman and Dworkin, 1971; Beletskii and Bhagwat, 1996) and eukaryotes (Datta and Jinks-Robertson, 1995), as well as in starving cells of Escherichia coli (Lipschutz et al., 1965; Wright, 1996; 1997; Wright and Minnick, 1997; Rudner et al., 1999; Wright et al., 1999). Recent reviews have discussed the mechanisms by which starvation and gene derepression can enhance non-random mutations (Foster, 1999; Wright, 2000; Bridges, 2001; Rosenberg, 2001). Previous studies have demonstrated a correlation between mutation rates and transcription as determined by measuring levels of specific mRNA (Wright et al., 1999). However, accurate rates of mRNA synthesis can only be determined when both mRNA concentration and half-life are known. Current studies on mRNA turnover (J. M. Reimers et al., submitted) have confirmed the conclusions of the earlier investigations (Wright et al., 1999).

There are at least two mechanisms by which transcription can increase mutation rates in localized areas of the genome: (i) separation of the two DNA strands exposes the non-transcribed strand, which is then vulnerable to mutations (Lindahl and Nyberg, 1972; 1974; Coulondre et al., 1978; Singer and Sterns, 1982; Fix and Glickman, 1987; Frederico et al., 1990; 1993; Skandalis et al., 1994); and (ii) the advancing RNA polymerase complex drives localized supercoiling (Balke and Gralla, 1987; Liu and Wang, 1987; Dayn et al., 1991; Dayn et al., 1992; Rahmouni and Wells, 1992; Krasilnikov et al., 1999), which creates and stabilizes stem–loop structures (SLSs) containing vulnerable unpaired and mispaired bases (Ripley, 1982; Todd and Glickman, 1982; Pananicolaou and Ripley, 1991). Thus, these two mechanisms, in concert, are ideal for obtaining the variants most likely to survive each kind of stress while ­minimizing random genomic damage (Wright, 2000). Evidence from many laboratories has demonstrated that transcription drives localized supercoiling and that stem–loop conformations contain mutable bases. However, no direct evidence has yet been presented to document a correlation   in   specific   genes   between   increased   rates of transcription and enhanced mutation localized in unpaired bases of SLSs.

The highest concentration of supercoiling and SLSs occurs in the wake (5′) of the transcription bubble, and SLSs can form in either or both strands (cruciforms). DNA secondary structures not only expose mutable bases, but also increase the time of their exposure by periodically blocking the progress of transcribing RNA polymerase complexes during transcript elongation. Transcription-enhanced mutations should therefore occur preferentially at pause sites, and mutation rates should be increased by activators such as guanosine tetraphosphate (ppGpp), which has been shown to lengthen the duration of these pauses (Kingston et al., 1981). Specific effects of stress on transcription are essential to the mechanisms of non-random mutations in evolution (see Discussion; Wright, 2000).

To explore the possible correlation of secondary structure formation and sites of mutation, Zuker's DNA folding computer program (Benham, 1992; SantaLucia, 1998; Zuker et al., 1999) was used. This program folds a segment of ssDNA, predicts the most thermodynamically stable  secondary  structure that could form and calculates the negative energy of folding (–ΔG) for each structure (http:www.bioinfo.rpi.eduapplicationsmfoldolddna). For example, a 200 nucleotide (nt) sequence containing a known mutable base is copied as input to the program. The program is then instructed to fold a series of 30 nt segments that include the mutable base, and then to show the structure with the highest –ΔG for that base. The theoretical methods used in screening DNA sequences for sites susceptible to superhelical strand separation and transition to alternative structures have been verified experimentally in other systems (Benham, 1992). A computer program that interfaces with Zuker's program has been developed in this laboratory in order to obtain additional information regarding the mutability of individual bases. This new program calculates the extent to which each base is unpaired during transcription (see Experimental procedures).

In four Escherichia coli auxotrophs (leuB, argH, malT and lacZ) studied in our laboratory, mutant sequences and number of mutations have been determined. Sequences and mutation frequencies of seven trpA auxotrophs were found in the literature (Isbell and Fowler, 1989; Bhamre et al., 2001). As controls, data were also obtained from the literature for background (‘spontaneous’) mutations not isolated in derepressed genes, namely lacI and ebgR (Schaaper et al., 1986; Hall, 1999). The presence of loss-of-function mutations in these repressor genes is revealed indirectly by allowing the constitutive expression of β-galactosidase.


The location of mutable bases in high and low –ΔG areas of a gene

In order to correlate the location of mutable bases in SLSs with the –ΔG of their structures, about 130 nt of the non-transcribed strand, including the mutation(s) in leuB, argH, malT or lacZ, were folded in successive, overlapping 30 nt segments, beginning at each fifth nt (Fig. 1). Thus, each nucleotide was included in six successive folds. The magnitudes of the –ΔG values indicate the relative stability of the structures formed by each segment: the greater the –ΔG value, the more likely a stable secondary structure will form and create a pause site. Figure 1 shows that mutations generally occur in structures with high –ΔGs. That is, in lacZ, leuB, argH, malT and codon 49 of trpA, mutations occur in the highest –ΔG peaks present. In the other three codons of interest in trpA, mutations occur in three of the four highest peaks. It should be pointed out that a high –ΔG results primarily from the stability of the stem(s), and does not necessarily indicate the presence of an unpaired base vulnerable to a chemical change (see Discussion). Presumably, mutations usually occur when an unpaired base exists in its highest –ΔG structure. The SLSs that form at peak –ΔG values in selected genes are described below.

Figure 1.

The location and –ΔG values of 16 known mutable bases in E. coli auxotrophs. Beginning at each fifth nucleotide, 130 nt ( lacZ , including codon 63; leuB , including codon 286; argH , including codon 4; malT , including codon 545; and trpA , including codon 49) or 260 nt ( trpA , including codons 183, 211 and 234) of the non-transcribed strands were folded in successive, overlapping 30 nt windows. Each bar indicates the beginning of a window, and the number of mutable bases in each fold appears above each bar. The entire trpA gene was included in the analysis, but a 340 nt segment without mutable bases was omitted from the following sequence and is indicated by dotted lines. Codon numbers are in parenthesis, and mutable bases are capitalized. …gct aat tga agc cgg tgc tga gcg ctg GAg (49) tta ggc atc ccc ttc tcc gac cca ctg gca gat ggc ccg acg att caa … gca ggc gtg aCc (183) ggc gca gaa aac cgc gcc gcg tta ccc ctc aat cat ctg gtt gcg aag ctg aaa gag tac aac gct gca cct cca ttg cag GGa (211) ttt ggt att tcc gcc ccg gat cag gta aaa gca gcg att gat gca gga gct gcg ggc gcg att tct GGt (234) tcg ccc…

The DNA folding program appears to describe a biochemical process that is robust. That is, although the structures formed in the 30 nt window are not as large and stable as those in 40 nt and 50 nt windows (not shown), all three analyses show a peak of high negative energy values and structures with a similar conformation at the site of the mutable base. This suggests a reliable and buffered system for the dynamic interconversion of secondary structures. Thus, despite probable continuous ­variations in the length of segments folded during transcription, mutation and (presumably) pause sites generally coincide with peak –ΔG values. The most stringent condition used, folding 30 nt (compared with 40 and 50 nt) shows the most specific correlation of maximum peak –ΔG values with the presence of each mutant nucleotide, reflecting, perhaps, the average number of nucleotides folded in vivo (Fig. 1). Although the 50 nt foldings had peaks at the mutation sites, they also had higher peaks elsewhere, and were therefore not as specifically correlated with the mutation sites. Very large windows generate very complex structures but, in choosing a smaller window, we attempted to simulate transcription conditions in silico. Therefore, a 30 nt window was chosen for analysing the mutant sequences. However, in the analysis of lacZ, it was also found necessary to use a 40 nt window in order to understand the mechanism of the triple mutation that occurred (see below).

In the trpA gene (Yanofsky et al., 1981; Isbell and Fowler, 1989; Bhamre et al., 2001), the locations of seven mutable bases in four codons are known, and the sequences containing these bases are shown (Fig. 1). These seven mutations are also located in high –ΔG peaks. In contrast, Fig. 2 shows an apparently random distribution of mutable bases in lacI and ebgR in which transcription was not induced. Note, especially in the large lacI data set (Fig. 2 legend), that most of the 414 mutations that were sequenced are in the first and second positions of the codons (see Discussion).

Figure 2.

The location and –ΔG values of mutable bases in the E. coli repressors, lacI and ebgR. Beginning every fifth nt, 30 nt windows were folded. Each bar indicates the nucleotide at the beginning of the window, and the number of mutable bases in each fold appears above each bar. Wild type lacI , 57 mutations (mutable bases are capitalized) have been found:
…gtg aaa cca gta ACg tta tac gat gTc GCa gag tat Gcc ggt gTc tCt Tat CAG ACc GTt TCc CGc GTg gTg Aac CAg gcc agc cac GTt tCt gcg aaa Acg cgg gaa aaa GTg Gaa gcg GCg atg Gcg gag ctg aat TAC att cCc aAC cgc gTg GCa CAA CAa cTg gCg Ggc Aaa Cag tCg ttg ctg att ggc gtt gcc acc tcc agt cTg gcc…
Wild-type ebgR sequence, 13 mutation sites (mutable bases are capitalized) have been found:
…gac atc gca atc gaa gCt ggc gta Tcc ctg gcg aca Gta tcc AGg gtc tta aat gac gAT ccg aca ttg Aat Gtg aaa gaa gag aCg aaa cat cgc att ctc Gag atc gcc Gaa aag ctg gag taC aag acc…

Calculation of a mutability index for mutable bases

DNA folding analyses have suggested that mutable bases in selected, derepressed genes are preferentially located in high –ΔG SLSs, in contrast to mutable bases in non-induced genes. The mutability of a base should depend both on the stability (–ΔG) of its SLS and on whether the base is unpaired in these structures (Lindahl and Nyberg, 1972; 1974; Coulondre et al., 1978; Singer and Sterns, 1982; Fix and Glickman, 1987; Frederico et al., 1990; 1993; Skandalis et al., 1994). Therefore, a program has been developed that will provide this information for each base (see Experimental procedures). To describe and evaluate these assumed causes of base mutability, we have defined a new term, called the mutability index (MI), and express it as an absolute value:

  • ΜΙ = (% total folds in which the base is unpaired) × (highest –ΔG of all folds in which it is unpaired).

Thus, MIs indicate the relative mutation frequencies of bases in predicted SLSs. The highest –ΔG term gives the most weight to the most stable fold containing the unpaired base. That is, primary importance is given to the structure in which the base is exposed for the longest period of time and therefore has the highest probability of mutation. In addition, the average –ΔG and MI of all the bases included in the sequence analysed are calculated, for comparison with individual base –ΔGs and MIs. This information is necessary, as mutable bases exist in different –ΔG regions. For example, the average –ΔG value in the lacZ analysis is 4.5, whereas this value for leuB is 2.4 (Fig. 1).

The percentage of each wild-type mutable base MI above or below the average MI in the –ΔG region of each base was calculated for the 16 mutable bases in auxotrophs and for the 70 mutable bases in control genes. The percentage above average MI values for the auxotroph bases is 87%, and this value for lacI and ebgR is 50%.

Comparing MIs with mutation frequencies

If mutable bases are located in unpaired bases of SLSs predicted by the Zuker DNA folding program, MI values may predict relative mutation frequencies. Sequences and reversion frequencies have been determined for bases in the same codons of seven trpA mutant alleles (Isbell and Fowler, 1989; Bhamre et al., 2001). In Fig. 3, the SLSs with the highest –ΔG for wild type and three mutant alleles in codon 49 are shown. In the wild type, a strong stem with five C*G basepairs is formed. As the G in the first position of the codon is in the stem, its MI is very low (0.3) in this 30 nt SLS, as well as in smaller windows (not shown). However, in higher –ΔG folds of 35 or 40 nt, more stable stems occur, and the MI is zero, i.e. this G is paired in all folds. Therefore, the mutation of this G to an A or T, giving rise to alleles 11 and 88, respectively, is predicted to occur in relatively small SLSs. These latter SLSs have only three C*G pairs in their stems and thus lower –ΔG values (see bar graph in Fig. 3). However, as they are unpaired much of the time, their MIs are much higher (2.3 and 2.5) than that of the wild-type G (0.3). The relative MI values calculated for the three mutable bases in these mutants compare favourably with their relative reversion frequencies (Isbell and Fowler, 1989; Bhamre et al., 2001; see Fig. 3, table inset). Mutation of the middle base in codon 49 (from an A to a T) retains the strong stem (allele 3), and the MI of that T reverting to wild-type A is the highest of all (3.5). These relative MIs predict that the reversion frequency of allele 3 should be higher than that of 11 and 88, and that the latter alleles should have comparable reversion frequencies. The table inset compares MIs with reversion frequencies. Figure 4 summarizes similar data for four more mutant alleles in codons 211 and 234 of trpA. Again, the relative MIs compare well with their relative reversion frequencies, and MIs of the mutant bases are higher than those of the wild-type base to which they revert.

Figure 3.

Predicted SLSs at peak –ΔG values are shown for wild type and three mutant alleles of codon 49 of trpA . The predicted MI values for the mutations (arrows) are compared in the table inset with mutation frequencies determined experimentally in two data sets. See Table 1 and text for details .

Figure 4.

Predicted SLSs at peak negative energy values for wild type and two mutant alleles for codons 211 and 234 of trpA . Predicted MIs and mutation frequencies determined experimentally are compared in the table inset. See Table 1 and text for details.

Figures 3 and 4 also show that the highest –ΔG SLSs of wild-type sequences are often the same as that of their mutant alleles. For example, in codon 49, the most stable SLS of allele A 3 is the same as that of wild type. Similarly, A 23 and A 26 in codon 211 and A 78 and A 58 in codon 234 have the same SLSs as their respective wild types. Transcription should enhance the mutation frequency of all mutable bases, including those of wild type, by localized increases in the number of SLSs.

The most probable SLSs formed by the non-transcribed strands (NTSs) at peak –ΔG values were compared with those formed by the transcribed strands (TSs) and usually found to mirror one another. In such genes, therefore, mutations are equally probable in the NTS and TS. However, in the argH gene (and, to a lesser extent, leuB), different structures form in 30 nt folds of the two strands, as shown in Fig. 5A. This occurs because of nearest neighbour interactions, the asymmetry of the base stacking rules and because a stem structure containing a G*T bond forms in the TS (SantaLucia, 1998; Zuker et al., 1999). This stem cannot form in the NTS, which has a C and an A in a comparable region of the SLS. The longer stem present in the TS bestows a higher –ΔG and higher MIs to that SLS compared with the NTS, and predicts that mutations are more likely in the TS. In both the TS and the NTS, the MIs of the mutable bases are somewhat higher than the MIs of the wild-type bases (not shown). In the NTS, folding 40 nt gives similar MIs for the first two codon positions but increases the MI in the third position from 1.5 to 2.8. In all folds above 20 nt, the third codon position has the highest MI value, consistent with the number of mutations determined by sequencing 23 revertants (Fig. 5A, table inset).

Figure 5.

Predicted SLSs of argH (A) and leuB (B) genes. The predicted MI of each mutable base is indicated by an arrow. Inset table shows the MI of each mutable base compared with the number of mutations (No.) determined by sequencing 23 revertants in the case of argH and 36 in the case of leuB . See text for ­discussion.

In Fig. 5B, a qualitative comparison of MIs with the number of mutation events is shown for the leuB mutant. In most cases, 30 nt SLSs have higher –ΔG values than 20 nt SLSs but, in this case, the stems are the same, and the long sequences at the ends of the 30 nt SLSs may well be paired with their complements. As shown in the table inset of Fig. 5B, the MIs of the mutable bases are comparable and consistent with the results of sequencing 36 revertants (Wright and Minnick, 1997).

Table 1 summarizes the comparisons in the same codon between calculated MIs and reversion frequencies for the seven trpA auxotrophs, and Table 2 summarizes sequencing data for the argH , leuB and lacZ auxotrophs. Data are also available for a comparable analysis of 414 background mutable bases in lacI ( Schaaper and Dunn, 1991 ; Fig. 6 ). In this sequence, all mutable bases within a codon in which at least one base had three or more mutational events were analysed, giving a total of 35. The number of mutants shown above each base was counted, and the results are summarized in Table 3 . Mutation frequencies ( trpA ) and number of mutations ( lacI ) are plotted against MIs to obtain linear regression analyses of the data in Tables 1 and 3 ( Fig. 7 ). The R 2 value for the trpA alleles is 0.57 and for the lacI data set is 0.002. Although the qualitative comparisons of MIs and mutation events are good for the mutable bases in Table 2 , there are too few data points for this type of analysis.

Table 1. . Comparisons in the same codons between MIs and reversion frequencies in seven mutable alleles of the derepressed trpA gene.
CodonMutant alleleMIRevertants/108 cells
Expt AaExpt Bb
 49 trpA 3
 49 trpA 11
 49 trpA 88 2.5NDc0.3
211 trpA 23 3.61.1ND
211 trpA 46 3.10.8ND
234 trpA 58
234 trpA 78 3.4ND1.8
Table 2. . Comparisons between MIs and relative numbers of mutations determined by sequencing in eight mutable alleles of three derepressed genes.
MutantCodonBase changeMINo. of mutations
  • a

    . This number of revertants is lower than expected, but mutation of G to A results in a stop codon that would not be observed.

  • b

    . Theoretically, mutations in the first position should occur as frequently as those in the second and third positions. This value may be low due to (observed) lower growth rates of these revertants, just as the A to G revertants (wild type) in the second position (wild type) may be high because these revertants have the highest growth rate.

argH   4T to G, C, or A 0.8 8
  4G to T or C 0.8 2a
  4A to C, G, or T 1.513
leuB 286T to A or G 2.620
286T to C 2.616
286G to A, C or TNot observed
lacZ  63T to C, G or A13.9 7b
 63A to G or C13.919
 63G to C or T13.912
 63TAG to AGCTriple 9
Figure 6.

The number of background mutations in lacI ( Schaaper and Dunn, 1991 ).

Table 3. . Comparisons in the same codon between MIs and the number of background mutations in 35 mutable bases of lacI .
AlleleMINo. of mutations
Figure 7.

Linear regression analyses for the data in Tables 1 and 3 . Mutation frequencies or number of mutations are plotted against MIs for seven mutable alleles in the derepressed trpA gene ( Isbell and Fowler, 1989 ; Bhamre et al., 2001 ), and for 35 background mutable alleles in lacI ( Schaaper and Dunn, 1991 ).

The triple mutant

The lacZ stop codon mutant and its triple revertant are shown in Fig. 8. As mentioned previously, a 30 nt window was chosen for the analysis of SLSs. However, when the very long and stable stem was observed, it was necessary to increase the window to 32 nt in order to include the TAG codon (Fig. 8). In the course of sequencing 47 of the revertants, we observed many types of single base revertants, a four codon deletion and, on nine occasions, a triple mutation was isolated (Table 2). As seen in Table 2, a number of amino acid substitutions can result in an active enzyme, although different growth rates are observed. This triple mutation must have arisen by a templating mechanism. The reversion frequency of this stop codon is ≈ 10−9, and three simultaneous independent mutations would have the unlikely frequency of (10−9)3. Inspection of the sequences at the end of the 32 nt SLS revealed that, were the structure extended to 40 nt, a bubble would appear in which templating from the complementary strand could change the TAG stop codon to the triple mutant shown in Fig. 8. There is precedence for this mechanism in the literature (Ripley, 1982), and the existence of this templated triple mutation provides compelling evidence for the role of SLSs in vivo.

Figure 8.

The 32 nt and 40 nt SLSs of lacZ. When 40 nt are folded, it can be seen that the triple mutation, AGC, could arise by templating to the complementary TCG sequence in the opposite strand.


Evidence in the literature indicates that DNA sequence can determine the location of a background mutation, but the frequency with which it occurs depends primarily upon various kinds of DNA-destabilizing metabolic events. When growth is inhibited, DNA replication is minimized and transcription of specific derepressed genes in response to an inhibitor or condition of stress may then be a major cause of mutations (Wright, 2000). In 11 strains of leuB and argH auxotrophs, isogenic except for relA, a correlation was found between mutation rates, ppGpp and mRNA levels (Wright, 1996; Wright and Minnick, 1997; Longacre et al., 1999; Wright et al., 1999). Similar results have been obtained in B. subtilis, in which reversion rates of three different amino acid auxotrophs were higher in strains able to produce the transcriptional activator guanosine tetraphosphate (Rudner et al., 1999). Current studies with malT and lacZ auxotrophs also indicate that those strains able to activate transcription (cya+, crp+) have higher reversion rates than the strains (cya and crp mutants) unable to activate transcription (unpublished data). The presence of these mutations in unpaired bases of predicted SLSs implicates supercoiling and is consistent with the literature.

Background point mutations in unpaired bases occur by known chemical mechanisms having finite, significant activation energies under physiological conditions, and such lesions can subsequently be immortalized by replication. The thermodynamic properties of hydrolytic reactions that occur in nucleic acids under these conditions are such that the deamination of C is much greater than that of A, and the depurination of G or A is much greater than the depyrimidation of C or T (reviewed by Singer and Sterns, 1982). Also, the oxidation potential of a G located 5′ to another G is greater than it is next to a C or T (reviewed by Sugden and Stearns, 2000). Because of its size, an A is more likely than a C or a T to replace a G. The lacI data set (Fig. 6) is sufficiently large to document these statements, i.e. 75% of the G mutations are to A, and 88% of the C mutations are to T. A recent analysis (Wright et al., 2002) of 14 000 mutations in the human p53 gene shows a similar pattern, with values of 74% and 91% respectively.

A significant source of unpaired and mispaired bases subject to the kinds of mutations described above are the loops of SLSs that can often elude detection and repair (Moore et al., 1999). These secondary DNA structures are naturally present because of the frequent occurrence of inverted complements in DNA sequences (Lilley, 1980; Papyotatos and Wells, 1981). It should be pointed out that, in general, specific positions in these SLSs (e.g. a base near a stem) are uniquely vulnerable to mutation, regardless of which base is present at that position. During transcript elongation and the probable continuous variations in the length of the nucleotide segments folded in vivo, the stems, which create the SLSs, are the invariant part. Thus, the most reliable location for an unpaired base that evolved to be mutable is near a stem. In prokaryotes, the most mutable bases are in codons that are within a codon's length of a stem (Figs 3–5 and 8). In the eukaryotic p53 tumour suppressor gene, the most mutable bases are  located  immediately  next  to stems in stable SLSs. The predicted MIs correlate well (R  2 = 0.76) with those observed for 14 000 human cancers, whereas no such correlation (R  2 = 0.0005) is seen for nearby control bases (Wright et al., 2002).

Do measured mutation rates really represent mutation rates in vivo? An appropriate gene to consider in discussing the true frequency of background mutations (whether or not they are silent) is lacI, in which 25% of the bases are known to be mutable (Fig. 6). However, 90% of these mutations occur in the first two codon positions, as changes in the third position are usually silent, i.e. do not alter the encoded amino acid. As mutations are also undoubtedly occurring in the third position, these can be added to the 57 observed mutated bases. Thus, about 40% of the bases in lacI are vulnerable to mutation. This does   not  include  all  the  other  silent  mutations  that do not inactivate this repressor. Thus, the majority of all bases appear to be mutable, but many of these are not detectable.

Although SLSs are similar for mutable bases in the auxotrophs and in lacI, a number of differences have been noted. In lacI, mutable bases occur more frequently at a greater distance from a stem, or in an unlooped, single strand 8–11 bases from a stem. The existence of such a structure may be unlikely, as that length of ssDNA would probably be bonded to its complement as dsDNA. The same mechanisms that give rise to the lacI mutations must of course occur at some frequency in derepressed auxotroph genes.

The analysis of background mutable bases in lacI indicates that as many mutations occur in high as in low –ΔG SLSs. In this system, all mutations have equal probabilities    of   being   detected   and   sequenced.    However,    this is     not  the  case  with  revertants  of  auxotrophs  selected in derepressed genes during starvation. Transcription enhances localized supercoiling and increases the concentration of SLSs that harbour unpaired bases susceptible to mutation. Therefore, these vulnerable bases located in high –ΔG SLSs are selected during starvation. Such bases will mutate the most frequently, enrich the mutant population with the most progeny and have the highest probability of being isolated.

Many kinds of environmental challenge can target specific genes for supercoiling and enhanced mutagenesis. Data summarized in this article indicate that nutritional stress can localize enhanced mutability in SLSs resulting from selective transcription. These circumstances can result in ‘forward’ mutations beneficial to evolution. For example, during starvation for ribitol in the presence of xylitol, two existing genes are modified to regulate the transport and use of xylose as a new carbon source (Lerner et al., 1964; Wright, 2000).

Other stressors, such as extreme osmotic conditions, heat shock, host–pathogen interactions or metal toxicity, could  direct  mutations  to  related genes potentially able to alleviate these problems. (Borowiec and Gralla, 1987; ­Higgins  et al.,  1988;  Pruss  and  Drlica,  1989; Ansari et al., 1992; Dorman, 1995; Jordi et al., 1995; Lefstin and ­Yamamoto, 1998; Massey et al., 1999).

Specific areas of the genome are targeted for localized transcription and supercoiling, increasing the concentration of SLSs and unpaired bases vulnerable to mutation. This mechanism provides a continuous feedback loop between the ever-changing environment, productive targets for localized enhanced mutation frequencies and selection of the fittest. As most mutations are detrimental, directing mutations to those genes regulating the cell's response to each type of stress minimizes genome-wide genetic damage while creating the most appropriate variants for accelerating evolution.

Just as DNA secondary structures may be considered an additional dimension of gene expression (Wells et al., 1980), so may the mutations that occur because of their location in these structures. These structures are present many thousands of times more frequently than expected by chance (Lilley, 1980) and, in viral DNAs, they are over-represented in regions that appear to have regulatory significance (Müller and Fitch, 1982). In higher organisms, these hypermutable sequences have apparently become localized to increase variants in genes of critical importance to the survival of the organism (Forsdyke, 1995). However, the same mechanisms by which stress can cause mutations and accelerate evolution in microorganisms may cause cancer in multicellular organisms (Rady et al., 1992; Ionov et al., 1993; Skandalis et al., 1994; Panagopoulos et al., 1997; Colman et al., 2000), where survival of the fittest mutant can result in abnormal growth.

There is reason to believe that the evolution of the DNA world was paralleled by the development of systems such as supercoiling to help maintain the integrity, geometry and functionality of DNA (López-García, 1999). The essential role of negative supercoiling in the regulation of all aspects of DNA behaviour, especially in response to environmental stress, is becoming increasingly apparent. If specific conditions of stress target particular areas of the genome for negative supercoiling, the formation of secondary structures will be facilitated; if stress first activates transcription, this will in turn drive supercoiling. By either scenario, the resulting increase in concentration of SLSs     will   result   in   localized   mutations,   and   a   number of circumstances can apparently converge to make responses to particular adverse conditions quite specific.

In bases within the same codon, differential effects of other variables affecting mutability should be minimal. These contiguous bases obviously have a very similar microenvironment with respect to supercoiling domains (Sinden and Pettijohn, 1981; Krasilnikov et al., 1999), ssDNA binding proteins, accessory proteins involved in transcription, repair systems and so on. Therefore, comparing their relative mutation frequencies or events is ideal for evaluating the relevance of the factors used in calculating MI, and for testing the validity of our proposed model for predicting mutability. In fact, all our results are consistent with the location of mutable bases in SLSs. Based on this assumption, MIs for mutable bases in selected genes have, with unanticipated success, predicted 15 relative mutation frequencies (trpA) or events (argH, leuB and lacZ). This correlation occurred in spite of the fact that an MI depends upon only two variables: (i) the highest –ΔG of all the SLSs in which it is unpaired; and (ii) the extent to which the base is unpaired in all its foldings during transcription.

Experimental procedures

The analysis of DNA secondary structures

A new software tool called ‘mfg’ was developed for use in this paper for predicting mutation frequencies. Given an input sequence, the set of all subsequences containing that base is generated. From this set, mfg calculates the single-stranded percentage of the base and the energy of the most stable structure containing the base. These are multiplied to produce   the   MI   value   for   that   base.   A   full   description  of this  program  and  instructions  for its use can be found at http:biology.dbs.umt.eduwrightuploadmfg.html. The program currently runs on the Windows platform only, but source code is available for porting to other platforms.

Mutation rate conditions used to obtain revertants for sequencing

Mutation rates for argH and leuB were determined as described previously (Wright and Minnick, 1997). The reversion rate of codon 63 of lacZ was determined as follows: E. coli CSH2 cells from a 12- to 24-h-old rich media plate were diluted in 75 ml of minimal medium, consisting of 50 mM sodium  phosphate buffer, pH 6.5, 1.0 g l−1 (NH4)2SO4, 1.0 g l−1 MgSO4, 0.3 mM glucose and 0.3 mM tryptophan to a cell density of 5 × 10−4 ml−1. This inoculum was divided into 45 2-cm-diameter test tubes at 1.5 ml per tube and grown for 18–28 h with shaking, at a 45° angle at 37°C. Cell growth was followed by hourly sampling companion tubes at a 1:10 dilution in buffered saline. When the cell concentration reached the end of log growth, serial dilutions were made from two companion cultures, and 10−6 and 10−7 dilutions were plated on rich nutrient agar plates and incubated for 24 h at 37°C to determine viable counts. The contents of 40 culture tubes were distributed on selective [5 g l−1 lactose, salts (as above), 0.3 mM tryptophan, 0.1 mM NaCl] agar plates. Plates were incubated for 40–48 h at 37°C and evaluated for revertants (Luria and Delbrück, 1943). Although mutation rates per se are not reported in this paper, the above procedures were used to obtain the revertants sequenced (Tables 2 and 3).

Sequencing of revertants

Revertants were isolated from mutation rate plates containing a single colony at 72 h, restruck on selective plates and grown for 24 h at 37°C. An inoculum was transferred to 2 ml of Luria–Bertani (LB) broth and grown for 8 h with shaking at 37°C. The culture was transferred to a microcentrifuge tube and pelleted at 10 000 g for 10 min. A Qiagen DNeasy tissue kit was used to extract chromosomal DNA. The lacZ DNA was  amplified  by  polymerase  chain  reaction  (PCR)  in a 50 µl volume using 500 ng of template and the primer pair: LacZ5′ (5′-ATGACCATGATTACGGATTC-3′) and LacZ3′ (5′-TTATTTTTGACACCAGACCAAC-3′) used at a concentration of 50 pmol per reaction. Other components of the PCR, all from New England BioLabs, were 10× polymerase buffer, VentR DNA polymerase, MgSO4 and dNTPs. After 30 cycles, 2 µl of the product was visualized on a 1% agarose gel containing 0.5 µg ml−1 of ethidium bromide, and the remainder was cleaned with a Qiaquick PCR purification kit (Qiagen). Sequencing was performed at the Murdock Molecular Biology Facility using a BigDye Terminator ready reaction kit with an Applied Biosystems Model 373A stretch DNA sequencer.


We are indebted to Drs Robert G. Fowler, Bryn Bridges, Roel M. Schaaper and Scott Samuels for valuable criticisms of and suggestions for the manuscript. We thank Dr George Card for the CSH2 E. coli strain. This work was supported by National Institutes of Health grants R15CA88893 and R55CA99242 and the Stella Duncan Memorial Research Institute.