RATES OF MOLECULAR EVOLUTION IN BACTERIA ARE RELATIVELY CONSTANT DESPITE SPORE DORMANCY

Authors

Errata

This article is corrected by:

  1. Errata: Correction for Maughan 2007 Volume 66, Issue 3, 945, Article first published online: 9 February 2012

Present address: Department of Zoology, Life Sciences Center, 2350 Health Sciences Mall, University of British Columbia, Vancouver, BC, Canada V6T 1Z3; E-mail: maughan@zoology.ubc.ca

Abstract

Rates of molecular evolution are known to vary considerably among lineages, partially due to differences in life-history traits such as generation time. The generation-time effect has been well documented in some eukaryotes, but its prevalence in prokaryotes is unknown. “Because many species of Firmicute bacteria spend long periods of time as metabolically dormant spores, which could result in fewer DNA substitutions per unit time, they present an excellent system for testing predictions of the molecular clock hypothesis.” To test whether spore-forming bacteria evolve more slowly than their non-spore-forming relatives, I used phylogenetic methods to determine if there were differences in rates of amino acid substitution between spore-forming and non-spore-forming lineages of Firmicute bacteria. Although rates of evolution do vary among lineages, I find no evidence for an effect of spore-formation on evolutionary rate and, furthermore, evolutionary rates are similar to those calculated for enteric bacteria. These results support the notion that variation in generation time does not affect evolutionary rates in bacterial lineages.

The observation that sequences of homologous proteins diverge at a constant rate, first noted by Zuckerkandl and Pauling (1962, 1965), and subsequently addressed by, most notably, Ohta (1972, 1987, 1993), Kimura (1983, 1987), and Gillespie (1991), caused an eruption of controversy over whether a molecular clock exists. The notion of a global molecular clock in eukaryotes has been largely rejected, due to the influence of generation time and other life-history traits on evolutionary rates (Li 1993; Martin and Palumbi 1993; Mooers and Harvey 1994). Generation-time effects on evolutionary rates are well documented in mammals where it has been repeatedly found that synonymous sites evolve slower in primates than in rodents with short-generation times (Wu and Li 1985; Li et al. 1987; Li 1993; Ohta 1993; Elango et al. 2006). Although the opposing force of selection reduces this effect on rates of substitution at nonsynonymous sites compared to synonymous sites, a subtle generation-time effect is still seen at nonsynonymous sites (Wu and Li 1985; Ohta 1993). Generation-time effects may not be found in all eukaryotes, as in plants where there are conflicting results (Gaut et al. 1996; Whittle and Johnston 2003), whereas generation time does affect evolutionary rates in some viruses (Hanada et al. 2004).

Whether a molecular clock exists in bacteria is not known, partly due to the lack of fossil evidence for the calibration of divergence dates. Using dates from ecological events to calibrate evolution, Ochman and Wilson (1987) found that substitution rates of 16S rDNA in several bacterial lineages are remarkably similar. A later study by Ochman et al. (1999) showed that although the rate of synonymous substitution in proteins varies among bacterial lineages, there is a constant rate of 16S rDNA divergence. In the context of these studies, it remains unclear whether differences in generation time affect the evolutionary rates in bacteria. Variation due to environmental and genetic effects causes generation times to be highly variable both among and within bacterial species. This variation in generation time contributes to the difficulty in determining whether generation-time effects apply to the rate of molecular evolution in bacteria.

Firmicute bacteria are an excellent system for testing whether generation-time affects the rate of evolution in bacteria. Many Firmicute bacteria are able to form spores, which are morphological structures resistant to many environmental assaults (reviewed in Nicholson et al. 2000). Spore formation results in metabolic dormancy, which enables these bacteria to spend long periods of time not replicating DNA and thus not generating mutations due to the replication errors (Sneath 1962; Kennedy et al. 1994; Nicholson et al. 2000). Furthermore, it is thought that spore-forming bacteria most often are found as spores in nature (Priest and Grigorova 1990). Assuming that long periods of dormancy are an important part of the life history of spore-forming bacteria, these observations suggest that spore-forming bacteria may have a lower mutation rate, and thus substitution rate, per unit time due to the generation-time effect. Although the proportion of time spore-forming bacteria spend as spores (vs. vegetative cells) is not known, it is a reasonable assumption that the average generation time of a spore-forming taxon is longer for a non-spore-forming taxon.

I tested the hypothesis that spore-forming bacteria evolve more slowly than closely related non-spore-forming bacteria. The results show that rates of protein evolution do indeed vary, but this variation cannot be attributed to the ability to form spores. The average rate of substitution was calculated for Firmicutes and the variance surrounding this rate includes the rate calculated for enteric (i.e., intestinal) bacterial lineages (Ochman and Wilson 1987). This similarity in amino acid substitution rate between distantly related bacteria and very different life-history strategies suggests that molecular evolution proceeds at a similar rate in these two groups of bacteria. This result has implications for our understanding of the relative contribution of replication-dependent (RD) mutation and replication-independent (RI) mutation to the long-term substitution patterns in bacteria.

Materials and Methods

OBTAINING THE ORTHOLOG SET

The complete genome sequences from 23 Firmicute bacteria and one Cyanobacteria outgroup were downloaded from the NCBI website (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi). The species, whose genomes were downloaded, and the corresponding accession number are shown in Table 1. A set of orthologous loci common to all 24 genomes was identified using reciprocal smallest distance (RSD) (Wall et al. 2003). RSD identifies all pairs of homologous loci between two genomes using Basic Local Alignment Search Tool (BLAST), aligns each homolog pair, and then calculates the distance between each pair. The homolog pair separated by the shortest distance is assumed to be homologous due to a speciation event (orthology) rather than a duplication event (paralogy) (Wall et al. 2003). All 23 ingroup genomes were queried against the outgroup genome and the resulting 23 output files were combined into one file which contained 75 genes (gene list is available for download from the author's website: http://www.zoology.ubc.ca/~redfield/whoHMaughan.html).

Table 1.  Taxa used in this study, their spore-forming ability, and the GenBank accession number for their complete genome sequence (http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi).
TaxonSpore-forming?GenBank accession
Bacillus anthracis Ames YesAE016879
Bacillus cereus ATCC 14579 Yes AE016877
Bacillus thuringiensis serovar konkukian strain 97-27 YesAE017355
Bacillus subtilis 168 Yes AL009126
Bacillus licheniformis DSM 13 YesAE017333
Bacillus halodurans C-125 Yes BA000004
Oceanobacillus iheyensis HTE831 YesBA000028
Listeria innocua CLIP 11262 No AL592022
Listeria monocytogenes EGD-e NoAL591824
Staphylococcus aureus subsp. aureus MW2 No BA000033
Staphylococcus epidermidis ATCC 12228 NoAE015929
Clostridium acetobutylicum ATCC 824 Yes AE001437
Clostridium perfringens 13 YesBA000016
Clostridium tetani E88 Yes AE015927
Thermoanaerobacter tengcongensis MB4 NoAE008691
Enterococcus faecalis V583 No AE016830
Lactobacillus plantarum WCFS1 NoAL935263
Lactobacillus johnsonii NCC533 No AE017198
Lactococcus lactis subsp. lactis IL1403 NoAE005176
Streptococcus pneumoniae R6 No AE007317
Streptococcus pyogenes M1 GAS NoAE004092
Streptococcus agalactiae 2603V/R No AE009948
Streptococcus mutans UA159 NoAE014133
Synechocystis sp. PCC 6803 No BA000022

After separate alignment of each gene in ClustalX (http://www.ebi.ac.uk/clustalw), all gaps were removed, neighbor-joining (NJ) trees were inferred, and 1000 bootstrap pseudo-replicates were performed in PAUP* 4.0b10 (Swofford 2003). The 75 resultant gene trees were compared and nine genes were eliminated because their gene tree exhibited a clear disagreement in phylogenetic structure, most likely due to horizontal transfer. After exclusion of these nine genes, the dataset contained 66 genes whose alignments were concatenated into one dataset, totaling 18,867 amino acid sites. An additional screen was performed to create another dataset where genes evolving in a non-clock-like manner (possibly due to positive selection) were eliminated. The distances between ingroup and outgroup ortholog pairs, which is a part of the RSD output, were used to identify genes that showed an excessive amount of divergence from the outgroup taxon, relative to the other ortholog pairs within and among genomes. This screen resulted in the exclusion of 19 genes, giving a second dataset of 47 genes whose alignments were concatenated into one dataset, totaling 13,918 amino acid sites.

To ensure that the amino acid sites were not saturated with substitutions, pairwise amino acid divergences were plotted against 16S rDNA pairwise divergences. The resultant plot showed that a linear relationship exists between the two divergence estimates (r2= 0.83; data not shown), and suggests that because 16S sites are not saturated, neither are the amino acid sites. The 16S rDNA data were not used for further analyses due to their inability to resolve phylogenetic relationships in the Firmicute bacteria using the methods described below (data not shown).

PHYLOGENETIC INFERENCE

The 47-gene concatenated dataset was used to infer phylogenetic relationships of the taxa using NJ (1000 bootstrap pseudo-replicates) and parsimony (1000 bootstrap pseudo-replicates) in PAUP* 4.0b10 (Swofford 2003), maximum likelihood in ProtML from PHYLIP (Felsenstein 2002), and Bayesian inference in MrBayes 3.0 (Huelsenbeck and Ronquist 2001). All methods produced trees with consistent topology except for the NJ method, which only differed in its placement of the Listeria and Staphylococcus clade. The NJ tree was compared to the tree obtained using both MrBayes and ProtML with phylogenetic analysis by maximum likelihood (PAML) (Yang 1997) and was determined to have a significantly lower likelihood score than the MrBayes tree. The 66-gene dataset was used to infer phylogenetic relationships of the taxa using all of the same methods, except for Bayesian inference. Bayesian inference was not performed with this dataset, because all three other methods produced trees that were exactly comparable to those of the 47-gene dataset, and branch lengths from the ProtML output were almost identical between the two datasets.

The phylogeny used for further analyses and presented in Figure 1 was obtained using the 47-gene dataset in MrBayes (Huelsenbeck and Ronquist 2001) and the details of the runs are as follows. Prior to MrBayes analyses, ProtTest (Abascal et al. 2005) was used to choose the appropriate amino acid substitution model, rtREV (Dimmic et al. 2002), although rtREV and JTT were both used in MrBayes runs and the results did not change. Individual loci were partitioned into separate character sets except for the 18 ribosomal protein loci, which were combined into the same character partition. Four chains were allowed to search parameter space for 1 million generations and trees were sampled every 100 generations after a 10,000-generation burn-in time. Several thousand generations were always sufficient for convergence of Ln-likelihood values from the four chains.

Figure 1.

Phylogenetic tree depicting the evolutionary relationships among the Firmicute bacteria. All clades have a posterior probability of 1.0 unless otherwise noted. Black and gray branches represent non-spore-forming and spore-forming taxa, respectively.

STATISTICAL TESTS FOR DIFFERENCES IN RATES OF EVOLUTION

Likelihood ratio tests to determine whether global and local clock models fit the data significantly better than non-clock models were performed using Ln-likelihood values from nested PAML runs (Yang 1997).

Absolute rates of molecular evolution were calculated by importing MrBayes or maximum-likelihood branch lengths into r8s, a program that relaxes the assumption of a molecular clock when calculating rates of evolution and divergence times by using rate smoothing techniques (Sanderson 1997, 2002, 2003). To avoid the variance associated with dates estimated using fossil data, the age of the root of the tree was calibrated for CAIC analyses (see below) by setting the root arbitrarily equal to 1.0, making the ages of all terminal taxa relative to 1.0. The age of the root was also calibrated at 3.5, 3.0, and 2.5 billion years ago for rate comparisons with other bacteria. These dates encompass many estimates derived from fossil and molecular estimates available for the appearance of Cyanobacteria and the date of divergence between Cyanobacteria and other Eubacteria (Knoll 2003). Penalized likelihood rate smoothing with a smoothing factor of 0.0001 (optimized using the cross-validation commands) was used in r8s (Sanderson 1997, 2002, 2003).

Independent contrast analysis performed in the comparative analysis by independent contrasts package (CAIC; Purvis and Rambaut 1995) was used to test for variables that could explain a significant amount of variation in evolutionary rate. Contrast data from CAIC output were imported into JMP 1.0.1.2 (SAS Institute, Inc., Cary, NC) for linear regression analysis with the regression line being forced through the origin.

Relative rate tests for each gene in the 66-gene dataset were performed in RRTree (Robinson et al. 1998). Taxa were partitioned into group 1, spore-forming, or group 2, non-spore-forming. For each of the 66 genes, a relative rate test was conducted in RRTree to determine the rate of evolution in group 1 and group 2. The topology of the phylogeny was taken into account in the RRTree runs. An additional set of tests was performed for each of the 66 genes where all Clostridium species were excluded from group 1, the spore-forming group. From this output, genes were partitioned into two groups: evolving more quickly in spore-formers or evolving more quickly in non-spore-formers, regardless of whether RRTree indicated that there was a significant difference in the rate of evolution between spore-formers and non-spore-formers. Gene length weights within each group were calculated by summing the amino acid sites within each group and dividing by the total number of amino acid sites in the 66-gene dataset. A sign test was performed to determine if the difference in the number of genes between the two groups was statistically significant.

SPORULATION GENES

The number of genes that are known to function in sporulation (from experimental work done in Bacillus subtilis and C. acetobutylicum) was taken from Onyenwoke et al. (2004). Sporulation genes were counted from tables 2–5 (Onyenwoke et al. 2004) and a Student's t-test and Wilcoxon sum rank test were used to analyze the data in JMP version 1.0.1.2 (SAS Institute, Inc.).

Results

PHYLOGENETIC RECONSTRUCTION AND THE EVOLUTION OF SPORE FORMATION

Free-living Firmicute bacteria with complete genome sequences available at the start of this project were chosen for study (Table 1). Spore-forming species include six species from the Bacillus genus, one species from the Oceanobacillus genus, and three species from the Clostridium genus. Non-spore-forming species include several representatives from the Streptococcus, Listeria, Lactococcus, and Staphylococcus genera in addition to single species from several additional genera (Table 1). Although both Bacillus and Clostridium species are capable of forming spores, these two genera differ in many ways. Although Bacillus and Clostridium species differ in certain aspects of the sporulation process, they do use homologous genes and processes for the development of the spore. Bacillus species are commonly found in soil and water, whereas Clostridium species are found in soil and mammalian intestines. The inability of Clostridium species to grow in the presence of oxygen is commonly used to distinguish between and define species of Bacillus and Clostridium.

A set of 75 genes present in all 24 genomes (Table 1) was identified and nine genes that showed evidence of horizontal transfer were removed resulting in a 66-gene dataset. A second dataset (47-gene dataset) was obtained by excluding 19 genes that showed evidence of accelerated evolution in at least one taxon (see Materials and Methods). Therefore, the 66-gene dataset included genes with accelerated evolution, consequently had more variation in evolutionary rate, and was used to increase the power to find an association between spore formation and rate of evolution. However, the molecular clock hypothesis is only applicable to neutral substitutions, and because the accelerated evolution in those 19 genes could be due to positive selection on a particular protein, the 47-gene dataset was also used. Because these datasets are amino acid sequences the possibility exists that the observed amino acid substitutions were influenced by selection (positive or negative) but because the genes used have highly conserved housekeeping functions it is likely that most of the substitutions that are allowed to occur are nearly functionally neutral.

To test whether the rate of amino acid substitution differs between spore-forming and non-spore-forming bacteria, it was important to determine the phylogenetic relationships of these species. This is because it is important to compare lineages that are phylogenetically independent when testing for the effect of spore formation on rates of evolution, as shared history can confound statistical analyses (Felsenstein 1985; Harvey and Purvis 1991). The phylogeny inferred from amino acid sequences using Bayesian methods in MrBayes (Huelsenbeck and Ronquist 2001) is shown in Figure 1. Although the branches at the coalescent of the Bacillus, Listeria, and Streptococcus clades are very short, the relationships among taxa are fully resolved and all nodes have a posterior probability of 1.0 except for one (as noted in Fig. 1). There are two monophyletic clades of spore-forming bacteria within this sample of otherwise non-spore-forming Firmicute bacteria, the Bacillus clade and the Clostridium clade, shown as gray branches in Figure 1.

The notion that either non-spore-forming Firmicute bacteria have never been spore-formers or have spent the majority of their evolutionary history as non-spore-formers (i.e., lost the ability to sporulate a long time ago) is central to the hypothesis that spore-forming bacteria evolve more slowly than related non-spore-forming bacteria. A parsimonious explanation would argue that two independent origins of sporulation were found within the Firmicute bacteria, one at the base of the Bacillus clade, and another at the base of the Clostridium clade (see Fig. 1). There are other reasons to believe that sporulation arose once and was subsequently lost (a minimum of three times) in the ancestors of non-spore-forming taxa. Sporulation is a very complex trait for bacteria to develop, with at least 200 genes shown experimentally to be involved in the best-characterized spore-former, B. subtilis (Piggot and Losick 2002). The process of sporulation at the molecular level is well understood in B. subtilis and has only been superficially studied in Clostridium species. Because sporulation is similar in Bacillus and Clostridium species, this suggests that either sporulation arose only once, has diverged subsequently in each clade, and was lost in other clades, or that this similarity is a remarkable case of convergent evolution. Furthermore, horizontal transfer of a functional sporulation pathway seems unlikely as the genes required for sporulation are scattered around the chromosome making several horizontal transfer events a requirement before sporulation could proceed as a functional developmental program. Nevertheless, it is critical to the work presented here that the non-spore-forming taxa are truly not able to form spores.

Under the assumption that as sporulation ability is lost the genes involved in sporulation would also be lost, the average number of genes involved in sporulation was compared between the genomes of non-spore-forming and spore-forming bacteria. If the non-spore-forming bacteria do not have many genes for sporulation, then it can be concluded that even if they were once able to form spores they have spent the majority of their evolutionary history as non-spore-forming. Recently, Onyenwoke et al. (2004) tabulated the number of sporulation genes in a diversity of Firmicute bacteria. Many of the species Onyenwoke et al. (2004) used did not have complete genomes available resulting in the number of sporulation genes they examined being lower than the 200 mentioned above. Onyenwoke et al. (2004) found that many distantly related lineages, including proteobacteria and even some Archaea, contain a number of genes that are homologous to sporulation genes, suggesting that the sporulation pathway evolved by co-opting genes that initially had other functions in the cell. Using their data, I found that non-spore-forming Firmicute taxa contain an average number of sporulation genes that is not significantly different from distantly related bacteria and Archaea and is significantly lower than the number found in spore-forming taxa (Fig. 2; Student's t-test, P= 0.01; Wilcoxon, P < 0.0004). Thus, if sporulation arose once and was subsequently lost in the non-spore-forming Firmicute bacteria, it was lost long ago, and these bacteria have spent the majority of their evolutionary history unable to sporulate.

Figure 2.

The mean number of “sporulation genes” (data taken from Onyenwoke et al. 2004) in phenotypic classes related to sporulation. The number of species sampled for each phenotypic class is within each bar (to the side of the bar for the Archaea) and letters above the bar represent significantly different statistical classes. Error bars correspond to the 95% confidence intervals. Spore-formers: Firmicute bacteria that are known to form spores; non-SF-Firmicute: Firmicute bacteria that are not known to form spores; non-SF-other bacteria: species that are not in the Firmicute bacteria and do not form spores; Archaea (these divisions were made by the author, not by Onyenwoke et al. 2004).

EVOLUTIONARY RATE VARIATION

To determine whether evolutionary rates differ between spore-forming and non-spore-forming bacteria, it is critical to establish that significant variation in evolutionary rate exists between the bacterial lineages used for this study. PAML (Yang 1997) allows one to determine the likelihood of the data given a phylogenetic tree and the rate of amino acid substitution for each lineage can be forced to be the same (global clock), allowed to be different (no clock), or a mixture of both (local clock). Various clock models were fit to the data to determine whether there was significant rate variation among lineages. In PAML (Yang 1997) a model enforcing a global clock was significantly worse at explaining the data than a model that allowed all branches to have their own rate of evolution (Table 2). Two separate local clock models were also applied to the data: (1) spore-forming and non-spore-forming taxa were partitioned into different rate categories, and (2) the Streptococcus clade was assigned its own rate of evolution. The Streptococcus clade was chosen because it appears to have longer branches in general (Fig. 1). These local clock models also fit the data more poorly than a model allowing each branch to have its own rate of evolution (Table 2). Therefore, the results from the PAML analyses show that there is variation in evolutionary rate between these Firmicute lineages, as the model allowing all branches to have their own rate of evolution fit the data best (Table 2).

Table 2.  Results from phylogenetic analysis by maximum likelihood (PAML) analyses.
DatasetModel-Ln likelihoodChi-squared* P-value
  1. *Chi-squared values were determined by calculating two times the difference in -Ln-likelihood values between the “no clock” model and the various clock models.

47-geneNo clock326709.565820n/an/a
47-gene Global clock 326914.402210 410 <0.0001
47-geneSpore-formers have own rate328615.2397713811<0.0001
47-gene Streptococcus clade has own rate 327819.367787 2219 <0.0001
66-geneNo clock469224.682979n/an/a
66-gene Global clock 471884.803837 5320 <0.0001
66-geneSpore-formers have own rate471573.3891624697<0.0001
66-gene Streptococcus clade has own rate 470429.202370 2409 <0.0001

Given that there is variation in evolutionary rate, I then determined whether this variation correlates with spore-forming ability. Independent contrast analysis (IC) (CAIC; Purvis and Rambaut 1995) was used because it is important to remove correlations between characters that are due to phylogenetic relationships instead of biologically interesting processes. There were only two phylogenetically independent contrasts available for analysis because of the structure of the phylogeny and the way spore-forming ability maps onto it. One contrast was between the Bacillus and Listeria/Streptococcus clade and the other was between the Clostridium clade and the T. tengcongensis lineage (see Fig. 1). The results from this analysis show that spore-forming ability does not explain a significant portion of variation in rate of evolution (47-gene dataset: r2= 0.08; P= 0.81; 66-gene dataset: r2= 0.04; P= 0.88). However, because of the small number of contrasts available, there would have been little power to detect a difference in rates, if indeed there was one. Therefore, a method with greater statistical power was used to test for the effects of spore formation on evolutionary rate.

Taxa were partitioned into spore-forming and non-spore-forming groups, and relative rates tests, implemented in RRTree (Robinson et al. 1998), were performed on each of the 66 genes (see Materials and Methods). Each gene was then partitioned into one of two groups: faster evolution in spore-formers or faster evolution in non-spore-formers. A sign test showed that for the 66-gene dataset significantly more faster-evolving genes were in spore-forming taxa than in non-spore-forming taxa (Table 3), a result contrary to the prediction.

Table 3.  Results of relative rate and sign tests.
Dataset faster inNumber of genes faster in non- spore-formingNumber of genes for spore- spore-formingLength weight for non- forming1Length weight of sign test spore-forming1 P-value
  1. 1The number of amino acid sites in each category was divided by the total number of amino acid sites analyzed.

  2. 2 Clostridium taxa excluded.

47-gene29180.630.370.0433
66-gene 40 26 0.63 0.37 0.0167
47-gene216310.260.743.11 × 10−4
66-gene2 24 42 0.29 0.71 1.91 × 10−4

To determine whether the Bacillus and/or Clostridium spore-formers were driving the result of more genes evolving faster in spore-formers, I repeated the relative rates tests after excluding taxa from the Clostridium clade. The exclusion of the Clostridium taxa essentially changed the results as a sign test showed that more genes were evolving faster in non-spore-forming taxa than in spore-forming taxa (i.e., the Bacillus clade) (Table 3), as predicted. Thus, because the spore-forming Clostridium taxa are responsible for the above observation that spore-forming taxa evolve faster, the variation in rates of evolution cannot be attributed to the formation of spores but must be due to something else.

OTHER VARIABLES POTENTIALLY INFLUENCING AMINO ACID SUBSTITUTION RATES

The effect of generation time on evolutionary rate was originally noted to be most evident at synonymous sites of DNA (Wu and Li 1985; Ohta 1993). Synonymous sites were saturated in this dataset and thus, amino acid sequences were used instead of nucleotide sequences for the phylogenetic analyses. Because both purifying and positive selection act on amino acid substitutions, differences in effective population size among the taxa in this dataset could confound any variation in evolutionary rate that is due to the generation-time effect. This is because selection is more effective in populations of large effective size and consequently, fewer deleterious mutations and more advantageous mutations are expected to have been fixed. Data on the effective population sizes of the taxa used in this study are not available and thus, under the assumption that bacterial populations of large effective size have a higher degree of selected codon usage bias than a population with a smaller effective size, selected codon usage bias (Sharp et al. 2005) was used as a proxy for effective population size. Selected codon usage bias cannot explain a significant amount of variation in evolutionary rate as determined using IC analysis (r2= 0.00; P= 0.7544), suggesting that differences in effective population size are not the cause of variation in evolutionary rates.

Other variables that could potentially explain variation in evolutionary rate were also examined using IC analysis and none of them explained a significant portion of rate variation (similar results were obtained for both the 66- and 47-gene datasets). These variables included differences in GC content at the third position of codons, DNA repair enzyme content, and genome size, which could affect patterns of mutation, rates of mutation, and rates of amino acid substitution, respectively (IC, pGC3= 0.7447; pDNArepair= 0.3366; psize= 0.9558). It has been shown that the amino acid composition of proteins can influence their evolutionary rates, as some amino acids are more mutable, whereas others are more stable (Graur 1985, but see Tourasse and Li 2000). Again using IC analysis, amino acid frequency did not explain a significant amount of variation in evolution rate (P-values for the effect of each of the 20 amino acids on rates of evolution ranged from 0.09 to 0.98).

CALIBRATING RATES OF EVOLUTION

Bacteria occupy almost every conceivable terrestrial niche and their life-history strategies are highly variable. It has been shown that amino acid substitution occurs more quickly in lineages of the intracellular symbionts of some insects (Clark et al. 1999; Wernegreen and Moran 1999), but overall it is unclear how ecological and biological differences affect the rate of amino acid substitution in free-living bacteria. I have shown that there is variation in amino acid substitution rate within the Firmicute bacteria but when examining Figure 1, it is clear that substitution rates are not extremely variable (i.e., there are no extraordinarily long branches). Therefore, to test for the existence of a rough molecular clock, I used rate-smoothing techniques, implemented in r8s (Sanderson 1997, 2002, 2003), to calculate an average amino acid substitution rate for the Firmicute bacteria. Determining a rate of evolution per unit time requires dating one or more divergence times in the phylogeny. Three dates, which encompass the estimated divergence times for the major groups of bacteria from fossil data (reviewed in Knoll 2003), were applied to the root of the tree: 3.5, 3.0, and 2.5 billion years ago. For each of the three dates, a rate of amino acid substitution was calculated and the resultant rates were compared to the amino acid substitution rate determined for enteric bacteria (Ochman and Wilson 1987). The difference between the rate of amino acid substitution in Firmicute and enteric bacteria was at most less than 2-fold (Table 4). When considering the amount of variation surrounding the estimated rate of amino acid substitution, the rate calculated for Firmicutes is remarkably similar to the rate reported for enteric bacteria suggesting that rates of amino acid substitution are not highly variable between very distantly related bacterial groups (Table 4).

Table 4.  Average rate of amino acid substitution (substitutions per site per 50 million years).
DatasetAmino acid (95% CI)
  1. 1 Billion years ago.

  2. 2 Nonsynonymous substitution data averaged from table 4 (Ochman and Wilson 1987).

This article: Root = 3.5 bya10.96 (0.49–1.4)
This article: Root = 3.0 bya1 1.12 (0.57–1.67)
This article: Root = 2.5 bya11.35 (0.69–2.0)
Ochman and Wilson (1987) 0.722

Discussion

EVOLUTIONARY RATES AND THE GENERATION-TIME EFFECT IN BACTERIA

The results from this work show that although the rate of amino acid substitution in bacteria does vary, this variation is not associated with spore formation, and furthermore, that the average rate of amino acid substitution is similar between the Firmicute and enteric bacteria. From these results, I conclude that the long periods of dormancy associated with spore formation do not influence rates of molecular evolution. This could be due to two nonmutually exclusive possibilities; there is an undetectable generation-time effect or there is a considerable RI component to the substitution process in bacteria. Each of these possibilities will be discussed in turn.

It is possible that there is a generation-time effect in bacteria and that it was undetectable. This could be because spore-forming bacteria do not spend as much time dormant as spores as it is thought or it could also be due to large variations in generation time throughout the evolutionary history of all bacterial lineages. The proportion of time bacteria spend as spore-formers is not well known and it is thought that spore-forming bacteria exist in nature most often as spores but concrete data are lacking (Slepecky and Leadbetter 1984; Priest and Grigorova 1990).

Variation in generation time in all bacterial lineages could also explain why generation-time effects might have been undetectable. In multicellular organisms, generation time is usually a fixed phenotype that does not vary within species. In bacteria, generation times within species are far from fixed and vary depending on the degree to which any particular niche promotes growth and reproduction. It is well known that the generation time of any species of bacteria can vary greatly in the laboratory, depending on the nutrients available in the growth medium. This variation in generation time is surely increased in natural environments where nutrients are likely to be more sporadic in time and space compared to the laboratory.

In addition to variation in generation time, variation in effective population size and strength of selection is surely to exist between the studied species and within species throughout time. Given the similarity in amino acid substitution rates between the Firmicute and enteric bacteria (Table 4), variation in effective population size and strength of selection may not be large enough to see an effect or may cancel out generation-time effects if they are indeed present.

It is also possible that a large proportion of mutations that contribute to substitution occur when cells are not growing (i.e., replicating DNA). The presence of a generation-time effect in mammals is consistent with the majority of mutations in mammals occurring during DNA replication. The results presented here suggest that many of the mutations that are substituted during divergence in bacterial lineages are not associated with DNA replication but with other mutagenic processes that occur when cells are not growing. This is not unexpected as RI mutations, caused by radiation, chemicals, and other environmental assaults, are known to occur in all organisms.

Although the evidence in eukaryotes suggests that RD mutation dominates the proportion of mutations that contribute to substitution on long evolutionary time scales, it makes sense that this may not be the case in bacteria. In mammals, where generation-time effects seem to be strongest, germ cells are sequestered from most environmental assaults while replicating and reproducing to form gametes. In bacteria however, the germ cells are the organism and cells cycle between periods of growth during feast and periods of starvation and stress during famine. During these stressful times, mutation is surely occurring and contributing to the long-term evolutionary substitution pattern.

Although the work presented here cannot determine whether spore-forming bacteria spend the majority of their time as spores or whether RI mutations contribute significantly to substitution patterns, I calculated what proportion of time spore-formers must spend dormant and what proportion of mutations must be RI to explain the data presented here. Assume that RD and RI mutation contributes 0.8 and 0.2 mutation units per day, respectively (these units are arbitrary). If spore-forming bacteria are spores, the 0.8 mutation units of RD mutation do not occur. If we calculate how many mutations occur over 10 days in spore-forming versus non-spore-forming bacteria where spore-forming bacteria are spores for 90% of the time (i.e., 9 days), we find that 10 mutations have occurred in the non-spore-forming bacteria, whereas only 2.8 mutations have occurred in the spore-forming bacteria (non-spore-forming = 10 days × [0.8 RD mutations/day + 0.2 RI mutations per day] = 10 mutations; spore-forming = 1 day × [0.8 RD mutations/day + 0.2 RI mutations per day] + 9 days × 0.2 RI mutations per day = 2.8 mutations). Varying the amount of time spore-forming bacteria spend as spores and the relative contributions of RD and RI mutation, approximately equal rates of substitution in spore-forming and non-spore-forming lineages would occur if RD and RI mutation contributed equally to substitution and if spore-forming bacteria spend 10–30% of their time dormant as spores. Increasing (and decreasing) the contribution of RD mutation decreases (increases) the amount of time bacteria spend as spores. Which of these scenarios is actually occurring in nature remains to be tested.

Regardless of variation in generation time, the contribution of replication-dependent and RI mutation, and the amount of time spore-forming bacteria spend dormant, amino acid substitution rates are remarkably similar between Firmicute and enteric lineages. Therefore, these results support the notion of a rough molecular clock for amino acid substitution in bacteria.

Associate Editor: K. Crandall

ACKNOWLEDGMENTS

The author wishes to thank B. Birky, M. Worobey, M. Herron, W. Nicholson, M. Nachman, N. Moran, H. Ochman, R. Redfield, the Ochman/Moran Laboratory, the Nachman Laboratory, and the Redfield Laboratory for conceptual discussions concerning this work; M. Herron, M. Worobey, T. Gilbert, J. Good, R. Redfield, and A. Cameron for reading a previous version of this article; D. Wall and M. Sanderson for technical assistance with RSD and r8s, respectively; and S. Miller for assisting with bioinformatics methods. The author also wishes to thank two anonymous reviewers for suggesting improvements to the statistical analyses and other general comments. Financial support was provided by a National Science Foundation-Integrative Graduate Education and Research Traineeship (NSF-IGERT) fellowship in Genomics at the University of Arizona and National Institutes of Health (NIH).

Ancillary