Life-history traits vary substantially across species, and have been demonstrated to affect substitution rates. We compute genome-wide, branch-specific estimates of male mutation bias (the ratio of male-to-female mutation rates) across 32 mammalian genomes and study how these vary with life-history traits (generation time, metabolic rate, and sperm competition). We also investigate the influence of life-history traits on substitution rates at unconstrained sites across a wide phylogenetic range. We observe that increased generation time is the strongest predictor of variation in both substitution rates (for which it is a negative predictor) and male mutation bias (for which it is a positive predictor). Although less significant, we also observe that estimates of metabolic rate, reflecting replication-independent DNA damage and repair mechanisms, correlate negatively with autosomal substitution rates, and positively with male mutation bias. Finally, in contrast to expectations, we find no significant correlation between sperm competition and either autosomal substitution rates or male mutation bias. Our results support the important but frequently opposite effects of some, but not all, life-history traits on substitution rates.
Point mutation rates, approximated by substitution rates at unconstrained sites, vary both across species (Lanfear et al. 2010) and between males and females in the same species (Ellegren 2007). Differences in life-history traits across species have long been implicated in this variation. Focusing on interspecific variation in substitution rates at unconstrained sites, first, species with longer generation times are expected to have lower substitution rates, because, per unit of time, they may have fewer germline cell divisions, entailing fewer rounds of DNA replication, and fewer opportunities for DNA errors to occur (Laird et al. 1969; Smith and Donoghue 2008; Welch et al. 2008). Second, both increasing metabolic rate and increasing mass, a known correlate of metabolic rate (Kolokotrones et al. 2010), have been found to negatively correlate with substitution rate (Martin and Palumbi 1993; Bromham 2002; Welch et al. 2008). However, there has been some debate about whether it is metabolic rate alone (specifically related to changes in the concentration of reactive oxygen species, ROS [Dowling and Simmons 2009]), mass alone, or a combination of the two that are driving the observed correlation with substitution rate (Froehle and Schoeninger 2006; Lanfear et al. 2007; Lanfear et al. 2010). Recent results indicate that variation in oxidative damage due to differences in metabolic rate may not, in fact, directly impact DNA substitution rates in mammalian genomes, but instead that mechanisms that repair oxidative DNA damage contribute to substitution rate differences between mammals (Hwang and Green 2004). Further, the fact that species with long generation time are also usually large has made it challenging to disentangle the effects of metabolic rate versus generation time on substitution rate (Herreid 1964; Martin and Palumbi 1993; Mooers and Harvey 1994; Speakman 2005). Third, theory suggests that increased sperm competition may occur at the expense of a higher mutation rate, leading to a higher substitution rate in species with stronger sperm competition (Blumenstiel 2007), but a study of synonymous substitution rates in six primate lineages did not have the power to detect any clear trends (Wong 2010). This progress notwithstanding, interspecies variation in substitution rates has not been studied for a large number of mammals on a genome-wide scale.
A higher mutation rate in males than in females, frequently observed in mammals, is called male mutation bias (Li et al. 2002; Ellegren 2007). According to the male mutation bias hypothesis, the mutation rate is higher in males because of a greater number of rounds of replication experienced by male gametes, sperm, than by female gametes, eggs (Miyata et al. 1987). This line of reasoning assumes that most mutations result from errors during DNA replication. In species with male heterogamety, such as mammals, the substitution rate at unconstrained sites (used to estimate the point mutation rate) is expected to correlate with the amount of time each type of chromosome spends in the male versus female germline; it should be highest on the Y, intermediate on the autosomes (A), and lowest on the X. This is because the Y spends all of its time in the male germline, accumulating the greatest number of substitutions, the X spends only one-third of its time in the male germline, and the autosomes spend equal amounts of time in male and female germlines. If male mutation bias is important in generating interchromosomal variation in substitution rate at unconstrained sites within the genome, then the male-to-female ratio of mutation rates (α), approximated from comparisons of X/A, X/Y or Y/A substitution rates, is expected to reflect the ratio of the number of germline cell divisions between the male and female germlines (Miyata et al. 1987). In agreement with this expectation, α values in primates and rodents are similar to c, the ratio of male and female germline cell divisions in those taxa (∼6 and ∼2, respectively [Gibbs et al. 2004; Mikkelsen et al. 2005; Taylor et al. 2006]).
Although many studies have observed varying degrees of male mutation bias in mammals (Li et al. 2002; Taylor et al. 2006; Elango et al. 2008), birds (Garcia-Moreno and Mindell 2000; Axelsson et al. 2004), fish (Ellegren and Fridolfsson 2003), and flies (Bachtrog 2008), the forces influencing this variation are yet to be thoroughly investigated. Several life-history traits likely influence the magnitude of male mutation bias across species; here we focus on mammals. First, weaker male mutation bias might be expected in species with shorter generation times, because the sperm of short-lived species undergo fewer cell divisions (and thus rounds of replication) before conception than the sperm of long-lived species, assuming that the eggs of all mammals undergo a similar number of cell divisions (Li et al. 2002). In agreement with this, primates have the longest generation time, and have the highest α estimates (Taylor et al. 2006), whereas rodents, with a very short generation time, have very low α estimates (Makova et al. 2004), and perissodactyls, whose generation times are in between rodents and primates, have intermediate values of α (Goetting-Minesky and Makova 2006). Second, the association of increased sperm competition with the magnitude of male mutation bias in mammals has not, to our knowledge, been investigated, except in the great apes (Presgraves and Yi 2009). Experiments indicate that sperm competition is associated with larger testes and a higher proportion of sperm-producing tissues, as observed in birds, as well as a shorter duration of spermatogenesis (Parapanov et al. 2007; Ramm and Stockley 2010). Together these observations imply that species with more intense sperm competition (estimated by the ratio of testes mass to body mass) produce larger quantities of sperm at an elevated rate, leading to an increased number of cell divisions in spermatogenesis and thus to higher male mutation bias. Third, although generation time and sperm competition may directly affect differences in the number of germline cell divisions before reproduction between males and females, changes in metabolic rate are expected to act on the DNA regardless of the number of divisions. Sperm exist in a much more ROS-rich environment than eggs (they are motile whereas eggs are not), and also have a less-dense cell membrane (Velando et al. 2008). Thus, it is expected that increases in metabolic rate may adversely affect DNA in sperm more than in eggs, resulting in a greater number of substitutions in the male germline than in the female germline, that is, higher male mutation bias.
The availability of 32 eutherian mammal sequences in a 44-way genome alignment (Miller et al. 2007) makes it possible to consider genome-wide variation in substitution rates and in male mutation bias across a variety of mammalian taxa with diverse life-history traits. Although previous studies have investigated the relationships between life-history traits and molecular evolution (Martin and Palumbi 1993; Mooers and Harvey 1994; Bromham 2002; Hwang and Green 2004; Lanfear et al. 2007; Nabholz et al. 2008; Welch et al. 2008), and a few studies have investigated the factors that correlate with male mutation bias across species (Bartosch-Härlid et al. 2003; Goetting-Minesky and Makova 2006; Presgraves and Yi 2009), to date, none have had the ability to complete such a genome-wide assessment with a number of species sufficient for statistical analysis. Thus, studies inferring trends using only a few species may identify putative correlations that, when studied in a larger context, disappear. One important factor that must be kept in mind when looking at correlations between life-history traits and molecular evolution is that the data points (both species sequences and life-history traits) are related to one another by common descent. In traditional statistical analyses, the datapoints are independent of one another, whereas species with similar life-history traits share similar evolutionary trajectories, and this phylogenetic dependence can skew results by artificially increasing the strength of regressions (Pagel 1999).
Here, we ask how life-history traits are associated with variation in substitution rates and, separately, with the magnitude of male mutation bias among 32 mammalian species. To address these questions, we use a genome-wide dataset, which is largely immune to regional, locus-specific effects (because they are averaged out), to compute substitution rates across the autosomes and the X chromosome. Autosomal substitution rates and α are each used, in conjunction with the set of predictors of three life-history traits (generation time, metabolic rate, and sperm competition), to investigate all possible models in a multiple regression framework. Then, correcting for phylogenetic dependence, we investigate trends and significant correlations, first, between life-history traits and autosomal substitution rates, and then between life-history traits and α, across mammalian species.
COLLECTING LIFE-HISTORY TRAITS
We analyzed sequences from all 34 mammalian genomes (32 eutherian, one marsupial, and one monotreme) sequenced at the time of this research: human (Homo sapiens), chimpanzee (Pan troglodytes), gorilla (Gorilla gorilla gorilla), orangutan (Pongo pygmaeus abelii), rhesus (Macaca mulatta), marmoset (Callithrix jacchus), tarsier (Tarsier syrichta), bushbaby (Otolemur garnettii), mouse lemur (Microcebus murinus), tree shrew (Tupaia belangeri), mouse (Mus musculus), rat (Rattus norvegicus), kangaroo rat (Dipodomys ordii), guinea pig (Cavia porcellus), squirrel (Spermophilus tridecemlineatus), rabbit (Oryctolagus cuniculus), pika (Ochotona princeps), shrew (Sorex araneus), hedgehog (Erinaceus europaeus), cat (Felis catus), dog (Canis lupus familiaris), horse (Equus caballus), megabat (Pteropus vampyrus), microbat (Myotis lucifugus), dolphin (Tursiops truncatus), cow (Bos taurus), alpaca (Vicugna pacos), armadillo (Dasypus novemcinctus), sloth (Choloepus hoffmanni), tenrec (Echinops telfairi), elephant (Loxodonta africana), rock hyrax (Procavia capensis), opossum (Monodelphis domestica), and platypus (Ornithorhynchus anatinus). For each of the 32 eutherian mammals, we collected data on eight biological features that are thought to be related to three life-history traits—metabolic rate, generation time, and sperm competition (Table 1)—and can thus be used as proxies for them.
Table 1. Biological features approximating three life-history traits in 32 eutherian mammals. Abbreviations for each predictor are included in parentheses.
Species with available information
*missing G. gorilla (gorilla)
**missing T. syricta (tarsier), P. vampyrus (megabat) and C. hoffmaini (sloth)
- Age at sexual maturity (SexMat)
- Life span
- Interlitter interval (Inter)
- Basal metabolic rate (BMR)
- Body temperature (Temp)
- Testes-to-body mass ratio (TBR)
- Mating pattern (mating)
Each of the eight features was initially included because information about it was available for more than three quarters of the species studied. Finally, we were limited to using only 28 of 32 eutherian mammals because four species had missing data on some of the traits and our analysis did not allow for missing traits (testes-to-body mass ratio, TBR, is missing in T. syricta, P. vampyrus and C. hoffmaini, and basal metabolic rate, BMR, is missing in G. gorilla).
Species-specific data were initially collected from three databases; the Animal Diversity Web (http://animaldiversity.ummz.umich.edu), AnAge (de Magalhaes et al. 2009), and PanTHERIA (Jones et al. 2009). All of the references from the databases were confirmed with the original papers or updated with new results and are cited in the Table S1. Most estimates for age at sexual maturity and litter size come from Asdell's Patterns of Mammalian Reproduction (Hayssen et al. 1993). All of the estimates for body mass came from a database, for which the authors have compiled hundreds of references to arrive at a sex-averaged estimate of body mass for each mammal (Smith et al. 2003).
PRINCIPAL COMPONENT ANALYSIS (PCA)
For a set of variables, PCA extracts the linear components that together explain the maximum amount of variance in the given space. For the life-history traits, the PCA space consisted of all eight predictors (eight-dimensional space). The PCA analysis was conducted using the princomp function in R (Team 2005), using the correlation matrix, as to be unaffected by the different units measuring each predictor.
Mammalian multiple alignment files (MAFs) were downloaded from the 44-way vertebrate alignment in Galaxy (Blankenberg et al. 2007) corresponding to each of 22 human autosomes and the X chromosome. Y chromosome sequence was not considered in this analysis because it is not yet available for most of the mammals in the 44-way alignment (aside from human and chimpanzee, all other mammals sequenced for the 44-way alignment were females).
Several partitions of the X chromosome (including various strata, and both the X-added and X-conserved regions) were tested, with remarkable agreement of the regression results (Table S9). Thus, we present the results based on the largest amount of X-linked sequence (the whole X, excluding all pseudoautosomal regions defined in human (Ross et al. 2005), horse (Raudsepp and Chowdhary 2008), cow (Van Laere et al. 2008), and dog (Wong et al. 2010), and excluding the human-specific X-transposed region (Page et al. 1984)).
To ensure the high quality of data for analysis, particularly because many of the sequences had low coverage (2X), we filtered out alignment columns in which at least one species had a base with PHRED quality score below 40 (equal to 99.99% base-calling accuracy (de Jong et al. 2001)). The quality data were retrieved from the UCSC Genome Browser (Fujita et al. 2011). To minimize the effects of selection, we limited our analysis to noncoding sites, excluding coding exons as well as 2 kb upstream and downstream of genes. Further, both interspersed repetitive elements and simple repeats, as identified by RepeatMasker (Smit et al. 1996–2004) are difficult to confidently align, and thus were also excluded. All of the site classes were assigned based on the human (hg18) sequence, downloaded from the UCSC Genome Browser (Fujita et al. 2011), and filtered using tools available in Galaxy (Blankenberg et al. 2007).
SUBSTITUTION RATES AND CONTEXT DEPENDENCIES
We used the mammalian-specific portion of the published phylogenetic tree (Murphy et al. 2007) to define the topology of the mammalian species studied here. AMBIORE (Hwang and Green 2004), a software program implementing a flexible Bayesian MCMC model, was used to compute the substitution rate in each alignment interval. The masked alignments were grouped into sets that summed to less than or equal to 1 Mb of total sequence, to run efficiently in AMBIORE. For the overall substitution rates, each interval took up to 24 h to run in AMBIORE. Utilizing parallel processing on an IBMx3450 1U Rackmount Server with 64 shared compute nodes, dual 3.0 GHz Intel Xeon E5472 (Woodcrest) quad-core processors and 64 GB of EEC RAM, the analysis in AMBIORE for the 34-way mammalian genome alignments (excluding the Y) took approximately one week to run.
The substitution rates were averaged over all intervals of all autosomes weighted for the length of each interval, and separately over the intervals on the X chromosome in a similar fashion. From X- and autosome-specific substitution rates, branch-specific estimates of male mutation bias (α) were calculated using the X/A divergence ratio according to a Miyata's formula (Miyata et al. 1987): α= (3(X/A) − 4)/(2 − 3(X/A)). The 95% confidence intervals (CI) for autosomes, X and α, were computed using the bootstrap method. Namely, using the intervals for X and autosomes described above, we randomly selected intervals 1000 times with replacement, and computed the weight-averaged substitution rates from each pseudosample.
The substitution rates and confidence intervals for the analysis of high coverage (>6X) genomes only were conducted using HyPhy (Pond et al. 2005) as implemented in Galaxy (Blankenberg et al. 2007). Ultra-conserved elements included highly conserved elements produced by phyloHMM (from the “most conserved” track in the UCSC Genome Browser (Fujita et al. 2011)), elements with high ESPERR regulatory potential, RP, (from the “7X Reg Potential” track in the UCSC Genome Browser; [Fujita et al. 2011], with a cutoff or RP ≥ 0.05), predicted enhancers (downloaded from the VISTA enhancers browser, http://enhancer.lbl.gov), both computationally predicted and experimentally assessed CTCF binding sites (downloaded from the Ren lab at the Ludwig Institute, http://licr-renlab.ucsd.edu/download.html), and experimentally assessed estrogen receptor and RNA polymerase II binding sites (downloaded from the Brown lab at Harvard University, http://research.dfci.harvard.edu/brownlab/datasets/).
Diversity within a population at the time of speciation (ancestral polymorphism) may affect divergence, and thus, α estimates (Li et al. 2002; Makova and Li 2002), ergo for species that are recently diverged, defined as less than 10 million years (human, chimpanzee, and gorilla, [Stauffer et al. 2001]), we considered a correction for ancestral polymorphism as described in Makova and Li (2002). For the pairwise analysis, we followed the previously described method (Makova and Li 2002), but for branch-specific estimates, to avoid over-correcting, we applied the correction to the human branch using only one-half π (Table S5), similar to another published method (Presgraves and Yi 2009).
PHYLOGENETIC DEPENDENCE AND REGRESSION ANALYSIS
To account for phylogenetic dependence in our analysis, we used the phylogenetic generalized least squares method (PGLM), developed by Mark Pagel (Pagel 1999) and implemented in the BayesTraits software package available from his webpage (http://www.evolution.reading.ac.uk/BayesTraits.html). All except for one biological feature we analyzed are continuous, and the one discrete trait (mating pattern) can be encoded as a binary variable (polyandrous vs. nonpolyandrous). PGLM uses a phylogeny to construct a variance–covariance matrix that is then used to weigh the datapoints according to the shared ancestry implied by the phylogeny. This method assumes that the features under study evolve according to a Brownian motion model of evolution. Thus, we applied PGLM, as implemented in the Continuous module of BayesTraits, to each of the eight biological features under investigation as well as to both the autosomal substitution rates and the male mutation bias estimates, and, as a result, obtained values “corrected” by their phylogenetic associations. These corrected values were then used as predictors in regression models explaining variation in male mutation bias.
We analyzed multiple regression models either with α as the response, to investigate male mutation bias, or with the weighted autosomal substitution rate as the response, to investigate genome substitution rates across species. For each response, we computed regressions using all possible combinations of the eight biological features as predictors, all “corrected” for phylogenetic dependence using PGLM implemented in BayesTraits (Pagel 1999). The data for human were excluded from regressions using α as the response, because the human α estimate is an outlier among all α values (Fig. 1). Log-likelihood tests were performed for each model to test whether the reduced model, using a default of λ= 1 (indicating complete phylogenetic dependence) was a better fit than the alternative model, where λ was estimated (indicating incomplete phylogenetic dependence). In all cases, the best models indicated complete phylogenetic dependence. To determine which models should be investigated further, t-values were determined for every predictor in all models, and models were retained if at least one predictor had a significant two-tailed P value (0.05). Finally, to determine the best models, considering both the number of predictors and the significance, the AICc was computed for each model, and models were ranked based on increasing AICc.
Results and Discussion
GENOMIC SEQUENCE DATA
The 32 eutherian mammals (Fig. 1 and Table S1) were chosen because genomic sequences are available for them all, and because these were included in the 44-way genome alignment available at the UCSC Genome Browser (Fujita et al. 2011). The opossum (a marsupial) and platypus (a monotreme) were also included in the alignment as outgroups for computing substitution rates, but were excluded from the regression analysis because of limited (marsupial) or absent (monotreme) homology between their sex chromosomes and those of eutherians ([Veyrunes et al. 2008]; see Methods). After filtering, a total of 403.8 Mb of alignment columns across autosomes and 16.5 Mb on the X chromosome were used for further analysis (see Methods; the number of bases shared across successively divergent species is reported in Table S2). Variation in substitution rates across species was studied using autosomal substitution rates, weighted by alignment length (see Methods). To study male mutation bias, α was computed using the substitution rate on the X chromosome and the weight-averaged autosomal substitution rate (Fig. 1; see Methods).
VARIATION IN SUBSTITUTION RATES
As expected, autosomal substitution rates (represented as the branch lengths in Fig. 1; see Table S3 for 95% CIs) vary substantially across the genome sequences of mammals included in this study. The combination of the low sequence coverage in some species and our stringent filtering requirements (see Methods), may have resulted in the retained sequence being highly conserved. However, this affects all branches of the phylogenetic tree, so, although the absolute value of the substitution rates may be decreased, the variation observed among estimates across the tree should remain the same, and should not affect the conclusions of this research (to evaluate this concern more specifically, see also our analysis excluding ultraconserved elements presented below). Confirming this expectation, the observed variation in substitution rates is consistent with previous studies that observed a relative increase along the rodent lineage and decrease along the primate lineage (e.g., Gibbs et al. 2004).
ESTIMATES OF MALE MUTATION BIAS
We present genome-wide, branch-specific estimates of mammalian α values (represented above branches in Fig. 1; see Table S4 for 95% CIs), which show considerable variation across species, in contrast to suggestions that male mutation bias is relatively constant (Patterson et al. 2006). Although different methods were used, we find a remarkable agreement between previous pairwise estimates of α and the branch-specific values presented here. Similarly to previous studies, we observe high α estimates in primates, low estimates in rodents, and intermediate estimates in perissodactyls. Our branch-specific estimate of α in horse, 3.53 (95% CI = 2.82–4.48), is very similar to that computed previously for perissodactyls (3.88 [95% CI = 2.90–6.07]; Goetting-Minesky and Makova 2006), and is based on the analysis of whole-genome sequences. We also find that α values along the external mouse and rat branches (2.19 [95% CI = 1.23–6.27] and 1.92 [95% CI = 1.39–2.91], respectively; Fig. 1) are similar to the previous mouse-rat pairwise estimate of 1.91 (95% CI = 1.65–2.27) (Makova et al. 2004). In fact, nearly all of estimates of α along the rodent lineage hover around 2 (Table S1). Curiously, α estimates for the dog, 2.04 (95% CI = 1.64–2.60), in the present study are slightly lower than the branch-specific value previously reported of 2.8 (Lindblad-Toh et al. 2005), but as no confidence intervals were provided in the latter manuscript, it is possible that the ranges overlap. Finally, the rhesus-specific estimate of α (2.88 [95% CI = 2.31–3.68]) is nearly identical to what was observed for the rhesus-human comparison (2.87 [95% CI = 2.37–3.81]; Rhesus Macaque Genome Sequencing and Analysis 2007).
Our results also provide novel data on the magnitude of male mutation bias in eutherian species for which this phenomenon has not been investigated previously (Fig. 1). For instance, male mutation bias observed for the only marine mammal in our study, dolphin, is relatively high, 3.88 (95% CI = 2.94–5.47). Therefore, if the difference is due to a longer generation time, as we discuss below, male mutation bias may be high in long-lived whale species. On the other end of the spectrum, the branch-specific estimates of α in the hedgehog and shrew are close to one (0.97 (95% CI = 0.62–1.41) and 1.06 (95%CI = 0.88–1.26), respectively), indicating that there may be few differences between the number of germline cell divisions between males and females before reproduction in these species. Similar to dolphin, estimates of α across primates excluding the great apes—ranging from 2.38 (95% CI = 1.91–3.01) in the wide-eyed bushbaby to 3.02 (95% CI = 2.28–4.21) in the tarsier—are larger than most α values reported in the tree. These values are also increased for the great apes (discussed below) and together provide strong evidence against the supposition that there is no increase in α in the primate lineage (Patterson et al. 2006).
The magnitude of α values in the great apes have been of great interest to the scientific community (Makova and Li 2002; Patterson et al. 2006; Taylor et al. 2006; Wakeley 2008; Presgraves and Yi 2009). The branch-specific estimates we provide here are based off of nearly an order of magnitude more sequence (for both X and autosomes) than previously studied (Presgraves and Yi 2009) and, unlike the previous dataset (Patterson et al. 2006), we excluded low quality and coding sites—both of which might affect estimates of the substitution rate. Still, the confidence intervals of α for both gorilla and chimpanzee (2.53 [95%CI = 1.44–4.54] and 3.81 [95% CI = 2.54–5.28], respectively) overlap with the corresponding previously reported values (see Table S5 for comparison). After applying a correction for ancestral polymorphism (by one-half π, see Methods), in addition to overlapping confidence intervals, the trend for the great apes is the same between our results and previous estimates with the smallest α in gorilla, 2.0 (95% CI = 1.04–4.00), intermediate α in chimpanzee, 2.8 (95% CI = 1.91–4.36), and the highest α estimates in human (see Table S5 for comparison with Presgraves and Yi 2009). The difference in magnitudes of α measured here and previously may be attributed to differences in the quantity and quality of the sequences analyzed. Strikingly, both the uncorrected and corrected human α estimates, 20.09 (95% CI = 8.34–∞) (Fig. 1) and 22.7 (95% CI = 7.58–∞) (Table S5), respectively, are larger than expectations provided by the ratio of the number of male versus female germline cell divisions, c. In humans c is estimated to be 6, if paternal age is 20 years, and 10 when paternal age is 25 years (Li et al. 2002). This large human α is closer to direct estimates of male mutation bias made by comparing point mutations of paternal and maternal origin for hemophilia (α= 15) (Oldenburg et al. 1993). We note that an alternate explanation for high α values in human may be a statistical fluctuation; although the autosomal substitution rates are similar between human and chimpanzee and the estimated substitution rate for human X is lower than that for chimpanzee (Table S3), both these values are quite low, meaning that small alterations in either estimate may produce large alterations in the estimate of α. Thus, to avoid any biases due to this potential outlier, the human branch is excluded from the regression analysis for the study of male mutation bias only. See also our pairwise human–chimpanzee analysis presented below.
We assembled data on eight biological features reflecting three life-history traits—generation time, metabolic rate, and sperm competition (Table 1, Table S1)—for nearly all the 32 eutherian mammalian species using a combination of primary literature, personal communications, and previously compiled databases (see Methods; all data and references are available in Table S1). We approximate generation time with three biological traits: the age at sexual maturity, similar to the age at first breeding (Bartosch-Härlid et al. 2003), maximum life span, and interlitter interval. Sperm competition is measured by one continuous variable, the ratio of testes mass to body mass (Harcourt et al. 1981) and one binary variable, mating pattern. Mating patterns were classified into one of two categories: either as those in which sperm from multiple males have to compete to fertilize eggs in the female (in polyandrous and polygynandrous species) or those where little or no sperm competition occurs (in monogamous and polygynous species). Thus some species that are not socially monogamous may be grouped in the latter category because females rarely mate with more than one male during the same periovulatory period, such as humans, which exhibit only weak sperm competition (Martin 2007). Finally, to estimate metabolic rate we collected data on three biological features: basal metabolic rate (BMR), body mass, and temperature.
To test for correlations among the biological features studied, we conducted a PCA (Fig. 2; see Methods for description), expecting that the features collected as predictors of each life-history trait should group together. Analysis of the first two, and the only significant, principal components indicated that features collected to represent each life-history trait do, indeed, cluster together, explaining similar variation. Namely, all three predictors of generation time (life span, age at sexual maturity, and interlitter interval) group closely together, and both predictors of sperm competition (mating behavior and testes-to-body mass ratio) group together. Two of our predictors of metabolic rate (mass and basal metabolic rate, BMR) group very closely together, highlighting their similar predictive power. The outlier is temperature, the third presumable predictor of metabolic rate. We observed that temperature does not group closely with the other predictors of metabolic rate, perhaps because of the lack of a consistently measured dataset for temperature (e.g., measured at the same time of day or activity level) for all 32 mammals, and the observation that body temperature can vary as much within an individual (at different points during a day) of some species as it varies across all species considered here. For instance, the body temperature of an individual treeshrew (Tupaia belangeri) can vary by 5°C in one day (Refinetti and Menaker 1992).
TRENDS BETWEEN LIFE-HISTORY TRAITS AND SUBSTITUTION RATES
Next, we assessed how each biological feature varies with either the magnitude of male mutation bias, or with the autosomal substitution rate across mammals (Figs. 3 and 4). Consistent with the results of the PCA analysis, we observed that features used to predict each life-history trait show similar trends, either being positively or negatively associated, with α or the autosomal substitution rate. Curiously, the correlations for each life-history trait with the autosomal substitution rates versus with α were exactly in the opposite direction. These trends will be discussed in the context of the regression results below.
Because closely related species may, by virtue of their shared evolutionary history, have more similar biological traits than more distantly related ones (phylogenetic dependence), treating species as statistically independent units may artificially increase the significance of results (Felsenstein 1985). BayesTraits is a powerful suite of tools for detecting and correcting for phylogenetic dependence that incorporates and improves upon pairwise methodologies, and integrates a phylogenetic tree to correct for divergence times between species (Pagel 1999) (see Methods). We used BayesTraits to account for phylogenetic dependence in our dataset and to compute regressions for the two different response variables: the weight-averaged autosomal substitution rate (at presumably unconstrained sites) in each species, to observe how the variation in life-history traits affects the underlying point mutation rate across mammals, and the male-to-female mutation rate ratio, α, to estimate the impact of life-history traits on male mutation bias, against each of the eight biological features of interest (Table 2). This led to the discovery of several significant relationships, and relationships that were surprisingly not significant. For both response variables, we investigated all possible combinations of the eight biological features as predictors (e.g., eight traits taken eight at a time, eight traits taken seven at a time, and so on), as well as an interaction term between Mass and BMR. However, none of the multiple regression models was a better fit than the simple regressions, as judged by the Akaike Information Criterion corrected for small samples sizes (AICc) and the significance of predictors in the models (see Methods; Table S6), and thus multiple regression models are not discussed in detail here. Interestingly, although many predictors were significant as predictors in the simple regressions (Table 2), when combined with other features into multiple regression models, for both sets of response variables, in general, only predictors of generation time maintained their significance (Table S6). This is in concordance with previous results for protein-coding substitutions (Welch et al. 2008) and highlights the importance of replication-dependent mutations in explaining variation in substitution rates and male mutation bias among species. Below we focus on the trends and correlations observed in the simple regression models.
Table 2. Regression results (for simple models) with using either the autosomal substitution rate as the response or α as the response. R2 values and coefficients (Betas) for the simple regression models are shown. Two-tailed P values are presented and marked in bold if significant after Bonferroni correction. The Akaike information criterion corrected for small sample size (AICc) for each model is also shown.
Autosomal substitution rate
CORRELATIONS BETWEEN AUTOSOMAL SUBSTITUTION RATES AND LIFE-HISTORY TRAITS
Predictors of generation time are significantly negatively correlated with substitution rates across species, as was observed in previous studies (Nikolaev et al. 2007). Life span explains 43.1% of the variation in rates across species, whereas the other predictors of generation time, age at sexual maturity, and interlitter interval explain 42.5% and 30.6% of the variation, respectively (Fig. 3 and Table 2). These results are consistent with expectations that species with shorter generation times experience a greater number of rounds of replication and of mutations arising due to errors during this process per unit of time (Laird et al. 1969; Hwang and Green 2004).
We observe that all predictors of metabolic rate (mass, BMR, and temperature), are negatively correlated with substitution rate (although temperature is not significant and BMR loses significance after Bonferroni correction; Fig. 3 and Table 2), similar to some other studies (Bromham 2002; Welch et al. 2008). These results are inconsistent with expectations derived from the supposition that higher metabolic rates will induce more oxidative damage to DNA (Martin and Palumbi 1993), instead suggest that higher metabolic rates might somehow shield DNA from damage. However, upon closer inspection of the multiple regression models, none of the predictors of metabolic rate retain their significance in accounting for variation in autosomal substitution rates when included alongside predictors of generation time (Table S6). Thus, although the simple regressions of metabolic rate are significant, results from the multiple regressions suggest that these replication-independent factors do not have the predictive strength of the estimators of generation time, and support studies of context-dependent substitution rate variation, which indicated that metabolic rate does not have a direct influence on substitution rates across species (Hwang and Green 2004).
Both predictors of sperm competition, the testes-to-body mass ratio and mating pattern, are positively, but not significantly (even prior to Bonferroni correction), correlated with substitution rates across mammals (Fig. 3 and Table 2). These results confirm previous studies—which investigated several loci in birds (Moller and Cuervo 2003; Nadeau et al. 2007) and synonymous substitution rates in primates (Wong 2010)—that also failed to find a significant correlation between postcopulatory sexual selection and the sex-averaged substitution rate (although the trends seem to be different at protein-coding sites (Wong 2010)). The weak positive trend observed here may provide some support for the theoretical expectations that increased sperm competition will result in a higher mutation rate (Blumenstiel 2007) but should be studied with more species, and when more quantitative measurements of sperm competition become available to confirm the presence or absence of a relationship between sperm competition and sex-averaged substitution rates. Alternatively, it is possible that the magnitude of sperm competition in mammals has relatively little effect on the sex-averaged substitution rate either because females have evolved a mechanism to compensate for any increase in the mutation rate in males, or because sperm competition may not influence the mutation rate in males at a detectable level.
CORRELATING LIFE-HISTORY TRAITS AND MALE MUTATION BIAS
Using the branch-specific estimates of α generated from genome-wide data corrected for phylogenetic dependence, we observe that generation time is a strong positive predictor of the magnitude of male mutation bias: age at sexual maturity explains 35.5% of the variation in α values across mammals whereas life span and interlitter interval explain 33.2% and 31.9%, respectively (Fig. 4 and Table 2). Whereas previous, smaller scale studies could only speculate about the positive correlation between generation time and male mutation bias based on fluctuations in α between species (Goetting-Minesky and Makova 2006), we are able to present statistically significant evidence supporting this observation.
The magnitude of male mutation bias is significantly positively correlated with two predictors of metabolic rate—mass and BMR—in the simple regressions (these regressions lose significance after Bonferroni correction; Fig. 4 and Table 2). Mass and BMR explain 27.8% and 26.4% of variation in α, respectively. The influence on male mutation bias of such replication-independent influences has not been previously investigated. Differences between DNA damage in males and females may result if, as has been suggested (Velando et al. 2008), sperm are more susceptible to changes in metabolic rate because they live in an oxygen-rich environment, while eggs have a denser cell membrane reducing oxidative stress. It would be desirable for future experiments to measure the differential effects of oxidative stress on sperm and eggs. Temperature, a positive (but not a significant) predictor can also affect the viability of both sperm and eggs (McDaniel et al. 1995). It is possible that we did not have the power to detect correlations between temperature and male mutation bias, due to the fluctuations in when and how body temperatures were measured. Alternatively, such a correlation may not exist if species evolved mechanisms to keep body temperature from differentially affecting DNA damage in the male and female germlines. This alternative seems plausible given that many male mammals have external testes and species with internal testes evolved mechanisms to keep their testes cool: male dolphins use their dorsal fins to control a countercurrent heat exchange to cool their abdominal testes (Rommel et al. 1994), in adult male rats the temperatures of abdominal testes are always below rectal temperatures (Kormano 1967), and it has even been proposed that species with abdominal testes may have cooler body temperatures overall (Wislocki 1933). It should be noted that although mass and BMR predict some variation in α (as described above; Table 2), and show clear, positive trends (Fig. 4), they lose their significance and add little to the explanatory power of the model when included in a multiple regression with predictors of generation time (Table S6), suggesting that replication-independent mechanisms may have a smaller influence on male mutation bias than replication-dependent mechanisms.
In contrast to expectations and a pattern observed for birds (Bartosch-Härlid et al. 2003) and the great apes (Presgraves and Yi 2009), the correlation between the magnitude of male mutation bias and the strength of sperm competition is, surprisingly, not significant for either the testes-to-body mass ratio or for mating pattern across the broad range of mammals studied here (Table 2). Males whose sperm must compete to fertilize eggs produce higher quantities of sperm than males from species without this competition (Short 1979). This is expected to lead to an increase in male mutation bias if a greater number of cell divisions occurs to accommodate the increased sperm production in the former class; theoretical work confirms this expectation (Blumenstiel 2007). Unfortunately, observations regarding specific mating patterns are few and often do not provide information on the exact number of partners per periovulatory period. Further, the other predictor of sperm competition, the testes-to-body mass ratio, may be confounded by when the measurements were taken (e.g., testes mass may vary for species that mate only seasonally, between dominant and nondominant males, and if measured in younger vs. more mature adults). Alternatively, the impact of generation time may greatly outweigh any influence of sperm competition, as was also suggested in the study of male mutation bias in apes (Presgraves and Yi 2009). Finally, the consequences of increased sperm competition have only been investigated in the male germline, not in the female germline. If mating patterns consistent with increased sperm competition affect the number of rounds of replication during spermatogenesis and oogenesis, then increased sperm competition would not affect the magnitude of male mutation bias, highlighting the need to study both the female and male germlines in future experiments. However, as we also do not observe a significant association between sperm competition and the sex-averaged mutation rate, this last explanation seems unlikely to have a predominant influence.
POTENTIAL EFFECTS OF SEQUENCE COVERAGE AND SELECTION ON OUR ESTIMATES
The low coverage (∼2×) of many of the currently sequenced mammalian genomes, combined with the alignment method used for the 44-way alignment (pairwise to human) limited our study to regions that were (1) sequenced with enough coverage (or long enough reads) to be confidently aligned and (2) conserved enough to be aligned to human, resulting in more conservative estimates of the substitution rates across species. Indeed, previous research indicates that estimates of the substitution rates at conserved noncoding sequences are similar or even lower than rates at nonsynonymous sites in the majority of eutherian species (Nikolaev et al. 2007). However, due to our stringent requirements for filtering by quality, the regions considered should be equally conserved across all species, and, as the trends we discovered are consistent with previous studies (Gibbs et al. 2004), the overall patterns of variation between species do not seem to be affected (Fig. 2).
To alleviate concerns over sequence coverage quality and selection (potentially acting even at noncoding regions analyzed here) affecting our estimates, we reran the analysis first using only species with high coverage genomes (6× or higher), both including and excluding ultra-conserved elements, UCEs (see Table S7 for substitution rates and α values). We had much less power to detect trends due to a smaller number of species considered, however any significant correlations observed in the full regression models described above (including all available mammalian sequences) were confirmed in the models with fewer species (Table S8). Although our results should be reconfirmed when the higher coverage sequences of all of the studied mammals become available, this restricted analysis suggests that the main results presented here are not affected by the inclusion of low coverage genomes or by UCEs.
To further test how our results might depend on the evolutionary history of the X chromosome, we also recomputed all regressions using several partitions of the X chromosome (stratum 1, stratum 2, stratum 3, stratum 4, XAR, and XCR) in addition to the entire X chromosome that we considered for the main analysis. There is a remarkable agreement among the regressions, suggesting that our results are also robust to the particular partition of the X chromosome chosen to compute α (Table S9).
OTHER CONTRIBUTING FACTORS
The results presented here suggest that errors in DNA replication account for a large proportion of variation in substitution rates and in male mutation bias among species. However, differences in the amount of replication errors alone may be insufficient to explain the variation in substitution rates between the X, Y, and autosomes; specific properties and genomic landscapes of these chromosomes might significantly contribute to this variation as well (Bartosch-Härlid et al. 2003; Malcom et al. 2003). To investigate how such interchromosomal differences affect estimates of male mutation bias, we reran the substitution rate analysis using only the human and chimpanzee genomes (which now both have completed Y chromosome sequence [Skaletsky et al. 2003; Hughes et al. 2010]), and incorporated higher quality genome builds than were available for similar analyses previously (Taylor et al. 2006). Using the same filtering methods described for the multiple alignment (see Methods) led to a much larger dataset (1.21 Gb, 46.5 Mb and 2.45 Mb base pairs for autosomes, X, and Y, respectively), compared with our multispecies dataset above, due to a greater number of nucleotides of high quality retained just for two species. α estimates were computed directly from the substitution rates and corrected for ancestral polymorphism (by subtracting the human diversity from both the autosomal and X-linked rates [Makova and Li 2002]). The corrected pairwise estimates (αX/A= 3.6, αY/A= 6.8, and αX/Y 3.5; Table S10) differ depending on which chromosomes are used for the comparisons, similar to previous work (Taylor et al. 2006; Pink et al. 2009). This result highlights the need for more sequencing projects to include the Y chromosome (as such pairwise analyses do not provide branch-specific resolution), and agrees with suggestions that there are also replication-independent mechanisms of mutations (Pink et al. 2009), for example, recombination might be mutagenic (Strathern et al. 1995). However, genome-wide estimates of recombination rate are not yet available for most species. Note also that the recently published analysis of 1000 sequenced human genomes casts doubt on the mutagenicity of recombination (Durbin et al. 2010).
We present the first genome-wide analysis of correlations between life-history traits and both autosomal substitution rates and male mutation bias, corrected for phylogenetic dependence. First, we determined that predictors of generation time reflecting replication-dependent processes explain over 30% of the variation in male mutation bias, and over 40% of the variation in autosomal substitution rates across mammals. Second, our results demonstrate that predictors of metabolic rate correlate significantly with male mutation bias and substitution rate, but surprisingly are minimally important when considered alongside predictors of generation time. Third, our results suggest that, contrary to expectations, there appears to be relatively little influence of sperm competition on either the underlying substitution rate at unconstrained sites or on male mutation bias in mammals.
This study highlights that, even in the genomic era, there are many basic biological questions that remain unanswered: What is the length of spermatogenesis and oogenesis in all sequenced mammals? How does metabolic rate affect both sperm and eggs? How many partners, on average, do males and females of each mammalian species have? What is the daily range in body temperature for a given species? Genomic analyses will be useful, only when taken in the context of the existing biology, and the opportunities for future research at interface of genomics and species biology are legion.
Associate Editor: A. Cutter
We would like to thank the following for helping collect species life-history traits: Dr. F. Weaker, Dr. R. Stanyon, D. Foreman, M. Savio, S. Byers, P. Christiansen, T. Dewey, and Pennsylvania State University's Interlibrary Loan system. We also thank G. Ananda and D. Blankenberg for assistance using Galaxy, and B. Dickins, H. Goto, C. Park, and Y. Kelkar for discussions about the article. We gratefully acknowledge funding for this project through the NSF Graduate Research Fellowship to MAWS, and by NIH grants RO1 GM072264–05S1 and RO1 GM072264 to KDM. This work was supported in part through instrumentation funded by the National Science Foundation through grant OCI-0821527.