Reconstructing the ancestral characteristics of species is a major goal in evolutionary and comparative biology. Unfortunately, fossils are not always available and sufficiently informative, and phylogenetic methods based on models of character evolution can be unsatisfactory. Genomic data offer a new opportunity to estimate ancestral character states, through (i) the correlation between DNA evolutionary processes and species life-history traits and (ii) available reliable methods for ancestral sequence inference. Here, we assess the relevance of mitochondrial DNA – the most popular molecular marker in animals – as a predictor of ancestral life-history traits in mammals, using the order of Cetartiodactyla as a benchmark. Using the complete set of 13 mitochondrial protein-coding genes, we show that the lineage-specific nonsynonymous over synonymous substitution rate ratio (dN/dS) is closely correlated with the species body mass, longevity and age of sexual maturity in Cetartiodactyla and can be used as a marker of ancestral traits provided that the noise introduced by short branches is appropriately dealt with. Based on ancestral dN/dS estimates, we predict that the first cetartiodactyls were relatively small animals (around 20 kg). This finding is in accordance with Cope's rule and the fossil record but could not be recovered via continuous character evolution methods.
Mapping ancestral characters onto a phylogeny is of fundamental importance in the study of evolutionary processes, providing a unique opportunity to formulate ecological or evolutionary scenarios that could hardly be tested otherwise (Coddington, 1988; Donoghue, 1989; Cunningham et al., 1998). Yet obtaining ancestral values of a particular trait is a difficult problem in comparative biology as ancient species are no longer observable. Palaeontology provides invaluable morphological information about extinct species and their ecological characteristics. The so-called Cope's rule (Alroy, 1998), for instance, which states that ancestors tend to be of smaller size than current species, was deduced from palaeontological data. Unfortunately, fossils are scarce and not always easily connected to a specific lineage of extant species, so in most cases, they are insufficient for depicting the complete phylogenetic history of morphological and ecological traits.
The lack of direct observations has prompted the development of character evolution models and phylogenetic methods for estimating ancestral states in a maximum likelihood or Bayesian framework. For continuous characters, the most basic model postulates that the trait of interest evolves under unidimensional Brownian motion (BM) at a constant rate along the branches of the underlying phylogeny (Felsenstein, 1985). The development of more complex models relaxing some of the BM assumptions has also been tried. For example, BM with a directional trend seeks a preferential trajectory in the random walk (Hunt, 2006), whereas the stable model allows for selection episodes by relaxing the assumption of neutral evolution (Elliot & Mooers, 2013). However, by analysing data sets that correspond to a single realization of the modelled process, these methods are hampered by a limited statistical power and are particularly sensitive to model misspecification (Oakley & Cunningham, 2000). A number of discordances with respect to the fossil record have already been reported in such studies (Finarelli & Flynn, 2006; Slater et al., 2012).
Quickly accumulating molecular data could provide a new perspective regarding ancestral trait reconstruction. Molecular evolution is influenced by a number of factors related to species biology and ecology, such as mutation rate, effective population size and mating systems (Lynch, 2007). Genomic sequences are expected to keep track of the various selective or neutral pressures that species have undergone through time. The idea would therefore be to estimate ancestral traits indirectly based on ancestral molecular reconstructions. The strength of this approach is that there are numerous molecular characters, which considerably increases the power and accuracy of phylogenetic inference methods. Moreover, contrary to life-history traits where any change overwrites the previous value, DNA accumulates in its sequence the signal from past substitutions; by using comparative methods, this allows these changes or rates of change to be allocated to the different branches of the phylogenetic tree (Lanfear et al., 2010). In prokaryotes, for example, the GC content of ribosomal RNAs and the amino acid composition of proteins have been shown to be correlated with species optimal growth temperature, therefore serving as a molecular thermometer to infer past environmental temperatures on earth (Galtier et al., 1999; Boussau et al., 2008; Groussin & Gouy, 2011).
In mammals, several molecular evolutionary features have been linked to species life-history traits (LHT), including nucleotide and amino acid substitution rates (Martin & Palumbi, 1993; Bromham et al., 1996; Nabholz et al., 2008; Welch & Waxman, 2008) and genomic GC content (Romiguier et al., 2010, 2013; Lartillot, 2013). The nonsynonymous (= amino acid changing, dN) to synonymous (dS) substitution rate ratio is of specific interest, as it has been found to be positively correlated with the species body size, longevity and generation time in mammals (Nikolaev et al., 2007; Popadin et al., 2007; Romiguier et al., 2012). These relationships were interpreted as reflecting a decreased effective population size (Ne) in large mammals (Damuth, 1981, 1987; White et al., 2007), as compared to small ones. According to the nearly neutral theory of molecular evolution (Ohta, 1987), purifying selection against slightly deleterious mutations is less efficient in small populations, in which the random effects of genetic drift dominate. Assuming that a majority of nonsynonymous changes are deleterious, an increased proportion of nonsynonymous substitutions could therefore be expected in low-Ne species.
Recently, two studies took advantage of these correlations between species LHT and molecular evolutionary processes to estimate ancestral character states in mammals using phylogenetic approaches. Using 787 nuclear genes, Romiguier et al. (2013) reported that the estimated dN/dS ratio and GC content dynamics during early divergence in placental mammals were comparable to those of current long-lived mammals, and very different from those of current short-lived mammals, supporting the assignment of a relatively long-lived ancestor to this group. Lartillot and Delsuc (2012) analysed a smaller data set using a more sophisticated method in which the correlation between molecular rates and LHT was explicitly modelled (Lartillot & Poujol, 2011). Their results were largely consistent with those of Romiguier et al. (2013). Unfortunately, these estimates of ancestral LHT, which concern early placental ancestors, can hardly be compared to palaeontological data due to the uncertainty regarding placental origins in the fossil record.
The two studies reviewed above were based on nuclear DNA. Here, we investigate the ability of mitochondrial DNA (mtDNA), an alternative source of molecular data, to unveil ancestral characteristics of organisms. Just as nuclear DNA, the evolutionary dynamics of mtDNA in mammals appears to be influenced by species LHT (Popadin et al., 2007; Nabholz et al., 2008; Welch & Waxman, 2008). As mtDNA data are easily sequenced and intensively used in phylogenetic and phylogeographical studies, they are available for a wide variety of groups and species, offering a much finer taxonomic resolution than nuclear data, providing an opportunity to explore taxa in which no or few nuclear genomes have been sequenced. However, some doubts remain regarding the power of this marker for ancestral reconstruction. First, the mammalian mitochondrial genome contains only 13 protein-coding genes and roughly 4000 codons, so the total amount of information is limited. Given its specific function (cellular respiration and metabolism), mtDNA might experience a distinct selective pressure in distinct groups (e.g. Shen et al., 2009), which might influence dN/dS irrespective of Ne and LHT. Finally, mtDNA in mammals evolves much faster than nuclear DNA (Nabholz et al., 2008). This fast rate leads to an increased risk of homoplasy and substitutional saturation, especially at synonymous positions, which might bias the estimation of the dN/dS ratio upward in fast-evolving lineages. Recently, Nabholz et al. (2013) used mtDNA to reconstruct ancestral LHT in birds and mammals. They concluded that, at this timescale, the mtDNA dN/dS ratio was not a reliable marker of LHT evolution because of the saturation of dS in long branches.
In this study, we evaluated the performance of the 13 protein-coding mammalian mitochondrial genes for estimating ancestral LHT in mammals. We used cetartiodactyl mammals as a benchmark, as this group is characterized by a wide spectrum of life-history traits ranging from a few kilograms (mouse-deer, dwarf antelopes) to whales, the largest animals on earth, whose weight can exceed a hundred tons. Moreover, this order benefits from a recent extensive mtDNA sequencing effort (Hassanin et al., 2012) – the complete mitochondrial genomes of 201 of the 332 recognized Cetartiodactyla species are available at this time (and just five nuclear genomes). Interestingly, the recent nuclear DNA-based reconstructions of former traits are suggestive of a relatively small short-lived Cetartiodactyla ancestor (Lartillot & Delsuc, 2012; Romiguier et al., 2013), a finding that contrasts with the large size of most of its current representatives. The independent information provided by mitochondrial DNA offers a good opportunity to better understand and more thoroughly explore the case of this peculiar mammalian order.
Two methodological approaches to the problem of DNA-aided ancestral state reconstruction are available in current literature. First, Lartillot and Poujol (2011) introduced an integrative, Bayesian method based on explicit modelling of the correlated evolution between continuous characters and substitution rates along the tree. This is arguably the statistically most elegant approach, but its robustness to potential departure from the underlying assumption of a multivariate Brownian process of character evolution has not been assessed so far. Alternatively, the problem can be addressed in a stepwise way, first mapping synonymous and nonsynonymous substitutions to the branches of the tree (Romiguier et al., 2012), then performing a regression analysis of dN/dS to LHT in terminal branches (Romiguier et al., 2013). This approach is expected to be robust to the process of character evolution, but due to the equal weight it puts on every terminal branch of the tree, it might be hampered by excessive variance in dN/dS estimation in short branches – especially in the context of taxon-rich sampling and limited sequence length – and faces the issue of phylogenetic nonindependence of data points.
In this study, we took advantage of these two complementary approaches to characterize the relationship between dN/dS and LHT in Cetartiodactyla, perform mtDNA-aided ancestral LHT reconstruction and compare our results to Brownian model-based reconstructions that do not make use of any molecular information. These reconstructions, based or not on sequence data, were compared to the palaeontological data to assess the plausibility of their ancestral estimates. This work validates mtDNA as a marker of ancestral LHT in Cetartiodactyla. It supports the hypothesis of a small common ancestor to extant Cetartiodactyla and the repeated evolution of large-sized lineages in this group, in agreement with Cope's rule on body mass evolution.
Materials and methods
Sequences were retrieved from the GenBank nucleotide database using the following query: ‘Cetartiodactyla and mitochondrion and complete and genome’. Sequences corresponding to incomplete mitochondrial genomes were excluded, and a single sequence was retained for each species, such that the complete mitochondrial genome of 201 Cetartiodactyla, of 332 recognized species, was finally obtained. Five out-group species were retrieved separately (Equus hemionus, Rhinoceros unicornis, Panthera tigris, Ursus arctos and Hipposideros armiger).
Alignment and phylogeny
Two alignments were performed: a whole mitochondrial genome alignment and an alignment restricted to the 13 protein-coding mitochondrial genes. The former was used for reconstructing phylogenetic trees and the latter for estimating dN/dS values. The alignment intended for phylogeny was performed using MUSCLE software (default parameters) (Edgar, 2004) and then cleaned with G-blocks (default parameters) (Castresana, 2000). We recovered an alignment of roughly 15 kB with very little missing data.
A phylogenetic tree was then inferred with the PhyloBayes program (Lartillot & Philippe, 2004) using a CAT GTR GAMMA 4 model with two Monte Carlo Markov chains running simultaneously.
Tree topology was compared with recently published phylogenies. Our Cetartiodactyla phylogeny was congruent at the subfamily level with that of Hassanin et al. (2012), with the exception of the monospecific Antilocapridae family, which was the sister group to Giraffidae in our inferred tree, but branched more deeply within Pecora in Hassanin et al. (2012). We manually corrected the Antilocapridae branching of our topology to match that of Hassanin et al. (2012). We also found that Tylopoda instead of Suina was the most basal group of Cetartiodactyla, a branching which is supported by almost all recently published Cetartiodactyla phylogenies (Price et al., 2005; Agnarsson & May-Collado, 2008) (see Appendix S1 for Newick tree). For each species, protein-coding genes were extracted using GenBank annotations and aligned with MUSCLE. All 13 gene alignments were then concatenated into a single one.
Species body mass, maximum longevity and age of sexual maturity were extracted from two databases: AnAge (build 12) (De Magalhães & Costa, 2009) and PanTHERIA (Jones et al., 2009). When two values were available (or three as male and female sexual maturity are given separately in AnAge), their mean was calculated. We obtained a measure of body mass in 178 of our 201 Cetartiodactyla species, longevity in 157 species and sexual maturity in 135 species. All values were log10-transformed.
Modelling the correlated evolution between dN/dS and body mass
We used the coEvol method version 1.3 (Lartillot & Poujol, 2011) to directly estimate the correlated evolution between life-history traits and mitochondrial dN/dS in Cetartiodactyla. The program models the correlated evolution of the two types of variables by assuming a multivariate Brownian diffusion process underlain by a covariance matrix. All parameters are estimated in the Bayesian framework using Monte Carlo Markov Chains. The strength of the coupling between dN/dS and an LHT is returned as a posterior probability (pp) of a positive (pp close to 1) or negative covariation (pp close to 0).
We performed one run with all three LHT simultaneously along with both dS and dN/dS as dependent variables, and another run using a single LHT variable – body mass. In each case, two chains were run independently and above 8000 and 14 000 points, respectively, were sampled from the posterior distribution of parameters (burn-in: 2000 points).
dN/dS calculation and species-level correlation analyses
Alternatively to the coEvol method, we took a stepwise approach to the correlation between dN/dS and LHT. For each branch of each tree, a dN/dS value was calculated from protein-coding gene alignments using the substitution mapping procedure (Romiguier et al., 2012). This consisted of first fitting the YN98 codon model to the whole data set, as implemented in the bppML program (Dutheil & Boussau, 2008). The estimated model parameters were used to map the two types of substitutions, that is, synonymous and nonsynonymous, onto every branch of the tree, thus allowing the calculation of a branch-specific dN/dS. The estimated dN/dS ratios of terminal branches were correlated with the species life-history traits, and parametric Pearson's correlation tests were performed with r version 2.14.1 (The R Foundation for Statistical Computing, Vienna, Austria). Terminal branches to which < 10 total substitutions had been assigned were discarded.
To evaluate the potential negative impact of the numerous short branches, for which the low amount of substitutions increases the sampling variance of the dN/dS estimate, we recalculated the dN/dS–body mass correlation coefficient in Cetartiodactyla after removing species associated with a terminal branch of length below a certain threshold value of synonymous substitutions (Table 1; Fig. S1).
Table 1. Impact of different manipulations on the dN/dS–body mass correlation in Cetartiodactyla.
To cope with this problem of excessive variance in dN/dS estimates in short branches introduced by the high-taxon sampling of our mtDNA data set, we combined the signal from distinct, closely related branches into a single data point. To achieve this, we defined monophyletic clusters of species sharing a relatively similar body mass and calculated one dN/dS ratio per cluster. Based on the tree topology and the distribution of body mass within Cetartiodactyla, clusters correspond in our main analysis to the tribe level in Ruminantia (with the exception of the Moschidae family and Capreolinae subfamily) and suborder level in the other lineages (with the exception of the Hippopotamidae family) (Table S1). Cluster name abbreviations are used in Figs 1 and 2 (aep: Aepycerotini, alc: Alcelaphini, ante: Antilocapridae, anti: Antilopini, bos: Boselaphini, bov: Bovini, cape: Capreolinae, capi: Caprini, cep: Cephalophini, cer: Cervini, gir: Giraffidae, hipe: Hippopotamidae, hipi: Hippotragini, mos: Moschidae, mun: Muntiacini, mys: Mysticeti, neo: Neotragini, odo: Odontoceti, ore: Oreotragini, red: Reduncini, sui: Suina, trai: Tragelaphini, trae: Tragulidae, tyl: Tylopoda). To control for this arbitrary choice, alternative clustering schemes were also explored (Table 1). The dN/dS value of a cluster was obtained by summing the nonsynonymous and synonymous substitution counts across its terminal branches and calculating the ratio. The body mass, longevity and sexual maturity of a cluster were defined as the mean of the log10-transformed values of its representatives.
Ancestral trait estimations
Ancestral life-history trait values were estimated by a simple regression analysis of internal branch dN/dS values on the observed correlation with higher-order taxa. We took advantage of the independence between substitution events occurring in distinct branches to combine the dN/dS value from several connected internal branches with the ‘meta’ r package, which turns a set of independent estimators of a given quantity into a single point estimate and confidence interval.
We applied several controls to the higher-order taxa dN/dS – body mass correlation. The analysis was re-conducted using slightly different options or design to assess the influence of (i) tree topology, (ii) taxonomic level of species clustering and (iii) particular taxa suspected to behave as outliers (i.e. aquatic taxa and poorly sampled taxa) (Table 1; Table S2). The effect of phylogenetic inertia was tested through the method of phylogenetically independent contrasts (Felsenstein, 1985) with the ‘ape’ r package.
Models of continuous character evolution
Ancestral body mass was also reconstructed based on body mass data only, that is, not using sequence data. This was achieved with three different continuous character evolution models: a standard Brownian motion-based model (BM), BM with directional trend (program BayesTraits, Pagel, 1999) and the ‘stable model’ (program StableTraits; Elliot & Mooers, 2013). Uncertainty on reconstructed ancestral values (95% confidence interval) was calculated in the Bayesian framework for all three models.
Continuous character reconstruction
Ancestral body mass at the root of Cetartiodactyla was estimated based just on body mass data using three continuous character evolution models: the standard Brownian motion with constant rate model (BM), BM with a directional trend (DT) and the Stable model (SM), which allows shifts in the evolutionary rate. BM predicted a large ancestor for Cetartiodactyla (156 kg point estimate), much larger than our mtDNA-based estimate (24 kg). Importantly, the 95% confidence interval of the BM analysis essentially covered the whole range of noncetacean Cetartiodactyla body masses (15 kg-1.6 T). This observation could be explained by the limitation of this simplistic model to recover an accurate ancestral state so deeply in the phylogeny when its value lies at an extremity of the range of its descendants. To account for this problem, we used the DT model, which was designed to detect a potential general trend towards an increase or decrease in the trait under consideration. We obtained an unrealistically high body mass value (132T) for the Cetartiodactyla ancestor, comparable to the largest whales. Reconstructions using SM, finally, yielded ancestral body mass estimates similar to those of BM, that is, a point estimate of 148 kg with a wide confidence interval (13.5 kg-1.5 T).
dN/dS vs. body mass correlation in Cetartiodactyla
Using mtDNA sequences, we obtained a good correlation between dN/dS and body mass using higher-order taxa: r2 = 0.57 (P < 2.10−5, Fig. 1), thus confirming the influence of body mass on the substitution dynamic of mitochondrial genes in Cetartiodactyla. The negative influence of short terminal branches (for which the dN/dS ratio may be badly estimated) when species are considered separately and the benefit of grouping them were confirmed by our control analysis where we excluded the smallest branches. We observed that the correlation between dN/dS and body mass at the species-level increases with the threshold for minimum terminal branch length, reaching r2 = 0.53 (P < 1.10−4) when only branches longer than 1000 synonymous substitutions are kept (Fig. S1), that is, a value similar to the one obtained with the higher-order taxa analysis (in which the total number of synonymous substitutions per data point is generally above 1000, with only two exceptions). Due to the large amount of noise introduced by the numerous terminal branches resulting from the rich taxon sampling, we observed a still significant but much weaker dN/dS–body mass correlation at the species level if no terminal branches were excluded (Pearson's test: r2 = 0.10, P < 2.10−5, Fig. 1).
The higher-order taxa level analysis corrects to some extent for the problem of phylogenetic nonindependence of data points. To further address this problem, we applied the method of phylogenetically independent contrasts (Felsenstein, 1985) and found that the cluster-based correlation was robust (r2 = 0.38, P <0.002). We performed several other control analyses in Cetartiodactyla on the influence of tree topology, taxonomic level of species clustering, potential outlier taxa and choice of out-groups (Table 1; Appendix S2). The dN/dS–body mass correlation levels were essentially unchanged in all cases. In particular, we noted that as soon as a sufficient level of clustering is reached, the regression slope converges towards a constant value, therefore leading to similar body mass ancestral reconstructions irrespective of the specific definition of higher-level taxa.
Ancestral body mass prediction in Cetartiodactyla
Ancestral body masses in Cetartiodactyla were predicted by projecting internal branch dN/dS ratios on the regression slope of the higher-order taxa dN/dS–body mass correlation. Branch-specific estimated body masses are shown along the phylogenetic tree with a colour code in Fig. 2. Interestingly, the most basal five branches of the tree were all suggestive of relatively small-sized ancestors. Confidence intervals around each branch-specific body mass estimates were wide but, by combining the independent dN/dS value from these five independent branches, we predicted that early Cetartiodactyla ancestors had a body mass ranging from 7.3 to 78.4 kg (95% confidence interval), with a point estimate of 24.0 kg. Each of the five basal branches exhibited a low dN/dS. Among the 128 extant species heavier than, for example, 30 kg, only 23 (18%) showed a dN/dS ratio below 0.029, that is, similar to those obtained in the five basal branches. The probability of observing such a low dN/dS value five times independently under the hypothesis of large ancestors is therefore minute.
Thus, although the Cetartiodactyla order almost exclusively comprises large animals (some exceptionally large, like whales and hippos), according to our predictions, they likely descended from a relatively small-sized ancestor, implying several independent increases in body mass in Tylopoda, Suina, Hippotamidae, Cetacea and several Ruminantia lineages. Figure 2 even suggests a moderately sized ancestor for Cetacea and a parallel increase in body mass in Mysticeti and Odontoceti. According to our reconstruction, the few small-sized clades of Cetartiodactyla, for example Tragulidae, Moschidae or Neotragini, which are currently exceptions in terms of body mass, would actually be close to the ancestral state of the group, in agreement with Cope's rule.
Contrasting life-history traits
Besides body mass, dN/dS was correlated with the species maximum longevity and age of sexual maturity at the higher-order taxa level. All three life-history traits in Cetartiodactyla displayed similar levels of correlation with mtDNA dN/dS (r2 = 0.57, P <2.10−5 for body mass, r2 = 0.52, P < 1.10−4 for longevity and r2 = 0.58, P < 2.10−5 for sexual maturity). Based on these correlations, we predicted for the Cetartiodactyla ancestors an age of sexual maturity of 432 days (95% confidence interval: 294–635 days) and a longevity of 16.5 years (95% confidence interval: 12.0–22.5 years). It should be noted that, these traits being correlated across species, it is unclear which one best explains the variation in effective population size, and ultimately in dN/dS, among species and lineages.
Modelling the correlated evolution between dN/dS and body mass
The coEvol analysis detected a significantly positive but relatively weak correlation between body mass and dN/dS evolution in Cetartiodactyla: r2 = 0.07 (pp = 0.99) when only body mass is used and r2 = 0.09 (pp = 1) when the two other LHT are included, similarly to the species-level correlation analysis. This low coupling level between the two variables resulted in a relatively wide confidence interval for the body mass reconstruction at the root of Cetartiodactyla, 4.7–116 kg with a mean of 35.3 kg, however, in agreement with the estimates obtained from the mtDNA-based stepwise approach and much more accurate than purely Brownian-based reconstructions.
In this study, we explored the relationships between mitochondrial DNA evolution and several major life-history traits in mammals, by taking the order Cetartiodactyla as a benchmark. The potential of this marker to indirectly inform about phenotypic characteristics of ancestors was assessed, and our reconstructions were compared to other methods based on models of continuous character evolution. Finally, we propose a scenario for the evolution of body mass in this group, which appears congruent with Cope's rule.
mtDNA as a marker of species life-history traits
The ability of mtDNA to infer LHT changes through effective population size was confirmed by our findings in Cetartiodactyla of strong correlations between dN/dS and three LHTs. These relationships were only strongly highlighted after we accounted for the noise introduced by short terminal branches, in which the dN/dS ratio was estimated with poor accuracy. We solved this problem by grouping similar-sized species into monophyletic clusters, and considering clusters as data points. The slope of the log-linear relationship between dN/dS and body mass was essentially unchanged in the analysis based on higher-order taxa (Fig. 1), and the r² reached a value similar to that obtained by removing short branches (Fig. S1), strongly suggesting that this analysis revealed the true biological relationship. Species grouping appeared to be a necessary step in this case of rich taxon sampling to compensate for the very weak substitution signal of numerous short terminal branches. Noisy short branches might contribute to explaining the relatively low correlation coefficients reported in previous studies of the dN/dS–body mass relationship in mammals (Popadin et al., 2007; Romiguier et al., 2012; Nabholz et al., 2013).
We noted that the coEvol method, which models the continuous coevolution of dN/dS and LHT along the tree, should not in principle be affected by the problem of short branches, but still failed to detect the strong impact of species LHT on mtDNA evolution. Therefore, it is possible that merging branches also improved the correlations by partly erasing the effect of short-term fluctuations of Ne and body mass, or any alternative source of overdispersion. At any rate, our results indicated that the average dN/dS of a group of monophyletic species predicted the average body mass of this group better than the dN/dS of a species predicts its body mass. To further improve the inference method, it would be helpful to understand the biological or statistical causes of this overdispersion.
Surprisingly, the coEvol analysis returned an estimate of ancestral body mass of Cetartiodactyla close to the one obtained in our regression analysis, with a reasonably narrow credibility interval. We recall that the current version of coEvol jointly analyses dN/dS and dS as predictors of phenotypic traits, whereas the substitution mapping approach solely considers dN/dS. We speculate that the relatively good performance of coEvol in reconstructing ancestral body masses while detecting only low level of correlation between dN/dS and body mass might be explained by an influence of the dS variable (N. Lartillot, pers. comm.).
Determining how to best depict and model the heterogeneity of the substitution process across branches in large phylogenetic datasets has recently attracted much attention and led to many developments (e.g. Blanquart & Lartillot, 2006; Boussau & Gouy, 2006; Dutheil & Boussau, 2008; Jayaswal et al., 2011; Zhang et al., 2011). In this study, we chose to merge species into clusters based on their body mass because we had strong prior expectations regarding its relationship with dN/dS, and also because we were interested in the first place in the reconstruction of ancestral body mass. Another approach could have been to cluster branches solely based on their molecular evolutionary pattern, as proposed by Dutheil et al. (2012).
The correlation coefficients we obtained in Cetartiodactyla from mtDNA data were lower than those obtained from nuclear markers in previous studies (r2 = 0.86 between dN/dS and species longevity in Romiguier et al. (2013), 36 placental mammals). MtDNA has numerous atypical functional and evolutionary features – strong mutational pressure, no recombination, maternal inheritance, respiratory function – that could lower its power to inform about Ne as compared to nuclear DNA. From a statistical standpoint, the small size of mtDNA (only 13 protein-coding genes) presumably limits its power, as compared to the hundreds that can be used in nuclear studies. However, at relatively recent phylogenetic scales, it is likely that nuclear markers partly lose their superiority because of their lower substitution rate, making fast-evolving mtDNA a valuable substitute.
Despite the risk of limited signal, good taxon sampling has the advantage to avoid long branches (particularly terminal ones) and the associated bias in dN/dS estimation due to saturation. If basal branches were affected by saturation, this might lead to an underestimated dS and therefore an overestimated dN/dS, so our findings of a relatively low dN/dS in deep branches of the Cetartiodactyla phylogeny would appear to be robust to saturation effects. However, it was recently suggested that the nonsynonymous rate dN could be affected by saturation too due to limitations of classical codon substitution models to account for multiple substitutions when the amino acid frequency spectrum varies across sites, thus resulting in potential underestimation of the dN/dS ratio in long branches (Dos Reis & Yang, 2013). Our data set shows evidence for multiple substitutions occurring at the same site across the tree, which is typical of mtDNA (Fig. S2). However, the average estimated branch length in our tree was only 0.068 substitutions per nucleotide site, that is, a very low value. Furthermore, we found only a weak effect of terminal branch length on dN/dS estimate in Cetartiodactyla (dN/dS vs. synonymous branch length, r2 = 0.05, P < 1.10−5), suggesting that our analysis is not strongly affected by the effect of saturation.
Cetartiodactyla is an interesting and difficult case for the study of body mass reconstruction because ancestral representatives are thought to have been significantly smaller than most extant species. Our study demonstrates that mtDNA could recover this signal in Cetartiodactyla thanks to a rich taxon sampling. However, it is unclear how well this analysis would translate to other mammalian groups. If we can expect LHT to affect dN/dS evolution in all placental mammals, the grouping procedure introduced by the need to treat short branches may not be suitable in certain cases, for example if the distribution of body masses does not follow the phylogeny. Further exploration in other well-sampled mammalian clades should help to establish the broad applicability of our methodology.
Body mass evolution in Cetartiodactyla
Through our strong dN/dS–body mass correlation in Cetartiodactyla, we were able to predict the body mass of extinct species, with a specific interest in the earliest ancestors of the group. However, we note that defining robust confidence intervals with this approach is not easy. Several sources of uncertainty appear along the analysis which should in principle be accounted for to give an integrative and complete confidence interval. The uncertainty on dN/dS estimation, in particular for deep internal branches, for example, was not taken into account in our reconstructions, and we also considered that the dN/dS–body mass correlation observed in extant species has not changed during all Cetartiodactyla evolution. Here, we mainly reported the uncertainty resulting from the strength of the correlation observed in extant species. This is why we do not pay specific attention to every point estimate returned by the method, but only discuss the main trends that were robust to the various methodological options we have explored.
Using internal branch dN/dS, we predicted small ancestors for all Cetartiodactyla even though the clade is now mostly made up of large species. This result is congruent with published nuclear reconstructions (Lartillot & Delsuc, 2012; Romiguier et al., 2013), which were based on an independent source of information, confirming that this somewhat counter-intuitive finding is robust to all scales and dataset types.
Palaeontology is also suggestive of small-sized ancestors of Cetartiodactyla with groups such as dichobunids or heloyids, in agreement with our reconstruction. The oldest known fossil assigned to this order, that is, Diacodexis (Rose, 1982), lived during the early Eocene and was a mouse-deer looking animal of 2–3 kg, which is even smaller than our estimate. However, fossils cannot always be connected to the phylogeny of extant species with certainty, and it is unclear whether Diacodexis actually corresponds to one of the ancestral branches highlighted in Fig. 2, or is a distinct lineage that diverged from one of these branches. At any rate, this fossil gives credit to the idea that early Cetartiodactyla could have been much smaller than suggested by the typical size of extant species. Our result of a delayed increase in size for this group is also in agreement with the oldest fossils of extant cetaceans, dating back to the Eocene, such as the wolf-sized Pakicetus (Thewissen et al., 2001), Indohyus (weighting around 10 kg) and other roellid artiodactyls (Thewissen et al., 2007).
On the other hand, continuous character evolution models, using either a simple Brownian motion or more elaborate assumptions, failed to retrieve the small size of Cetartiodactyla ancestors. Perhaps not so surprisingly, these methods returned ancestral estimates close to the mean and median of the distribution of current body mass. Even the model that assumes a directional trend (DT) for the evolving trait did not detect any trend towards a body mass increase, but rather reconstructed considerably larger ancestors than simple Brownian motion did. Furthermore, the confidence intervals were quite wide, which suggests that the information on ancestral states has simply gone. This is reminiscent of the findings of the Igic et al. (2006) study of ancestral mating systems in Solanaceae, in which a method based on models of continuous character evolution failed to recover the strong unidirectional trend (from outcrossers to self-fertilizers), which was only detected by molecular-aided reconstructions. These methods would presumably greatly benefit from the inclusion of fossils in their estimations (Slater et al., 2012), but we note that such information is not necessary when using molecular data.
Compared with published nuclear DNA analyses, our species-rich mitochondrial data set allowed us to propose a more detailed scenario for the evolution of body mass in Cetartiodactyla. Our analysis predicted an independent increase in body mass in most of the main lineages of this group – Tylopoda, Suina, Hippopotamidae, Cetacea, Odontoceti, Mysticeti and numerous groups of Ruminantia – with no strong cases of reversal towards smaller sizes (Fig. 2). The few current small-sized groups of Cetartiodactyla (Tragulidae, Moschidae, Neotragini) were within the confidence interval of estimated basal body mass, suggesting that these lineages have mostly remained unchanged since the ancestral character state – but note that the details of the evolutionary history of body mass during the radiation of Ruminantia are largely unresolved. This peculiar pattern is in agreement with Cope's rule, which postulates that body size tends to increase in evolutionary lineages. According to this hypothesis, a large body size would confer a higher fitness at the population level, for example a better ability to escape predation or to compete for resources and sexual partners (Hone & Benton, 2005), but large species would be hampered by a higher probability of extinction than small ones, perhaps because of their reaching a higher level of ecological specialization (Stanley, 1973; Lihoreau & Ducrocq, 2007), or because of their reduced population size and subsequent accumulation of deleterious mutations (Lynch & Blanchard, 1998; Popadin et al., 2007). The relevance and level of generality of Cope's rule remain controversial in mammals as the evidence seems to essentially be derived from fossil analyses (Alroy, 1998; Van Valkenburgh et al., 2004), with little or no corroboration from studies using extant species body mass (Monroe & Bokma, 2010). Of course, one strong argument against the generality of Cope's rule is the existence of small-sized groups (e.g. muroid rodents), in which body size has clearly not increased over time since they appeared in the fossil record (Stanley, 1973).
Several possible reasons could be put forward to explain the trend towards an increased body mass in Cetartiodactyla. The physiological adaptation to rumination, for example, that appeared independently in Ruminantia and Tylopoda has been shown to be more effective in larger animals (Bae et al., 1983), and the large size of whales and hippos is presumably associated with their aquatic lifestyle (Vislobokova, 2013). It is therefore unclear whether the forces pushing towards a larger size in Cetartiodactyla are the generic ones typically invoked to explain Cope's rule (see above) or are specific to this clade. Additional nuclear data from key Artiodactyla taxa should help in further testing the conclusions of this study by increasing the accuracy of our reconstruction of the body mass evolutionary history in this group.
Our analysis revealed that the mtDNA dN/dS ratio is a valuable marker of ancestral LHT evolution in mammals, provided that (i) a large number of species is available, and (ii) the overdispersion of the dN/dS signal across short branches is properly accounted for. Our mtDNA-aided ancestral reconstructions supported small-sized Cetartiodactyla ancestors, in agreement with the fossil record, whereas methods based on models of continuous character evolution yielded estimates with a very large degree of uncertainty. We suggest that molecular data hold a great potential for ancestral trait reconstruction, opening new prospects for unveiling characteristics of ancient organisms in groups lacking conclusive palaeontological data.
The authors thank Nicolas Lartillot and Frédéric Delsuc for thoughtful discussions and are grateful to Simon Ho and one anonymous reviewer for their helpful comments on the manuscript. This work was supported by and ANR-10-BINF-01-02 ‘Ancestrome’ and by European Research Council Advanced Grant ERC 232971 ‘PopPhyl’.