Trait-dependent diversification and the impact of palaeontological data on evolutionary hypothesis testing in New World ratsnakes (tribe Lampropeltini)


  • R. A. PYRON,

    1. Department of Biological Sciences, The George Washington University, Washington, DC, USA
    Search for more papers by this author

    1. Department of Biology, The Graduate School and University Center, The City University of New York, New York, NY, USA
    2. Department of Biology, The College of Staten Island, The City University of New York, Staten Island, NY, USA
    Search for more papers by this author

R. A. Pyron, Department of Biological Sciences, The George Washington University, 2023 G St. NW, Washington, DC 20052, USA.
Tel.: +1 202 994 6616; fax: +1 202 994 6100; e-mail:


For studies investigating trait evolution, there are at least two important questions. First, have traits under consideration influenced cladogenesis and extinction in the group? Second, how do fossil data alter inferences about trait evolution or diversification-rate dynamics? However, relatively few studies have assessed these questions. Here, we use recently developed methods to test for trait-dependent diversification in the New World colubrid snake tribe Lampropeltini. We also integrate data from fossil taxa into phylogenetic estimation of evolutionary parameters using a simple Monte Carlo randomization test. These analyses suggest that ecological conditions in temperate regions are tied to higher rates of cladogenesis, but that body size is not related to diversification in the group. We also find that the inclusion of fossil taxa alters absolute estimates of size and the rate of size evolution, but not the overall pattern of ecomorphological diversification, as well as estimates of evolutionary rates, particularly extinction.


The apparent evolutionary history of a group implied by extant taxa can differ significantly from that represented by palaeontological data. This difference may be quite large if extinction and cladogenesis have biased the distribution of traits in extant taxa (Maddison, 2006). Integrating neontological and palaeontological data in evolutionary analyses has demonstrably positive effects on phylogenetic inference (Wiens et al., 2010), divergence-time estimation (Pyron, 2011), diversification-rate estimation (Paradis, 2004; Etienne & Apol, 2009) and ancestral character state reconstruction (Laurin, 2004; Finarelli & Flynn, 2006; Albert et al., 2009). In contrast, biases can result when estimating ancestral states and diversification rates using only molecular phylogenies (Cunningham et al., 1998; Finarelli & Flynn, 2006; Liow et al., 2010; Rabosky, 2010), particularly if traits of interest impact rates of cladogenesis and extinction (FitzJohn, 2010). This correlation between diversification rates and trait values can impact historical inference by biasing the probability that a particular character state will be observed in the extant taxa (Maddison, 2006; Paradis, 2008; FitzJohn, 2010). Thus, for any study testing hypotheses regarding trait evolution, there are two important questions. First, have traits under consideration influenced diversification in the group (Maddison, 2006; Freckleton et al., 2008)? Second, how do fossil data alter inferences about evolutionary dynamics (Finarelli & Flynn, 2006; Albert et al., 2009; Liow et al., 2010)?

Here, we test for the presence of trait-dependent diversification in a group of New World (NW) colubrid snakes (tribe Lampropeltini), and evaluate whether the inclusion of fossil data alters conclusions about evolutionary dynamics in the group. Given the strong apparent links between ecological specialization, climate, species richness and body size in the group (Pyron & Burbrink, 2009a,d; Burbrink & Pyron, 2010), we hypothesize that size and niche may be significantly related to diversification rates. Body size has been shown by several authors to affect diversification rates in mammals (Cardillo et al., 2005b; Liow et al., 2008; FitzJohn, 2010), although whether ectotherms such as snakes exhibit similar patterns is unclear. In contrast, rates of morphological evolution are decoupled from cladogenesis in other groups (Adams et al., 2009). Additionally, ecological variables have also been shown to impact diversification rates in numerous other groups (Cardillo et al., 2005a; Jansson & Davies, 2008; Kozak & Wiens, 2010), which could provide an alternate explanation for temperate diversity peaks in Lampropeltini (Pyron & Burbrink, 2009b).

Incorporating fossils into phylogenetic comparative analyses may be one of the only ways to accurately reconstruct the evolutionary history of clades and traits, as fossils represent direct observations of past character states, and extinction events (Polly, 2001; Webster & Purvis, 2002; Laurin, 2004; Finarelli & Flynn, 2006; Albert et al., 2009). A detailed fossil record is also available for Lampropeltini (Holman, 2000), with representatives from most major lineages for which morphological data (e.g. vertebral length, from which body size can be estimated; Head & Polly, 2007; Head et al., 2009). Thus, given a dated chronogram for the group, we can generate a distribution of simulated trees containing fossil taxa randomly placed on branches consistent with their age, representing the potential placement of these extinct species (Fig. 1).

Figure 1.

 Schematic diagram of the proposed Monte Carlo procedure for randomly simulating fossil taxa to generate a distribution of estimated rates and ancestral character states.

We use this distribution to assess error in estimates of diversification rates (Keiding, 1975) and ancestral character states (Schluter et al., 1997) with respect to the extant species, the extinct fossil taxa, and the phylogenetic uncertainty associated with the potential relationships of the fossil species. This can be used to assess confidence in evolutionary hypothesis testing, such as whether ancestral values of body size were large or small compared to the sizes of extant taxa (Polly, 2001; Laurin, 2004; Finarelli & Flynn, 2006; Albert et al., 2009). This strategy allows us to incorporate fossil species into tests of historical evolutionary hypotheses, rather than relying solely on extant taxa (see Fig. 1). We assess the impact of fossil data on estimated rates of cladogenesis and extinction, and the direction and magnitude of changes in body size and ecological niche through time.

First, we test for the signature of trait-dependent cladogenesis and extinction rates stemming from body size and climatic niche in the tribe Lampropeltini, using the recently developed QuaSSE algorithm (FitzJohn, 2010) in the R package ‘diversitree’. We find little support for any impact of body size on rates of cladogenesis and extinction in the group, although both are significantly correlated with ecological niche. Second, we use a simple Monte Carlo randomization test to create a simulated empirical distribution of potential ancestral states for body-size and diversification-rate estimates. We find that the fossil data significantly alter ancestral reconstructions of body size, although this does not affect previous conclusions about patterns of body-size evolution. Finally, diversification-rate estimates are altered substantially when fossils are included, further bolstering recent concerns that molecular phylogenies alone may be insufficient for estimating diversification rates (Paradis, 2004; Liow et al., 2010; Rabosky, 2010). This study highlights the importance of integrating fossil morphology with molecular phylogenetics to better understand evolutionary trajectories and ultimately, the potential for trait-dependent variation in rates of cladogenesis and extinction. Dependence of evolutionary rate on trait values may significantly affect reconstruction of evolutionary history. The impact of palaeontological data is likely to be crucial and rarely insignificant and should be an important consideration for any study attempting to test historical evolutionary hypotheses.

Materials and methods

Phylogenetic, morphological and ecological data

Previous studies have yielded a well-resolved, time-calibrated phylogeny for the majority (31) of the known species (approximately 40) in the tribe Lampropeltini, based on nine genes (six mitochondrial, three nuclear), totalling 8294 bp of sequence data (Pyron & Burbrink, 2009d). This tree represents the mean of the estimated divergence times, used for clarity and simplicity; however, it would be possible to employ the analyses below across the posterior distribution of trees to take into account temporal and topological uncertainty. It should also be noted that the dated chronogram was conditioned on temporal information from some of these fossils (four of the seven fossils used here were employed as constraints) from which trait data were derived. Although the fossils were used as soft-bounded priors (e.g. parameterized with a lognormal distribution without a zero offset, and not hard minimum or maximum bounds), there may still be some circularity in the analyses regarding the temporal and phylogenetic placement of these fossil taxa on the time-calibrated phylogeny. This is likely to be unavoidable in most studies of this kind, unless trees are calibrated without any fossil data (e.g. using biogeographic constraints or known rates of substitution). However, given the consistency of the chronogram with the fossil history of the group (e.g. Holman, 2000), this should not be particularly problematic. Additionally, previous cross-validation analyses indicate that divergence-time estimates in the group are robust with respect to calibration choice (Burbrink & Lawson, 2007). We exclude the outgroup Coronella austriaca and test only the ingroup in all subsequent analyses.

The morphological data set consists of snout–vent length (SVL) measurements from 757 specimens of the same 31 species pooled across sex (Pyron & Burbrink, 2009a; Pyron et al., 2011). These measurements were obtained through the analysis of museum specimens and published records and were corrected for potential allometric scaling problems by removing juveniles (see Pyron & Burbrink, 2009a). These have been used to (i) estimate ancestral body sizes for the group and (ii) infer patterns of ecomorphological diversification (Burbrink & Pyron, 2010). We use these data to test hypotheses about the impact of body size on rates of diversification in the group. The ecological data set comprises 4564 presence locality records for the 31 lampropeltine species. We extracted the 19 BIOCLIM variables (Hijmans et al., 2005), measuring averages, ranges and extremes in temperature and precipitation. These were reduced to the major climate axes using principal component analysis (PCA) based on the correlation matrix of the 19 variables, of which the first (PC1) is applied in all subsequent analyses, with the scores used calculated as the mean within species of the scores for the 4564 records. The PCA was performed in R (R Core Development Team, 2010).

This measurement of niche has previously been shown to represent the greatest dimension of ecological separation among extant species in the group along the latitudinal gradient from the temperate regions in North America (NA), where they exhibit peak richness (> 30 species), to the Neotropics, where as few as one species is observed in most areas (Pyron & Burbrink, 2009b). The first PC axis contains 33.5% of the total variation in ecological niche among the sampled extant species, with tropical species (e.g. Pseudelaphe) scoring highly negatively (< −10) and the most temperate species (e.g. Pantherophis vulpinus) highly positively (> 10). The ecological gradient is seen in the highest (±0.50) factor loadings of the BIOCLIM variables; the most positive are BIO7 (temperature annual range; 0.84), BIO4 (temperature seasonality; 0.71) and BIO2 (mean diurnal range; 0.64); the most negative are BIO13 (precipitation of wettest period; −0.87), BIO16 (precipitation of wettest quarter; −0.87), BIO12 (mean annual precipitation; −0.80), BIO6 (minimum temperature of coldest period; −0.68), BIO19 (precipitation of coldest quarter; −0.64), BIO11 (mean temperature of coldest quarter; −0.61), BIO18 (precipitation of warmest quarter; −0.55) and BIO3 (isothermality; −0.50).

Trait-dependent diversification

We test for trait-dependent diversification related to body size and ecological niche in the lampropeltines using the recently developed Quantitative State Speciation and Extinction (QuaSSE) algorithm, implemented in the R package ‘diversitree’ (FitzJohn, 2010). This algorithm takes a phylogeny and set of trait measurements for the tip species and fits a series of birth–death models in which the cladogenesis and extinction probabilities along branches vary as a function of the trait values, under a Brownian motion (BM) model of diffusion (FitzJohn, 2010). This allows for a comparison of models in which diversification and trait evolution are independent to those in which rates of diversification are directly affected by trait values. The exact mechanism for this would presumably be a biological process that links the trait to the fitness of the organism (Rabosky & McCune, 2010). Simulations have shown this method to be relatively robust under most circumstances (FitzJohn, 2010), although any such algorithm may have difficulty teasing apart cladogenesis and extinction on a molecular phylogeny (Rabosky, 2010).

We used the log-transformed measurements of mean body size (SVL) and ecological niche (PC1) for the 31 measured species, and the empirical standard errors of the measurements (Pyron & Burbrink, 2009a,b). We fit models in which diversification rates were a constant (e.g. rate is invariant with respect to trait value, the null expectation), linear (rate varies as an increasing or decreasing linear function of trait value), sigmoidal (logistic relationship between rate and trait value) and hump-shaped (rates peak at intermediate trait values) function of body size and niche. We compared three sets of models: one in which only cladogenesis varied, one in which both cladogenesis and extinction varied and one in which only extinction varied. These were compared using Akaike Information Criteria (AIC) values to determine the best-fit model. The QuaSSE algorithm required initial starting values for likelihood optimization of the parameters; these were the Maximum Likelihood (ML) estimates of diversification rate, and the mean trait values.

Diversification-rate estimates

Diversification rates are typically estimated using information from the branch lengths of molecular phylogenies (Nee et al., 1994), diversity data (Etienne & Apol, 2009) or both (Rabosky et al., 2007). For trees containing both extinct and living taxa, however, most phylogenetic estimators are inappropriate, as they assume all lineages have survived to the present, whereas diversity-based estimators discard potentially useful branch-length information. The best strategy to analyse such trees would thus be methods combining phylogenetic and diversity estimation for a birth–death process, resulting in observations of both extant and extinct taxa. Such estimators have previously been shown to have sufficient power to estimates diversification rates for trees containing fossils (Paradis, 2004). The likelihood function for the birth–death process resulting in BT births and DT deaths during inline image time, where Xt is the number species living at time t (between 0 and T), has typically been given in terms of λ and μ, the rates of cladogenesis and extinction (Keiding, 1975). Here, we re-parameterize it in terms of r and ε, the net diversification rate (λ−μ) and extinction fraction (μ/λ), respectively:


where inline image and inline image are the ML estimators, and ST is calculated as the sum of the branch lengths of the phylogeny (the total time lived by all species). We use this estimator for the trees containing extinct species and compare the results to the phylogenetic and diversity-based estimates presented for the lampropeltines in previous studies (Burbrink & Pyron, 2010). Code implementing this model in R is available as supporting online material.

Incorporating fossil species

Even fossil taxa for which the available characters are not extensive enough to reconstruct trees still contain affinities with living groups and provide phylogenetic information in the form of their age and stratigraphic range, restricting their placement on a molecular chronogram. A fossil of a given age can thus only be placed at a point along the branches of the dated phylogeny which is older than that fossil. Our protocol consists of randomly selecting and breaking a branch within this age range for that taxon. The fossil is then placed on a branch equal to its total stratigraphic range, plus the gap between the first appearance of the taxon and the randomly selected break point (Fig. 1). This creates randomly distributed fossilization gaps: the time between the actual divergence of the fossil and its first appearance in the fossil record (Foote et al., 1999; Marshall, 2008). Depending on the sampling density of the fossil record [see (Marshall, 2008)], it may be advantageous to weight the sampling probability, such as by the length of the branch, or to place a distribution on the fossilization gaps (see Foote et al., 1999). Note that we maintain the same root age and relative position of the branching events between the extant species, so that comparisons are equivalent with the extant-only analyses. The intent here is not to randomize the impact of the fossils on the divergence times between extant species (which are assumed to be robust), but to randomize the hypothetical placement of the known set of extinct species within the existing temporal framework given by the chronogram. In some cases, however, it may be advantageous to use stratigraphic methods for assigning fossils and re-estimating divergence times between extant species (Marjanovic & Laurin, 2007).

Repeating this procedure for each of the extinct species in the data set yields a tree representing the inferred relationships of the extant taxa based on molecular sequence data, and a random placement of the extinct species. Generating hundreds or thousands of such trees thus yields a simulated distribution of the potential placement of these species within the appropriate temporal window. Using the trait values for the living and extinct taxa, it is thus possible to use these simulated trees to generate a distribution of reconstructed ancestral character states representing the value of a character through time while including information from the extinct taxa. Similarly, combined phylogenetic and taxonomy estimators of diversification rate (Rabosky et al., 2007) can be used to generate rate distributions for cladogenesis, extinction and morphological evolution. These simulated distributions can then be used to assess phylogenetic uncertainty in evolutionary hypothesis testing while incorporating data from the fossil record, and accounting for potential biases introduced by extinction. Similar nonphylogenetic methods have been proposed by other authors (Marjanovic & Laurin, 2008). Code for implementing the method described here in R is available on request from R.A.P.

Fossil traits

Given data on the body sizes of the extinct taxa (Table 1), we use the method described here to test whether body size is significantly related to diversification patterns in the group. The fossil record of NA snakes is fairly detailed (Holman, 2000), and there are a number of extinct species known from the Miocene on, potentially more than 10. Previous studies recognized eight species (Burbrink & Pyron, 2010), although only seven have been described in detail (Holman, 2000). Using measurements of vertebral length, we estimated mean SVL for the fossil species by multiplying vertebra length by the mean number of ventral scales in the extant species to which each fossil bears the closest putative phylogenetic affinity based on previous descriptions (e.g. Holman, 2000; Table 1). Ventral scales show a one-to-one correspondence with vertebral number in most snakes (Voris, 1975), and similar methods are commonly used to estimate the length of fossil snakes (Head & Polly, 2007; Head et al., 2009).

Table 1.   Body-size estimates (snout–vent length; SVL) for extinct species from extant relatives.
Fossil speciesAge (Ma)Vertebra (mm)SVL (cm)Similar extant speciesMean ventral scale no
  1. Data given are mean estimates from known specimens (Ernst & Ernst, 2003; Holman, 2000; Schulz, 1996).

Arizona voorhiesi10.3–4.93.676.3Arizona elegans212
Pseudocemophora antiqua20.6–16.32.848.2Cemophora coccinea172
Pantherophis buisi5.3–4.99.5226.1Pantherophis obsoletus238
Pantherophis kansensis16.3–5.37.0142.8Pantherophis vulpinus204
Pantherophis pliocenica4.1–3.07.5153.0Pantherophis vulpinus204
Lampropeltis similis14.5–5.33.770.3Lampropeltis triangulum190
Stilosoma vetustum7.2–5.31.947.5Lampropeltis extenuata250

Some error is introduced in this procedure because the intracolumnar placement of the vertebrae or the soft tissue between vertebrae is not used in size estimates, but this method has previously provided robust estimates of fossil snake size based solely on vertebral length (see Head et al., 2009). Additionally, size estimates appear to correspond well to previous estimates for some taxa. For instance, Pantherophis buisi was estimated to exceed 2000 mm SVL based on overall size of the available skeletal elements [(Holman, 1973; Schulz, 1996); exact methods not given], similar to the empirical measurements here (Table 1). Although the fossil taxa here are represented by small amounts of temporally homogenous material, it may be necessary to evaluate assumptions about whether or not a range of fossil materials represent a single chronospecies that is distinct from the ancestral lineage, or any similar extant lineages. Similarly, it may be important to analyse fossil taxa for any temporal variation in measured traits, as characteristics like body size may show long-term changes within species.

For each species, we noted the extant species to which the fossil showed the greatest overall morphological resemblance, and putative phylogenetic affinity. Given the conserved nature and strong relationship between SVL and somatic pleiomerism in snakes (Voris, 1975; Head & Polly, 2007; Head et al., 2009), this should provide relatively robust results. To avoid confusion, note that the best-guess phylogenetic placement of each species may not be sister to the species of greatest morphological similarity. For instance, Pantherophis kansensis is very similar morphologically and previously proposed as ancestral to Pantherophis vulpinus (Holman, 2000), but is somewhat (∼5 Ma) older than that species (Table 1). Thus, the SVL estimate is based on morphological similarity, and the potential phylogenetic placement is restricted temporally. The variation observed in ventral scale number (1.25-fold) among extant species considered here is much less than that of the vertebral length from the fossils (3.4-fold); thus, the results should be relatively robust to the extant taxa used for estimation.

Body size has previously been shown to fit a BM model in the lampropeltines (Burbrink & Pyron, 2010). We reconstruct ancestral states under BM with ML (Schluter et al., 1997) using the ‘getAncStates’ command in the GEIGER package (Harmon et al., 2008). For the molecular phylogeny, we use the ‘birthdeath’ command in the APE package (Paradis et al., 2004) to estimate diversification rates, and the ML estimator described above for trees containing fossil taxa. Previous phylogenetic estimates of extinction were zero (Burbrink & Pyron, 2010), despite at least seven extinct species described from the Miocene and Pliocene (Holman, 2000). We calculated ancestral states and diversification rates for the molecular phylogeny (Pyron & Burbrink, 2009d), the simulated distribution of fossil placements (1000 replicates) and the tree containing the best-guess placement of the fossil taxa based on their hypothesized relationships and stratigraphic origin and range (Holman, 2000). This best-guess phylogeny places the fossil taxa as close as possible to the extant species of the closest phylogenetic affinities as proposed by previous authors (Holman, 2000).


Trait-dependent diversification

Fitting models in which cladogenesis and extinction rates vary as a function of body size (SVL) and ecological niche (PC1) reveals potential links between climatic and diversification rate, but not body size (Table 2). A constant-response model for body size fits significantly better than any other variable response model, either for cladogenesis or extinction, or both. There does not appear to be any significant relationship between body size and diversification rate through time in the lampropeltines. Thus, evolutionary hypothesis tests about body-size evolution based on data from extant species is not likely to have been affected by biases in the topology or branch lengths of the phylogeny due to variable cladogenesis or extinction related to body size. However, the inclusion of data from fossil species may still have a significant effect on ancestral reconstructions and phylogenetic comparative analyses (Fig. 2).

Table 2.   Trait-dependent diversification models fit to body-size (snout–vent length; SVL) and niche (PC1) data.
  1. Asterisk indicates significance of model.

  2. Italic models are significant, while the bold model is the best-fit model chosen. ‘#P’ indicates the number of free parameters of the model. All models are compared against the constant model, yielding the calculated AIC scores.

Body size (SVL)
Niche (PC1)
Figure 2.

 Phylogeny of tribe Lampropeltini (Pyron & Burbrink, 2009c), and stratigraphic and putative phylogenetic placement (i.e. branching points) of extinct species (Holman, 2000). Plots show estimates of the root state and diversification rate (with the estimated extinction fraction, ε) for the simulated trees (‘Mean’, the seven extinct taxa randomly shuffled across all branches older than their first appearance), the molecular phylogeny (‘Extant’), and the best-guess placement of the fossil taxa (‘Fossil’, indicated by labelled branching points). The addition of fossils decreases estimates of the ancestral state at the root and estimates of net diversification rate (cladogenesis minus extinction), whereas ε increases.

In contrast, all three classes of models show significant responses of diversification rates to ecological niche (PC1). The best-fit model is one in which cladogenesis rate is a hump-shaped function of niche, with constant extinction (Table 2). A sigmoidal response, where cladogenesis rates asymptotically approach minimum and maximum values at the extremes of the trait, is only marginally insignificant (= 0.06). A model in which both cladogenesis and extinction exhibit a hump-shaped response is also significant, as is a positive linear-response variable-extinction model (Table 2). We repeated the analysis for the best-fit model using PCA scores derived using the phylogenetically corrected method (Revell, 2009), which produces highly similar PC scores, and found that the hump-shaped variable-cladogenesis model remains the best fit (ΔAIC = 1.75), although the magnitude of significance is somewhat reduced (P = 0.05). This method also assumes independence of the phylogeny from the trait and uses only the means within species, rather than the full 4564 locality data set. Given the strong apparent influence of the observed niche PC values on the tree structure, we interpret the hump-shaped variable-cladogenesis model as the best explanation for the observed patterns.

Effects of fossil data

The ML estimate of ancestral body size (SVL) at the root under a BM model is 82.8 cm. In contrast, the mean estimate from the simulated-placement trees was 69.7 cm, with a 95% confidence interval (CI) of 65.5–77.9 cm (Fig. 2). The estimate from the best-guess fossil-placement tree was 67.6 cm. Thus, the estimated ancestral value using the extant taxa is significantly larger than those estimated when taking fossil information into account. The smaller ancestral-size estimates derived from the combined data indicate that the extinct species contribute information on the evolution of body size that cannot be derived from the extant taxa alone, even though neither estimate has a relationship with diversification rates (Fig. 4). The estimated rates of morphological evolution (σ2) also increase: the rate from the extant-only tree is 0.008, whereas the best-guess fossil-placement tree and the 1000 random trees yield mean estimates of 0.018–0.019 (Fig. 2), more than doubling when fossils are included.

As the temporal placement of fossils can significantly restrict their phylogenetic placement, some fossils (e.g. Pseudocemophora antiqua, which occurs near the root) may exert a disproportionate influence over the results. Therefore, we repeated this procedure without P. antiqua, which yielded a 95% CI of 79.4–97.3 cm, including the estimate from the extant-only tree. The mean rate of morphological evolution from when P. antiqua is excluded remains 0.019, indicating that absolute size estimates, but not estimated rates, change significantly when this species is not included. Thus, individual fossil taxa and assumptions about their phylogenetic affinity (e.g. as a member of the ingroup) can have a strong effect on these analyses. However, morphology of vertebrae as well as jaw elements and teeth appears to strongly group P. antiqua with the crown-group lampropeltines (Auffenberg, 1963; Holman, 2000), suggesting that the patterns observed from its inclusion are legitimate representations of the history of the group.

Using the simulated distribution, we corroborate previous studies suggesting that an intermediate-sized ancestor subsequently underwent a bidirectional divergence (Pyron & Burbrink, 2009a). This conclusion is not substantially altered by the inclusion of fossil information. Extant taxa range from ∼30 to 140 cm SVL, so the MRCA of the group was intermediate between them, rather than approaching either extreme of large or small body size. As noted, climatic information is not available for the fossil species, so this analysis cannot be used to assess variation in ecological niche reconstructions for the group. Accordingly, we are also unable to generate a distribution of correlation coefficients between the two traits, although they are not significantly related using either tip values (Spearman’s rank correlation coefficient rs = −0.04, P = 0.81) or phylogenetically independent contrasts (rs = −0.09, P = 0.69). These correlations were performed in R (R Core Development Team, 2010).

Finally, diversification-rate estimates are also significantly affected by the inclusion of fossil data. Diversity-based estimators of cladogenesis and extinction under a constant-rate model (Etienne & Apol, 2009) yielded λ = 0.19 and μ = 0.04 (= 0.15, ε = 0.21), whereas the best-fit model for the phylogenetic data was a 2-rate Yule process, transitioning from = 0.134 to = 0.032 at 5.2 Ma, with ε = 0 (Burbrink & Pyron, 2010). Fitting a single-rate birth–death model to the phylogenetic data gives = 0.094 and ε = 0. Given that there are possibly more than 10 extinct species and approximately 40 extant species, extinction has apparently not been zero in the group, further corroborating recent studies suggesting that extinction rates cannot usually be correctly inferred from molecular phylogenies (Quental & Marshall, 2009; Liow et al., 2010; Rabosky, 2010). The mean estimate of r from the simulated trees was 0.082, with a 95% CI of 0.011–0.033, whereas ε was 0.189 (Fig. 2). The estimates from the best-guess fossil-placement tree were = 0.088, ε = 0.189. Thus, the inclusion of the fossil data predictably and significantly changes estimates of diversification rates. In particular, extinction is estimated to be nonzero, presumably a much better estimate as evidenced by the numerous extinct taxa.


Diversification in Lampropeltini

Ecomorphological and species diversity in the group is strongly related to both body size and ecological niche (Pyron & Burbrink, 2009a,b; Burbrink & Pyron, 2010). Of these, ecological niche is directly correlated with rates of cladogenesis and extinction, with the highest diversification rates occurring in intermediate (warm temperate) climates (Table 2; Fig. 3). Lineages with PC scores from −3.15 to −0.59 on the first PC of ecological niche measured using the 19 BIOCLIM variables experience significantly elevated rates of cladogenesis (Fig. 3). This climatic niche range characterizes extant species primarily found in warm temperate climates in the eastern and western United States (e.g. Pantherophis (part), Pituophis (part), Lampropeltis triangulum, L. zonata, L. calligaster, Cemophora coccinea), and southwestern deserts (e.g. Bogertophis rosaliae, Pituophis deppei, L. mexicana). Moreover, undiscovered phylogeographic diversity (i.e. new species) may be even higher in these regions than included here (Pyron & Burbrink, 2009c,e; Burbrink et al., 2011). In contrast, lineages found in tropical climates (e.g. Pseudelaphe flavirufa) and colder temperate environments (e.g. Pantherophis vulpinus) are predicted to have experienced decreased rates of cladogenesis (Fig. 3). There also appears to be an elevational aspect to this response (Fig. 3) as climate varies along elevational gradients. Both of these are well-known drivers of cladogenesis (Kozak & Wiens, 2006, 2007).

Figure 3.

 Phylogeny of extant Lampropeltini represented as a ‘traitgram’, where the height of the tips corresponds to the trait value (PC1 score for ecological niche), and the height of the internal nodes corresponds to the ML reconstructions under the Brownian motion model. The plot on the left shows the estimated cladogenesis rate as a hump-shaped function of niche from the best-fit QuaSSE model. The inset map shows the 4564 climate localities for the 31 ingroup species, with greener, smaller dots corresponding to lower predicted rates of diversification based on scores for PC1, and redder, larger dots to increased predicted rates, illustrating a mostly temperate increase.

Thus, previous analyses (Pyron & Burbrink, 2009b) suggesting that diversification rates did not appear to vary latitudinally (i.e. with climate) and that niche conservatism limited dispersal and diversification outside of temperate regions may have been confounded by these correlated evolutionary dynamics. However, we note that predictions of increased diversification related to ecological niche in temperate zones might be difficult to distinguish from the effect of originating and existing in temperate areas longer than tropical regions (e.g. due to decreased dispersal). In other words, the underlying cause of the link between climate and diversification rate is still unclear, but may still be related to phylogenetic niche conservatism if failure to adapt to highly temperate or tropical climates limits diversification in those regimes and normal cladogenesis rates continue in the preferred climate (Wiens & Donoghue, 2004; Wiens et al., 2006). Although numerous studies have investigated the correlation between climatic history and species diversification (Evans et al., 2009; Pyron & Burbrink, 2009c; Vieites et al., 2009), few have tested for links between ecology and rates of cladogenesis and extinction. This approach may prove powerful for investigating the underlying causes of global species richness, particularly with regard to untangling increased diversification due to the impacts of climate alone from those due to niche conservatisms and area of origin.

In contrast to ecological niche, body size does not appear to be significantly related to diversification rates. However, the integration of body-size data from fossil taxa significantly alters the absolute value of ancestral-size estimates and rates of morphological evolution (Fig. 4), although this does not strongly contradict the inference of a bidirectional diversification towards the present-day extremes from an intermediate ancestor (Pyron & Burbrink, 2009a). Ultimately, accurate estimates of ancestral character states may be unattainable in groups with a poor fossil record. In groups where fossils are available, comparison of the tip data with estimates generated with and without fossil species should permit consideration of more general hypotheses, such as whether an ancestor was ‘large’ or ‘small’ compared to modern species (Finarelli & Flynn, 2006; Albert et al., 2009). Importantly, the evolutionary portrait of body-size diversification based solely on extant taxa differs significantly from the one offered by the extinct species (Fig. 4), highlighting the importance of palaeontological data.

Figure 4.

 Phylogeny of the extant (a) and extant plus the best-guess placement of the extinct (part; b) species represented as a traitgram, where the height of the tips corresponds to the trait value (ln[svl]), and the height of the internal nodes corresponds to the ML reconstructions under the Brownian motion model. A bidirectional divergence towards major extant clades exhibiting ‘large’ (red and blue), ‘intermediate’ (magenta) and ‘small’ (orange) body sizes (Pyron & Burbrink, 2009a) from an intermediate ancestor can be seen on the left. The addition of fossil data (b) increases the variance through time, a pattern that is not evident when considering only the extant taxa.

Incorporating fossil data

Estimates of diversification rate from the molecular phylogeny also appear to be biased when fossil species are excluded, whether they use phylogeny (Nee et al., 1994) or diversity-based estimators (Etienne & Apol, 2009). Including phylogenetic data from fossils and using appropriate estimators (Keiding, 1975; Paradis, 2004) yields far more reasonable estimates of net diversification and extinction (Fig. 2). Importantly, the ML estimator of the extinction fraction ε under a single-rate birth–death model where extinct taxa are observed is simply the number of extinctions divided by the number of cladogenesis events. This can provide at least a preliminary estimate of extinction for any group with extinct species and can be used as a more robust minimum bound on ε, rather than the value of 0 as estimated here for these snakes and commonly in other groups, regardless of the prevalence of fossils (Rabosky, 2010).

Even when fossil data are considered in an explicitly phylogenetic context, as here, some temporal, phylogenetic uncertainty (as seen with Pseudocemophora antiqua and Lampropeltis vetustum) and even allometric variation can increase the potential variance in ancestral estimation. All fossil specimens were assumed to represent adults, but this is not based on direct morphological evidence (e.g. fused neurocentral structures). The strategy proposed here is an attempt to incorporate this variance for a better estimate of palaeontological uncertainty in historical evolutionary analyses. For instance, if an extinct species existed long past their final occurrence in the fossil record, this will cause rates to be overestimated, and vice versa if they are placed on long branches relative to their true duration. However, the addition of any fossil species is likely to give a clearer picture of the true potential distribution of rates, compared to molecular phylogenies alone (Paradis, 2004; Etienne & Apol, 2009; Rabosky, 2010). Credible intervals generated with palaeontological data allow for evolutionary hypotheses to be evaluated with some measure of confidence relative to the fossil record.

Trait-dependent diversification and effects of fossil data

Although traits of interest and the underlying phylogeny are often considered to be independent, inferences about trait evolution and diversification in a group based on this assumption can be drastically misled if it is violated (Maddison, 2006; Paradis, 2008; FitzJohn, 2010). Thus, for any study attempting to answer historical evolutionary questions, fossil data are likely to be crucial for hypothesis testing and estimating diversification rates (Laurin, 2004; Paradis, 2004; Finarelli & Flynn, 2006; Albert et al., 2009). Nevertheless, there are several remaining issues concerning fossil data, historical inference and strategies such as those discussed and presented here. Some of these problems are well known, such as the incompleteness of the fossil record for many groups or that fossil taxa typically only provide data for a few types of traits, which may bias phylogenetic interpretation (Sansom et al., 2010). Furthermore, it is not possible at present to test for trait-dependent diversification with strategies such as QuaSSE using trees containing fossil taxa. The current implementations of these algorithms are conditioned on lineage survival to the present (i.e. molecular phylogenies of contemporaneous taxa) and have not been extended to allow extinct tips (FitzJohn, 2010). However, some methods may be able to partially account for this (Marjanovic & Laurin, 2008).

Here, we have shown that historical evolutionary inferences can be misled by the effects of extinction in two ways: (i) even when characters are not correlated with diversification (such as body size), stochastic extinction can alter the signature of trait evolution and rates of diversification through time and (ii) when characters are correlated with diversification (e.g. ecological niche), this can produce misleading inferences about dynamics of trait evolution, particularly patterns related to species richness. New methods now allow trait-dependent diversification to be addressed using molecular phylogenies without fossil data (Maddison et al., 2007; Paradis, 2008; FitzJohn, 2010). Strategies such as the Monte Carlo randomization procedure described here can create test distributions of parameters based on data from both the fossil record and molecular phylogenies. This provides at least a preliminary method for incorporating fossil data to address their impact on historical evolutionary analyses.


This research was funded by NSF grant DBI-0905765 awarded to R.A.P. We would like to thank R.G. FitzJohn, E. Paradis and L.J. Revell for analytical advice; S.J. Steppan, M.A. McPeek, P.D. Polly and D. Marjanović for comments that significantly improved this manuscript; and K. L. Krysko for providing measurements of the holotype of Stilosoma vetustum (UF6467).