• Daniel L. Rabosky

    1. Department of Ecology and Evolutionary Biology, Cornell University, Ithaca, New York 14853
    2. Cornell Laboratory of Ornithology, 159 Sapsucker Woods Road, Ithaca, New York 14850
    3. Department of Integrative Biology, University of California, Berkeley, California 94720
    4. E-mail: drabosky@berkeley.edu
    Search for more papers by this author


Molecular phylogenies contain information about the tempo and mode of species diversification through time. Because extinction leaves a characteristic signature in the shape of molecular phylogenetic trees, many studies have used data from extant taxa only to infer extinction rates. This is a promising approach for the large number of taxa for which extinction rates cannot be estimated from the fossil record. Here, I explore the consequences of violating a common assumption made by studies of extinction from phylogenetic data. I show that when diversification rates vary among lineages, simple estimators based on the birth–death process are unable to recover true extinction rates. This is problematic for phylogenetic trees with complete taxon sampling as well as for the simpler case of clades with known age and species richness. Given the ubiquity of variation in diversification rates among lineages and clades, these results suggest that extinction rates should not be estimated in the absence of fossil data.

Time-calibrated molecular phylogenies contain information about both the timing and rate of species diversification and thus provide a complementary window into macroevolutionary processes that are often obscured by the incompleteness of the fossil record. Numerous studies have used molecular phylogenies to characterize the pattern of species diversification through time (e.g., Nee et al. 1992; Harmon et al. 2003; Ruber and Zardoya 2005; McPeek 2008) and to quantify differences in diversification rates among lineages (e.g., Mooers and Heard 1997; Moore et al. 2004; Moore and Donoghue 2007; Rabosky et al. 2007).

One of the most intriguing applications of molecular phylogenies involves the inference of extinction rates from data on extant taxa only (Nee et al. 1994a; Paradis 2003; Maddison et al. 2007; Ricklefs 2007). It may seem counterintuitive that living species could provide any information on historical extinction rates, but this is indeed possible, because the shape of phylogenetic trees is influenced both by the net rate of lineage diversification through time as well as the ratio of the extinction rate μ to the speciation rate λ. This parameter (μ/λ), denoted by ɛ, is also known as the relative extinction rate and is critically important in determining the distribution of speciation times that occur in a molecular phylogeny. Differences in the relative extinction rate can result in different phylogenetic tree shapes, even for clades diversifying under precisely the same net diversification rate (denoted by r, where r=λ−μ). This phenomenon occurs because high relative extinction rates lead to high-lineage turnover through time, which changes the “age structure” of nodes in a phylogenetic tree. When ɛ is low, many species will be relatively old, but when ɛ is high, most species will be young, simply as a function of high-lineage turnover. This leads to the appearance of a temporal acceleration in the rate of diversification through time, a phenomenon that has been explored by numerous prior studies (Nee et al. 1994b; Rabosky 2006b).

Methods for analyzing species diversification rates often make assumptions about the constancy of diversification rates through time (Magallon and Sanderson 2001) or among lineages (Pybus and Harvey 2000; Rabosky 2006b). For example, many studies have inferred diversification rates from clade age and species richness data with the explicit assumption that rates have been constant through time within clades (Hunt et al. 2007; Rabosky et al. 2007; Wiens 2007; Alfaro et al. 2009; Magallon and Castillo 2009). However, Rabosky (2009a) showed that this assumption is often violated, leading to potentially incorrect inferences about the causes of variation in species richness among clades.

In this article, I explore the consequences of among-lineage rate variation for estimates of extinction rates. I demonstrate that, when rates vary across the branches of phylogenetic trees or among clades, estimators that assume rate-constancy among lineages perform poorly. In many cases, the problem is sufficiently severe that extinction should not be estimated at all.



To assess the effects of among-clade variation in net diversification rates (r) on estimates of extinction, I used a general simulation protocol of (1) drawing diversification rates for a set of clades from a distribution with a specified mean and variance (see below) and under a known relative extinction rate ɛ; (2) simulating clade diversity under those rates; and (3) estimating ɛ for each set of clades. The central question is thus whether increased variance in net diversification rates among clades can lead to erroneous inference on ɛ.

Given a set of clade ages with associated species richness data, we can use the theory of the birth–death process to estimate both r and ɛ. Under the simple birth–death process (rates constant through time and among lineages), we can compute the probability that a number of ancestral lineages, a, will result in a total of n surviving lineages, after some amount of time t, given a particular speciation and extinction parameterization. This probability can be difficult to compute when a is large, but for the case in which information is available on stem clade ages (a= 1), it is simply (Raup 1985)




Note that (1) has been conditioned on clade survival to the present. If data are available for many clades, (1) can be used to find the likelihood that a particular birth–death parameterization has generated the observed data (Ricklefs 2007). The likelihood of the full data is given by


and is maximized with respect to r and ɛ (Bokma 2003).

I conducted simulations using diversification rates and clade ages parameterized from an avian families dataset (Sibley and Ahlquist 1990) that has been the subject of previous analyses of diversification (Paradis 2003; Ricklefs 2003). The data consist of 127 avian families with at least two species and includes their respective species richness and stem-clade ages inferred from DNA hybridization studies. The decision to use this dataset over any other is arbitrary and was made to facilitate meaningful comparisons between the variance in rates used in simulations with the variance in rates observed among real clades. This tree is known to conflict with more recent phylogenetic analyses (e.g, Barker et al. 2004), but is expected to be adequate in the present context, as a rough framework for parameterizing simulations.

I assumed that rates for each family are drawn from a single gamma distribution; gamma is a reasonable choice here, as it is defined on the appropriate interval (0, ∞), has great flexibility in shape, and is widely used to model evolutionary rate variation in molecular evolutionary studies (Yang 1993). Given such a model of rate variation, one possible method for estimating the variance in rates (σ2)r observed among avian families is to estimate r for each clade independently using stem-clade estimators for r that have been used in previous studies (Raup 1985; Magallon and Sanderson 2001). A gamma distribution can then be fitted to these inferred rates; the shape (k) and scale (θ) parameters of this fitted distribution specify the mean (kθ) and variance (kθ2) of the distribution of r. This approach assumes that diversification rates within each clade can simply be calculated from the observed species richness and clade age. However, diversification in the birth–death framework is a stochastic process: even if all clades have the same rate, this approach will estimate different rates for most or all clades, simply due to stochastic variation in species richness. This may lead to a problem of overparameterization, because estimation of separate rates for each clade is effectively a model with the same number of parameters as data points. The statistical and biological consequences of assuming this model are poorly known, and model selection procedures are rarely used to test for overparameterization.

A more appropriate framework might be to assume that species richness data are the observed outcomes of a stochastic process in which each clade has a rate ri drawn from an overall gamma distribution with parameters k and θ. The rates themselves are not directly observed, but we can integrate over all possible rates given a particular gamma(k, θ) parameterization to find the likelihood of the data. In this framework, the likelihood of a particular species richness value n becomes


where Pr(n | r, ɛ) corresponds to equation (1) and f(r | k, θ) is a gamma density. The likelihood of the full data D becomes


where β is given by equation (2) and the expression in curly braces is the probability density of the gamma distribution. By integrating over all possible rates, we can describe rate variation among clades in terms of the two parameters of the gamma distribution (scale and shape), and this fitted model can be directly compared to models that assume homogeneity of rates or that allow rates to vary over time (Rabosky 2009a). I refer to this model as a “relaxed-rate” model of rate variation, analogous to relaxed-clock methods of modeling molecular evolutionary rate variation (Thorne et al. 1998; Drummond et al. 2006).

I estimated the distribution of net diversification rates that best described the variation in species richness among avian families assuming that all clades diversified under a common relative extinction rate, and I considered five relative extinction scenarios: ɛ= 0, 0.25, 0.5, 0.75, and 0.95. Numerical integration was used to compute the likelihoods in (4), and likelihoods were maximized using the Nelder–Mead algorithm. Results of fitting the relaxed-rate model to the avian families dataset are given in Table 1; maximum log-likelihoods of the data under the constant rate model (eq. 3) are included for comparison and demonstrate a substantial improvement in model fit by accommodating among-lineage variation in r.

Table 1.  Parameters and log-likelihoods inferred for the avian families dataset under a relaxed-rate model of rate variation for five relative extinction rates (ɛ). LogL gives the log-likelihood of the data under the relaxed-rate model. For comparison, LogL-CR* gives the log-likelihood under a model with constant rates across all lineages (eq. 3). Parameters are the shape (k) and scale (θ) parameters of the fitted gamma distribution, and S2obs is the variance of the fitted distribution. Note that the large increase in the log-likelihood of the data under the relaxed-rate model relative to the constant-rate model comes with the addition of just a single parameter.
ɛ LogL LogL-CR*ParametersMean rateS2obs
0−656.8−781.9k=4.5, θ=0.0660.2980.02
0.25−655−772.1k=4.16, θ=0.0670.2760.018
0.5−652.6−759.2k=3.67, θ=0.0670.250.016
0.75−649.1−739k=2.87, θ=0.0690.1980.014
0.95−643.5−705.5k=1.69, θ=0.0590.0990.006

The observed mean and variances (S2obs) in rates for each ɛ class were used to parameterize simulations. Each simulation entailed drawing a rate for each clade from a gamma distribution with the same mean as the avian families data and variance vS2obs, where v is a multiplier of the observed variance. One can view v as the variation in diversification rates used for a particular simulation, relative to the variation observed in the avian families dataset. Simulations were conducted under five variance multipliers: v= 0, 0.5, 1.0, 1.5, and 2.0. Thus, a dataset simulated under v= 1.0 had rates drawn from a gamma distribution with variance identical to that observed for the avian families, and simulations with v= 2.0 had rates drawn from a distribution with twice the observed variance. To rescale the gamma distribution with respect to v, we first compute the scale parameter θ by dividing the target variance (vS2obs) by the target mean, as θ is the ratio of the mean to the variance. The target mean is then divided by the new θ to give the updated shape parameter k. A scenario in which all clades have the same rate corresponds to v= 0. Figure 1 illustrates a fitted gamma distribution of rates (under ɛ= 0) as well as the corresponding curves with half and twice the observed variance (v= 0.5 and v= 2.0, respectively).

Figure 1.

Fitted gamma density of net diversification rates for avian families assuming ɛ= 0 (black curve) and under alternative variance scenarios considered in simulations (gray curves). All curves have an identical mean (see Table 1), but dashed curve corresponds to a distribution of rates with 50% the observed variance (v= 0.5) and solid gray curve denotes a distribution with twice the observed variance (v= 2.0).

A single simulation consisted of drawing 127 rates from the fitted gamma distribution and pairing each rate uniquely with one of the 127 avian families. The rates were then used to simulate species richness data (see Rabosky 2009a) given the ages of those families. Note that the distribution of species richness for a given diversification process is geometric with parameter 1 –β (eq. 1). It follows that species richness can be simulated by drawing from this distribution until the first nonzero value is obtained, as we have conditioned on survival to the present. I investigated 25 evolutionary scenarios in all (five ɛ classes, each with five v classes), with 5000 simulations conducted under each scenario.

For each set of clade ages and simulated richness data, ɛ was estimated using the simple constant-rate estimator described above (eq. 3). To guard against the possibility of recovering local rather than global optima, 50 optimizations were performed on each simulated dataset, using random starting parameters. Initial ɛ values for each optimization were drawn uniformly on the interval [0, 0.99]. In the Supporting information, I demonstrate that increasing the number of independent optimizations does not change the maximum likelihood estimates of parameters.


To investigate the effects of among-lineage variation in diversification rates on estimates of ɛ, I constructed a phylogenetic tree simulation algorithm that allows speciation rates (λ) to evolve along branches of the phylogenetic tree (Heard 1996). Simulations were conducted in continuous time; λ values for each branch were drawn from a lognormal distribution with a mean equal to the value of the parent branch and a standard deviation equal to Tσ, where σ is the standard deviation of the process of rate evolution and T is the length of the parent branch. As σ increases, the magnitude of heterogeneity in rates among branches increases. I conducted simulations under four relative extinction rates: ɛ= 0, ɛ= 0.25, ɛ= 0.5, and ɛ= 0.75. For simulations with nonzero extinction, the relative extinction rate ɛ was kept constant by recalculating μ on each branch once a new speciation rate had been drawn.

Initial speciation rates for each ɛ category were chosen to result in an expected value of N = 50 surviving lineages after 50 time units, corresponding to λ= 0.064, 0.078, 0.102, and 0.151 for ɛ= 0, 0.25, 0.5, and 0.75, respectively. The parameter of among-lineage rate variation σ was varied from 0 to 0.06 in increments of 0.005, with 2000 trees generated per σ and ɛ. I estimated ɛ by fitting a constant rate birth–death model to the distribution of speciation times (Nee et al. 1994b). All simulations and analyses were conducted in R, with some code modified from the Ape (Paradis et al. 2004), Geiger (Harmon et al. 2008), and Laser (Rabosky 2006a) packages. Source code (in R) for the simulation of species richness, phylogenetic trees, and model fitting is available as Supporting information.



Among-clade variation in diversification rates exerts a potent effect on inferences about ɛ based on the birth–death model (Fig. 2). For clades simulated under constant r among lineages (v= 0), estimates of ɛ are generally consistent with the simulation model (Fig. 2; Table 2). However, as the variance in rates among clades is increased, estimates of ɛ become inconsistent with the simulation model. Any variation in r among clades leads to inaccurate estimates of ɛ, and this problem is especially acute for intermediate relative extinction rates. For example, when the extinction rate is one-half of the speciation rate (ɛ= 0.5), the estimated ɛ values are reasonable for simulations with constant rates among clades (Table 2; Fig. 2, top row). However, these estimates become nonsensical with even minor among-lineage rate variation: median and modal estimates of ɛ are 0.16 and 0.01 for the scenario with v= 0.5.

Figure 2.

Increased variance in net diversification among clades leads to incorrect estimates of relative extinction, even when extinction is not present. Each histogram represents the distribution of estimated ɛ values. Histograms in the same column were simulated under identical ɛ values (arrows at top), whereas those in the same row were simulated under identical magnitudes of among-clade rate variation. The magnitude of rate variation among clades is given by v. As v increases, estimates of ɛ tend to 0 or 1. Simulation models with v= 0 correspond to constant rates for all clades, and simulations with v= 1.0 were conducted with among-clade rate variation identical to that estimated for the avian families dataset.

Table 2.  Estimates of relative extinction rates for species richness data simulated under a model with constant rates among clades (v=0; Fig. 2, top row) and under five relative extinction ratios. Modes of each distribution were inferred using kernel density estimation with a Gaussian smoothing kernel. Quantiles give the 2.5 and 97.5 percentiles of the distribution of estimates. Note that the distribution of estimated ɛ values is bimodal under ɛ=0.25.
SimulationMeanMedianModeQuantiles model (ɛ)–0.55

The problem is exacerbated as the variation in rates among clades increases. As v increases, the distribution of estimated ɛ values becomes bimodal, with peaks on both boundaries (ɛ= 0 and ɛ= 1), with few estimates falling between those values. Thus, minor variation in diversification rates among clades results in severely compromised inference. When the data are analyzed with methods that assume constant rates among clades, we are left with the erroneous perception that a birth–death model with extremely high (ɛ= 1) or low (ɛ= 0) relative extinction provides an explanation for observed patterns of species richness.


For phylogenetic trees with complete taxon sampling, estimates of ɛ are reasonably unbiased in the absence of among-lineage rate variation (Table 3), although confidence intervals are large. However, a pronounced upward bias in estimates of ɛ is noted when rates vary among branches. Even under a simulation model with ɛ= 0, increased rate heterogeneity among branches inflates estimates of ɛ (Fig. 3). For simulations lacking extinction (Fig. 3A), mean and median estimates of ɛ are 0.56 and 0.60, respectively. A positive bias in estimates of ɛ occurs with increasing heterogeneity in diversification rates across all extinction scenarios considered.

Table 3.  Estimates of relative extinction rates for phylogenies with complete taxon sampling simulated under a model with constant rates among lineages (σ=0; Fig. 3A). Quantiles give the 2.5 and 97.5 percentiles of the distribution of estimates.
SimulationMeanMedianModeQuantiles model (ɛ)–0.95–0.99
Figure 3.

Variation in diversification rates among branches of phylogenetic trees leads to inflated estimates of ɛ. Histograms depict frequency distributions of estimated ɛ values, as a function of σ. Simulations were conducted under (A) ɛ= 0, (B) ɛ= 0.25, (C) ɛ= 0.5, and (D) ɛ= 0.75 (see arrows at right). As σ increases, there is an increase in the number of simulated trees that appear to have high relative extinction rates.


Previous studies have shown that confidence intervals on relative extinction rates as inferred from phylogenetic data are generally large (Nee et al. 1994a,b), and that this parameter may be difficult to infer in the absence of fossil data (Paradis 2004; Maddison et al. 2007). My results suggest that the problem is even more severe than typically thought, because among-lineage variation in diversification rates results in erroneous or directionally biased estimates of ɛ. This applies not only to phylogenies with complete taxon sampling (Fig. 3), but also to clade age and species richness data. Even comparatively minor variation in rates among clades leads to profound error in the estimation of ɛ from clade age and species richness data (Fig. 2).

For clade age and species richness data, the estimator appears to be fundamentally unstable with respect to even minor violations of model assumptions (e.g., v= 0.5). It is clear that rate variation among lineages results in estimates of ɛ that approach 0 or 1, regardless of the true relative extinction rate. It is especially interesting that, even within a single parameterization (e.g., v= 2.0, ɛ= 0.5), estimates of ɛ are bimodal and centered on the endpoints of the distribution of possible values. In the Supporting information, I show that this bimodality appears to be related, in part, to the ratio of species richness in old clades relative to young clades (Fig. S5). Because this ratio will vary as a result of stochasticity, simulations conducted with identical parameterizations can show dramatic differences in estimates of relative extinction. This bimodality is not limited to scenarios with high among-lineage rate variation, but is also observed for constant-rate diversification processes with small numbers of clades (Figs. S2 and S3). The latter occurs because, when only few clades are considered, the ratio of richness in young and old clades is driven by stochasticity: in some cases, young clades might even be more diverse than old clades, simply due to chance alone, and this has important consequences for estimated relative extinction rates (Fig. S4).

For phylogenetic trees with complete taxon sampling, the upwards bias in estimates of ɛ with respect to rate variation has to do with the fact that the birth–death model predicts a geometric distribution for species richness with a modal value of n= 1. This is best illustrated by an example: suppose a clade is diversifying under a pure-birth process, with λ= 1 lineage/million years (my). Imagine the effects of a large rate decrease occurring in a lineage at 1 my before present: if this lineage undergoes a complete cessation of diversification in the recent past, such that λ= 0, we will observe that the lineage will have left only a single descendant in the present—itself. This is not an especially unlikely event under the pure-birth model; even with λ= 1, the probability of observing just a single descendant is 0.375. However, suppose that this same lineage has undergone a fivefold increase in the speciation rate at 1 my before present. In this case, the new speciation rate is λ= 5, leading to a substantial increase in the number of lineages in the past million years. With λ= 5, the probability of leaving a single lineage is small (P= 0.006), and the 2.5% and 97.5% quantiles on the expected number of progeny lineages are 4 and 545, with a mean of 148. One can easily imagine that such increases and decreases would have contrasting effects on the shape of lineage-through-time plots (Fig. 4). A recent rate increase, by generating a pulse of diversification, would suggest a massive rise in total diversity toward the present, but a severe rate decrease would scarcely change the shape of the plot.

Figure 4.

Contrasting effects of lineage-specific increases and decreases in the speciation rate (λ) on the shape of lineage-through-time (LTT) curves. (A) Expected LTT curve under a pure-birth process (μ= 0) with λ constant among lineages. (B) Expected LTT curve when a single lineage undergoes a major decrease in λ. (C) Expected lineage through time plot when a single lineage undergoes a major increase in λ. The rapid rise in the number of lineages toward the present leads to estimates of high ɛ, even when extinction is not present. The effects of a single rate decrease should be statistically indistinguishable from a constant-rate process. However, a rate increase can exert a large effect on the shape of LTT curves, because all descendants of the high-rate lineage inherit an elevated λ and thus contribute disproportionately to clade diversity. The slope of the LTT curve approaches and, given sufficient time, becomes equal to the new (elevated) λ.

In short, decreases in rates along individual branches do not have a large effect on patterns of lineage accumulation through time: assuming the lineage does not go extinct, a decrease in a single lineage, at its most severe, still results in a single descendant surviving to the present. This is expected to be reasonably common even in lineages that do not undergo declines in rates. In contrast, a rate increase that affects only a single branch of a phylogenetic tree can yield a large spike in the number of lineages, because all of the descendants of this lineage will inherit the high rate and thus continue to diversify.

This study does not paint a promising picture of fossil-free extinction estimates, and I have not even considered the additional complications introduced by assuming constant rates through time within clades. This assumption is almost certainly violated in many cases (Ricklefs 2007; Ricklefs et al. 2007; Rabosky 2009a) and is likely to pose additional problems for inferences about extinction (Rabosky 2009a). The problem should be especially severe for large clades, because larger clades should contain a more heterogeneous mixture of lineages with different diversification rates.

However, it is impossible to ignore the observation that extinction rates estimated from smaller species-level phylogenies tend toward zero (Nee 2006; Bokma 2008; Purvis 2008). Why these estimates are low remains unclear (Purvis 2008), and even analyses that allow both λ and μ to vary independently through time frequently converge on low estimates for μ (Rabosky and Lovette 2008). Previous authors have found this surprising because the fossil record strongly suggests that relative extinction rates have been high throughout the history of life (Stanley 1979; Gilinsky 1994; Alroy 2000, 2008a,b). The results presented here render this matter somewhat murkier still, because among-lineage variation in rates biases estimates in the opposite direction.

A possible explanation for this was proposed in Rabosky (2009b), which demonstrated that phylogenetically patterned extinction can lead to molecular phylogenies that contain little signature of high background extinction. If extinction events are phylogenetically clustered on phylogenetic trees, then even a high rate of extinction (e.g, ɛ= 1) can fail to leave a signature in the distribution of speciation times for phylogenetic trees of extant species only (Rabosky 2009b). This is likely to be problematic, because a growing body of evidence suggests that extinction rates are phylogenetically conserved (Vamosi and Wilson 2008; Roy et al. 2009). Another possible explanation is implicit in a recent study by Crisp and Cook (2009), who demonstrated that mass extinction events could leave patterns in phylogenetic trees consistent with ɛ= 0. Although their results did not directly address problems in the estimation of ɛ, they showed that mass extinctions could lead to lineage accumulation patterns suggestive of “early burst” diversification; these early burst patterns in turn typically lead to estimates of low ɛ (Rabosky and Lovette 2008; but see Quental and Marshall 2009).

What is the way forward from here? One possibility is to explicitly estimate ɛ after relaxing assumptions about homogeneous diversification rates among lineages. This is the essence of the relaxed-rate model for variation in r developed in this article. However, estimating ɛ with a model similar to that described in equation (4) is a dubious endeavor; minimally, it requires assumptions about: (1) the true distribution of diversification rates among clades; (2) constancy of rate variation through time; and (3) the extent to which extinction is phylogenetically conserved. Results presented here demonstrate that assumptions about the nature of among-lineage rate variation can lead to highly misleading estimates of extinction. Moreover, a previous paper found that estimation of ɛ under a constant-rate birth–death model when rates have varied through time led to high estimates of ɛ with extremely high confidence (Rabosky 2009a). Yet when the data were analyzed under a more appropriate model, there was little ability to discriminate between low and high ɛ.

In summary, the estimation of meaningful extinction rates from data on living species only is a challenging problem that may be insoluble. The results of this study argue strongly for better integration of paleontological data with molecular phylogenetic studies of diversification. At present, molecular phylogenetic studies use the fossil record primarily for dating trees, but combining these complementary sources of information should generate a richer perspective on the dynamics of speciation and extinction.

Associate Editor: J. Vamosi


I thank S. Heard, I. Lovette, A. McCune, R. Ricklefs, J. Vamosi, M. Alfaro, T. Quental, and L. Harmon for discussion of these topics and/or comments on the manuscript. This research was supported by National Science Foundation (NSF-OSIE-0612855), NSF-DEB-0814277, and the Miller Institute for Basic Research in Science at the University of California, Berkeley.