A stochastic rate-calibrated method for time-scaling phylogenies of fossil taxa

Authors


Correspondence author. E-mail: dwbapst@uchicago.edu

Summary

  1. Applying phylogeny-based analyses of trait evolution and diversification in the fossil record generally involves transforming an unscaled cladogram into a phylogeny scaled to geologic time. Current methods produce single time-scaled phylogenies with no indication of the uncertainty in the temporal relationships and, under some methods, artificial zero-length branches.
  2. Here, I present a stochastic algorithm for time-scaling phylogenies of fossil taxa by randomly sampling node ages from a constrained distribution, with the ultimate goal of producing large samples of time-scaled phylogenies for a given data set as the basis for phylogeny-based analyses. I describe how this stochastic approach can be extended to consider potential ancestral relationships and resolve polytomies.
  3. The stochastic selection of node ages in this algorithm is weighted by the probability density of the total inferable unobserved evolutionary history at single divergence events in a tree, a distribution dependent on rates of branching, extinction and sampling in the fossil record.
  4. The combined time-scaling method must be calibrated with explicit estimates of three rates: branching, extinction and sampling, and thus is named the cal3 time-scaling method, included in the r library paleotree. I test the time-scaling capabilities of the cal3 and older time-scaling methods in simulations. cal3 produces samples of time-scaled trees that better bracket the uncertainty in the true node ages than existing time-scaling methods. This is true even in simulations under a ‘terminal-taxon’ model of differentiation that violates many of the assumptions of the cal3 method.
  5. The cal3 method provides a new approach for time-scaling palaeontological cladograms, calibrated to estimated sampling and diversification rates, allowing for better estimates of uncertainty in the phylogenetic time-scaling. The cal3 method is robust to relaxation of at least some model assumptions. Additional work is needed to analyse the impact of time-scaling approaches on macroevolutionary analyses and to integrate time-scaling with phylogenetic inference.

Introduction

Phylogeny-based analyses of diversification and trait evolution, commonly referred to as phylogenetic comparative methods, are a powerful route for understanding macroevolutionary tempo and mode. These methods appear to have even greater potential when data sets are not limited to a particular moment in time, such as the present, but include additional information from the fossil record on ancestral and extinct lineages (Finarelli & Flynn 2006; Slater, Harmon & Alfaro 2012). However, such analyses require time-scaled phylogenies, that is, branching diagrams (trees) that accurately describe the temporal relationships among lineages, including ancestor–descendant relationships. A necessary prerequisite for applying these analyses in the fossil record is time-scaled phylogenies of fossil taxa, but hypothesized relationships among extinct organisms are typically available only in the form of a cladogram, a branching diagram unscaled to time which depicts only the nesting relationships among morphologically differentiated taxon units (‘morphotaxa’; Fig. 1). Although some methods consider information on temporal occurrence of fossils simultaneously with inferring relationships from morphological characters (Fisher 1991, 1994; Wagner 1998; Marcot & Fox 2008; Pyron 2011; Ronquist et al. 2012), most palaeobiological trees are not constructed using these approaches. Thus, methods are needed for integrating inferred cladograms with temporal data to approximate the true time-scaled phylogeny (Fig. 2a).

Figure 1.

Incomplete sampling in the fossil record records a partial account of evolutionary history. (a) The true evolutionary history of an imagined set of morphotaxa, with various patterns of taxonomic differentiation. (b) Evolutionary history with sampling events distributed among taxa. Note that some morphotaxa are never sampled. (c) The temporal ranges of taxa that would be observed given the sampling events in (b). (d) The best possible cladistic topology that could be created for the sampled taxa in (b). These example data are also used for Fig. 2.

Figure 2.

The true phylogeny of observed morphotaxa (a) can differ considerably from the phylogeny obtained via the basic time-scaling method (b). (a) The true time-scaled phylogeny represents the actual evolutionary relationships of the sampled taxon ranges in Fig. 1c. (b) The inferred time-scaled phylogeny produced by the basic method, where clades are as old as their earliest sampled member. Note that this produces several zero-length branches, which look like polytomies. The one actual polytomy was not resolved prior to time-scaling. (c) The basic method time-scaled tree in (b) with branches extended to show the zero-length branches (dotted lines). This is conceptually similar to some common fixes for the basic method, where branches are constrained to be some minimum length.

A frequently applied approach for integrating temporal and cladistic data to produce a time-scaled phylogeny was formalized by Norell (1992) and Smith (1994), hereafter referred to as the ‘basic’ time-scaling method. In this method, clades are as old as the first appearance date of their earliest descendant (Fig. 2b). Although some workers recommend treating plesiomorphic taxa as ancestors, which can appear before branching events (Smith 1994), this is often not done in recent time-scaling attempts, as the cladograms used are supertrees lacking information on apomorphies. As the first appearing lineage can be nested relative to other lineages on the cladogram, this method can cause the branch lengths between successive nodes to collapse into zero-length branches (‘ZLBs’; dashed lines in Fig. 2c). These ZLBs are potentially unrealistic artefacts resulting from gaps in the pattern of evolutionary relationships (Hunt & Carrano 2010). In addition, their inclusion in a tree can cause issues for analyses of trait evolution, as any evolutionary change across a ZLB will appear to be instantaneous. Some comparative methods are unable to evaluate such trees because the necessary phylogenetic variance–covariance matrix (Garland & Ives 2000) can become singular and thus unusable for analytical operations. To avoid the theoretical and methodological issues of including ZLBs, many workers first calculate node ages using the basic method and then extend branch lengths under various algorithms, such as restricting branches to some minimum length (Laurin 2004; Brusatte et al. 2008; Laurin, Canoville & Quilhac 2009). Ignoring the issues with zero-length branches, the basic time-scaling method and the various derivations with branch length extensions do not allow for uncertainty in node ages. Furthermore, the basic method assumes that the phylogeny of morphotaxa exactly matches the cladogram used, even though a large number of phylogenies with ancestor–descendant relationships are consistent with any given topology (Platnick 1977; Wagner & Erwin 1995; Bapst 2013).

Here, I propose a general algorithm for stochastic time-scaling of palaeontological phylogenies which I call the ‘zipper’ method, where node ages are sampled randomly. This stepwise process of drawing node ages should be repeated many times to generate large numbers of time-scaled phylogenies, approximating the potential range of time-scaled phylogenies for a given data set. Macroevolutionary analyses should then be applied across such samples of phylogenies, rather than a single tree, as the stochastic quality of the time-scaling method makes any single time-scaled tree a potentially poor predictor of the temporal relationships. This stochastic approach is similar to methods for dealing with the uncertainty arising from soft polytomies or appearance times known only from discrete intervals (see Appendix S1 for more detail on solutions for discrete interval data and uncertain times of observation). The zipper method is extended to allow potential ancestor–descendant relationships and resolve soft polytomies.

A probability distribution is required to describe the random sampling of branching times for each node under the zipper method. Rather than randomly assigning node ages with uniform probability between some set of bounds, the stochastic sampling in this implementation is weighted relative to a distribution defined by a probability function of unsampled phylogenetic history. This model predicts the amount of unobserved evolutionary history as a function of branching, extinction and sampling rates, and thus the combination of this model with the zipper method is referred to here as the three-rate-calibrated time-scaling method (‘cal3’).

The zipper algorithm for stochastic time-scaling

The zipper algorithm requires only three ingredients: (i) an unscaled cladogram (Fig. 1d) potentially containing polytomies; (ii) continuous-time ranges for the taxa on the cladogram, with the times of observation given as the range endpoints (if they differ from the last appearance dates); and (iii) a model for the probability density of node ages relative to the first appearance dates of observed taxa. The zipper algorithm initially time-scales the unscaled cladogram such that node ages of each clade are equivalent to the first appearance date of the oldest taxon in that clade. Starting with the root node, the zipper algorithm evaluates each node individually, randomly sampling a new age using the given distribution, adjusting the phylogeny to assign this age and moving to the next node, always from the deepest to the most shallow nodes. A time-scaled phylogeny is produced once every node has been assigned a node age using the zipper algorithm, and a sample of many such trees can be generated by repeating the algorithm.

Divergence dates do not have a clear lower (i.e. older) bound. The only certainty is that successive nodes occur in order: ancestral nodes must occur before any descendants. Node ages sampled stochastically must obey this constraint, met in the zipper method by only considering nodes in sequence from the root (the oldest branching event) to the most shallow. Thus, nodes are always traversed upwards from the root within the zipper algorithm. As daughter nodes must occur after their ancestral nodes, this protocol provides the necessary lower bounds on the position of every node except the root. By adjusting the node ages individually in an upwards tree traversal, node ages are always congruent with each other and almost all divergence times gain a nonarbitrary lower bound. The lower bound for the divergence time of the root node is set to an arbitrarily ancient time before the first appearance of the oldest taxon. As long as a probability distribution for sampling node ages is specified that considers an extreme amount of unobserved evolutionary history to be unlikely, this root age constraint should have little effect on the resulting tree.

If a node of interest has only two daughter lineages, there are three branches directly connected to this node: a lower stem extending to that node's lower bound and two daughter branches continuing up to the first taxon appearance on each branch (i.e. the minimum unsampled evolutionary history on each daughter branch). A freely varying age for the node of interest can be thought of metaphorically as a zipper, where adjustments upwards shorten the daughter branches and lengthen the stem just as pulling a zipper up brings together the teeth, while downwards adjustment has the opposite effect (Figs 3 and 4). When we do not allow for the stochastic assignment of ancestors, the node age is given an upper bound by the oldest first appearance time of a taxon along either daughter branch.

Figure 3.

A phylogeny of fossil taxa can be time-scaled by stochastically drawing node ages, given some upper and lower bounds. (a) An example scenario with a rooted three-taxon tree and unknown node ages. (b) In this case, the node is the root node and thus the lower bound is some arbitrarily old date and the upper bound is fixed at the first appearance time of the clade AB, as this clade appears before taxon C. The node ages are randomly sampled, and the selected node age is indicated by an arrow. (c) The branch lengths are adjusted so the node is placed at its new stochastically chosen age from (b).

Figure 4.

The zipper algorithm can be conceptually extended to stochastically consider potential ancestral morphotaxon relationships. (a) Following the example in Fig. 3, the zipper algorithm traverses the nodes of the tree upwards, with the newly selected node age becoming the lower bound for any daughter nodes. (b) By allowing the upper bound of the node age to be after the first appearance of the earlier-appearing taxon (taxon A), the zipper algorithm can stochastically assign A as a paraphyletic ancestor to the later appearing lineage (taxon B). (c) The node ages stochastically selected in (b) would produce this time-scaled tree, while a different stochastic run might produce the time-scaled tree in (d), which has very different node ages.

The zipper algorithm discretizes the time span between these lower and upper bounds, producing a sequence of dates with arbitrarily fine increments from which to sample (the default increment is 0·1 time units; Fig. 3b). Using the specified distribution, the probability density is calculated for each of these potential node ages and rescaled so they sum to one. These rescaled densities are used to weight the random selection of a new node age. Discontinuous modifications can be made to these discrete probabilistic weights, such as with ancestor weights (see below). Once a new divergence time is selected, the branch lengths of the time-scaled tree are adjusted accordingly (Fig. 3c), and the algorithm moves to the next highest node.

Extensions for the zipper method: ancestral relationships and resolving polytomies

The zipper algorithm can be extended to consider divergence times occurring after the first appearance of one of the daughter lineages (Fig. 4); that is, that the earlier-appearing lineage is ancestral to the other daughter lineage. By allowing node ages to be sampled later than the first appearance of a daughter taxon, each stochastically time-scaled phylogeny generated may find different sets of potential ancestral taxa. Many such trees should approximate our uncertainty in the frequency and placement of ancestor–descendant relationships. Ancestral relationships are only considered under the zipper algorithm if the earliest-appearing daughter lineage of a node is composed only of a single morphotaxon; when that lineage is a clade, the upper bound remains at the first appearance time of the oldest taxon in that clade (as in Fig. 3). For assigning ancestor relationships, the later appearing daughter lineage (the potential descendant) might be a single morphotaxon or an entire clade.

When the earliest-appearing daughter lineage at a node is a single morphotaxon, and ancestor assignment is allowed by the user, the upper bound for node ages is either the first appearance time of the potential descendant or the time of observation of the potential ancestor, whichever occurs first (this is the former in Fig. 4). This protocol allows short-lived taxa to be stochastically assigned as ancestors sampled before the origination of their descendents. If selected as such by the zipper algorithm, these ancestors become connected to the phylogeny by zero-length branches, matching our expectation for true trees (Fig. S2). Such zero-length branches should be removed from the tree prior to any analysis of lineage diversification, as they do not represent real phylogenetic offshoots.

The potential assignment of ancestral relationships under the zipper algorithm presumes that the morphotaxa used have at least some small chance of being paraphyletic. Ancestors are assigned independently of apomorphy information, so the zipper algorithm may unintentionally generate trees requiring additional character reversals, thus adding to the number of necessary character steps under parsimony. If a researcher considers ancestral taxa to be problematic on theoretical grounds or has strong evidence of monophyly, ancestral assignment can be modulated on a per taxon basis by adjusting the ancestor ‘weight’. Densities for node ages after the first appearance date of the older daughter taxon are multiplied by this value prior to rescaling for use as weights in the random sampling. By default, ancestor weights are set at one and have no effect on the probability distribution used to weight node age selection. If the ancestor weight for a taxon is set to zero that taxon can never be assigned as an ancestor to any another lineage. A user who only wishes to allow plesiomorphic taxa as potentially ancestral could selectively set the ancestor weight for all apomorphic taxa to zero.

The zipper algorithm is also extended to stochastically resolve soft polytomies. This provides an opportunity for the probability distribution applied to node age sampling to influence the topologies produced. The zipper method resolves polytomies by placing each daughter lineage within the polytomy in a stepwise iteration, from earliest-appearing to latest-appearing (Fig. 5). In the first step, a node age is obtained for divergence between the two earliest-appearing lineages in the polytomy, using the same algorithm described above for a dichotomous node (Fig. 5b). With each additional lineage beyond the first pair, it is uncertain which of the earlier-appearing lineages did a given younger lineage diverges from, in addition to when this divergence took place. To account for this, node age densities are incrementally calculated along every earlier-appearing (and thus previously considered) lineage in the polytomy (Fig. 5c). This set of ‘zippers’, one for every lineage already placed, are considered simultaneously and reweighted in parallel, so that all calculated density values sum to one. The entire set of divergence times, distributed across multiple lineages, is randomly sampled for a single divergence event, weighted by the associated densities. This procedure allows an ancestral lineage to be topologically placed and its divergence dated while considering the range of possible alternatives simultaneously. By including polytomies in the input cladogram, a worker using the zipper algorithm recognizes that complex patterns of ancestral relationships can contribute to phylogenetic uncertainty via a lack of synapomorphies (Wagner & Erwin 1995; Bapst 2013). The zipper method allows such complex patterns to be potentially reconstructed (Fig. 5c). As all the nodes produced de novo by resolving a polytomy are also given node ages under the zipper algorithm, these nodes are ignored for the remainder of the stochastic time-scaling algorithm.

Figure 5.

The zipper algorithm can resolve polytomies via an iterative process of stochastic time-scaling, allowing for potentially complex patterns of ancestor–descendant relationships among morphotaxa. When the zipper algorithm is passed a polytomy (a), it first treats the earliest-appearing taxa in the polytomy as a simple scenario of time-scaling a node with two daughters (b), similar to Figs 3 and 4. (c) Additional daughter lineages are placed iteratively, in order of their time of first appearance, with a set of potential node ages calculated along all previously placed lineages. A single node age is randomly selected from this sample. (d) The resulting time-scaled tree produced by the node ages selected in (b) and (c).

A three-rate-calibrated model of unobserved evolutionary history

Palaeontologists expect more uncertainty reconstructing the temporal relationships in groups of taxa with more poorly sampled fossil records. The stochastic zipper algorithm described above can be combined with any probability distribution relevant to the selection of different node ages, with nonuniform distributions unevenly weighting the selection of node ages. Here, I define a model for weighting node age sampling based on the probability distribution for the amount of total inferable unobserved evolutionary history (hereafter, referred to as Δ). Δ is defined as the minimum amount of missing evolutionary time from the stem age to the first observed appearances of taxa along both daughter branches, given a dichotomous node and some age for that node (Fig. 6a). In this definition, ‘dichotomous node’ refers to any two lineages identified as sisters on a cladogram, independent of evolutionary relationships. For example, two morphotaxa from an anagenetic ancestor–descendant pair would sit as sisters on a cladogram. ‘Stem age’ refers to the date of the branching event, which was parent to the branching event of present interest.

Figure 6.

(a) Measures of the total amount of inferable missing evolutionary history (Δ) at a node for different positions of the node age. In each of the three scenarios illustrated, Δ is the sum of the waiting times indicated by the accompanying arrows. (b) This total inferable amount of missing evolutionary history (Δ) is probably often an underestimate of the true amount of unobserved evolutionary history, due to the unknown presence of unsampled extinct lineages.

Δ is a minimum estimate for the amount of evolutionary history that must be missing for a given node age to be true, as additional evolutionary history may be unobserved due to unsampled extinct lineages (Fig. 6b). Some previous models of the amount of missing evolutionary history on a phylogeny of fossil taxa have considered distributions, which are function only of sampling intensity (Marshall 1995; Wagner 1995). Other studies have included diversification processes in their model to account for the potential impact of unobserved lineages as a source of unaccounted incompleteness in the fossil record (Foote et al. 1999; Tavare et al. 2002; Friedman & Brazeau 2011). Under a model that accounts for the probability of such unobserved ‘twigs’ in a data set (Fig. 6b), we should predict smaller amounts of missing evolutionary history than that expected by a pure-sampling model (i.e. smaller values of Δ).

I model the probability density of Δ as a function of the instantaneous rates of branching, extinction and sampling. Following the notation of Stadler (2010), I refer to these as λ, μ and ψ, respectively, given in units of per lineage time units (hereafter, per Ltu). Here, the probability density function of Δ is calculated as the density of an Erlang (gamma) distribution, with a shape parameter of two and a rate parameter of (ψ + Ps λ):

display math(eqn 1)

Ps is the probability of sampling an extinct clade of unknown size, calculated as:

display math(eqn 2)

While Ps (eqn 2) is derived exactly for a specific set of assumptions, the probability density function for Δ (eqn 1) is obtained based on consideration of first principles and comparison with simulation. See the Appendix S1 for details, including simulation tests of both.

This density function can be used to weight the selection of node ages in the zipper algorithm given the three rates as calibration (branching, extinction and sampling). This combined approach is implemented as the ‘cal3’ time-scaling method in the free software library paleotree (Bapst 2012) for the statistical computing language r (R Core Team 2013), which manipulates the ‘phylo’ objects used in library ape (Paradis, Claude & Strimmer 2004). The cal3 method requires estimates of these rates as input, in addition to the cladogram and taxon appearance dates required by other time-scaling methods. This r library includes a number of additional functions useful for implementing the cal3 time-scaling method, such as maximum-likelihood methods for estimating sampling and extinction rates from the distribution of taxon ranges in the fossil record (Foote 1997). Users of paleotree can test between models of temporal and taxonomic heterogeneity in sampling rates within their data set using standard model-selection methods. If users find support for heterogeneity, they can include different sampling and diversification rates for each taxon when they apply cal3 time-scaling.

Testing the cal3 time-scaling method

I tested the capabilities of the stochastic cal3 time-scaled method to accurately capture the true node ages via simulation analyses in two data sets of 100 simulations, generated under two different models of differentiation and diversification (Fig. 7). The first set of simulations uses the taxonomic differentiation model of budding cladogenesis (Fig. 7a), with one daughter lineage differentiating at branching events while the other daughter lineage remains morphologically indistinguishable from the ancestor (Foote 1996; Bapst 2013). This is a process-based model of differentiation, where taxonomic identity results from processes occurring concurrently with the branching of lineages.

Figure 7.

Models of morphotaxon differentiation can produce very different taxon identities even if branching and extinction patterns are identical. (a) Taxon identity under a model of budding cladogenesis and (b) observable taxon identity under a ‘terminal-taxon’ model of differentiation where taxa are required to be intrinsically resolvable.

The second model of differentiation used to test the cal3 method gives taxonomic identity only to those portions of lineages, which are inherently monophyletic (i.e. the terminal branches of a phylogeny, which extend from a branching node to an extinction event). These lineages are considered to be fully differentiated morphotaxa, with earlier ancestral (i.e. paraphyletic) lineages being impossible to sample in the fossil record or otherwise unobserved (Fig. 7b). As definable morphotaxa sit as solitary terminal units at the ends of unobservable branches, I refer to this model as the ‘terminal-taxon’ model of differentiation. By necessity, taxa can be defined under the terminal-taxon differentiation pattern only after diversification and extinction processes have been fully simulated, as later branching or extinction events can alter the inherent monophyly of lineages (and thus the set of defined, observable morphotaxa). The terminal-taxon model is based on a caricature of the phylogenetic species definition, intentionally defined to produce a scenario where certain assumptions of the full cal3 method are not met: taxa under the terminal-taxon model can never be ancestral and the true amount of unobserved evolutionary history at a node is not distributed according to stochastic sampling processes, as sampling events can never occur along paraphyletic stem lineages.

For the small set of simulations used here, only a single combination of generating rates is considered, with branching, extinction and sampling rates all equal to 0·1 per Ltu. These values resemble estimates from the fossil record of marine invertebrates, if we treat one million years as one time unit (Sepkoski 1998; Foote & Sepkoski 1999). For both the budding cladogenesis and terminal-taxon model simulations, sampling events were stochastically simulated following the modelling of diversification and differentiation. Simulations are conditioned to sample about fifty extinct taxa on average, with no living taxa. First and last appearance dates of taxa were placed in discrete time intervals, each five time units in length. The resulting set of observed taxon ranges and the idealized unscaled cladogram of sampled taxa (Bapst 2013) are used as input for time-scaling analyses, with rate estimates for the cal3 method coming from the maximum-likelihood analyses of observed taxon ranges. The extinction rate estimate is also used as the branching rate estimate, matching palaeobiological evidence for a close relationship between these rates (Stanley, 1979).

Trees were time-scaled under three different variants of the cal3 time-scaling method: (i) considering potential ancestral relationships, (ii) without considering potential ancestral relationships and (iii) with polytomies randomly resolved prior to time-scaling and without considering ancestral relationships. Time-scaled trees were also estimated under a variant of the basic method (Smith 1994), with clades always as old as their oldest descendant but no consideration for ancestral taxa (plesiomorphy is unknown as character data is not modelled in these simulations). As ideal cladograms for simulations under the terminal-taxon model have no polytomies (i.e. these data sets never lack intrinsic resolution unlike the budding model), the third version of cal3 was inapplicable to the terminal-taxon model simulations. Polytomies were always randomly resolved when applying the basic method. As simulated data had been transformed to discrete time intervals, continuous-time dates were stochastically drawn for all time-scaling methods. Therefore, all applied time-scaling methods have at least some stochastic element, so samples of 20 time-scaled trees are produced for each time-scaling approach, with independent resolved polytomies and continuous-time dates. Time-scaled trees are always generated with last appearance dates as the time of observation (see Appendix S1). Fidelity of estimated node ages is considered across tree samples, in comparison with node ages from the true time-scaled phylogeny, also generated with the last appearance dates as the times of observation (as in Fig. S2c).

The same simulation analyses described above were also used to measure the accuracy of the cal3 method resolution of soft polytomies introduced onto secondarily ‘degraded’ cladograms; the methods and results are described in the Appendix S1 and figured in Fig. S10. Simulated data values and scripts to reproduce all simulations and analytical figures (i.e. Figs 8 and 9) described in this paper are submitted to Dryad and located at doi:10.5061/dryad.vk7t0.

Figure 8.

Median error in estimated node ages shows that all time-scaling methods produce biased node age estimates, although the directionality of the bias depends on the mode of differentiation. The basic method (‘Basic’) is least biased under budding cladogenesis, while the various cal3 methods (‘cal3: A’ represents cal3 with ancestral assignment; ‘cal3: NoA’ is cal3 without ancestral assignment; and ‘cal3: RR, NoA’ is cal3 without ancestral assignment or polytomy resolution) consistently estimate older age estimates. See text for simulation details.

Figure 9.

cal3 methods have a higher proportion of true node ages within 95% quantiles estimated from samples of time-scaled trees, with the cal3 method with ancestors performing best under the budding model (a). See text for simulation details.

Results: testing cal3

We can examine the accuracy and precision of approximated branching times by comparing the true time-scaled phylogeny of a simulated data set to the samples of empirical trees estimated under each time-scaling method. As results must be compared across simulations composed of samples of multiple trees, error must be summarized as a median of medians (medians are used as underlying distributions are skewed). For example, the shift in node age on a generated time-scaled tree was calculated with respect to the true node age, followed by the per-node median calculated across the sample of generated time-scaled trees, and finally the median of these per-node median shifts was calculated for each simulated data set. This per-simulation median of median node age shifts is displayed in Fig. 8. Under budding, time-scaling methods always produce negatively biased node ages, with divergence times estimated as being older than truth. Terminal-taxon model simulations produce an opposite positive bias, with estimated node ages often much later than the true ages. No time-scaling method produces unbiased samples of node ages. Under budding differentiation, the basic method produces the least biased estimate of node ages but, conversely, is the most biased method under terminal-taxon. Examining the untransformed median error obscures the precision of the estimate, as a method might produce estimates with little bias but high variance. To examine the absolute magnitude of error, I also calculated the per-run median of per-node-median-squared error (Fig. S9). The median-squared error values were similar to the previous error assessment, with the basic method producing the least biased node age estimates under budding and the most biased under the terminal-taxon model.

The median error and median-squared error of node age estimates reveal that cal3 methods generally produce older estimates of node ages than the basic method, so cal3 only provides better estimates under a terminal-taxa model. It is unsurprising that the basic method provides the best point estimates of the node ages under the budding scenario, as this simply confirms that the first appearance times of taxa provides the best point estimate for their time of divergence (i.e. the maximum likelihood estimate; Strauss & Sadler 1989). The stochastic time-scaling of the zipper algorithm randomly assigns error to node ages in order to construct trees that embody appropriate temporal uncertainty. To assess whether cal3 methods provide better summaries of the uncertainty in node ages, I generated 95% quantiles for the sample of ages for individual nodes in each set of time-scaled trees and counted the proportion of true node ages contained within these quantiles (Fig. 9). Under budding, the cal3 method with ancestors contains a greater proportion of true node ages in the 95% quantile relative to the basic method, indicating that the cal3 method with ancestors is providing a better estimate of the uncertainty in node ages. Under the terminal-taxon model, cal3 with and without ancestor assignment outperformed basic time-scaling. The stochastic cal3 method may not improve our point estimates of node ages but does improve our perception of uncertainty in time-scaling phylogenies of fossil taxa, even under the terminal-taxon simulations, which break many of the implicit assumptions of the cal3 method.

Discussion

The cal3 method comprises two components: the zipper algorithm, which stochastically assigns consistent branching times to a pre-existing cladogram of fossil taxa, and a probabilistic model of unobserved evolutionary history (Δ). These two components can be changed or replaced independently of each other, for example using a different distribution to weight node age selection within the zipper model. The density function for Δ may be applicable with other time-scaling methods, such as Bayesian MCMC methods, which consider the likelihood of all the node ages simultaneously, although such a Bayesian framework for fossil phylogenetics is not yet fully developed (but see Wagner & Marcot 2010). Current Bayesian methods do not account for sampling intensity in the fossil record nor can place observed taxa in ancestor–descendant relationships (Ronquist et al. 2012), although the probabilistic modelling described here may provide a basis for such implementation. Once fully developed, Bayesian frameworks would allow for model-based phylogenetic inference methods that simultaneously estimate relationships and temporal history, allowing nodes to be time-scaled simultaneously and informing the assignment of ancestral relationships with character data (Wagner & Marcot 2010).

Some previous studies have time-scaled palaeontological phylogenies so as to best fit morphological data, assuming a model of constant trait change (Laurin 2004; Ruta, Wagner & Coates 2006). It may be possible to integrate the stochastic rate-calibrated time-scaling approach with such ‘morphological clock’ methods. However, these methods would not be applicable to trees for which trait information is unavailable, such as supertrees.

Our ability to describe sampling processes in the fossil record may be improved if we expand our repertoire of process-based models of sampling beyond the time-constant Poisson model used here. Previous work has revealed that sampling is impacted by many different factors, such as biogeography, taphonomy, relative abundance and sampling effort (Adrain & Westrop 2001; Alroy et al. 2001; Kidwell & Holland 2002). Yet, there have been few studies evaluating the relative magnitude of these factors. Studies using model comparison approaches are needed to quantify the relative role of these variables for predicting sampling events (but see Connolly & Miller 2001; Wagner & Marcot, 2013). Given the current state of our knowledge, a homogenous Poisson model may be an adequate general description of sampling in the fossil record. Simple models often are useful approximations for systems where the probability of events is a complex function of many independent random variables. In addition, sampling rates cannot be calculated with current methods for groups with poorly sampled fossil records. A parameterized model of exceptional preservation potential in rare stratigraphic horizons is needed to model the distribution of unobserved evolutionary history in these clades.

While the cal3 method can allow for ancestral relationships, this study did not examine the reliability of these assignments. In several cases, allowing for potentially ancestral taxa did meaningfully change the fidelity of the resulting time-scaled phylogenies (Fig. 9). As the cal3 method cannot consider explicit models of differentiation (Bapst 2013), I am agnostic as to whether this approach provides a strong basis for determining ancestor–descendant pairs in the fossil record, but this capability could be tested by measuring the accuracy of these ancestral assignments. Although simulation provides one way to measure the fidelity of these estimates, another approach would be to use a real data set with prior predictions of ancestor–descendant relationships based on morphology. The accuracy of ancestor–descendant assignment could be quantified by testing whether these pairs are more often assigned as ancestor–descendants than other combinations across a sample of trees time-scaled with the cal3 method.

Time-scaling methods are compared in this paper based on the fidelity of node age estimates. However, the accuracy and precision of node age estimates may not predict the fidelity of reconstructing evolutionary patterns with phylogeny-based approaches, as the latter can be analytically complex. In addition, time-scaling fidelity needs to be considered over a wider range of scenarios than the two evolutionary patterns considered here (Fig. 7). Given the potential importance of time-scaling for many tree-based analyses of evolution, future work is necessary to test our ability to reconstruct evolutionary inferences on simulations of palaeotrees.

This study presents a novel stochastic rate-calibrated time-scaling approach for phylogenies of fossil taxa, named cal3, which can also accommodate potential ancestral taxa and resolve polytomies. In simulation, neither of the time-scaling methods considered produces unbiased estimates of node ages, but the cal3 method provides better estimates of uncertainty in the true node ages under both budding and terminal-taxon differentiation models. In a sense, this means a sample of time-scaled trees produced by the cal3 method is a superior reflection of the uncertainty in the temporal and evolutionary relationships, and thus more likely to include the true evolutionary relationships in that set. Based on these results, the cal3 method should be preferred over alternative time-scaling approaches.

Acknowledgements

I am indebted to M. Foote, who helped sculpt clarity. This work was first conceived while stranded in a Pennsylvanian gas station, following a blizzard and subsequent interruption in interstate bus service. G. Hunt and P. Wagner both provided valuable suggestions, most notably arguments for resolving polytomies simultaneously with the stochastic time-scaling. J. Felsenstein stressed the importance of both diversification and sampling processes in a model of unobserved evolutionary history. The three-rate model depends on functions derived in collaboration with E. King and M. Pennell, with additional assistance from S. Nalayanan. Additional suggestions and valuable commentary were provided by G. Slater, L. Harmon, G. Lloyd, A. Haber, M. Gorman, P. Smits, M. Hopkins, J. Mitchell, R. Fitzjohn, R. Maia, M. Friedman, M. Webster, M. LaBarbera, D. Jablonski, C.K. Boyce, C. Mitchell and three anonymous reviewers.

Ancillary