Molecular dating of phylogenetic trees: A brief review of current methods that estimate divergence times


Frank Rutschmann, Institute of Systematic Botany, University of Zürich, Zollikerstrasse 107, CH-8008 Zürich, Switzerland. E-mail:


This article reviews the most common methods used today for estimating divergence times and rates of molecular evolution. The methods are grouped into three main classes: (1) methods that use a molecular clock and one global rate of substitution, (2) methods that correct for rate heterogeneity, and (3) methods that try to incorporate rate heterogeneity. Additionally, links to the most important literature on molecular dating are given, including articles comparing the performance of different methods, papers that investigate problems related to taxon, gene and partition sampling, and literature discussing highly debated issues like calibration strategies and uncertainties, dating precision and the calculation of error estimates.


The use of DNA sequences to estimate the timing of evolutionary events is increasingly popular. The idea of dating evolutionary divergences using calibrated sequence differences was first proposed in 1965 by Zuckerkandl and Pauling (1965). The authors postulated that the amount of difference between the DNA molecules of two species is a function of the time since their evolutionary separation. This was shown by comparing protein sequences (hemoglobins) from different species and further comparing amino acid substitution rates with ages estimated from fossils. Based on this central idea, molecular dating has been used in countless studies as a method to investigate mechanisms and processes of evolution. For example, the timing of the eucaryotic evolution (Douzery et al., 2004), the Early Cambrian origin of the main phyla of animals (Cambrian explosion; Wray et al., 1996; Smith & Peterson, 2002; Aris-Brosou & Yang, 2003), the replacement of dinosaurs by modern birds and mammals in the late Tertiary (Madsen et al., 2001), and the age of the last common ancestor of the main pandemic strain of human immunodeficiency virus (HIV; Korber et al., 2000) have all been investigated using molecular dating. Also in plants, there are numerous studies where molecular dating methods have been used to investigate the timeframe of evolutionary events, e.g. for testing biogeographical hypotheses or to investigate the causes of recent radiations (for a more complete review see Sanderson et al., 2004). For example, dating techniques have been applied on taxa from very different taxonomic levels, e.g. to infer the age of plastid-containing eucaryotes (Yoon et al., 2004), land plants (Sanderson, 2003a), tracheophytes (Soltis et al., 2002), angiosperms (Magallón & Sanderson, 2001; Sanderson & Doyle, 2001; Wikström et al., 2001, 2003; Bell et al., 2005), monocot–dicot divergence (Chaw et al., 2004), Asterids (Bremer et al., 2004), Myrtales (Sytsma et al., 2004), Crypteroniaceae (Conti et al., 2002) and Fuchsia (Berry et al., 2004).

The goal of this article is to give a short overview on the most commonly used molecular dating methods. To allow for easier comparisons, the different methods for estimating divergence times are also summarized in Tables 1–3.

Table 1.  Comparison table of different molecular dating methods. Part 1: Methods that use a molecular clock and one global rate of substitution
The specifications about ease of use and popularity represent only the author's personal view and are therefore highly subjective
  • *

    Except model parameters, priors or calibration constraints.

  • SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval.

  • Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source Unix environment.

  • §

    paup* (Swofford, 2001), dnamlk (part of the phylip package; Felsenstein, 1993), baseml (part of the paml package; Yang, 1997), mrbayes (Huelsenbeck and Ronquist, 2001), beast (Drummond & Rambaut, 2003), etc.

  • If the user provides a tree topology in addition to the sequences, the optimization runs much faster.

MethodLinear regressionMean path lengthLangley–FitchML with clock
Author(s)Nei (1987); Li & Graur (1991)Bremer & Gustafsson (1997)Langley & Fitch (1974)Felsenstein (1981)
Software where it is implementedpath (Britton et al., 2002)r8s (Sanderson, 2003b)Many phylogenetic packages§
Current version1.7
Runs on operating system(s)Unix/Linux with Gpc 1999Unix/LinuxDepends on program
Optimization strategyMaximum likelihoodMaximum likelihood
Input data*Distance matrixPhylogram (with bl)Phylogram (with bl)Sequence data (+ tree topology)
Allows multiple calibration pointsYesNoYesDepends on program
Accounts for multiple data sets/partitionsNoNoNoNo
Provides error estimates (SE/CI/CrI)Yes, CIYes, mean path length CIYes, internal and bootstrap CIsDepends on program
Ease of useEasyEasyMediumDepends on program
Popularity according to literatureVery popular until about 10 years agoPopularLess popularVery popular
Table 2.  Comparison table of different molecular dating methods. Part 2: Methods that correct for rate heterogeneity
The specifications about ease of use and popularity represent only the author's personal view and are therefore highly subjective
  • *

    Except model parameters, priors, or calibration constraints.

  • SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval.

  • Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source Unix environment.

MethodLinearized trees  
Author(s)Li & Tanimura (1987)
Software where it is implemented  
Current version  
Runs on operating system(s)  
Optimization strategyDepends on clock method  
Input data*Phylogram (with bl)  
Allows multiple calibration pointsDepends on clock method  
Accounts for multiple data sets/partitionsNo  
Provides error estimates (SE/CI/CrI)Depends on clock method  
Ease of useEasy  
Popularity according to literatureLess popular  
MethodLocal molecular clock  
Author(s)Hasegawa et al. (1989); Uyenoyama (1995); Rambaut & Bromham (1998); Yoder & Yang (2000)
Software where it is implementedbaseml (paml; Yang, 1997)qdate (Rambaut & Bromham, 1998)rhino (Rambaut, 2001)
Current version3.141.11.2
Runs on operating system(s)Unix/Linux/WindowsMac OS 9.x/Unix/Linux/WindowsMac OS 9.x/Unix/Linux/Windows
Optimization strategyMaximum likelihoodMaximum likelihoodMaximum likelihood
Input data*Phylogram (with bl)Sequence data + quartet definitionSequence data + tree topology
Allows multiple calibration pointsYesYesYes
Accounts for multiple data sets/partitionsYesNoOnly codon position partitions
Provides error estimates (SE/CI/CrI)Yes, SEYes, CIYes, CI
Ease of useMediumEasyMedium
Popularity according to literaturePopularLess popularPopular, but more for rate comparisons
Software where it is implementedr8s (Sanderson, 2003b)beast (Drummond & Rambaut, 2003) 
Current version1.71.3 
Runs on operating system(s)Unix/LinuxUnix/Linux/Windows, requires Java 1.4 
Optimization strategySmoothing/minimizing Optimality functionBayesian MCMC 
Input data*Phylogram (with bl)Sequence data 
Allows multiple calibration pointsYesYes 
Accounts for multiple data sets/partitionsNoYes 
Provides error estimates (SE/CI/CrI)Yes, CI, but separate bootstrapping is requiredYes, CrI 
Ease of useMediumMedium 
Popularity according to literaturePopular, but more for NPRS and PL methodsBecoming increasingly popular 
Table 3.  Comparison table of different molecular dating methods. Part 3: Methods that incorporate rate heterogeneity
The specifications about ease of use and popularity represent only the author's personal view and are therefore highly subjective
  • *

    Except model parameters, priors, or calibration constraints.

  • SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval.

  • Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source UNIX environment.

  • §

    Plus one hyperparameter (autocorrelation value ν).

  • Plus several hyperparameters (describing speciation and extinction rate).

  • **Implements a range of relaxed clock models by Drummond et al. (submitted), but also the models by Thorne and Kishino (2002) and Aris-Brosou and Yang (2002).

  • ††Additionally, two graphical user interfaces called beauti and tracer facilitate data setup and output analysis.

  • ‡‡

    The method is implemented in a c program, but it is not meant for public access (as there is no documentation).

  • For the age of calibration points, different prior distributions are available, such as normal, lognormal, exponential or gamma.

No.7 8
Author(s)Sanderson (1997) Sanderson (2002)
Software where it's implementedr8s (Sanderson, 2003b)treeedit (Rambaut & Charleston, 2002)r8s (Sanderson, 2003b)
Current version1.71.0a101.7
Runs on operating system(s)Unix/LinuxMac OS 8.6 or later, including Mac OS XUnix/Linux
Optimization strategySmoothing/minimizing optimality functionSmoothing/minimizing optimality functionSmoothing/minimizing optimality function
Input data*Phylogram (with bl)Phylogram (with bl)Phylogram (with bl)
Model of rate evolutionRate autocorrelationRate autocorrelationRate autocorrelation
Allows multiple calibration pointsYesNoYes
Accounts for multiple data sets/partitionsNoNoNo
Provides error estimates (SE/CI/CrI)Yes, internal and boostrap CIsNoYes, CI, but separate bootstrapping is required
Accounts for phylogenetic uncertaintyNoNoNo
Ease of useMediumEasy (graphical user interface)Medium
Popularity according to literaturePopularPopularBecoming increasingly popular
Author(s)Yang (2004)Thorne et al. (1998), Kishino et al. (2001)Aris-Brosou & Yang (2002)
Software where it is implementedbaseml (paml; Yang, 1997)multidivtime (Thorne & Kishino, 2002)phybayes (Aris-Brosou & Yang, 2001)
Current version3.14 (since 3.14 beta 5)9/25/030.2e
Runs on operating system(s)Unix/Linux/WindowsUnix/Linux/WindowsUnix/Linux/Windows
Optimization strategyMaximum likelihoodBayesian MCMCBayesian MCMC
Input data*Sequence data + tree topologySequence data + tree topologySequence data + tree topology
Model of rate evolutionRate autocorrelationRate autocorrelationRate autocorrelation
Rates are drawn fromLognormal distributionSix different distributions
Prior distribution of divergence timeDescribed as dirichlet distribution§Described as generalized birth–death process
Allows multiple calibration pointsYesYes, as user-specified intervalsNo
Accounts for multiple data sets/partitionsYesYesNo
Provides error estimates (SE/CI/CrI)Yes, CIYes, CrIYes, CrI, but must be calculated by the user
Accounts for phylogenetic uncertaintyNoNo, but accepts polytomies in input treeNo
Ease of useMediumMedium (use step-by-step manual; Rutschmann, 2005)Medium
Popularity according to literatureNot yet very popularBecoming increasingly popularNot yet very popular
MethodVariable rate models in beastOverdispersed clockCompound Poisson
Author(s)Drummond et al. (submitted)Cutler (2000)Huelsenbecket al. (2000)
Software where it is implementedbeast (Drummond et al. submitted)c program (Cutler, 2000)c program (Huelsenbecket al., 2000)
Current version1.3Dating5.c‡‡
Runs on operating system(s)Unix/Linux/Windows, requires Java 1.4Unix/Linux/Windows‡‡
Optimization strategyBayesian MCMCMaximum likelihoodBayesian MCMC
Input data*Sequence dataPhylogram (with bl)‡‡
Model of rate evolutionVarious models implemented**Doubly stochastic Poisson processCompound poisson process
Rates are drawn fromDifferent distributions such as exp or lognormal‡‡
Prior distribution of divergence timeNo specific description, priors are uniform‡‡
Allows multiple calibration pointsYesYes‡‡
Accounts for multiple data sets/partitionsYesNo‡‡
Provides error estimates (SE/CI/CrI)Yes, CrIYes, CI‡‡
Accounts for phylogenetic uncertaintyYesNo‡‡
Ease of useMedium, a range of tutorials is available††Medium‡‡
Popularity according to literatureBecoming increasingly popularNot yet very popular‡‡

A. The ideal case scenario: a molecular clock and one global rate of substitution

In the special case of a molecular clock, all branches of a phylogenetic tree evolve at the same, global substitution rate. The clock-like tree is ultrametric, which means that the total distance between the root and every tip is constant.

Method 1: Linear regression (Nei, 1987; Li & Graur, 1991; Hillis et al., 1996; Sanderson, 1998)

In an ultrametric tree, nodal depths can be converted easily into divergence times, because the molecular distance between each member of a sister pair and their most recent common ancestor is one half of the distance between the two sequences. If the divergence time for at least one node is known (calibration point), the global rate of substitution can be estimated and, based on that, divergence times for all nodes can be calculated by linear regression of the molecular distances (Li & Graur, 1991; Sanderson, 1998).

In other words: the observed number of differences D between two given sequences is a function of the constant rate of substitution r (subst. * site−1 * Myr−1) and the time t (Myr) elapsed since the lineage exists. If we have one calibration point (e.g. if we can assign a fossil or geological event to one specific node in the tree), we can calculate the global substitution rate as follows: r = D/2t. In a second step, we can use the global rate r to calculate the divergence time between any other two sequences: t = D/2r.

In those cases where we have more than one calibration point, we can plot all calibration nodes in an age-genetic distance diagram, build a (weighted) regression line, whose slope is a function of the global substitution rate, and then interpolate (or extrapolate) the divergence times for the unknown nodes. The scatter of data points around the regression line provides then a confidence interval around estimated ages.

Although it is possible to estimate the global substitution rate r over an entire phylogeny (see methods 3 and 4), pairwise sequence comparisons have provided the most widely used approach to molecular dating until approximately 5 years ago, probably because the calculations can be done easily with any statistical or spreadsheet software.

Method 2: Tree-based mean path length method (Bremer & Gustafsson, 1997; Britton et al., 2002)

The mean path length method estimates rates and divergence times based on the mean path length (MPL) between a node and each of its terminals. By calculating the MPL between a calibration node and all its terminals, and dividing it by the known age of the node, the global substitution rate is obtained. To calculate the divergence time of a node, its MPL is divided by the global rate. Although the calculation is simple enough to be done by hand, Britton et al. (2002) provide a Pascal program, named path, for the analysis of larger trees.

Method 3: Tree-based maximum likelihood clock optimization (Langley & Fitch, 1974; Sanderson, 2003b)

The Langley–Fitch method uses maximum likelihood (ML) to optimize the global rate of substitutions, starting with a phylogeny for which branch lengths are known. Using the optimized, constant rate, branch lengths and divergence times are then estimated. Finally, the results plus the outcome of a chi-square test of rate constancy are reported. The method is implemented in r8s (Sanderson, 2003b).

Method 4: Character-based maximum likelihood clock optimization (Felsenstein, 1981; Swofford et al., 1996)

Under the assumption of rate constancy (and therefore under the constraints of ultrametricity), the global rate of substitution can also be optimized by ML directly from sequence data during the phylogenetic reconstruction. The likelihood is then a much more complex function of the data matrix, and the computing time is much higher than for the phylogenetic reconstruction without ultrametric constraints. Once the global rate of substitution is known, branch lengths and divergence times can be calculated.

The ML clock optimization method is implemented, at least partially, in paup* (Swofford, 2001), dnamlk(part of the phylip package; Felsenstein, 1993), baseml (part of the paml package; Yang, 1997), and other phylogenetic packages. It is perhaps the most widespread strategy commonly known as ‘enforcing the molecular clock’. Depending on the software, the additional constrain of a fixed tree topology can be provided by the user, which reduces the complexity of the likelihood function and the computing time significantly.

B. The reality (in most cases): rate heterogeneity or the relaxed clock

As sequences from multiple species began to accumulate during the 1970s, it became apparent that a clock is not always a good model for the process of molecular evolution (Langley & Fitch, 1974).Variation in rates of nucleotide substitution, both along a lineage and between different lineages, is known to be pervasive (Britten, 1986; Gillespie, 1986; Li, 1997). Several reasons are given for these deviations from the clock-like model of sequence evolution (some people call it relaxed or ‘sloppy’ clock): (1) generation time: a lineage with shorter generation time accelerates the clock because it shortens the time to accumulate and fix new mutations during genetic recombination (Ohta, 2002; but disputed by Whittle & Johnston, 2003); (2) metabolic rate: organisms with higher metabolic rates have increased rates of DNA synthesis and higher rates of nucleotide mutations than species with lower metabolic rates (Martin & Palumbi, 1993; Gillooly et al., 2005); (3) mutation rate: species-characteristic differences in the fidelity of the DNA replication or DNA repair machinery (Ota & Penny, 2003); and (4) effect of effective population size on the rate of fixation of mutations: the fixation of nearly neutral alleles is expected to be the greatest in small populations (according to the nearly neutral theory of DNA evolution; Ohta, 2002).

Because it is much easier to calculate divergence times under the clock model (with one global substitution rate), it is worth testing the data for clock-like behavior. This can be done by comparing how closely the ultrametric and additive trees fit the data. For example, the likelihood score of the best ultrametric tree can be compared with the (usually higher) likelihood score of the best additive tree and the difference between the two values (multiplied by two) can be checked for significance on a chi-square table with n − 2 degrees of freedom (where n is the number of terminals in the tree; likelihood ratio test; Felsenstein, 1988, 1993, 2004; Muse & Weir, 1992). Other tests that can be applied to identify parts of the tree that show significant rate deviations are the relative rates test (Wu & Li, 1985) and the Tajima test (Tajima, 1993). However, all these clock tests lack power for shorter sequences and will detect only a relatively low proportion of cases of rate variation for the types of sequence that are typically used in molecular clock studies (Bromham et al., 2000; Bromham & Penny, 2003). Failure to detect clock variation can cause systematic error in age estimates, because undetected rate variation can lead to significantly over- or underestimated divergence times (Bromham et al., 2000). If the null hypothesis of a constant rate is rejected, or if we have evidence suggesting that test results should be treated with caution, we might conclude that rates vary across the tree; in such cases the use of methods that try to model rate changes over the tree is necessary. This procedure is also supported by the fact that an increasing number of divergence time analyses show significant deviations from a molecular clock, especially if sequences of distantly related species are analysed (e.g. different orders or families; Yoder & Yang, 2000; Hasegawa et al., 2003; Springer et al., 2003). At least two groups of methods try to handle a relaxed clock: (1) methods that correct for rate heterogeneity before the dating and (2) methods that incorporate rate heterogeneity in the dating process, on the basis of specific rate change models.

(1) Methods that correct for rate heterogeneity

The first set of methods described below correct for the observed rate heterogeneity by pruning branches or dividing the global rate into several rate classes (local rates). After this first step, which makes the trees (at least partially) ultrametric, the methods estimate rates and divergence times using the molecular clock as described above.

Method 5: Linearized trees (Li & Tanimura, 1987; Takezaki et al., 1995; Hedges et al., 1996)

The linearized trees method involves three steps: first, identify all branches in a phylogeny that depart significantly from rate constancy by using a statistical test (e.g. relative rate tests, Li & Tanimura, 1987; or two-cluster and branch-length tests, Takezaki et al., 1995). Then, selectively eliminate (prune) those branches. Finally, construct a tree with the remaining branches (the linearized tree) under the assumption of rate constancy. This procedure relies on eliminating data that do not fit the expected global rate behavior, and in many cases, this approach would lead to a massive elimination of data. Cutler (2000), describing the procedure as ‘taxon shopping’, stated: ‘If one believes that rate overdispersion is intrinsic to the process of evolution (Gillespie, 1991) ( … ), then restricting one's analysis to taxa which happen to pass relative-rate tests is inappropriate’.

Method 6: Local rates methods (Hasegawa et al., 1989; Uyenoyama, 1995; Rambaut & Bromham, 1998; Yoder & Yang, 2000)

Apply two or more local molecular clocks on the tree by using a common model that characterizes the rate constancy on each part of the tree. One substantial difficulty is to identify correctly the branches or regions of a tree in which substitution rates significantly differ from the others; this difficulty explains why several methods of the ‘local rates type’ exist. Usually, biological (e.g. similar life form, generation time, metabolic rate) or functional (e.g. gene function) information is used for their recognition. Also conspicuous patterns in the transition/transversion rate of some branches (Hasegawa et al., 1989), the known differential function of alleles (Uyenoyama, 1995), the branch lengths obtained by ML under the absence of a molecular clock (Yoder & Yang, 2000; Yang & Yoder, 2003), or the rate constancy within a quartet of two pairs of sister groups (Rambaut & Bromham, 1998) can be used for the definition of local clock regions.

Probably the best known example of a local rates method is the ML-based local molecular clock approach (Hasegawa et al., 1989; Yoder & Yang, 2000; Yang & Yoder, 2003). This method pre-assigns evolutionary rates to some lineages while all the other branches evolve at the same rate. The local molecular clock model therefore lies between the two extremes of a global clock (assuming the same rate for all lineages) and the models that assume one independent rate for each branch (described below). The method allows for the definition of rate categories before the dating, which is a crucial and sensitive step for this method. Two different strategies can be used to pre-assign independent rates: (1) definition of rate categories: the user pre-assigns rate categories to specific branches based on the branch length estimates obtained without the clock assumption. For example, three different rate categories are defined, one for the outgroup lineage with long branch lengths, another for a crown group with short branch lengths, and a third for all other branches; (2) definition of rank categories: divide the taxa into several rate groups according to taxonomic ranks, e.g. order, suborder, family and genus, based on the assumption that closely related evolutionary lineages tend to evolve at similar rates (Kishino et al., 2001; Thorne & Kishino, 2002). After the definition of rate categories, the divergence times and rates for the different branch groups are estimated by ML optimization. The local molecular clock model is implemented in baseml (part of the paml package; Yang, 1997) and the program provides standard errors for estimated divergence times. baseml does not (yet) allow for the specification of fossil calibrations as lower or upper limits on node ages, as in r8s or multidivtime (see below). So far, nodal constraints based on fossils have to be specified as fixed ages. The local molecular clock method implemented in baseml is now able to analyse multiple genes or data partitions with different evolutionary characteristics simultaneously and allows the branch group rates to vary freely among data partitions (since version 3.14). For example, the models allow some branches to evolve faster at codon position 1 while they evolve slower at codon position 2 (Yang & Yoder, 2003).

The quartet method (Rambaut & Bromham, 1998) implemented in qdate is one of the simplest local clock methods. The method works with species quartets built by combining two pairs of species, each of which has a known date of divergence. For each pair, a rate can be estimated, and this allows to estimate the date of the divergence between the pairs (age of the quartet). Because groups with undisputed relationships can be chosen, the method avoids problems of topological uncertainties. On the other hand, it is difficult to combine estimates from multiple quartets in a meaningful way (Bromham et al., 1998).

Another program that allows the user to assign different rates and substitution models to different parts of a tree is rhino by Rambaut (2001). This ML local clock implementation has been used so far mainly for comparing substitution rates of different lineages by using likelihood ratio tests (e.g. Bromham & Woolfit, 2004).

A fourth implementation of the local molecular clock approach has been realized in the software r8s (Sanderson, 2003b). It follows the Langley–Fitch method described above (method 3; Langley & Fitch, 1974), but instead of only using one constant rate of substitution, the method permits the user to specify multiple rate parameters and assign them to the appropriate branches or branch groups. After such a definition of rate categories, the divergence times and rates for the different branch groups are estimated by ML optimization.

A fifth program, beast (Drummond & Rambaut, 2003), uses Bayesian inference and the Markov chain Monte Carlo (MCMC) procedure to derive the posterior distribution of local rates and times. As the software does not require a starting tree topology, it is able to account for phylogenetic uncertainty. Additionally, it permits the definition of calibration distributions (such as normal, lognormal, exponential or gamma) instead of simple point estimates or age intervals.

(2) Methods that estimate divergence times by incorporating rate heterogeneity

Methods that relax rate constancy must necessarily be guided by specifications about how rates are expected to change among lineages. Because rates and divergence times are confounded, it is not possible to estimate one without making assumptions regarding the other (Aris-Brosou & Yang, 2002). Recently, it has been questioned that divergence times without a molecular clock can be estimated consistently just by increasing the sequence lengths (Britton, 2005). However, available methods try to introduce rate heterogeneity on the basis of three different approaches: one is the concept of temporal autocorrelation in rates (Methods 7–11; Gillespie, 1991), another is the stationary process of rate change (Method 13; Cutler, 2000), and a third is the compound Poisson process of rate change (Method 14; Huelsenbeck et al., 2000). All methods estimate branch lengths without assuming rate constancy, and then model the distribution of divergence times and rates by minimizing the discrepancies between branch lengths and the rate changes over the branches. The methods differ not only in their models, but also in their strategy to incorporate age constraints (calibration points) into the analysis.

(2a) Methods that model rate change according to the standard Poisson process and the concept of rate autocorrelation

An autocorrelation limits the speed with which a rate can change from an ancestral lineage to a descendant lineage (Sanderson, 1997). As the rate of substitution evolves along lineages, daughter lineages might inherit their initial rate from their parental lineage and evolve new rates independently (Gillespie, 1991). Temporal autocorrelation is an explicit a priori criterion to guide inference of among-lineage rate change and is implemented in several methods (Magallón, 2004). Readers who want to learn more about the theory of temporal autocorrelation are referred to publications by Takahata (1987), Gillespie (1991), Sanderson (1997) and Thorne et al. (1998).

Method 7: Nonparametric rate smoothing (Sanderson, 1997, 2003b)

By analogy to smoothing methods in regression analysis, the nonparametric rate smoothing (NPRS) method attempts to simultaneously estimate unknown divergence times and smooth the rapidity of rate change along lineages (Sanderson, 1997, 2003b). To smooth rate changes, the method contains a nonparametric function that penalizes rates that change too quickly from branch to neighboring branch, which reflects the idea of autocorrelation of rates. In other words: the local transformations in rate are smoothed as the rate itself changes over the tree by minimizing the ancestral-descendant changes of rate. Since the penalty function includes unknown times, an optimality criterion based on this penalty (the sum of squared differences in local rate estimates compared from branch to neighboring branch; least-squares method) permits an estimation of the divergence times. NPRS is implemented in r8s (Sanderson, 2003b) and treeedit (Rambaut & Charleston, 2002). With r8s, but not with treeedit, the user is able to add one or more calibration constraints to permit scaling of rates and times to absolute temporal units. A serious limitation of NPRS is its tendency to overfit the data, leading to rapid fluctuations in rate in regions of a tree that have short branches (Sanderson, 2003b). In r8s, but not in treeedit, two strategies to provide confidence intervals on the estimated parameters are available: (1) a built-in procedure that uses the curvature of the likelihood surface around the parameter estimate (after Cutler, 2000); and (2) the calculation of an age distribution based on chronograms generated from a large number of bootstrapped data sets. The central 95% of the age distribution provide the confidence interval (Efron & Tibshirani, 1993; Baldwin & Sanderson, 1998; Sanderson, 2003b). This robust, but time-consuming procedure can be facilitated by using a collection of Perl scripts written by Eriksson (2002) called the r8s-bootstrap-kit.

Method 8: Penalized likelihood (Sanderson, 2002, 2003b)

Penalized likelihood (PL) combines likelihood and the nonparametric penalty function used in NPRS. This semi-parametric technique attempts to combine the statistical power of parametric methods with the robustness of nonparametric methods. In effect, it permits the specification of the relative contribution of the rate smoothing and the data-fitting parts of the estimation procedure: a roughness penalty can be assigned as smoothing parameter in the input file. The smoothing parameter can be estimated by running a cross-validation procedure, which is a data-driven method for finding the optimal level of smoothing. If the smoothing parameter is large, the function is dominated by the roughness penalty, and this leads to a clock-like model. If it is low, the smoothing will be effectively unconstrained (similar to NPRS). So far, PL is only implemented in r8s (Sanderson, 2003b). As with NPRS, the user is able to add one or more calibration constraints to permit scaling of rates and times to real units. The same two strategies for providing confidence intervals on the estimated parameters as for the NPRS method are also available for PL.

Method 9: Heuristic rate smoothing (AHRS) for ML estimation of divergence times (Yang, 2004)

The heuristic rate-smoothing (AHRS) algorithm for ML estimation of divergence times (Yang, 2004) has a number of similarities with PL and the two Bayesian dating methods described above. It involves three steps: (1) estimation of branch lengths in the absence of a molecular clock; (2) heuristic rate smoothing to estimate substitution rates for branches together with divergence times, and classification of branches into several rate classes; and (3) ML estimation of divergence times and rates of the different branch groups. The AHRS algorithm differs slightly from PL: where Sanderson (2002) uses a Poisson approximation to fit the branch lengths, the AHRS algorithm uses a normal approximation of the ML estimates of branch lengths. Furthermore, the rate-smoothing algorithm in AHRS is used only to partition branches on each gene tree into different rate groups, and plays therefore a less significant role in this method than in PL. In contrast to the Bayesian approaches described above, AHRS optimizes rates, together with divergence times, rather than averaging over them in an MCMC procedure. Another difference to the Bayesian dating methods is that the AHRS algorithm does not need any prior for divergence times, which can be an advantage. On the other hand, it is not possible to specify fossil calibrations as lower or upper bounds on node ages, as in r8s or multidivtime— so far, nodal constraints based on fossils have to be specified as fixed ages. The AHRS algorithm is implemented in the baseml and codeml programs, which are parts of the paml package (since version 3.14 final; Yang, 1997). Those programs provide standard errors for estimated divergence times. As multidivtime, the AHRS algorithm implemented in baseml is able to analyse multiple genes/loci with different evolutionary characteristics simultaneously.

Method 10: Bayesian implementation of rate autocorrelation in multidivtime (Thorne et al., 1998; Kishino et al., 2001; Thorne & Kishino, 2002)

The Bayesian dating method implemented in multidivtime (Thorne et al., 1998; Kishino et al., 2001; Thorne & Kishino, 2002) uses a fully probabilistic and high parametric model to describe the change in evolutionary rate over time and uses the MCMC procedure to derive the posterior distribution of rates and times. In effect, the variation of rates is addressed by letting the MCMC algorithm assign rates to different parts of the tree, and then sampling from the patterns that are possible. By this way, the MC techniques average over various patterns of rates along the tree. The result is a posterior distribution of rates and times derived from a prior distribution. For the assignments of rates to different branches in the tree, rates are drawn from a lognormal distribution, and a hyperparameter ν (also called Brownian motion constant) describes the amount of autocorrelation. The internal node age proportions are described as a dirichlet distribution, which represents the idea to model evolutionary lineages due to speciation, but is not intended as a detailed model of speciation and extinction processes. In practice, the most commonly used procedure is divided into three different steps and programs, and is described in more detail in a step-by-step manual (Rutschmann, 2005). (1) In the first step, model parameters for the F84 + G model (Kishino & Hasegawa, 1989; Felsenstein, 1993) are estimated by using the program baseml, which is part of the paml package (Yang, 1997). (2) By using these parameters, the ML of the branch lengths is estimated, together with a variance–covariance matrix of the branch length estimates by using the program estbranches (Thorne et al., 1998). (3) The third program, multidivtime (Kishino et al., 2001; Thorne & Kishino, 2002), is then able to approximate the posterior distributions of substitution rates and divergence times by using a multivariate normal distribution of estimated branch lengths and running an MCMC procedure.

multidivtime asks the user to specify several priors, such as the mean and the variance of the distributions for the initial substitution rate at the root node or the prospective age of the root node. Additionally, constraints on nodal ages can be specified as age intervals. The program provides Bayesian credibility intervals for estimated divergence times and substitution rates. In contrast to NPRS and PL (implemented in r8s), multidivtime is able to account for multiple genes/loci with different evolutionary characteristics. Such a simultaneous analysis of multiple genes may improve the estimates of divergence times which are shared across genes.

Method 11: phybayes (Aris-Brosou & Yang, 2001, 2002)

The phybayes program (Aris-Brosou & Yang, 2001, 2002) is similar to the multidivtime Bayesian approach described above. It also uses a fully probabilistic and high parametric model to describe the change in evolutionary rate over time and uses the MCMC procedure to derive the posterior distribution of rates and times. But the method is more versatile in terms of possibilities of defining the prior distributions, as it allows for the usage of models that explicitly describe the processes of speciation and extinction. For the rates of evolution, it offers a choice of six different rate distributions to model the autocorrelated rate change from an ancestor to a descendent branch (lognormal, ‘stationarized’-lognormal, truncated normal, Ornstein–Uhlenbeck process, gamma and exponential, plus the definition of two model-related hyperparameters), whereas in multidivtime, rates are always drawn from a lognormal distribution. The prior distribution of divergence times is generated by a process of cladogenesis, the generalized birth and death process with species sampling (Yang & Rannala, 1997), a model that assumes a constant speciation and extinction rate per lineage (multidivtime uses a dirichlet distribution of all internal node age proportions to generalize the rooted tree structure; Kishino et al., 2001). In contrast to multidivtime, it is not possible to analyse multiple genes simultaneously with phybayes, and the program does not allow for an a priori integration of nodal constraints (calibration points).

(2b) Methods that model rate change with other concepts than rate autocorrelation

Method 12: Bayesian implementation of rate variation in beast (Drummond et al., submitted)

Similar to phybayes, the variable rate methods implemented in beast use Bayesian inference and the MCMC procedures to derive the posterior distribution of rates and times. In contrast, in addition to the autocorrelated models like the one implemented in multidivtime, a range of different, novel models have been implemented, where the rates are drawn from a distribution (with various distributions on offer; Drummond et al. submitted). These models have a couple of interesting features: (1) the parameters of the distributions can be estimated (instead of being specified), and (2) the correlation of rates between adjacent branches can be tested (if > 0, this would indicate some inherence of rates). Another unique feature of the software is that it does not require a starting tree topology, which allows it to account for phylogenetic uncertainty. Additionally, beast permits the definition of calibration distributions (such as normal, lognormal, exponential or gamma) to model calibration uncertainty instead of simple point estimates or age intervals. For the other, non-calibrated nodes, there is no specific process that describes the prior distribution of divergence times (they are uniform over a range from 0 to very large). beast allows the user to simultaneously analyse multiple data sets/partitions with different substitution models, and provides Bayesian credibility intervals.

Method 13: Overdispersed clock method (Cutler, 2000)

While all methods described so far assume fundamentally that the number of substitutions in a lineage is Poisson-distributed, the overdispersed clock model (Cutler, 2000) assumes that the number of substitutions in a lineage is stationary. Unlike a Poisson process, the variance in the number of substitutions will not necessarily be equal to the mean. Under this model, which treats rate changes according to a stationary process, ML estimates of divergence times can be calculated. The method is implemented in a c program, which is available directly from the author (Cutler, 2000). It is possible to incorporate multiple calibration points.

Method 14: Compound Poisson process method (Huelsenbeck et al., 2000)

As all methods described above (with the exception of the overdispersed clock model; Cutler, 2000), the compound Poisson process method uses a model that assumes that nucleotide substitutions occur along branches of the tree according to a Poisson process. But in addition to the other models, it assumes that another, independent Poisson process generates events of substitution rate change. Therefore, this second Poisson process is superimposed on the primary Poisson process of molecular substitution (hence the name compound Poisson process), and introduces changes (in form of discrete jumps) in the rate of substitution in different branches of the phylogeny. Rates on the tree are then determined by the number of rate-change events, the point in the tree where they occur, and the magnitude of change at each event (Huelsenbeck et al., 2000; Magallón, 2004). These parameters are estimated by using Bayesian inference (MCMC integration). One of the main advantages of treating rate variation as a compound Poisson process is that the model can introduce rate variation at any point of the phylogenetic tree; all other methods assume that substitution rates change only at speciation events (nodes; Huelsenbeck et al., 2000). The method is implemented in a c program, but it is not meant for being available for the community so far as there is no documentation (yet).


Molecular dating is a rapidly developing field, and the methods generated so far are still far from being perfect (Sanderson et al., 2004). Molecular dating estimates derived from different inference methods can be in conflict, and so can the results obtained with different taxon sampling, gene sampling and calibration strategies (see below).

It should be clear that there is no single ‘best’ molecular dating method; rather, all approaches have advantages and disadvantages. For example, the methods reviewed here differ in the type of input data they use and process: the first group of methods (NPRS and PL) bases their analysis on input phylograms with branch lengths. Therefore, they are not able to incorporate branch length errors or parameters of the substitution model into the dating analysis (this has to be done prior to the dating). On the other hand, these methods are fast and versatile, because they can process phylogenies generated from parsimony, likelihood or Bayesian analyses. The second group of methods (multidivtime, phybayes, AHRS) uses one ‘true’ tree topology to assess rates and divergence times and estimate the branch lengths themselves. Therefore, they are able to account for the branch length errors described above, but still base their analyses on a fixed, user supplied tree topology. The third group of methods (ML with clock and beast) directly calculate ultrametric phylogenies based only on sequence data and model parameters, a procedure that also allows them to incorporate topological uncertainties. Computationally, this can be very expensive, especially in the case of a variable rate model, and with a high number of taxa.

As I have not tested all software packages and methods described here myself, this review is not a comparison based on practical experiments. However, the papers that first described these approaches always report on their performance on simulated real data. Three papers that compare some of the described methods on original data sets or simulated data have been published recently: Yang and Yoder (2003) compared methods 3, 5 and 8 (see Tables 1–3) by analysing a mouse lemur data set using multiple gene loci and calibration points; Pérez-Losada et al. (2004) compared methods 5, 7, 8 and 9 in their analysis of a nuclear ribosomal 18S data set to test the evolutionary radiation of the Thoracian Barnacles by comparing different calibration points independently; and Ho et al. (2005b) compared the performance of methods 8, 10 and 11 by using simulated data. Currently, software developers are starting to integrate different methods in the same software (e.g. baseml, Yang, 1997; beast, Drummond et al., 2005; future versions of mrbayes > version 3.1, Huelsenbeck & Ronquist, 2001). This recent trend is thus allowing users to try out different methods based on their own data sets, and then compare the results.

Although this review focuses on the dating methods themselves, at least a few links to key articles about more general or very specific issues related to molecular dating are given here: (1) general reviews: Magallón (2004) wrote a comprehensive review of the theory of molecular dating, which also discusses paleontological dating methods and the uncertainties of the paleontological record. The classification of the molecular methods described here is based on her publication. Another recent review has been written by Sanderson et al. (2004), which discusses the advantages and disadvantages of Bayesian vs. smoothing molecular dating methods, summarizes the inferred ages of the major clades of plants and lists many published dating applications that investigated recent plant radiations and/or tested biogeographical hypotheses. Finally, Welch and Bromham (2005), Bromham and Penny (2003), Arbogast et al. (2002) and Wray (2001) wrote more general reviews on the issue of estimating divergence times. (2) For specific discussions about the crucial and controversial role of calibration, refer to the following papers: Where on the tree and how should we assign ages from fossils or geologic events? —Near & Sanderson (2004), Conti et al. (2004) and Rutschmann et al. (2004). How can we deal with the incompleteness of the fossil record? —Tavaréet al. (2002), Foote et al. (1999) and Foote and Sepkoski (1999). How should we constrain the age of the root and deal with the methodological handicap of asymmetric random variables in molecular dating? —Rodríguez-Trelles et al. (2002) and Sanderson and Doyle (2001). (3) For the recent debate about the precision of divergence time estimates, refer to the following papers: Should we extrapolate substitution rates accross different evolutionary timescales? —Ho et al. (2005a). How can we account for the various uncertainties related to the calibration and the dating procedure. How should we report and interpret error estimates, and should we use secondary calibration points? —Hedges and Kumar (2003), Graur and Martin (2004), Reisz and Müller (2004) and Hedges and Kumar (2004). (4) For questions related to the influence of taxon sampling on estimating divergence times under various dating methods, see Linder et al. (2005) and Sanderson and Doyle (2001). (5) For the influence of gene sampling read Heckman et al. (2001). (6) Theoretical problems and strategies connected to the molecular dating of supertrees are discussed in Vos and Mooers (2004) and Bryant et al. (2004). (7) For ‘special’ dating problems like estimating the substitution rate when the ages of different terminals are known (e.g. from virus sequences that were isolated at different dates; implemented in the software tipdate and also in beast), refer to Rambaut (2000).

Finally, I would like to add a suggestion to those among us who write software: although the development of graphical user interfaces (GUIs) is certainly not a first priority, GUIs would simplify significantly molecular dating analyses for the average biologist. Modern tools (like the open source Qt 4 C++ class library by trolltech; make the development of fast, native and multiplatform GUI applications easier than ever before. I do not share the widespread concerns about stupid ‘analyse-by-click’ users. On the contrary: comprehensive user interfaces allow the user to explore and detect all the important features a method offers.


I thank Peter Linder, Chloé Galley, Cyril Guibert, Chris Hardy, Timo van der Niet and Philip Moline for organizing the meeting, ‘Recent Floristic Radiations in the Cape Flora’, and for inviting me to participate in it as a discussion facilitator. All authors and co-authors of the methods described in this article who contributed by sending software or manuals, or answering questions, are greatly acknowledged, especially Jeff Thorne, Ziheng Yang, Andrew Rambaut, Michael Sanderson, David Cutler and Bret Larget. Finally, I'm very grateful to Elena Conti, Susana Magallón, Susanne Renner and Tim Barraclough for critically reading the manuscript.