## INTRODUCTION

The use of DNA sequences to estimate the timing of evolutionary events is increasingly popular. The idea of dating evolutionary divergences using calibrated sequence differences was first proposed in 1965 by Zuckerkandl and Pauling (1965). The authors postulated that the amount of difference between the DNA molecules of two species is a function of the time since their evolutionary separation. This was shown by comparing protein sequences (hemoglobins) from different species and further comparing amino acid substitution rates with ages estimated from fossils. Based on this central idea, molecular dating has been used in countless studies as a method to investigate mechanisms and processes of evolution. For example, the timing of the eucaryotic evolution (Douzery *et al*., 2004), the Early Cambrian origin of the main phyla of animals (Cambrian explosion; Wray *et al*., 1996; Smith & Peterson, 2002; Aris-Brosou & Yang, 2003), the replacement of dinosaurs by modern birds and mammals in the late Tertiary (Madsen *et al*., 2001), and the age of the last common ancestor of the main pandemic strain of human immunodeficiency virus (HIV; Korber *et al.*, 2000) have all been investigated using molecular dating. Also in plants, there are numerous studies where molecular dating methods have been used to investigate the timeframe of evolutionary events, e.g. for testing biogeographical hypotheses or to investigate the causes of recent radiations (for a more complete review see Sanderson *et al*., 2004). For example, dating techniques have been applied on taxa from very different taxonomic levels, e.g. to infer the age of plastid-containing eucaryotes (Yoon *et al*., 2004), land plants (Sanderson, 2003a), tracheophytes (Soltis *et al*., 2002), angiosperms (Magallón & Sanderson, 2001; Sanderson & Doyle, 2001; Wikström *et al*., 2001, 2003; Bell *et al*., 2005), monocot–dicot divergence (Chaw *et al*., 2004), Asterids (Bremer *et al*., 2004), Myrtales (Sytsma *et al*., 2004), Crypteroniaceae (Conti *et al*., 2002) and *Fuchsia* (Berry *et al*., 2004).

The goal of this article is to give a short overview on the most commonly used molecular dating methods. To allow for easier comparisons, the different methods for estimating divergence times are also summarized in Tables 1–3.

- *
Except model parameters, priors or calibration constraints. - †
SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval. - ‡
Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source Unix environment. - §
paup* (Swofford, 2001), dnamlk (part of the phylip package; Felsenstein, 1993), baseml (part of the paml package; Yang, 1997), mrbayes (Huelsenbeck and Ronquist, 2001), beast (Drummond & Rambaut, 2003), etc. - ¶
If the user provides a tree topology in addition to the sequences, the optimization runs much faster.
| ||||

No. | 1 | 2 | 3 | 4 |

Method | Linear regression | Mean path length | Langley–Fitch | ML with clock |

Author(s) | Nei (1987); Li & Graur (1991) | Bremer & Gustafsson (1997) | Langley & Fitch (1974) | Felsenstein (1981) |

Software where it is implemented | — | path (Britton et al., 2002) | r8s (Sanderson, 2003b) | Many phylogenetic packages§ |

Current version | — | — | 1.7 | — |

Runs on operating system(s) | — | Unix‡/Linux with Gpc 1999 | Unix‡/Linux | Depends on program |

Optimization strategy | — | — | Maximum likelihood | Maximum likelihood |

Input data* | Distance matrix | Phylogram (with bl) | Phylogram (with bl) | Sequence data (+ tree topology¶) |

Allows multiple calibration points | Yes | No | Yes | Depends on program |

Accounts for multiple data sets/partitions | No | No | No | No |

Provides error estimates (SE/CI/CrI)† | Yes, CI | Yes, mean path length CI | Yes, internal and bootstrap CIs | Depends on program |

Ease of use | Easy | Easy | Medium | Depends on program |

Popularity according to literature | Very popular until about 10 years ago | Popular | Less popular | Very popular |

- *
Except model parameters, priors, or calibration constraints. - †
SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval. - ‡
Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source Unix environment.
| |||

No. | 5 | ||

Method | Linearized trees | ||

Author(s) | Li & Tanimura (1987) | ||

Software where it is implemented | — | ||

Current version | — | ||

Runs on operating system(s) | — | ||

Optimization strategy | Depends on clock method | ||

Input data* | Phylogram (with bl) | ||

Allows multiple calibration points | Depends on clock method | ||

Accounts for multiple data sets/partitions | No | ||

Provides error estimates (SE/CI/CrI)† | Depends on clock method | ||

Ease of use | Easy | ||

Popularity according to literature | Less popular | ||

No. | 6 | ||

Method | Local molecular clock | ||

Author(s) | Hasegawa et al. (1989); Uyenoyama (1995); Rambaut & Bromham (1998); Yoder & Yang (2000) | ||

Software where it is implemented | baseml (paml; Yang, 1997) | qdate (Rambaut & Bromham, 1998) | rhino (Rambaut, 2001) |

Current version | 3.14 | 1.1 | 1.2 |

Runs on operating system(s) | Unix‡/Linux/Windows | Mac OS 9.x/Unix‡/Linux/Windows | Mac OS 9.x/Unix‡/Linux/Windows |

Optimization strategy | Maximum likelihood | Maximum likelihood | Maximum likelihood |

Input data* | Phylogram (with bl) | Sequence data + quartet definition | Sequence data + tree topology |

Allows multiple calibration points | Yes | Yes | Yes |

Accounts for multiple data sets/partitions | Yes | No | Only codon position partitions |

Provides error estimates (SE/CI/CrI)† | Yes, SE | Yes, CI | Yes, CI |

Ease of use | Medium | Easy | Medium |

Popularity according to literature | Popular | Less popular | Popular, but more for rate comparisons |

Software where it is implemented | r8s (Sanderson, 2003b) | beast (Drummond & Rambaut, 2003) | |

Current version | 1.7 | 1.3 | |

Runs on operating system(s) | Unix‡/Linux | Unix‡/Linux/Windows, requires Java 1.4 | |

Optimization strategy | Smoothing/minimizing Optimality function | Bayesian MCMC | |

Input data* | Phylogram (with bl) | Sequence data | |

Allows multiple calibration points | Yes | Yes | |

Accounts for multiple data sets/partitions | No | Yes | |

Provides error estimates (SE/CI/CrI)† | Yes, CI, but separate bootstrapping is required | Yes, CrI | |

Ease of use | Medium | Medium | |

Popularity according to literature | Popular, but more for NPRS and PL methods | Becoming increasingly popular |

- *
Except model parameters, priors, or calibration constraints. - †
SE, standard error; CI, 95% confidence interval; CrI, 95% Bayesian credibility interval. - ‡
Unix software also runs under Mac OS X, as this operating system bases on Darwin, an open source UNIX environment. - §
Plus one hyperparameter (autocorrelation value ν). - ¶
Plus several hyperparameters (describing speciation and extinction rate). **Implements a range of relaxed clock models by Drummond *et al*. (submitted), but also the models by Thorne and Kishino (2002) and Aris-Brosou and Yang (2002).††Additionally, two graphical user interfaces called beauti and tracer facilitate data setup and output analysis. - ‡‡
The method is implemented in a c program, but it is not meant for public access (as there is no documentation). - ▾
For the age of calibration points, different prior distributions are available, such as normal, lognormal, exponential or gamma.
| |||

No. | 7 | 8 | |

Method | NPRS | PL | |

Author(s) | Sanderson (1997) | Sanderson (2002) | |

Software where it's implemented | r8s (Sanderson, 2003b) | treeedit (Rambaut & Charleston, 2002) | r8s (Sanderson, 2003b) |

Current version | 1.7 | 1.0a10 | 1.7 |

Runs on operating system(s) | Unix‡/Linux | Mac OS 8.6 or later, including Mac OS X | Unix‡/Linux |

Optimization strategy | Smoothing/minimizing optimality function | Smoothing/minimizing optimality function | Smoothing/minimizing optimality function |

Input data* | Phylogram (with bl) | Phylogram (with bl) | Phylogram (with bl) |

Model of rate evolution | Rate autocorrelation | Rate autocorrelation | Rate autocorrelation |

Allows multiple calibration points | Yes | No | Yes |

Accounts for multiple data sets/partitions | No | No | No |

Provides error estimates (SE/CI/CrI)† | Yes, internal and boostrap CIs | No | Yes, CI, but separate bootstrapping is required |

Accounts for phylogenetic uncertainty | No | No | No |

Ease of use | Medium | Easy (graphical user interface) | Medium |

Popularity according to literature | Popular | Popular | Becoming increasingly popular |

No. | 9 | 10 | 11 |

Method | AHRS | multidivtime | phybayes |

Author(s) | Yang (2004) | Thorne et al. (1998), Kishino et al. (2001) | Aris-Brosou & Yang (2002) |

Software where it is implemented | baseml (paml; Yang, 1997) | multidivtime (Thorne & Kishino, 2002) | phybayes (Aris-Brosou & Yang, 2001) |

Current version | 3.14 (since 3.14 beta 5) | 9/25/03 | 0.2e |

Runs on operating system(s) | Unix‡/Linux/Windows | Unix‡/Linux/Windows | Unix‡/Linux/Windows |

Optimization strategy | Maximum likelihood | Bayesian MCMC | Bayesian MCMC |

Input data* | Sequence data + tree topology | Sequence data + tree topology | Sequence data + tree topology |

Model of rate evolution | Rate autocorrelation | Rate autocorrelation | Rate autocorrelation |

Rates are drawn from | — | Lognormal distribution | Six different distributions |

Prior distribution of divergence time | — | Described as dirichlet distribution§ | Described as generalized birth–death process¶ |

Allows multiple calibration points | Yes | Yes, as user-specified intervals | No |

Accounts for multiple data sets/partitions | Yes | Yes | No |

Provides error estimates (SE/CI/CrI)† | Yes, CI | Yes, CrI | Yes, CrI, but must be calculated by the user |

Accounts for phylogenetic uncertainty | No | No, but accepts polytomies in input tree | No |

Ease of use | Medium | Medium (use step-by-step manual; Rutschmann, 2005) | Medium |

Popularity according to literature | Not yet very popular | Becoming increasingly popular | Not yet very popular |

No. | 12 | 13 | 14 |

Method | Variable rate models in beast | Overdispersed clock | Compound Poisson |

Author(s) | Drummond et al. (submitted) | Cutler (2000) | Huelsenbecket al. (2000) |

Software where it is implemented | beast (Drummond et al. submitted) | c program (Cutler, 2000) | c program (Huelsenbecket al., 2000) |

Current version | 1.3 | Dating5.c | —‡‡ |

Runs on operating system(s) | Unix‡/Linux/Windows, requires Java 1.4 | Unix‡/Linux/Windows | —‡‡ |

Optimization strategy | Bayesian MCMC | Maximum likelihood | Bayesian MCMC |

Input data* | Sequence data | Phylogram (with bl) | —‡‡ |

Model of rate evolution | Various models implemented** | Doubly stochastic Poisson process | Compound poisson process |

Rates are drawn from | Different distributions such as exp or lognormal | — | —‡‡ |

Prior distribution of divergence time | No specific description, priors are uniform▾ | — | —‡‡ |

Allows multiple calibration points | Yes | Yes | —‡‡ |

Accounts for multiple data sets/partitions | Yes | No | —‡‡ |

Provides error estimates (SE/CI/CrI)† | Yes, CrI | Yes, CI | —‡‡ |

Accounts for phylogenetic uncertainty | Yes | No | —‡‡ |

Ease of use | Medium, a range of tutorials is available†† | Medium | —‡‡ |

Popularity according to literature | Becoming increasingly popular | Not yet very popular | —‡‡ |

### A. The ideal case scenario: a molecular clock and one global rate of substitution

In the special case of a molecular clock, all branches of a phylogenetic tree evolve at the same, global substitution rate. The clock-like tree is ultrametric, which means that the total distance between the root and every tip is constant.

#### Method 1: Linear regression (Nei, 1987; Li & Graur, 1991; Hillis *et al*., 1996; Sanderson, 1998)

In an ultrametric tree, nodal depths can be converted easily into divergence times, because the molecular distance between each member of a sister pair and their most recent common ancestor is one half of the distance between the two sequences. If the divergence time for at least one node is known (calibration point), the global rate of substitution can be estimated and, based on that, divergence times for all nodes can be calculated by linear regression of the molecular distances (Li & Graur, 1991; Sanderson, 1998).

In other words: the observed number of differences *D* between two given sequences is a function of the constant rate of substitution *r* (subst. * site^{−1} * Myr^{−1}) and the time *t* (Myr) elapsed since the lineage exists. If we have one calibration point (e.g. if we can assign a fossil or geological event to one specific node in the tree), we can calculate the global substitution rate as follows: *r* = *D*/2*t*. In a second step, we can use the global rate *r* to calculate the divergence time between any other two sequences: *t* = *D*/2*r*.

In those cases where we have more than one calibration point, we can plot all calibration nodes in an age-genetic distance diagram, build a (weighted) regression line, whose slope is a function of the global substitution rate, and then interpolate (or extrapolate) the divergence times for the unknown nodes. The scatter of data points around the regression line provides then a confidence interval around estimated ages.

Although it is possible to estimate the global substitution rate *r* over an entire phylogeny (see methods 3 and 4), pairwise sequence comparisons have provided the most widely used approach to molecular dating until approximately 5 years ago, probably because the calculations can be done easily with any statistical or spreadsheet software.

#### Method 2: Tree-based mean path length method (Bremer & Gustafsson, 1997; Britton *et al*., 2002)

The *mean path length method* estimates rates and divergence times based on the mean path length (MPL) between a node and each of its terminals. By calculating the MPL between a calibration node and all its terminals, and dividing it by the known age of the node, the global substitution rate is obtained. To calculate the divergence time of a node, its MPL is divided by the global rate. Although the calculation is simple enough to be done by hand, Britton *et al*. (2002) provide a Pascal program, named path, for the analysis of larger trees.

#### Method 3: Tree-based maximum likelihood clock optimization (Langley & Fitch, 1974; Sanderson, 2003b)

The *Langley–Fitch method* uses maximum likelihood (ML) to optimize the global rate of substitutions, starting with a phylogeny for which branch lengths are known. Using the optimized, constant rate, branch lengths and divergence times are then estimated. Finally, the results plus the outcome of a chi-square test of rate constancy are reported. The method is implemented in r8s (Sanderson, 2003b).

#### Method 4: Character-based maximum likelihood clock optimization (Felsenstein, 1981; Swofford et al., 1996)

Under the assumption of rate constancy (and therefore under the constraints of ultrametricity), the global rate of substitution can also be optimized by ML directly from sequence data during the phylogenetic reconstruction. The likelihood is then a much more complex function of the data matrix, and the computing time is much higher than for the phylogenetic reconstruction without ultrametric constraints. Once the global rate of substitution is known, branch lengths and divergence times can be calculated.

The *ML clock optimization method* is implemented, at least partially, in paup* (Swofford, 2001), dnamlk*(*part of the phylip package; Felsenstein, 1993), baseml (part of the paml package; Yang, 1997), and other phylogenetic packages. It is perhaps the most widespread strategy commonly known as ‘enforcing the molecular clock’. Depending on the software, the additional constrain of a fixed tree topology can be provided by the user, which reduces the complexity of the likelihood function and the computing time significantly.

### B. The reality (in most cases): rate heterogeneity or the relaxed clock

As sequences from multiple species began to accumulate during the 1970s, it became apparent that a clock is not always a good model for the process of molecular evolution (Langley & Fitch, 1974).Variation in rates of nucleotide substitution, both along a lineage and between different lineages, is known to be pervasive (Britten, 1986; Gillespie, 1986; Li, 1997). Several reasons are given for these deviations from the clock-like model of sequence evolution (some people call it relaxed or ‘sloppy’ clock): (1) generation time: a lineage with shorter generation time accelerates the clock because it shortens the time to accumulate and fix new mutations during genetic recombination (Ohta, 2002; but disputed by Whittle & Johnston, 2003); (2) metabolic rate: organisms with higher metabolic rates have increased rates of DNA synthesis and higher rates of nucleotide mutations than species with lower metabolic rates (Martin & Palumbi, 1993; Gillooly *et al*., 2005); (3) mutation rate: species-characteristic differences in the fidelity of the DNA replication or DNA repair machinery (Ota & Penny, 2003); and (4) effect of effective population size on the rate of fixation of mutations: the fixation of nearly neutral alleles is expected to be the greatest in small populations (according to the nearly neutral theory of DNA evolution; Ohta, 2002).

Because it is much easier to calculate divergence times under the clock model (with one global substitution rate), it is worth testing the data for clock-like behavior. This can be done by comparing how closely the ultrametric and additive trees fit the data. For example, the likelihood score of the best ultrametric tree can be compared with the (usually higher) likelihood score of the best additive tree and the difference between the two values (multiplied by two) can be checked for significance on a chi-square table with *n* − 2 degrees of freedom (where *n* is the number of terminals in the tree; likelihood ratio test; Felsenstein, 1988, 1993, 2004; Muse & Weir, 1992). Other tests that can be applied to identify parts of the tree that show significant rate deviations are the relative rates test (Wu & Li, 1985) and the Tajima test (Tajima, 1993). However, all these clock tests lack power for shorter sequences and will detect only a relatively low proportion of cases of rate variation for the types of sequence that are typically used in molecular clock studies (Bromham *et al*., 2000; Bromham & Penny, 2003). Failure to detect clock variation can cause systematic error in age estimates, because undetected rate variation can lead to significantly over- or underestimated divergence times (Bromham *et al*., 2000). If the null hypothesis of a constant rate is rejected, or if we have evidence suggesting that test results should be treated with caution, we might conclude that rates vary across the tree; in such cases the use of methods that try to model rate changes over the tree is necessary. This procedure is also supported by the fact that an increasing number of divergence time analyses show significant deviations from a molecular clock, especially if sequences of distantly related species are analysed (e.g. different orders or families; Yoder & Yang, 2000; Hasegawa *et al*., 2003; Springer *et al*., 2003). At least two groups of methods try to handle a relaxed clock: (1) methods that *correct* for rate heterogeneity before the dating and (2) methods that *incorporate* rate heterogeneity in the dating process, on the basis of specific rate change models.

### (1) Methods that *correct* for rate heterogeneity

The first set of methods described below correct for the observed rate heterogeneity by pruning branches or dividing the global rate into several rate classes (local rates). After this first step, which makes the trees (at least partially) ultrametric, the methods estimate rates and divergence times using the molecular clock as described above.

#### Method 5: Linearized trees (Li & Tanimura, 1987; Takezaki *et al.*, 1995; Hedges *et al.*, 1996)

The *linearized trees method* involves three steps: first, identify all branches in a phylogeny that depart significantly from rate constancy by using a statistical test (e.g. relative rate tests, Li & Tanimura, 1987; or two-cluster and branch-length tests, Takezaki *et al*., 1995). Then, selectively eliminate (prune) those branches. Finally, construct a tree with the remaining branches (the linearized tree) under the assumption of rate constancy. This procedure relies on eliminating data that do not fit the expected global rate behavior, and in many cases, this approach would lead to a massive elimination of data. Cutler (2000), describing the procedure as ‘taxon shopping’, stated: ‘If one believes that rate overdispersion is intrinsic to the process of evolution (Gillespie, 1991) ( … ), then restricting one's analysis to taxa which happen to pass relative-rate tests is inappropriate’.

#### Method 6: Local rates methods (Hasegawa *et al*., 1989; Uyenoyama, 1995; Rambaut & Bromham, 1998; Yoder & Yang, 2000)

Apply two or more local molecular clocks on the tree by using a common model that characterizes the rate constancy on each part of the tree. One substantial difficulty is to identify correctly the branches or regions of a tree in which substitution rates significantly differ from the others; this difficulty explains why several methods of the ‘local rates type’ exist. Usually, biological (e.g. similar life form, generation time, metabolic rate) or functional (e.g. gene function) information is used for their recognition. Also conspicuous patterns in the transition/transversion rate of some branches (Hasegawa *et al*., 1989), the known differential function of alleles (Uyenoyama, 1995), the branch lengths obtained by ML under the absence of a molecular clock (Yoder & Yang, 2000; Yang & Yoder, 2003), or the rate constancy within a quartet of two pairs of sister groups (Rambaut & Bromham, 1998) can be used for the definition of local clock regions.

Probably the best known example of a local rates method is the ML-based local molecular clock approach (Hasegawa *et al*., 1989; Yoder & Yang, 2000; Yang & Yoder, 2003). This method pre-assigns evolutionary rates to some lineages while all the other branches evolve at the same rate. The local molecular clock model therefore lies between the two extremes of a global clock (assuming the same rate for all lineages) and the models that assume one independent rate for each branch (described below). The method allows for the definition of rate categories before the dating, which is a crucial and sensitive step for this method. Two different strategies can be used to pre-assign independent rates: (1) definition of *rate categories*: the user pre-assigns rate categories to specific branches based on the branch length estimates obtained without the clock assumption. For example, three different rate categories are defined, one for the outgroup lineage with long branch lengths, another for a crown group with short branch lengths, and a third for all other branches; (2) definition of *rank categories*: divide the taxa into several rate groups according to taxonomic ranks, e.g. order, suborder, family and genus, based on the assumption that closely related evolutionary lineages tend to evolve at similar rates (Kishino *et al*., 2001; Thorne & Kishino, 2002). After the definition of rate categories, the divergence times and rates for the different branch groups are estimated by ML optimization. The local molecular clock model is implemented in baseml (part of the paml package; Yang, 1997) and the program provides standard errors for estimated divergence times. baseml does not (yet) allow for the specification of fossil calibrations as lower or upper limits on node ages, as in r8s or multidivtime (see below). So far, nodal constraints based on fossils have to be specified as fixed ages. The local molecular clock method implemented in baseml is now able to analyse multiple genes or data partitions with different evolutionary characteristics simultaneously and allows the branch group rates to vary freely among data partitions (since version 3.14). For example, the models allow some branches to evolve faster at codon position 1 while they evolve slower at codon position 2 (Yang & Yoder, 2003).

The quartet method (Rambaut & Bromham, 1998) implemented in qdate is one of the simplest local clock methods. The method works with species quartets built by combining two pairs of species, each of which has a known date of divergence. For each pair, a rate can be estimated, and this allows to estimate the date of the divergence between the pairs (age of the quartet). Because groups with undisputed relationships can be chosen, the method avoids problems of topological uncertainties. On the other hand, it is difficult to combine estimates from multiple quartets in a meaningful way (Bromham *et al*., 1998).

Another program that allows the user to assign different rates and substitution models to different parts of a tree is rhino by Rambaut (2001). This ML local clock implementation has been used so far mainly for comparing substitution rates of different lineages by using likelihood ratio tests (e.g. Bromham & Woolfit, 2004).

A fourth implementation of the local molecular clock approach has been realized in the software r8s (Sanderson, 2003b). It follows the *Langley–Fitch method* described above (method 3; Langley & Fitch, 1974), but instead of only using one constant rate of substitution, the method permits the user to specify multiple rate parameters and assign them to the appropriate branches or branch groups. After such a definition of rate categories, the divergence times and rates for the different branch groups are estimated by ML optimization.

A fifth program, beast (Drummond & Rambaut, 2003), uses Bayesian inference and the Markov chain Monte Carlo (MCMC) procedure to derive the posterior distribution of local rates and times. As the software does not require a starting tree topology, it is able to account for phylogenetic uncertainty. Additionally, it permits the definition of calibration distributions (such as normal, lognormal, exponential or gamma) instead of simple point estimates or age intervals.

### (2) Methods that estimate divergence times by *incorporating* rate heterogeneity

Methods that relax rate constancy must necessarily be guided by specifications about how rates are expected to change among lineages. Because rates and divergence times are confounded, it is not possible to estimate one without making assumptions regarding the other (Aris-Brosou & Yang, 2002). Recently, it has been questioned that divergence times without a molecular clock can be estimated consistently just by increasing the sequence lengths (Britton, 2005). However, available methods try to introduce rate heterogeneity on the basis of three different approaches: one is the concept of *temporal autocorrelation in rates* (Methods 7–11; Gillespie, 1991), another is the *stationary* process of rate change (Method 13; Cutler, 2000), and a third is the *compound Poisson* process of rate change (Method 14; Huelsenbeck *et al*., 2000). All methods estimate branch lengths without assuming rate constancy, and then model the distribution of divergence times and rates by minimizing the discrepancies between branch lengths and the rate changes over the branches. The methods differ not only in their models, but also in their strategy to incorporate age constraints (calibration points) into the analysis.

### (2a) Methods that model rate change according to the standard Poisson process and the concept of *rate autocorrelation*

An autocorrelation limits the speed with which a rate can change from an ancestral lineage to a descendant lineage (Sanderson, 1997). As the rate of substitution evolves along lineages, daughter lineages might inherit their initial rate from their parental lineage and evolve new rates independently (Gillespie, 1991). Temporal autocorrelation is an explicit a priori criterion to guide inference of among-lineage rate change and is implemented in several methods (Magallón, 2004). Readers who want to learn more about the theory of temporal autocorrelation are referred to publications by Takahata (1987), Gillespie (1991), Sanderson (1997) and Thorne *et al*. (1998).

#### Method 7: Nonparametric rate smoothing (Sanderson, 1997, 2003b)

By analogy to smoothing methods in regression analysis, the *nonparametric rate smoothing (NPRS) method* attempts to simultaneously estimate unknown divergence times and smooth the rapidity of rate change along lineages (Sanderson, 1997, 2003b). To smooth rate changes, the method contains a nonparametric function that penalizes rates that change too quickly from branch to neighboring branch, which reflects the idea of autocorrelation of rates. In other words: the local transformations in rate are smoothed as the rate itself changes over the tree by minimizing the ancestral-descendant changes of rate. Since the penalty function includes unknown times, an optimality criterion based on this penalty (the sum of squared differences in local rate estimates compared from branch to neighboring branch; least-squares method) permits an estimation of the divergence times. NPRS is implemented in r8s (Sanderson, 2003b) and treeedit (Rambaut & Charleston, 2002). With r8s, but not with treeedit, the user is able to add one or more calibration constraints to permit scaling of rates and times to absolute temporal units. A serious limitation of NPRS is its tendency to overfit the data, leading to rapid fluctuations in rate in regions of a tree that have short branches (Sanderson, 2003b). In r8s, but not in treeedit, two strategies to provide confidence intervals on the estimated parameters are available: (1) a built-in procedure that uses the curvature of the likelihood surface around the parameter estimate (after Cutler, 2000); and (2) the calculation of an age distribution based on chronograms generated from a large number of bootstrapped data sets. The central 95% of the age distribution provide the confidence interval (Efron & Tibshirani, 1993; Baldwin & Sanderson, 1998; Sanderson, 2003b). This robust, but time-consuming procedure can be facilitated by using a collection of Perl scripts written by Eriksson (2002) called the *r8s-bootstrap-kit*.

#### Method 8: Penalized likelihood (Sanderson, 2002, 2003b)

*Penalized likelihood (PL)* combines likelihood and the nonparametric penalty function used in NPRS. This semi-parametric technique attempts to combine the statistical power of parametric methods with the robustness of nonparametric methods. In effect, it permits the specification of the relative contribution of the rate smoothing and the data-fitting parts of the estimation procedure: a roughness penalty can be assigned as smoothing parameter in the input file. The smoothing parameter can be estimated by running a cross-validation procedure, which is a data-driven method for finding the optimal level of smoothing. If the smoothing parameter is large, the function is dominated by the roughness penalty, and this leads to a clock-like model. If it is low, the smoothing will be effectively unconstrained (similar to NPRS). So far, PL is only implemented in r8s (Sanderson, 2003b). As with NPRS, the user is able to add one or more calibration constraints to permit scaling of rates and times to real units. The same two strategies for providing confidence intervals on the estimated parameters as for the NPRS method are also available for PL.

#### Method 9: Heuristic rate smoothing (AHRS) for ML estimation of divergence times (Yang, 2004)

The *heuristic rate-smoothing (AHRS)* algorithm for ML estimation of divergence times (Yang, 2004) has a number of similarities with *PL* and the two Bayesian dating methods described above. It involves three steps: (1) estimation of branch lengths in the absence of a molecular clock; (2) heuristic rate smoothing to estimate substitution rates for branches together with divergence times, and classification of branches into several rate classes; and (3) ML estimation of divergence times and rates of the different branch groups. The AHRS algorithm differs slightly from PL: where Sanderson (2002) uses a Poisson approximation to fit the branch lengths, the AHRS algorithm uses a normal approximation of the ML estimates of branch lengths. Furthermore, the rate-smoothing algorithm in AHRS is used only to partition branches on each gene tree into different rate groups, and plays therefore a less significant role in this method than in PL. In contrast to the Bayesian approaches described above, AHRS optimizes rates, together with divergence times, rather than averaging over them in an MCMC procedure. Another difference to the Bayesian dating methods is that the AHRS algorithm does not need any prior for divergence times, which can be an advantage. On the other hand, it is not possible to specify fossil calibrations as lower or upper bounds on node ages, as in r8s or multidivtime— so far, nodal constraints based on fossils have to be specified as fixed ages. The AHRS algorithm is implemented in the baseml and codeml programs, which are parts of the paml package (since version 3.14 final; Yang, 1997). Those programs provide standard errors for estimated divergence times. As multidivtime*,* the AHRS algorithm implemented in baseml is able to analyse multiple genes/loci with different evolutionary characteristics simultaneously.

#### Method 10: Bayesian implementation of rate autocorrelation in multidivtime (Thorne *et al*., 1998; Kishino et al., 2001; Thorne & Kishino, 2002)

The *Bayesian dating method* implemented in multidivtime (Thorne *et al*., 1998; Kishino *et al*., 2001; Thorne & Kishino, 2002) uses a fully probabilistic and high parametric model to describe the change in evolutionary rate over time and uses the MCMC procedure to derive the posterior distribution of rates and times. In effect, the variation of rates is addressed by letting the MCMC algorithm assign rates to different parts of the tree, and then sampling from the patterns that are possible. By this way, the MC techniques average over various patterns of rates along the tree. The result is a posterior distribution of rates and times derived from a prior distribution. For the assignments of rates to different branches in the tree, rates are drawn from a lognormal distribution, and a hyperparameter ν (also called Brownian motion constant) describes the amount of autocorrelation. The internal node age proportions are described as a dirichlet distribution, which represents the idea to model evolutionary lineages due to speciation, but is not intended as a detailed model of speciation and extinction processes. In practice, the most commonly used procedure is divided into three different steps and programs, and is described in more detail in a step-by-step manual (Rutschmann, 2005). (1) In the first step, model parameters for the *F*_{84} + G model (Kishino & Hasegawa, 1989; Felsenstein, 1993) are estimated by using the program baseml, which is part of the paml package (Yang, 1997). (2) By using these parameters, the ML of the branch lengths is estimated, together with a variance–covariance matrix of the branch length estimates by using the program estbranches (Thorne *et al*., 1998). (3) The third program, multidivtime (Kishino *et al*., 2001; Thorne & Kishino, 2002), is then able to approximate the posterior distributions of substitution rates and divergence times by using a multivariate normal distribution of estimated branch lengths and running an MCMC procedure.

multidivtime asks the user to specify several priors, such as the mean and the variance of the distributions for the initial substitution rate at the root node or the prospective age of the root node. Additionally, constraints on nodal ages can be specified as age intervals. The program provides Bayesian credibility intervals for estimated divergence times and substitution rates. In contrast to NPRS and PL (implemented in r8s), multidivtime is able to account for multiple genes/loci with different evolutionary characteristics. Such a simultaneous analysis of multiple genes may improve the estimates of divergence times which are shared across genes.

#### Method 11: phybayes (Aris-Brosou & Yang, 2001, 2002)

The phybayes program (Aris-Brosou & Yang, 2001, 2002) is similar to the multidivtime Bayesian approach described above. It also uses a fully probabilistic and high parametric model to describe the change in evolutionary rate over time and uses the MCMC procedure to derive the posterior distribution of rates and times. But the method is more versatile in terms of possibilities of defining the prior distributions, as it allows for the usage of models that explicitly describe the processes of speciation and extinction. For the rates of evolution, it offers a choice of six different rate distributions to model the autocorrelated rate change from an ancestor to a descendent branch (lognormal, ‘stationarized’-lognormal, truncated normal, Ornstein–Uhlenbeck process, gamma and exponential, plus the definition of two model-related hyperparameters), whereas in multidivtime, rates are always drawn from a lognormal distribution. The prior distribution of divergence times is generated by a process of cladogenesis, the generalized birth and death process with species sampling (Yang & Rannala, 1997), a model that assumes a constant speciation and extinction rate per lineage (multidivtime uses a dirichlet distribution of all internal node age proportions to generalize the rooted tree structure; Kishino *et al*., 2001). In contrast to multidivtime, it is not possible to analyse multiple genes simultaneously with phybayes, and the program does not allow for an a priori integration of nodal constraints (calibration points).

### (2b) Methods that model rate change with *other concepts* than rate autocorrelation

#### Method 12: Bayesian implementation of rate variation in beast (Drummond *et al*., submitted)

Similar to phybayes, the variable rate methods implemented in beast use Bayesian inference and the MCMC procedures to derive the posterior distribution of rates and times. In contrast, in addition to the autocorrelated models like the one implemented in multidivtime, a range of different, novel models have been implemented, where the rates are drawn from a distribution (with various distributions on offer; Drummond *et al*. submitted). These models have a couple of interesting features: (1) the parameters of the distributions can be estimated (instead of being specified), and (2) the correlation of rates between adjacent branches can be tested (if > 0, this would indicate some inherence of rates). Another unique feature of the software is that it does not require a starting tree topology, which allows it to account for phylogenetic uncertainty. Additionally, beast permits the definition of calibration distributions (such as normal, lognormal, exponential or gamma) to model calibration uncertainty instead of simple point estimates or age intervals. For the other, non-calibrated nodes, there is no specific process that describes the prior distribution of divergence times (they are uniform over a range from 0 to very large). beast allows the user to simultaneously analyse multiple data sets/partitions with different substitution models, and provides Bayesian credibility intervals.

#### Method 13: Overdispersed clock method (Cutler, 2000)

While all methods described so far assume fundamentally that the number of substitutions in a lineage is Poisson-distributed, the overdispersed clock model (Cutler, 2000) assumes that the number of substitutions in a lineage is stationary. Unlike a Poisson process, the variance in the number of substitutions will not necessarily be equal to the mean. Under this model, which treats rate changes according to a stationary process, ML estimates of divergence times can be calculated. The method is implemented in a c program, which is available directly from the author (Cutler, 2000). It is possible to incorporate multiple calibration points.

#### Method 14: Compound Poisson process method (Huelsenbeck *et al*., 2000)

As all methods described above (with the exception of the *overdispersed clock model*; Cutler, 2000), the *compound Poisson process method* uses a model that assumes that nucleotide substitutions occur along branches of the tree according to a Poisson process. But in addition to the other models, it assumes that another, independent Poisson process generates events of substitution rate change. Therefore, this second Poisson process is superimposed on the primary Poisson process of molecular substitution (hence the name *compound Poisson* process), and introduces changes (in form of discrete jumps) in the rate of substitution in different branches of the phylogeny. Rates on the tree are then determined by the number of rate-change events, the point in the tree where they occur, and the magnitude of change at each event (Huelsenbeck *et al*., 2000; Magallón, 2004). These parameters are estimated by using Bayesian inference (MCMC integration). One of the main advantages of treating rate variation as a *compound Poisson* process is that the model can introduce rate variation at any point of the phylogenetic tree; all other methods assume that substitution rates change only at speciation events (nodes; Huelsenbeck *et al*., 2000). The method is implemented in a c program, but it is not meant for being available for the community so far as there is no documentation (yet).