• Nobuyuki Kutsukake,

    1. Department of Evolutionary Studies of Biosystems and Hayama Center for Advanced Studies, The Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan
    2. PRESTO, Japan Science and Technology Agency, 4-1-8 Honcho Kawaguchi, Saitama, 332-0012, Japan
    3.  E-mail:
    Search for more papers by this author
  • Hideki Innan

    1. Department of Evolutionary Studies of Biosystems and Hayama Center for Advanced Studies, The Graduate University for Advanced Studies, Hayama, Kanagawa 240-0193, Japan
    2. PRESTO, Japan Science and Technology Agency, 4-1-8 Honcho Kawaguchi, Saitama, 332-0012, Japan
    Search for more papers by this author


Phylogenetic comparative methods (PCMs) have been used to test evolutionary hypotheses at phenotypic levels. The evolutionary modes commonly included in PCMs are Brownian motion (genetic drift) and the Ornstein–Uhlenbeck process (stabilizing selection), whose likelihood functions are mathematically tractable. More complicated models of evolutionary modes, such as branch-specific directional selection, have not been used because calculations of likelihood and parameter estimates in the maximum-likelihood framework are not straightforward. To solve this problem, we introduced a population genetics framework into a PCM, and here, we present a flexible and comprehensive framework for estimating evolutionary parameters through simulation-based likelihood computations. The method does not require analytic likelihood computations, and evolutionary models can be used as long as simulation is possible. Our approach has many advantages: it incorporates different evolutionary modes for phenotypes into phylogeny, it takes intraspecific variation into account, it evaluates full likelihood instead of using summary statistics, and it can be used to estimate ancestral traits. We present a successful application of the method to the evolution of brain size in primates. Our method can be easily implemented in more computationally effective frameworks such as approximate Bayesian computation (ABC), which will enhance the use of computationally intensive methods in the study of phenotypic evolution.

Interspecific (or interpopulation) variation in phenotypes provides useful insight for understanding the patterns and processes of adaptive evolution. Because phylogenetically close species commonly exhibit similar phenotypic traits, consideration of the effect of phylogeny is indispensable in the analysis of interspecific phenotypic variation. Because the first formal attempt to consider the effect of the nonindependence of species in phylogeny, phylogenetic comparative methods (PCMs) have been developed and used in a wide range of studies in comparative biology and evolutionary ecology (Felsenstein 1985; Harvey and Pagel 1991; Martins 1996; Freckleton 2009; Nunn 2011).

In PCMs, most of the methods and conceptual frameworks used to analyze continuous phenotypic traits are based on a simple evolutionary mode, Brownian motion (BM) (Felsenstein 1985). BM corresponds to neutral evolution or the process of genetic drift (although selection toward a continuously fluctuating optimum can also be plugged in). The validity of using such a simple model for phenotypic evolution has often been questioned (reviewed in Losos 2011). For example, the BM model assumes that the evolutionary rate is constant across phylogeny, which may be an unrealistic assumption in nature. Therefore, the BM model has been extended to allow greater flexibility. Modifications have included allowing different rates of BM (rate parameter or variance of BM) within a phylogeny (McPeek 1995; O’Meara et al. 2006; Thomas et al. 2006, 2009; Revell 2008; Eastman et al. 2012; Revell et al. 2012; Slater et al. 2012), applying BM with accelerating or decelerating rates (Blomberg et al. 2003; Harmon et al. 2010), and adjusting BM with punctuated evolution (Bokma 2008), although those models do not explicitly incorporate selection. One study modeled BM with a phenotypic mean that had an evolutionary trend corresponding to directional selection (Hunt, 2006). Other extensions incorporated the effect of stabilizing selection for a single adaptive optimum in a particular branch or for multiple adaptive optima in different branches or clades within a phylogeny by using the Ornstein–Uhlenbeck (OU) process (Martins 1994; Hansen 1997; Butler and King 2004). However, actual phenotypic evolution does not always fit the BM or OU models (Estes and Arnold 2007). Based on discrepancies between real processes of phenotypic evolution and the existing evolutionary models, more flexible evolutionary models need to be developed in the framework of PCM.

The primary reason for using BM or the OU process to model phenotypic evolution is their mathematical tractability in the likelihood framework (Hansen 1997; Pagel 1999; O’Meara et al. 2006; Thomas et al. 2006, 2009). By using maximum-likelihood (ML) methods, it is possible to estimate evolutionary parameters of interest. However, difficulties in analytical likelihood computations have precluded the use of more complex and realistic evolutionary modes in PCMs. To overcome those limitations, we developed a new and, to the best of our knowledge, very flexible and comprehensive model to infer different modes and rates of phenotypic evolution within a single phylogeny. In this model, likelihood calculations are not straightforward; therefore, we took advantage of computer simulations. Recent increases in computational power allowed us to simulate a huge number of replications of phenotypic evolution given any mode and rate of evolution, which made it possible to use simulation-based evaluations of likelihood (reviewed in Marjoram and Tavare 2006). Furthermore, we extended the method so that the approximate Bayesian computation (ABC) approach can be used (see Slater et al. 2012 for an application of ABC to PCMs).

In our approach, evolutionary change in a phenotype is simulated within a population genetics framework. By using a simulation-based likelihood approach, the following advantages may be gained. (1) Our method allows a larger number of parameters to be incorporated compared with previous models, such as branch- or clade-specific evolutionary parameters. This is particularly useful for testing for the presence of selection that occurs locally within the phylogeny. (2) By simultaneously estimating evolutionary parameters, the model allows us to infer the ancestral states of internal nodes. Methods to estimate ancestral states have been developed in previous studies of PCM (reviewed in Nunn 2011), but the estimation has been difficult when phenotypic evolution does not follow BM (Oakely and Cunningham 2000; Polly 2001; Finarelli and Flynn 2006; Alberts et al. 2009). Our framework provides a method for inferring ancestral states under a model of phenotypic evolution that does not necessarily follow BM. (3) Intraspecific variation can be incorporated when evaluating likelihood. Recently, several methods have been developed to cope with intraspecific variation (e.g., Ives et al. 2007; Felsenstein 2008; Hadfield and Nakagawa 2010). (4) Our method computes likelihoods of phenotypic data directly rather than through summary statistics. By doing so, it is possible to avoid the controversial issue of which and how many summary statistics should be used in ABC studies (Beaumont et al. 2002; Csillery 2010; Leuenberger and Wegmann 2010). (5) This model can be applied to evolution of both continuous and discrete traits.



We first introduce simulation-based algorithms for estimating evolutionary parameters given a species tree (Ψ). The algorithms essentially follow Marjoram et al. (2003) and Marjoram and Tavare (2006) (also see references therein and Slater et al. 2012). The species tree Ψ consists of information on tree shape and the lengths of all of the branches in the topology, which are assumed to be known. The tree shape represents the topology, that is, the branching pattern from the most recent common ancestor (MRCA) to all descendant species that are being considered. The number of species is denoted by n. Let τi be the length of the ith branch on the tree in a given unit (e.g., time or genetic distance). Let θ0 be the focal phenotype of the MRCA, and we can simulate its evolutionary changes to the n descendant species, Θ= {θ1, θ2, θ3, … θn}. By simulating a number of patterns of phenotypic evolution given a certain parameter set, we can evaluate the likelihood of the parameter set by comparing the simulation results with observed data, Ω= {Ω1, Ω2, Ω3, … Ωn}. Suppose that, given a parameter set (Λ), a large number (say, J replications) of simulations are performed. Then, the likelihood of Λ is approximately given by


where Θj represents a simulated phenotype dataset for the jth replication. Pr(Ω|Θj) is the probability of the observed data given Θj, which can be computed as a joint probability:


where θi is the simulated value of the ith species. This form is rigorous and flexible enough to incorporate information regarding intraspecific variation. However, this computation may not be feasible in many cases. Therefore, we propose a conventional method that uses composite probability. That is, equation 2 is replaced by


where θi,j is the simulated value of the ith species in the jth replication. This equation should work well when the sampled species are reasonably diverged. Then, we might be able to approximate Pr(Ω|θ) by a normal distribution with mean θ. Again, it is difficult to determine the standard deviation (SD) of the normal distribution (σ), but it might be reasonable to base σ on the observed distributions of phenotypic values for each species.

In our framework, Pr(Ω|θ) does not need to be a normal distribution. Any function can be used for Pr(Ω|θ), which makes it possible to apply this method to a trait that does not follow a normal distribution. By using this algorithm with a substantial number of simulation replicates, likelihoods can be computed for a wide range of parameter sets, and ML estimates of the parameters can be obtained.


We also developed an ABC algorithm (Marjoram et al. 2003; Marjoram and Tavare 2006; Slater et al. 2012) that is flexible enough to take prior information into account. The parameter set (Λ) can also be inferred in the ABC framework, and the process can be implemented as follows.

  • 1Determine prior distributions for all parameters in Λ.
  • 2For each parameter, generate a random value from the prior distribution. A random set of the parameters is denoted by Λ′.
  • 3Simulate phenotype data (Θ′) using Λ′.
  • 4Accept Λ′ with a probability that is proportional to Pr(Ω|Θ′).
  • 5Go back to step (2) until a large number of accepted Λ's has accumulated.

The result includes posterior distributions for all of the parameters in Λ as a list of accepted Λ's.


The algorithms introduced above are flexible enough to incorporate any evolutionary model, from the commonly used BM and OU models to complicated models that one wishes to assume as alternatives. Our framework does not specify the genetic background of phenotypic evolution, but it is possible to incorporate both a single-locus system and a multilocus system by setting mutational effects on phenotype when desired. The framework can also be used when the genetic background of a target trait is unknown, which is generally the case in most phenotype studies.

As a first example, we describe a method for simulating phenotypic evolution in a species tree. This method essentially follows the BM model, but it is more flexible because the effect of selection can be incorporated. Changes in phenotypic values on individual branches can be simulated as follows. We consider this process within a population genetics framework. In brief, mutations occur randomly at a certain rate (u), and each mutation changes the phenotypic value. We assume that changes in phenotypic values occur because of single mutations that follow an arbitrary distribution, φ. This can be any distribution, from a discrete distribution (corresponding to a binary phenotypic trait) to a continuous distribution, depending on the nature of the focal phenotype. Mutations occur and either increase or decrease the phenotypic value at the rates u+ and u, respectively, with the assumption that their absolute phenotypic effects follow φ+ and φ, respectively. In many cases, it would be biologically reasonable to assume that u+=u and φ+ because they simply reflect the mutational process. Then, natural selection works to determine whether these mutations fix in the species or go extinct. Only fixed mutations contribute to long-term evolution, which is our main interest. If a phenotypic change is completely neutral, the fixation rate is 1/2N in a diploid species, where N is the population size. The fixation rate is higher than 1/2N for advantageous mutations and lower for deleterious mutations. We are primarily interested in the number of fixed mutations; therefore, we let μ+ and μ be the rates of fixed mutations that increase and decrease the phenotypic value, respectively. Thus, the effect of adaptive (directional) selection is included in μ. If there is no selection on the phenotype, we expect μ+.

Under this framework, we need to set μi+, μi, φi+, and φi to simulate phenotypic evolution on the ith branch. As mentioned above, we assume φ+ to be constant throughout the study. Neutral evolution can be designated by assuming that μi+=μi, which is essentially identical to BM. We refer to the model where μi+=μi is assumed to be true on all branches as the neutral random BM (NRBM) model.

The simulation of neutral phenotypic evolution on the ith branch with length τi is implemented as follows:

  • 1Determine si+ and si, the numbers of fixed mutations that increase and decrease the phenotypic value, respectively. si+ (si) is a random integer from a Poisson distribution with mean μi+τi (μiτi).
  • 2Determine the effect of each mutation on the phenotype, which is a random value from the distribution φi. Then, by adding the effects of all of the mutations, compute the phenotypic value at the bottom node of the branch.

Using this procedure, we can simulate the evolution of the phenotype on the entire tree, from the common ancestor (θ0) to all of the descendant species, Θ= {θ1, θ2, θ3, … θn}. It should be noted that because the effect of the mutation rate, u, is included completely in the evolutionary rate, μ, together with the effect of selection (see below), the system does not directly involve u.

The incorporation of selection is straightforward using this system. In contrast to the NRBM model, models that include selection on at least one branch are referred to as selection random BM (SRBM) models. If selection favors mutations that increase the phenotypic value, then we expect μi+ > μi (assuming φ+); the reverse is also assumed to be true. It is also possible to simulate changes in the phenotypic value using the OU model. In this model, the expected evolutionary change in the phenotypic trait (inline image) because of a single mutation is given by


where inline image is the phenotypic value before the mutational change, and α and ß represent the strength of stabilizing selection and the adaptive optimum state, respectively. In the original description of the OU model, the right-hand side of equation 4 had a second term that added a random factor in the stochastic process. As an alternative, we introduce a modified version of the OU model, called the random OU (ROU) model. Our model is similar to the OU model; the critical difference is that the random process in our NRBM is employed instead of the original random factor. In practice, we set the condition that inline image is equal to the phenotypic value before the mutation, which determines the expected change in the phenotype as calculated by equation 4. Then, a random factor is added such that random changes follow φ+ with probability μ+/(μ+) and φ with probability μ/(μ+).

We incorporated ROU in our framework because the OU model has been commonly used in previous studies using PCMs. However, the estimation of α may be meaningless especially when we set dt in equation 4 equals 1. In this case, it is reasonable to consider that 0 < α≤ 1 is the possible range of this parameter. When α is equal to or close to 1 within this range, the phenotype value is expected to instantaneously reach the adaptive optimum (β) through one or a few phenotypic changes. Therefore, if the model assumes a gradual approach toward the adaptive optima, the condition α<<1 should be met. If α >1, the phenotypic value will oscillate and gradually separate from the adaptive optimum. Finally, as is illustrated in an imaginary example (Fig. 1), in our framework, α does not affect the outcome of OU-like phenotypic evolution toward the adaptive optimum unless α is extremely small. This is because the phenotypic value that is obtained by the OU process is largely determined by ß, and α only determines its trajectory. Consequently, the likelihood values of models do not differ within a wide range of α. Therefore, we did not quantify α intensively, as we did for other parameters, and only used α to test for the presence of stabilizing selection against neutral evolution (see below).

Figure 1.

An imaginary example of an evolutionary trajectory under ROU with different α values. The phenotypic value (y-axis) reached an adaptive optimum (set to 1000) from a phenotypic value at MRCA (set to 0) after a given evolutionary time (x-axis, set to 150 time units) under different values of α (indicated on each line).

In addition to these commonly used models, one can simulate evolutionary changes in phenotypes using any model. In other words, as long as simulations can be conducted, any model can feed into the likelihood and ABC algorithms described herein. This even allows us to use very complicated selection models (e.g., a model that incorporates different modes and intensities of selection that are assigned to different branches), but parameter inference in models with large numbers of parameters is computationally intensive.


We applied our method to examine the evolution of brain size in four great ape species, including humans, whose brain size has enlarged rapidly relative to other species (Fig. 2). We assumed that the phylogenetic relationships among the four species and branch lengths (actual time) were accurately given by the software “10ktrees” (Arnold et al. 2010). We used brain size data reported by Bauchot and Stephan (1969) after removing six extreme outliers (Fig. 2). It is a common practice to log transform this type of data; therefore, we analyzed the data using both untransformed and log-transformed data. Because we obtained essentially identical results, we only show the results obtained using untransformed data to explain the algorithm.

Figure 2.

Inter- and intraspecific variation in brain volume in four species of great apes, including humans, together with their phylogeny. Data are shown in the Nexus format: ((1:8.65,(2:6.18,3:6.18):2.48):6.48,4:15.13); [1] gorilla, Gorilla gorilla (mean = 467.12, SE = 7.28, n= 103); [2] human, Homo sapiens (mean = 1321.27, SE = 13.76, n= 81); [3] chimpanzees, Pan troglodytes (mean = 348.08, SE = 3.46, n= 179); [4] orang-utan, Pongo pygmaeus (mean = 334.76, SE = 8.43, n= 71).

We estimated the parameters in three different evolutionary models: NRBM, SRBM, and ROU. We used NRBM as a null model in which we assumed a constant evolutionary rate on all branches, that is, μi+=μi0, where μ0 is the baseline evolutionary rate. The model also assumed that φi+=φi0 for all branches, where φ0 is given by an exponential distribution with mean δ= 1. The use of an exponential distribution corresponds to a well-known pattern in which large mutational effects that affect the phenotypes of quantitative traits are rare, and most changes are small (e.g., Orr 2005; Yang 2006). Thus, under this simple setting, the NRBM model had two parameters, the baseline evolutionary rate (μ0) and the phenotypic value of the MRCA (θ0). For the two alternative models, SRBM and ROU, because we had predicted a priori that brain size had increased in the human lineage after splitting from the common ancestor with chimpanzees, an accelerated rate of brain size evolution was assumed only in the external branch that led to humans, and a constant rate of neutral evolution was assumed in all of the nonhuman branches (i.e., μi+=μi0). The models also assumed that φi+=φi0 for all nonhuman branches. For the SRBM model, directional selection was incorporated in the human lineage by introducing a new parameter, k, that was defined such that the upper evolutionary rate increased by a factor k (i.e., μi+0k) and the rate in the other direction decreased by a factor k (i.e., μi0/k). When k= 1, the situation is identical to the neutral case. This version of the SRBM model had three parameters, θ0, μ0, and k. The number of free parameters in the ROU model was four because it included two selection-related parameters (α and β) in addition to θ0 and μ0.

We first used the simulation-based likelihood approach. Under the null NRBM model, wide ranges of the two free parameters (θ0 and μ0, ranges shown in Table 1) were considered. Log-likelihood values for all possible pairs of the two parameters were computed with at least 100,000 replicate simulations for each parameter set. In evaluating Pr(Ω|θ) in equation 2, we used a normal distribution for the brain size in each species:


where Ωi and σi are the mean and SD of the observed phenotypic value of the ith species (i= {1,2,3,4}, corresponding to the four species), respectively. Figure 2 demonstrates that the intraspecific variation in the four species can be approximated by a normal distribution.

Table 1.  Summary of the parameters used in this study.
ParameterRange (increment)NRBMSRBMROU
Common parameters in all models
  Most recent common ancestor (MRCA) θ0 330–1330 (5) 490 365 365
 Baseline evolutionary rate per million yearsμ01–10,000 (5)4075155
Parameters specific to selection models
 Directional selection k 1.0–20.00 (0.1)10.3
  Strength of stabilizing selection in OU process   0–1.0 (0.1)    0.8
 Adaptive optimum in OUβ330–1500 (5)1325
Maximum log likelihood −30.150 −22.287 −21.857

All of the programs were written in the C language (available upon request), and the computations were performed using a MacPro (OS 10.6.7, 2 × 2.93 GHz Quad-Core Intel Xeon). Note that the ML estimation took a very long time; it took approximately 4 (for μ0= 1) to 150 min (for μ0= 10,000) to calculate the likelihood of one parameter set with 100,000 replications of the simulation in the BM model.

The obtained log-likelihood surface is presented in Figure 3, which shows that the maximum log-likelihood (MLL =−30.150) was given when θ0= 490 and μ0= 4075 (also see Table 1). This estimate is conditional on our arbitrary assumption that δ, the mean of φ, equals 1. To investigate the effect of this arbitrary choice, we repeated the ML computation with different values of δ (0.5, 5, and 10). We found that δ2 and μ0 were strongly confounded; values of δ2μ0 were nearly identical, approximately 4025, 4750, and 4600 for the three different δ values. The product of δ2 and μ0 should represent the extent of random fluctuations in the phenotypic values. Therefore, unless we are directly interested in μ0, we can treat δ as a nuisance parameter, even when we do not know δ, and an arbitrary choice of δ (unless biologically unreasonable) might work reasonably well because its effect can be cancelled out by μ0.

Figure 3.

Two-dimensional log-likelihood surface in NRBM. The arrow points to the parameter set with the maximum likelihood.

We also performed the ML estimation using the two alternative models, the SRBM and ROU models. In addition to θ0 and μ0, the former model includes one more parameter (k), and the latter includes two more (α and β). In the SRBM model, k varied between very small values (i.e., k∼ 0 so that μi >> μi+) and 20.0 (i.e., μi+= 20; μ0 is 400 times more likely to occur than μi0/ 20). Note that the SRBM model in which k= 1 is the same as the NRBM model. We found the MLL =−22.287 at (θ0, μ0, k) = (365, 15, 10.3) (Table 1). We again examined the effect of our arbitrary choice of δ. Additional simulations with different δ values (0.5, 5, 10) indicated that in this selection model, although δ strongly affected the estimates of μ0 and k, their product, δμ0k, remained almost constant (δμ0k= 153.75, 157.5, and 142.0). This is easy to understand if we notice that the most meaningful quantity is the expected change in the phenotypic value on the lineage under selection, which is denoted by Δsel. In our SRBM, Δsel is given by


With our conventional setting (μ+=μ and δ+=δ),Δsel is given by δμ(k− 1 /k)τ, and if k >> 1, Δsel is approximately given by δμkτ. Therefore, it makes sense that δμk remained constant in our simulations. For our estimates, this quantity, Δsel, is given by 1 × 15 × 10.3 × 6.18 (million years) = 954.81.

In the ROU model, MLL =−21.839 was found at (θ0, μ0) = (365, 5) with (α, β) = (0.8, 1325) (see Table 1), but likelihood was very similar among models with different values of α (see Fig. 1). As was mentioned above, Δsel is generally determined by the difference in β, unless α is very small, and the expected phenotypic value at the beginning of the lineage under selection, which is denoted by θ″. If θ″ is replaced by θ0, Δsel is 1325 – 365 = 960, which is in good agreement with the SRBM model (Δsel= 954.81).

Overall, the MLL values in the two selection models were much larger than in the neutral model, which supports the hypothesis that selection occurred specifically in the human lineage. Model fit was tested by a likelihood-ratio test. When comparing NRBM and SRBM, the difference in the ML (after log transformation) was 7.86, which is about two times larger than the value at the 5% cut-off value by the χ2 distribution (df = 1). The observed difference was highly significant (P < 0.001), indicating that the SRBM explained the data much better than the NRBM did. A closer investigation of the simulation results indicated that the 95% confidence interval for k under the null SRBM was as high as k= 5.2 when μ0= 15 was fixed (i.e., k×μ0= 78), indicating that our ML estimate of k (10.3) was much larger than could be detected in our framework.

A similar result was obtained from the comparison between NRBM and ROU. The ML improved significantly in the ROU model (χ2df= 2= 16.60, P < 0.001). The likelihood of the ROU model was not statistically different from that of the NRBM when α was less than 0.018. This result has two implications: (1) phenotypic evolution toward β was too weak to be distinguished from neutral phenotypic evolution when α was below this value, and (2) values of α larger than 0.018 do not create a difference in likelihood between ROU and NRBM (see Fig. 1).

The estimated evolutionary rate was much higher for the NRBM (μ0= 4075) than for the two selection models (μ0= 15 and 5). This is probably because the human data, which exhibited an extreme deviation from data for the other three species, had to be accounted for by mutation alone in the NRBM. In contrast, in the two alternative models, this exceptional increase in brain size could be explained by selection; therefore, we obtained much lower estimates of the baseline evolutionary rate. The effects of selection were evaluated separately through the selection parameters k in the SRBM and α and β in the ROU model. For the same reason, the estimates of θ0 in the two alternative models were much lower (θ0= 365 for both SRBM and ROU) than those in the NRBM (θ0= 490).

The difference between the two alternative models in the estimated evolutionary rate for the human lineage was simply due to the nature of the selection models. In the ROU model, the phenotypic value quickly approached the optimum, whereas the process involved an almost linear increase in the SRBM. Therefore, for a very drastic increase in the phenotype value, the ROU requires a smaller number of changes than the SRMB does. This is why we proposed that the most important quantity to represent the effect of selection was Δsel, which was consistent between the two selection models.

Next, we applied our ABC algorithm to the same data. We only used the alternative models because the NRBM is simply a special case of the alternatives (the SRBM model with k= 1 or the ROU model with α= 0). We assumed that the prior distributions of all of the parameters were uniform; therefore, we expected to obtain results that were essentially identical to those from the likelihood approach. As we expected, the peaks of the posterior distributions roughly agreed with the ML estimates (Figs. 4, 5), with the exception of k in the SRBM. The inferred value of k in the ML approach was 10.3, but the mode of the posterior distribution of k was 1.2 with the ABC method. This discrepancy occurred because μ0 and k are strongly confounded. As shown in Figure 4C, the posterior distribution of the μ0k pair falls roughly along the curve μ0k= 170 (dashed line in Fig. 3C), and the most accepted parameter set was located at (μ0, k) = (15, 10.8), which is almost identical to the ML estimate (15, 10.3). In the ROU, the posterior distribution of α was almost flat, which was similar to the likelihood curve in the ML approach. As was mentioned earlier, estimates of α are quite meaningless in our new framework (also see Fig. 1).

Figure 4.

Posterior distributions of parameters in the SRBM obtained by the ABC method with 50,000 accepted parameter sets. (A) Results with uniform prior distributions. (B) Results with nonuniform prior distributions. In each panel, the line shows the prior distribution of each parameter. The mode of each estimated parameter value and its 95% credible interval are also shown. (C) Relationship between the evolutionary rate (μ0) and the strength of directional selection (k) on the value of inline image.

Figure 5.

Posterior distributions of parameters in the ROU obtained by the ABC method with 50,000 accepted parameter sets. (A) Results with uniform prior distributions. (B) Results with nonuniform prior distributions.

There are several ways to statistically compare different models and to judge whether the null model (NRBM) should be rejected in favor of alternative models in our ABC framework. One method is to look at the posterior probability of support for the alternative model over the null model (see Slater et al. 2012). In the case of the SRBM, all of the ks in the accepted data were greater than 1, which suggests robust support for the SRBM over the NRBM. To handle confounding effects, Δsel is more appropriate for comparing different models in the ABC approach, rather than looking at each parameter separately. This is particularly effective in the case of the ROU because it is not easy to calculate the posterior probability of support based on α, which should always be 0 or larger. Figure 6 shows the posterior distributions of Δsel for the SRBM and ROU models. All of the accepted data for Δsel in the SRBM and 99.99% of the data in the ROU were greater than 0, which can be considered strong evidence for selection. Even more conservatively, the posterior distribution of Δsel can be compared to its null distribution under the NRBM (hereafter, called Δnull). Δnull was obtained through simulation with 1000 replications using the posterior distribution of μ0. We found that the overlap between Δsel and Δnull was very small, only 0.02% for SRBM and 4.4% for ROU, which again provided strong evidence for selection.

Figure 6.

Posterior distributions of Δsel in the SRBM and ROU models.

Thus, the algorithm worked reasonably well with simple uniform prior distributions for all free parameters. However, the acceptance rate was very low, and therefore the simulation took a very long time (13 and 37 h for the SRBM and ROU models, respectively). The performance of the ABC algorithm could be improved if appropriate prior distributions could be determined. Figures 4 and 5 show that when uniform prior distributions were used, we obtained wide posterior distributions of μ0 and θ0. This is most likely because simulated data that had a high evolutionary rate in combination with larger values of θ0 and weak selection parameters were occasionally accepted (see above). This problem can be solved if we take advantage of our a priori prediction: the effect of selection is negligible except in the human lineage. If this holds true, we may be able to apply the NRBM model to the three nonhuman species. This analysis would provide better inference for θ0 and μ0 because none of the lineages would involve strong selective forces. In this way, we obtained posterior distributions of μ0 (mode: 25; 95% credible interval: 20.44–8820.87) and θ0 (mode: 350; 95% credible interval: 336.52–444.08) for the three nonhuman species. These distributions were much narrower than the ones obtained from the four-species models (Figs. 4 and 5). These posterior distributions were then used as prior distributions when using the ABC algorithm with the SRBM that included all four species. With this approach, the computational time was reduced significantly, and as a result, the effect of selection was more pronounced in the human lineage (Fig. 4). In the ROU model, the computational time was also substantially reduced (4.2 and 2.1 h for SRBM and ROU, respectively), although the posterior distributions did not change dramatically. This is because, as was mentioned above, this model could predict a very quick increase in the phenotypic value with only a small number of substitutions, even under weak stabilizing selection (α). Therefore, the effect of selection was well evaluated regardless of the prior distribution of μ0. Again, 100% of Δsel for SRBM and 99.99% for ROU were positive. Compared with Δnull, the posterior probabilities of support for SRBM and ROU were 99.02% and 95.10%, respectively, which were still sufficient to reject the NRBM model.


Here, we report the power and error rates for our frameworks. As it is possible to calculate error rates and power in various settings, we evaluated the power of our approach in a relatively simple setting (Fig 7). We created symmetrical and balanced phylogenies with different numbers of species (4, 8, 16, and 32). Then, using the SRBM, we considered a case in which positive selection worked on one external branch. We performed 100–1000 replicates of simulations. For each replication, we applied both the SRBM and the NRBM in the ML and ABC frameworks. We explored the power of this approach using various levels of selection (i.e., k). In the ML framework, the power was measured as the proportion of simulation replications that rejected the null model at the 95% level. In the ABC framework, we concluded that the SRBM was supported more than the NRBM if the posterior distribution of k did not overlap to 1. As a result, power increased with k in both the ML and ABC frameworks (Fig. 7). Power was higher with a larger number of species in the ML framework, but it was generally comparable among models with different numbers of species. We did not investigate the power of the ROU because it is meaningless to investigate different α values when using our approach (see above and Fig. 1).

Figure 7.

(A, B) Power analysis for the SRBM with different k values (shown on the x-axis) using the ML (A) and ABC (B) approaches. The lengths of all of the branches were assumed to be constant (5 time units), and the SDs of all of the tip phenotypes were set to 5. μ0 for the simulated data was set to 10. We fixed MRCA to 0 because this is a baseline value for all of the phenotypic values; therefore, it is less likely that different MRCA values will affect the statistical performance. (C–F) Error rates for SRBM (C: ML; D: ABC) and ROU (E: ML; F: ABC) using an imaginary dataset that included different numbers of species (n= 4, 8, 16, 32). The influence of each parameter was investigated by changing the value of the parameter while holding the other parameters fixed. μ0 is shown on the x-axis.

To investigate error rates, we performed power simulations with the assumption that k= 1 (i.e., NRBM) but with different evolutionary rates (μi+=μi= 1, 5, 10, 20, 50). The calculation of the error rate for the ROU in the ABC framework was not straightforward. Ideally, we should test whether a posterior distribution of α overlaps with 0 (i.e., no stabilizing selection). However, a negative value of α as a prior distribution is biologically questionable; therefore, it is impossible to use this approach. Instead, we examined the posterior distribution of Δsel and determined whether this distribution overlapped 0. We found that the type-I error rate was less than 5% in both the ML and ABC frameworks (Fig. 7), and the number of species in the model did not affect this result. These results suggest that our framework is conservative when rejecting the NRBM.

We also investigated the error rates and power assuming the phylogeny used for the primate brain analysis by conducting additional simulations in the same way as reported above. As a result, we found that the estimated parameter sets had low error rates and high power (see Supporting information for details).


This article introduces a new framework that can be used to infer evolutionary parameters under different models of phenotypic evolution. The primary advantage of this framework lies in its flexibility; it can incorporate any evolutionary model with a number of parameters of interest. This flexibility stems from a recent dramatic increase in computational power, which makes it possible to apply simulation-based computations of likelihood. That is, likelihood can be obtained without analytic expressions as long as simulation is possible (reviewed in Marjoram et al. 2003 and Marjoram and Tavare 2006; Slater et al. 2012). This framework will be useful for analyses of complex phenotypic evolution on phylogeny that cannot be analyzed using traditional PCMs. Similar simulation-based likelihood approaches are commonly used in studies of molecular evolution and population genetics (Marjoram and Tavare 2006), but so far they have been less commonly used in PCMs of phenotypic evolution.

The greatest advantage of the simulation-based approach is that we are able to incorporate very realistic models of phenotypic evolution. Evolutionary changes in the phenotype are modeled based on the framework of population genetics. Our approach makes it possible to consider many parameters that are related to evolutionary modes and tempos. Increases in the number of parameters, which were confined to interactions among δ,μ, and k in this study, may result in confounding interactions among parameters. In such cases, it will be important to test the evolutionary hypothesis by considering such confounding interactions among parameters, but not by treating each parameter separately. Our framework can also include branch- or clade-specific BM evolution, which has been examined in previous studies (O’Meara et al. 2006; Thomas et al. 2006, 2009; Venditti et al. 2011; Eastman et al. 2012, Revell et al. 2012; Slater et al. 2012, see Butler and King 2004 for OU). In addition, one important extension of our framework is that is allows us to incorporate branch- or clade-specific parameters for directional selection. Previously, Hunt (2006) modeled an occurrence of directional selection in one lineage. Hunt (2006) used sequential fossil data, whereas this study only used tip data for extant species, and the phenotypic value of the root in which directional selection was assumed to have occurred was unknown. Therefore, our method is similar to Hunt's (2006) but differs in that this framework attempts to model the likelihood of phenotypic evolution in the entire phylogeny.

As an example, our simulation-based algorithms were applied to the evolution of brain size in great apes. In this phylogeny, increases in brain size are thought to have occurred specifically in the human lineage after it split from the common ancestor of humans and chimpanzees. We found that the selection models explained the data much better than the null model that did not include selection (Table 1), which matched our a priori understanding. Our approach used the information of intraspecific variation in the computation of likelihood and provided an estimate of the ancestral state of the phenotypic value in a model that included a heterogeneous evolutionary pattern within a phylogeny. Although estimates of the ancestral state based on the BM have been available in previous PCM methods, the inclusion of branch-specific evolutionary modes (i.e., directional selection) alters the estimate of the ancestral state (Oakely and Cunningham 2000; Polly 2001; Finarelli and Flynn 2006; Alberts et al. 2009).

We also introduced an ABC algorithm that was more flexible and efficient than the likelihood approach (cf. Slater et al. 2012). In our example with the SRBM, we first confirmed that a posterior distribution of the accepted parameter set (θ0, μ0, and k) in the ABC with noninformative (flat) prior distributions agrees with the result of the ML analysis. The effective incorporation of prior distributions not only reduces the computational load but also makes it possible to take the uncertainties of nuisance parameters into account. This would work well, particularly when we have reliable phenotypic values from fossil records or a strong evolutionary hypothesis because researchers can use an informative prior distribution. The computational treatment for such special cases will be published elsewhere.

Thus, by taking advantage of dramatic improvements in computational power, we can use a flexible and powerful approach to infer models and parameters of phenotypic evolution. This simulation-based likelihood approach is particularly useful for complex data with many taxa, although full likelihood of simple data like the brain evolution in four primate species might be obtained analytically. The described framework is very simple and it could benefit from a number of potential improvements and extensions, both biological and statistical. Biologically, the oversimplified assumptions that were used in this article can easily be relaxed so that parameter inference can be performed under more realistic conditions. For example, we can assume that different modes of selection with different strengths act on different branches. It is even possible to model variation in the selection mode and intensity, even within a single branch. This can be applied to situations where drastic environmental or climate changes are known to have occurred during the period represented by a single branch. Additionally, although we assumed that the species tree (both topologies and branch lengths) was known, we can easily include uncertainty in the tree by incorporating it as a prior distribution. To correctly interpret the results of this framework, it is strongly recommended that researchers examine the power and error rates of the model in parallel with the simulation to test the evolutionary hypothesis. Simulations should also be effectively used to assess statistical differences between different models, as was done in our analysis when comparing the null NRBM model with alternative models that included branch-specific selection.

An issue worthy of further consideration is the generation time effect. It is quite common that species at the tip of a phylogeny have different life histories, including generation time, which might cause substantial variation in the evolutionary rate among branches. The need to consider generation time has been discussed in previous PCM studies (e.g., Polly 2001). Given the biological importance of generation time in PCM, it might be useful to adjust μ0 proportionally with the species-specific generation time instead of using a constant baseline evolutionary rate, μ0, throughout the entire phylogeny. One would need to investigate whether such an adjustment would improve a model's fit to the data compared with a model without the adjustment. This idea is analogous to previous models that applied plural BM with different rate parameters to a phylogeny (O’Meara et al. 2006; Thomas et al. 2006, 2009), although these previous models did not incorporate selection.

Another issue that we should consider is the scaling of phenotypes. We used an arbitrary value of δ= 1 as a mean of φ+ and φ to simulate phenotypic data ranging from 330 to 1330 (Table 1). We confirmed that the use of other δ values did not dramatically change the results, but it is important to use a realistic value for δ based on the extent of the observed phenotypic variation. If a realistic value of δ makes the computational time too long, it may be possible to use a larger value. However, values of δ that are too large would cause unnecessarily large random fluctuations, which would make the performance of the likelihood evaluation worse. In any case, it is essential to check the robustness of the results against different values of δ.

It is also possible to statistically extend our model in many ways. In the case of ML, the null and alternative models were compared using a likelihood-ratio test, but it is also possible to perform model fitting with a number of models, perhaps using the Akaike Information Criteria (AIC). Although the computational load was not enormous in our analyses of only four primate species with simple settings, it will become computationally intensive with increases in the amount of data and the number of parameters. In such situations, the ABC inference process should be implemented in the Markov chain Monte Carlo (MCMC) framework (Marjoram et al. 2003; Marjoram and Tavare 2006).

In summary, we believe that this simulation-based computational approach will be very useful, and that it has great potential for future studies of phenotypic evolution. It allows us to understand the mode and intensity of selection in a very realistic situation, and inferred models and parameters can be tested statistically. We also point out that this framework can be used to detect and quantify selection in PCMs. The initial motivation of this work was largely on the pioneering work in molecular evolution by Yang and colleagues (Yang 2006). They established a fundamental statistical framework that could be used to infer selection on amino acid substitutions in a phylogeny. In their framework, the rate of amino acid changing nucleotide substitutions (i.e., nonsynonymous substitutions) is measured by standardizing the rate of synonymous substitutions, which are generally considered to be neutral and constant among generations. This relative ratio, called the dN/dS or Ka/Ks ratio, is a well-accepted summary statistic for representing the direction and intensity of selection in studies of molecular evolution (Li 1997). For studies of phenotypic evolution, it would be beneficial to have such a conventional expression, and we propose the use of Δsel for this purpose. This index represents the total amount of inferred phenotype change in a specific branch. The basic idea of Δsel is similar to the unit of evolutionary change, the darwin (Haldane 1949), which is defined as the difference in (log transformed) phenotypic changes divided by the time span between two samples. Our Δsel extends this concept in two ways. First, Δsel can be calculated within the PCM framework where the ancestral state of each branch is unknown. Second, the darwin is a descriptive value that does not allow inference about evolutionary processes, such as testing whether selection explains observed phenotypic changes better than neutral changes. On the other hand, Δsel can be used for these purposes because Δnull can be obtained by simulation.

Associate Editor: P. Lindenfors


G. Slater provided important comments on the manuscript. This study was financially supported by Precursory Research for Embryonic Science and Technology (PRESTO) at Japan Science and Technology Agency (JST) (to NK and HI, respectively) and the Center for the Promotion of Integrated Sciences (CPIS) at Sokendai.