General quantitative genetic methods for comparative biology: phylogenies, taxonomies and multi-trait models for continuous and categorical characters


Jarrod D. Hadfield, School of Biological Sciences, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh EH9 3JT, UK. Tel.: +44 131 650 7782; fax: +44 131 650 6564; e-mail:


Although many of the statistical techniques used in comparative biology were originally developed in quantitative genetics, subsequent development of comparative techniques has progressed in relative isolation. Consequently, many of the new and planned developments in comparative analysis already have well-tested solutions in quantitative genetics. In this paper, we take three recent publications that develop phylogenetic meta-analysis, either implicitly or explicitly, and show how they can be considered as quantitative genetic models. We highlight some of the difficulties with the proposed solutions, and demonstrate that standard quantitative genetic theory and software offer solutions. We also show how results from Bayesian quantitative genetics can be used to create efficient Markov chain Monte Carlo algorithms for phylogenetic mixed models, thereby extending their generality to non-Gaussian data. Of particular utility is the development of multinomial models for analysing the evolution of discrete traits, and the development of multi-trait models in which traits can follow different distributions. Meta-analyses often include a nonrandom collection of species for which the full phylogenetic tree has only been partly resolved. Using missing data theory, we show how the presented models can be used to correct for nonrandom sampling and show how taxonomies and phylogenies can be combined to give a flexible framework with which to model dependence.


The work of Fisher, Haldane and Wright not only established the field of quantitative genetics but made substantial contributions to the field of statistics (Falconer, 1983). These statistical tools are still routinely used in comparative biology, although with a few notable exceptions (Lynch, 1991; Felsenstein, 2005; Naya et al., 2006) the connection with quantitative genetics seems to have been largely lost. In this paper, we aim to reconnect quantitative genetics with comparative biology via the mixed model, highlighting solutions developed in quantitative genetics for problems that appear not to have been addressed or resolved in comparative biology.

Although used across the sciences, mixed models have their origin in quantitative genetics where a large and sophisticated, but perhaps inaccessible literature exists (Lynch & Walsh, 1998; Sorensen & Gianola, 2002; Thompson, 2008). Given their origin, it is perhaps not surprising that an early application of mixed models was to the analysis of data collected on individuals linked through a pedigree – an analysis now known as the ‘animal model’ (Henderson, 1976). In an important paper, Lynch (1991) showed that this same model can be applied to problems in phylogenetic comparative biology despite the difference in timescales over which shared ancestry is measured. Although Lynch's (1991) paper had received little attention until relatively recently (Housworth et al., 2004; Felsenstein, 2008), an equivalent model (Pagel, 1999) was developed independently in the intervening period (Housworth et al., 2004).

A perceived difficulty of Lynch's (1991) original phylogenetic mixed model was that finding the maximum likelihood (ML) estimate was too computer intensive to make it a practical tool (e.g Martins, 1996; Diniz-Filho et al., 1998). However, a great deal of quantitative genetic literature had accumulated for efficiently fitting a range of large complex models (for a review, see Thompson et al., 2005) and by at least 1996 this theory had a general implementation in the program ASReml (Gilmour et al., 2002). For many data sets, Lynch's (1991) model could have been fitted in a matter of seconds using restricted maximum likelihood (REML), which became the method of choice in quantitative genetics relatively early (Patterson & Thompson, 1971). By contrast, the ML and generalized least squares (GLS) procedures advocated by Lynch (1991) and Pagel (1999) have largely been superseded in quantitative genetics due to their inherent bias and inflexibility. This bias arises because the methods fail to take into account the uncertainty in the fixed effects, resulting in downwardly biased variance components. The bias is likely to be severe in the context of phylogenetic comparative analyses because the fixed effects are associated with the ancestral state, and the ancestral state usually has high sampling error.

In this paper we start by showing that the relationship between the animal model and the phylogenetic mixed model is deeper than had been noted. The original phylogenetic mixed model was derived by making the analogy between the matrix of phylogenetic distances and the relatedness matrix defined by a pedigree. However, by expanding the phylogenetic covariance matrix to include ancestral nodes we show that these matrices also share several structural properties. More specifically, we show that a phylogeny is mathematically equivalent to an inbred pedigree, where the inbreeding coefficients are equal to the branch lengths. This relationship can be exploited in order to develop algorithms that are more accurate and orders of magnitude faster for large problems.

We go on to emphasize that general solutions and software are already available for dealing with many aspects of comparative analysis for which comparative biologists often flag as future avenues of research. We illustrate this by taking three recently published comparative papers (Ives et al., 2007; Adams, 2008; Felsenstein, 2008) and show that they can all be considered phylogenetic meta-analyses in a mixed model framework. By doing this we highlight that the original phylogenetic meta-analysis (Adams, 2008) is implemented incorrectly, and that REML estimates could have been obtained for all three models over a decade ago without the need to develop new statistical tools or software. As a worked example, we re-analyse data collected by Adams (2008) in order to test Bergmann's (1847) rule – an ecological rule predicting a positive intraspecific correlation between body size and latitude.

We go on to discuss mixed model procedures for dealing with imperfect data in the context of comparative biology. In particular, the problem of missing data has received a great deal of attention in quantitative genetics and general methods that correct for nonrandom sampling are available and well understood (e.g Im et al., 1989; Hadfield, 2008). These results are particularly important in the context of meta-analysis and comparative analysis because they may be able to correct for the publication bias that arises through nonrandom sampling of taxa, for example when common or ‘fluffy’ species are over-represented (Fisher et al., 2003; Nakagawa & Freckleton, 2008). In a similar vein, the availability of a complete phylogeny may not be available for all taxa, and we show how taxonomic models (Clutton-Brock & Harvey, 1977) and phylogenetic models can be combined relatively simply using standard methodology. Although not an ideal solution, the method does provide a flexible work-around for analysing data where phylogenetic information is currently incomplete.

We end by discussing phylogenetic generalized linear mixed models for non-Gaussian traits, as standard REML methods are known to be unreliable due to the intractability of the likelihood. Markov chain Monte Carlo (MCMC) methods have proved to be useful tools for solving this problem both in quantitative genetics (Sorensen & Gianola, 2002) and phylogenetics (Pagel et al., 2004; Felsenstein, 2005) and we show how efficient Gibbs samplers from quantitative genetics can be directly used for a wide range of phylogenetic methods. In particular, we discuss in detail a model where the trait can be one of J > 2 nominal states, as this type of model does not appear to have been used in quantitative genetics or comparative biology. The model allows the analysis of continuous and discrete characters to be brought under the same framework by shifting emphasis from evolutionary jumps between states to continuous evolution of the probability for expressing a state. In the context of phenotypic evolution, the proposed model seems to have an easier biological interpretation than currently available alternatives derived from substitution models of DNA (e.g. Pagel, 1994) because it allows for the fact that a whole host of developmental pathways are often required for the expression of complex categorical phenotypes. For example, a flightless stick insect is inherently more likely to produce a flying descendant than a flightless rodent.

Phylogenetic covariance matrix vs. the additive relationship matrix

We give a brief description of the phylogenetic covariance matrix and the additive genetic relationship matrix because it was the link between these two concepts that allowed Lynch (1991) to develop the phylogenetic mixed model. We then show that if the phylogenetic covariance matrix is expanded to include ancestral species then these matrices are also similar in form. This may seem like a technical aside (the technical details are left to the Appendix), but we include it for two reasons. First, it makes the link between the phylogenetic mixed model and the animal model more explicit. Second, the structural properties of the additive genetic relationship matrix have played a key role in the development of robust and efficient algorithms in quantitative genetics. As phylogenetic comparative analyses become larger in scale, it will be useful and perhaps even necessary to exploit these properties.

The additive genetic relatedness matrix (A) is a square matrix equal in dimension to the number of individuals in the pedigree. Element Aij is twice the probability that two alleles drawn at random, one from individual i and the other from individual j, are identical by descent. In the absence of inbreeding this is equivalent to the expected proportion of genes shared by two individuals (i.e. 1 if i = j, 0.5 if i and j are parent and offspring, 0.5 if full-sibs, 0.25 if half-sibs and so on).

In a phylogenetic context the equivalent matrix is equal in dimension to the number of species at the tips of the phylogeny. In this case the elements Aij are equal to the length of the path from the most recent common ancestor of species i and j to the root of the phylogeny. Generally, the length of the path from the tips to the root of the phylogeny is scaled to unity so that the matrix is the correlation matrix with all the diagonal elements being 1.

However, in most statistical applications it is not A that is required, but its inverse A−1. For pedigrees this matrix can be very large and efficient ways of obtaining the inverse made the fitting of these models practical (Henderson, 1976; Quaas, 1976; Meuwissen & Luo, 1992). These algorithms usually start by inserting ‘phantom parents’ into the pedigree so that all individuals can be traced back to a set of unrelated parents. By analogy, we can extend the concept of the phylogenetic covariance matrix to include all ancestral nodes:


where F is a square matrix of dimension n − 2 (the number of internal nodes, excluding the root, where n is the number of tips). In the Appendix, we show why S has the same form as the complete pedigree matrix A, which allows us to apply Henderson's (1976) results directly to the problem of inverting S (i.e. S−1). This equivalence of form is due to the fact that a phylogeny has the same graph structure as a pedigree without fathers, and the branch lengths between parent and child nodes are equivalent to inbreeding coefficients.

In later sections we will often use models parametrized in terms of A−1 rather than S−1 so that the connection with earlier work on comparative analysis is clearer. However, we emphasize that it is usually better to work with the S−1 parametrization, even though this involves including n − 2 missing records. There are three reasons for this. First, S−1 can be formed without the need to use direct inversion techniques such as Gauss–Jordan elimination. A has to be inverted this way which will be slow for large phylogenies, and may suffer from numerical problems as the matrix becomes ill conditioned, which is more likely with phylogenies than pedigrees as the variation in eigenvalues is generally higher (Housworth et al., 2004). Second, S−1 has reduced storage requirements because the number of nonzero elements is linear in n (i.e. 6(n − 1)) and the matrix is said to be ‘sparse’. By contrast, A−1 is dense with the number of nonzero elements nonlinear in n (i.e. n2). Last and most importantly, the pattern of zeros in S−1 allows GLS/mixed model equations to be re-ordered in such a way that the number of arithmetic operations needed to solve them is drastically reduced (for an introduction to sparse matrix methods, see Davis, 2006).

In Appendix S1, we simulate phenotypic data for the recently published mammal super-tree of 4510 species (Bininda-Emonds et al., 2007) and show that depending on the method used, fitting a model with the A−1 parametrization either fails completely or takes up to a month of computing time. By contrast, the S−1 method takes between 0.2 s and 8 min depending on the method used.

Phylogenetic meta-analysis

Using A to represent the phylogenetic relatedness matrix, we show that several recent comparative papers are variations of a common theme – phylogenetic meta-analysis. However, our main point is that this model is also a relatively minor variation of the basic mixed model, for which software has been available for some time.

We consider the simplest, but sufficiently general phylogenetic meta-analysis where the study test statistic (y) for study i has the form:


where μ is the intercept (the ancestral state of the root node under a Brownian model), a are phylogenetic effects, e are residuals and m are study-specific measurement errors. The random effects are assumed to follow normal distributions:


where inline image is the residual variance and inline image is the phylogenetic variance. The identity matrix I represents the assumption that the residuals are independent and identically distributed. inline image and inline image are generally unknown and must be estimated; however, the distribution of m are often assumed to be known, and can usually be represented by a diagonal matrix M, where each diagonal element is the sampling variance of the published test statistic (often assumed to be the square of the standard error):


where for generality we include the measurement error variance inline image, although this is set to one when the sampling variances are known. In meta-analysis, it is common to talk about weighting, and the weight matrix (W) is equal to the inverse of M.

The (co)variance structure of the data V is therefore of the form:


In GLS, an estimate of the fixed effects (inline image) is obtained using:


where X is the fixed effects design matrix (in this case an n × 1 vector of ones). V−1 is assumed to be known, up to proportionality, and when this is satisfied inline image is equivalent to the best linear unbiased estimate (BLUE) of μ in a REML analysis.

The method of Adams (2008)

Garland & Ives (2000) in their non-meta-analytic approach assume that M = 0, naturally, but also assume no residual error (i.e. inline image). Under these assumptions, A−1 and V−1 are proportional because inline image. Because of these assumptions Garland & Ives (2000) are able to project X and y using the matrix Ψ which allows an ordinary least squares parametrization of the model. This is possible because Ψ is defined as the non-normalized eigenvectors of A−1, and has the property ΨAΨ = I and so A = (Ψ)−1Ψ−1 = (ΨΨ)−1, giving:


which is equivalent to eqn 6 only because A−1V−1. This relationship can only be satisfied when there are no additional sources of random variation. This may be unrealistic, and particularly so in the context of meta-analysis where the aim is to take into account the variation in the precision of study-specific estimates, using weighted statistical models.

Even if inline image then the expected (co)variances under a phylogenetic meta-analytic model would have the form inline image and V cannot be decomposed unless inline image and inline image are known. Adams (2008), equation 6) proposes the GLS estimator:


which implies that inline image. However, if this is the case then:


where c is some constant (for the full derivation, see Appendix). As W−1 is a diagonal matrix, eqn 9 can only hold if W−1 can be diagonalized by Ψ, implying (amongst other things) that the taxa are unrelated. Clearly, such a technique is not useful in a phylogenetic context.

The method of Ives et al. (2007)

Ives et al. (2007) derive an analogous model which is correct when inline image and so the data have the expected variance structure:


where M is assumed known. Ives et al. (2007) did not explicitly mention the connection with meta-analysis, as M for them represents the precision of species mean estimates rather than the precision of some arbitrary effect size statistic, but eqn 10 shows the models to be analogous to meta-analysis. Ives et al. (2007) give ML, REML and estimated GLS (EGLS) procedures that estimate inline image to give estimates of μ. Ives et al. (2007) also consider the problem of the measurement error variance inline image for those cases where it is unknown but did not provide a strategy for estimating inline image.

The method of Felsenstein (2008)

Felsenstein (2008), rather than working with species means, considers data points collected on individuals where multiple individuals may have been measured per species. If Zs is the random effect design matrix relating records (rows) to species (columns), then the expected (co)variance between data points due to measurement error (species) is equal to inline image where inline image is a square block diagonal matrix which is equivalent to M, and inline image is the within-species variance which can be interpreted as inline image. The main differences between this model and Ives et al. (2007) is that the expected within-species variance is assumed to be constant across species, and that the variance (inline image) is unknown and must be estimated. In Ives et al. (2007) the within-species variances are estimated on a species-by-species level, and are incorporated into the model as known quantities.

Again, it is implicitly assumed that inline image and we disagree with Felsenstein (2008) that this model is equivalent to the original phylogenetic mixed model of Lynch (1991) because the two models only coincide when the phylogenetic heritability inline image, which is also known as lambda (Pagel, 1999) is equal to 1]. Lynch (1991) did not assume that species means could be completely explained by Brownian motion down a phylogeny, and this is why a residual term was included to model deviations of species means from those expected. Although measurement error may be an important source of these deviations, there are a range of other processes that could cause them, and the inclusion of a residual term could be seen as a robust alternative (Lynch, 1991; Freckleton et al., 2002; Housworth et al., 2004; Revell et al., 2008).

The method of Gilmour (c. 1996)

All of the above methods are standard in statistical quantitative genetics and a great deal of effort has gone into developing efficient computational strategies and understanding the properties of the REML estimators. Since 1996 at least, all of the above models could have been fitted using ASReml (Gilmour et al., 2002). In the Supporting Information, we give the ASReml-R syntax for fitting the models presented above, and in each case we also include a model with a residual term. We do not present the theory, nor the algorithms involved, as this information is widely available (Gilmour et al., 2002, and references therein), but note that all analyses can be fitted using a single line of code.

Worked example: a re-analysis of Bergmann's rule

Adams (2008) analysis of Bergmann's rule found a mean effect size of 0.2883 (SE 0.0301) using conventional fixed effects meta-analysis, and a mean effect size of 0.672 (SE 0.4745) using phylogenetic meta-analysis.

Using ASReml we fitted the standard meta-analysis and obtained exactly the same result as Adams (2008). However, we prefer to use random effect meta-analysis where the assumption is relaxed that the correlation between latitude and body size would tend to the same value if replication within each species was very large. This type of assumption has been criticized even in the context of controlled clinical trials (Higgins et al., 2009), and would seem untenable when the data are associated with different species (West & Sheldon, 2002). Using this technique the log-likelihood increased by more than 100 indicating the model was much better. Although the mean effect size was broadly similar (0.2271), the standard error increased substantially (SE 0.1156) because the original analysis underestimated the variability considerably. The correlation was not significant at the nominal 0.05 threshold (P = 0.057).

Using a phylogenetic meta-analysis under the assumption that inline image we obtained a different answer from Adams (2008) due to problems with the calculations (mean effect size 0.1729 (SE 0.0995)). Relaxing the assumption that inline image we obtained a mean effect size of 0.2271 (SE 0.1156) which coincides with the nonphylogenetic random meta-analysis because the REML estimate of the phylogenetic variance was zero. It is worth noting that the standard non-meta-analytic phylogenetic model gives a different mean effect size (0.4454, SE 0.2364) and indicates a reasonable phylogenetic signal (H2 = 0.444).

In the context of this study we suggest a more direct and powerful approach would be to fit a model of body size with latitude as either an additional response variable or as a fixed effect. Each study would be a separate data point with species fitted as an additional random effect. This analysis also allows species with only a single data point to be included, which are excluded from the former analysis because the correlation coefficient cannot be estimated. If these species are localized, and Bergmann's rule is the result of local adaptation, then these species may well be a nonrandom sample due to reduced gene flow, which in widespread species may weaken the relationship between body size and latitude. In later sections, we discuss multi-response models and biases resulting from missing data, both of which are relevant to meta-analysis.

Taxonomic mixed model

For many comparative analyses, the taxonomic scope is often focused enough that a complete phylogeny is available. However, meta-analyses often include a heterogeneous collection of species for which the full phylogenetic tree has only been partly resolved (for example, Kingsolver et al., 2001). To accommodate this, we show how the classic nested taxonomic model (Clutton-Brock & Harvey, 1977) can be combined with the phylogenetic mixed model (Lynch, 1991; Housworth et al., 2004) to give a flexible framework with which to model phylogenetic dependence. Combining these models is a direct extension of Felsenstein (2008).

In the nested taxonomic model (Clutton-Brock & Harvey, 1977) we can model the data vector (y) as:


where c, o, f, g, s are vectors of random effects for class, order, family, genus and species respectively. These can be thought of as the expected mean statistics for the different taxa, and these are related to the data by the random effects design matrices Z. These design matrices have dimensions equal to the number of data points (rows) and the number of taxa at the subscripted taxonomic level (columns). The design matrix for the vector of residuals (e) is assumed to be an identity matrix (I) and is therefore omitted. In addition, we assume a simple fixed effects structure where only an intercept is estimated. The taxonomic random effects are assumed to be identically and independently distributed (i.i.d.) and also normally distributed with a mean of zero and a taxonomic level-specific variance. For example, the distribution of genus effects follows:


Taxonomic and phylogenetic mixed model

The taxonomic mixed model can be represented as a phylogenetic mixed model allowing taxonomy and phylogeny to be incorporated into a single analysis. Although the two types of model have different biological interpretations, they can be combined to give a description of the data that are consistent with an evolutionary process under certain, perhaps restrictive assumptions.

To illustrate the interpretational differences between the two types of model, we will start with a hypothetical taxonomy with two special properties. First, the taxonomy is an accurate description of the true phylogenetic topology which is polytomous when more than two representatives of a taxon exist. Second, the different taxonomic levels are assumed to be equally spaced in evolutionary time, and thus the taxonomy also captures phylogenetic branch lengths accurately. This second assumption will be relaxed later.

An example of such a taxonomy/phylogeny is depicted in Fig. 1 and can also be represented by the correlation (because the tree is ultrametric) matrix A:

Figure 1.

 A pictorial representation of a phylogeny where branching events occur at regular intervals and each set of branching events represent a different taxonomic grouping (order, family or genus). Under these conditions the phylogeny and taxonomy can be represented by scaled versions of the same picture/correlation matrix (see eqn 13).

With the inclusion of a residual term the expected (co)variance between data points is


As inline image is an identity matrix, species effects are confounded with the residual component (each species is measured only once) and so an identifiable taxonomic model would be:


where we use the superscript s in inline image to indicate that the term includes variation at the species level (s) through to the residual level (e).

Alternatively, the model can be recast as the phylogenetic model, under the assumption that inline image:


In reality, a taxonomy is unlikely to coincide with a phylogeny exactly, except perhaps in topology, and so it may make more sense to estimate the taxon-specific variances. If it is found that the taxon-specific variances are not equal, then there are two equally valid interpretations. First, it could be that the taxonomy and the phylogeny do coincide and that the different variances represent temporal variation in phylogenetic inertia (Fig. 2). Or alternatively, the phylogenetic signal may be constant over time and the different variances indicate that the taxonomic branch lengths must be rescaled in order to coincide with the real phylogeny (Fig. 3).

Figure 2.

 A representation of a taxonomic model fitted to a phylogeny (Fig. 1) where the variance explained by each taxonomic level is found to vary.

Figure 3.

 A reparametrization of the taxonomic model in Fig. 2 so that the variance explained by each taxonomic level is constant, but the branch lengths connecting taxonomic levels are rescaled to give a different phylogeny.

Either interpretation is equally valid without prior information (Paradis, 2006). However, there may be parts of a taxonomy for which the phylogeny is available, and the assumption of a common variance across those taxonomic levels may be valid. For example, it may be that a phylogeny exists at the family level, but the classification of species into genera is by taxonomy. As in the standard taxonomic model, we can superscript phylogenetic effects with the region that they span, from the lowest taxonomic level to the highest, where ao:f indicates that the phylogenetic effects are associated with the complete phylogeny up to the family level (assuming all taxa belong to the same order).

We could then fit the model:


assuming that there are multiple species within genera, and multiple individuals within species, so that the generic and specific variances are estimable. g and s both follow the assumption of identical and independent distribution as in the taxonomic model. However, the phylogenetic effects (A) have expected (co)variances proportional to the phylogenetic covariance matrix (A) from the standard phylogenetic mixed model. However, rather than the tips of the phylogeny being species, the tips represent families (for the ASReml and MCMC syntax for fitting this model, see Supporting Information).

Publication bias and missing data

An important part of meta-analysis is to assess the sensitivity of the parameter estimates to possible biases in the way the data were collected. This bias is often referred to as publication bias (Rothstein et al., 2005) and it can occur at multiple stages of publication (e.g. submission, review or editorial decision). The main cause of publication bias is that statistically significant results are more likely to be published than nonsignificant results which are often relegated to the ‘file drawer’ (Rosenthal, 1979). This essentially results in missing data, where the probability of missingness depends on both effect size and sample size. In a comparative context, however, it is also possible that nonrandom sampling of species may happen due to species’ biology and status (e.g. accessibility, abundance or conservation status), which can be referred to as taxonomic bias (Nakagawa & Freckleton, 2008). In quantitative genetics this type of bias is known as selection bias (Lush & Shrode, 1950), and can occur when individuals in the pedigree have missing phenotypes, usually because they died before they could be measured or before they expressed the trait (Im et al., 1989; Hadfield, 2008). If the missing phenotypes are a nonrandom sample, then biased estimates are possible. However, the problem can be alleviated if other data have been collected that determine the relationship between phenotype and the probability of missingness (Rubin, 1976). If this is the case the data are said to be missing at random (MAR) rather than missing completely at random (MCAR), which covers the intuitive concept of randomness (Little & Rubin, 2002; Nakagawa & Freckleton, 2008). In the context of comparative analysis the condition of MAR would be satisfied if a phylogeny is available that covers those species for which trait data are unavailable, and complete measurements are available for those traits that determine the probability of missingness. For example, information on life-history traits may be more likely to be unavailable for rare species than common species. Then, any association between abundance and life history can cause biases if the missing data are not taken into account. However, if abundance is available for all species then by including abundance in the analysis, either as a covariate or an additional response variable, unbiased estimates are possible. This is achieved by updating the missing life-history data conditional on the information provided by abundance using data augmentation, imputation or EM techniques (Fisher et al., 2003; Nakagawa & Freckleton, 2008).

It is important to note that in phylogenetic meta-analysis, effect size statistics are often considered to be the species’ trait, for example when the relationship between two variables or difference between two groups are to be summarized (e.g. correlation coefficient or Cohen's d). Therefore, phylogenetic meta-analysis of summary statistics will often suffer from both taxonomic bias and publication bias, and this can be much harder to correct for. The difficulty arises because of uncertainty in the number of missing studies, and the complicated decisions that govern the process of publication. Numerous methods have been developed to detect and to correct for publication bias, but there appears to be no consensus on a general method (Smith et al., 2000; Congdon, 2003; Rothstein et al., 2005). A full review is outside the scope of this paper, but we briefly discuss some simple heuristic techniques for dealing with publication bias.

Although various correlation- or regression-based methods for detection of publication bias have been suggested, they all suffer from the problem of statistical power (Macaskill et al., 2001; Sterne et al., 2005). This is due to the fact that publication bias is more likely to occur and to cause incorrect estimates as the number of studies used in a meta-analysis becomes smaller (Moller & Jennions, 2001). Therefore, visual inspection of publication bias such as funnel plots (Sterne & Egger, 2005) is generally more preferable (but see Kulinskaya et al., 2008). Once publication bias is (visually) detected, the correction of such bias may be necessary, and there are several easy-to-use sensitivity tests available. One of these is the ‘trim and fill’ method (Duval & Tweedie, 2000a,b; Duval, 2005) which relies on visual assessment of funnel plot asymmetry and then, adding data points that make the plot more symmetrical by utilizing existing data points. The trim and fill method has been successfully applied to meta-analysis in ecology and evolution (Jennions & Moller, 2002). In addition, the fail-safe N (file-drawer number; Rosenthal, 1979) and related statistics (e.g. Orwin, 1983; Rosenberg, 2005) have often been reported as a means of assessing the validity of mean effect size estimates in meta-analysis in evolution and ecology (Moller & Jennions, 2001), although more recently these statistics have come under heavy criticism (Becker, 2005).

Markov Chain Monte Carlo techniques for non-Gaussian traits

Phylogenetic mixed models have mainly been applied to traits which are assumed to be normally distributed (for exceptions, see Felsenstein, 2005; Naya et al., 2006). Generalized linear mixed models extend the linear mixed model to non-Gaussian responses, although model fitting has proved more difficult because the likelihood cannot be obtained in closed form. MCMC techniques solve this problem by breaking the high-dimensional joint distribution into a series of lower dimensional conditional distributions which are easier to sample from. By repeatedly sampling from these conditional distributions it is possible to very accurately approximate the complete joint distribution, and thereby extract things of interest (often marginal distributions).

For a thorough description of MCMC methods in the context of quantitative genetics we refer the interested reader to Sorensen & Gianola (2002). Here, we describe an MCMC algorithm for fitting a basic phylogenetic model and highlight those aspects which differ from already published results. Due to lack of space we are not able to cover the complete range of models that can be fitted using this technique, and so we restrict ourselves by highlighting a model with a nominal multinomial response, as this model does not seem to have been applied in quantitative genetics.

The general MCMC algorithm is described in detail elsewhere (Hadfield, 2010), and we also provide an R library (MCMCglmm) which is accompanied by a user manual in which the full range of supported models are discussed in greater detail. In brief, Gaussian, Poisson, Zero-inflated Poisson, Binomial, Multinomial, Ordinal and Exponential distributions are supported. More than one response variable is allowed, and the multiple responses can follow different distributions. Missing and censored data for the responses are tolerated under the assumption of a MAR process (Rubin, 1976). Any number of fixed and/or random effects can be fitted, and the random effects can be i.i.d. (as in species effects) or correlated (as in phylogenetic effects). The routines are fast, as all posterior simulations are done in compiled C++ using direct methods for sparse linear systems (Davis, 2006).

The basic model

With a Gaussian response the linear model is applied to y:


where W is a design matrix which relates the predictors to the data, θ is a vector of location effects (fixed and random effects), and e is vector of residuals.

With non-Gaussian data a latent variable (l) is introduced which is the canonical parameter of some distribution on the link scale. For example, if datum yi is Poisson distributed and a log link specified then the assumption is:


where λ is the canonical parameter of the Poisson distribution (often called the rate parameter or mean parameter) and exp is the inverse link function.

In this case the linear model is applied to l:


There is little distinction between fixed and random effects in a Bayesian analysis (hence we represent both with θ). The ‘fixed’ effects are usually assumed a priori to be independently distributed about zero with specified variance (inline image):


Usually inline image is set to something large in order to represent diffuse prior knowledge. The ‘random’ effects are also assumed to come from a normal prior distribution with zero mean, but the variance (inline image) is usually inferred a posteriori:


and in the case of phylogenetic effects the identity matrix (I) is replaced by a relationship matrix (A). In the simplest case, the residuals are assumed to be i.i.d.:


where the residual variance inline image is also estimated.

Parameter estimation

There are three types of parameter to estimate: the latent variables (l), location effects (θ) and the variance components inline image and inline image. We describe the basic sampling schemes using the example of a single trait following a Poisson distribution.

The latent variables (l) do not have a recognizable conditional distribution and so we sample them one at a time using Metropolis–Hastings updates (Metropolis et al., 1953; Hastings, 1970). The conditional probability of the latent variable is proportional to the product of two terms: the Poisson likelihood of the data given l and the normal likelihood of e given a mean of zero and variance inline image.

The location effects are multivariate normal, conditional on the latent variables and variance components, and can be Gibbs sampled (Geman & Geman, 1984). We use the Gibbs sampling method of Garcia-Cortes & Sorensen (2001) which updates all the location effects in a single pass and avoids inverting the mixed model coefficient matrix.

The variance components follow scaled inverted chi-squared distributions with scale equal to the cross-product of the ‘random’ location effects, or the cross-product of the residuals in the case of inline image. As the variance components come from a known distribution they can be Gibbs sampled also. If the random effects are correlated, as they will be for the phylogenetic effects, the scale matrix is obtained using aA−1a rather than the direct cross-product aI−1a=aa.

Extensions to multiple responses

Multiple response models are not widely used in comparative biology but can be useful in many situations. Extending the model to the multivariate case is straightforward by concatenating the data vectors and latent variables for each trait and structuring the mixed model equations accordingly. For simplicity, we consider a bivariate Poisson analysis with response vector


and latent variable vector


In multi-response models it is usual to replace the variance components with their multivariate analogues, (co)variance matrices, which denote the variance within each trait and between each trait for the designated source of variance. For example, if Va is a 2 × 2 matrix with the phylogenetic variance for the two traits along the diagonal, and the covariance between the phylogenetic effects for the two traits in the off-diagonals, then the complete set of phylogenetic effects


have the expected distribution N(0, V⊗ A) where ⊗ is the Kronecker product.

The latent variables in the multi-response model are sampled using Metropolis–Hastings updates as before, although they are updated in blocks corresponding to observational units (i). In the bivariate case, this involves updating the latent variables inline image and inline image together, where the likelihood is proportional to the product of three densities: the Poisson density of inline image given inline image, the Poisson density of inline image given inline image, and the density of the residuals


from a bivariate normal distribution with null mean vector and (co)variance matrix Ve (the multivariate analogue of inline image).

The location effects remain multivariate normal as before and so can be Gibbs sampled in a single pass following the multivariate extension to the Gibbs sampling method of Garcia-Cortes & Sorensen (2001) (Korsgaard et al., 2003). The covariance matrices are Gibbs sampled using the multivariate analogue of the scaled inverse chi-squared distribution: the inverse Wishart distribution. The scale matrix in the multivariate case has a similar form: [a(1)a(2)]A−1[a(1)a(2)].

Although similar in many respects to Felsenstein's (2005) application of MCMC to the comparative analysis of binary traits, the method differs in three respects. First, the model is constructed explicitly as a generalized linear model so that the concept of the latent variable can be extended beyond binary traits to other types of distribution. Second, the model can be more easily generalized to multi-trait models where the different traits can follow different distributions, and lastly all fixed and random effects are sampled simultaneously from their multivariate conditional distribution rather than one at a time from univariate conditional distributions. This is expected to improve the efficiency of the algorithm substantially (Roberts & Sahu, 1997).

Multinomial phylogenetic mixed model

We briefly describe some results additional to those described above that are required to fit a multinomial logit model for more than two nominal categories. The model does not appear to have been used in quantitative genetics but is quite widely used in econometrics and political science (Congdon, 2005). However, fitting such a model is a direct extension of a multivariate binary mixed model, with additional constraints on parameter space and a slight modification to the latent variable likelihood. If the categories are ordered, then it is possible to work with a different parametrization of the model presented below (Hedeker, 2003), or alternatively the ordered multinomial probit model can be used, which has been used in quantitative genetics (Gianola & Foulley, 1983).

In a binary model, a single data point can be one of two categories (J = 2), and this can be expressed as the univariate binary variable. Likewise, if J > 2 then it is usual to use a transformation that reduces the problem to J − 1 dimensions (Daganzo, 1979). In the binary model, the motivation for the dimension reduction is obvious; if a variable increased the probability of expressing the first category by 10%, it must by necessity reduce the probability of expressing the second category by 10% because an individual cannot express both categories simultaneously. The dimension reduction essentially constrains the probability of expressing the first or the second trait to unity [Pr(Yi = j1) + Pr(Yi = j2) = 1]. For the three-trait case, we will think of the three colours: red, black and white. Denoting αij as the probability that species i is colour j, the unit sum constraint has the form inline image. To reduce the problem into J − 1 dimensions, it is usual to work with the parametrization in terms of a logs odd ratio with respect to an arbitrary baseline category (we will use the first category – red) so that


lij is the latent variable for individual i and colour j. The problem can be represented using the contrast matrix Δ (Bunch, 1991):


For a simple fixed effects model:


where Xi is the design matrix for species i and has J − 1 rows. Likewise, the residual (Ve) and any random effect (e.g. Va) covariance matrices are for estimability purposes estimated on the J − 1 space: inline image and inline image, where the tilde indicates a covariance matrix on the scale of the three log(α)’s. Moreover, as there is only a single realization from the multinomial, then no element of Ve is estimable and must be fixed. A choice for Ve is essentially arbitrary, and we choose to fix inline image where J is the unit matrix. We can visualize the unit sum constraint for three categories as a model parametrized on the simplex (Fig. 4).

Figure 4.

 The three axes in the left panel represent the probability of belonging to one of the three colours: red, back and white. In this example, we assume that there are only three possible colours and so the two-simplex (triangle) represents the parameter space where the probability of being one of the three colours is equal to one. With more than three categories, the model is difficult to represent, but the parameter space of a four-category model would be a three-simplex (tetrahedron) in a four-dimensional subspace. Three points have been plotted in parameter space (A, B and C) and for ease of interpretation, let us assume that these are the probabilities associated with an ancestral state. If the phylogenetic heritability is high (or the phylogenetic distance between ancestor and descendant small) then these probabilities should predict well the probabilities associated with their descendants. For example, descendants of species A should be red with high probability, and descendants of species B should be red or white with equal probability but are unlikely to be black. Finally, descendants of species C are equally likely to express any one of the three colours. It should be understood that the uncertainty regarding the expressed colours of species B and particularly species C, does not reflect uncertainty in the underlying probability; species C really does have an equal propensity to express any of the colours and the uncertainty is associated with exactly which colour is manifest. Although predictions of individual random effects (BLUPS) could be plotted as above, it is more usual to interpret the distribution of the random effects. When these distributions are multivariate normal they can be represented by ellipses circumscribing regions of trait space that have equal density. The right panel shows the hypothetical distribution inline image on the scale of the log contrasts, the scale on which the assumption of normality was made (which includes the scale of the log probabilities log(α) also). The distribution is nonstandard on the inverse link (probability) scale, and so cannot be represented by ellipses on the simplex. However, the probability distribution can be approximated using posterior simulation as shown in the central panel. The distribution is seen to have triangular axes of symmetry and this is the motivation behind the choice of inline image for Ve.


In this paper, we develop both classic and new results from quantitative genetics in the context of phylogenetic comparative analysis. We show how the relationship between pedigree structure and phylogeny structure can be used to exploit efficient algorithms already developed for ‘animal model’ analyses. We also suggest that many of the techniques currently being developed in comparative biology have already existed as standard REML tools in quantitative genetics for at least a decade. More recently, Bayesian models have gained popularity in quantitative genetics for certain types of problem for which standard techniques were known to behave poorly (Sorensen & Gianola, 2002; O'Hara et al., 2008). Many of these problems also exist for the comparative method, and we show how quantitative genetic MCMC algorithms can be implemented for phylogenies. Taken together, these results not only demonstrate that the theory and software for fitting the original phylogenetic mixed model is well developed, but also that extensions dealing with meta-analysis, measurement error, missing data, multi-trait models, non-Gaussian data, small-sample inference and uncertainty in derived quantities such as phylogenetic heritability are readily available. In addition, we extend the quantitative genetic models to give a multinomial phylogenetic mixed model for analysing categorical traits: a type of model that does not appear to have been used in either field, quantitative genetics or comparative biology.

The multinomial model developed in this paper follows a logic similar to the threshold model of Wright (1934a,b) for binary characters in that a continuous polygenic probability is postulated that underlies which state is manifest. The model received some of the earliest Bayesian treatments in quantitative genetics (Foulley et al., 1983; Gianola & Foulley, 1983), and was followed by the development of MCMC procedures (Sorensen et al., 1995) which are discussed at length in Sorensen & Gianola (2002). Felsenstein (2005) discusses this threshold model in the context of phylogenies and provides alternative computational strategies for estimating the relevant parameters also, using MCMC. Alternative methods exist, mainly based around models of substitutions in DNA sequences (Pagel, 1994; Huelsenbeck et al., 2003; Pagel & Meade, 2006), but Felsenstein (2005) gives a persuasive argument as to why the quantitative genetic approach is to be preferred for the comparative analysis of categorical phenotypes. From a statistical perspective, a multi-trait threshold model can often be parametrized with far fewer parameters, which given the small amount of information in phylogenies reduces the chances of trying to fit over identified models (Felsenstein, 2005). From a biological perspective, we believe the substitution model is harder to interpret in terms of phenotypic evolution because it assumes that the probability from jumping from a zero to one state is constant over the phylogeny. When focus shifts to the underlying probability, species that have zero states, but that are found in clades with many species in the one state, have an increased chance of flipping to the one state relative to a species found in clades dominated by zero states. This reasoning seems natural because the expression of phenotypes, even categorical ones, are often dependent on a whole range of developmental and biochemical processes being in place. For example, the re-appearance of wings in a wingless lineage of stick insects should be much less surprising than the appearance de novo of wings on guinea pigs. This being said, both the threshold model and the substitution model may be equivalent in the univariate case, and the differences may be merely interpretational; the substitution model would, no doubt, infer a wingless ancestral state to the rodents with very high confidence.

As Housworth et al. (2004) noted, although phylogenies tend to be more informative than equally sized pedigrees, typically phylogenies are orders of magnitude smaller than pedigrees in evolutionary biology. As measures of uncertainty and hypothesis testing are often conducted using Fisher information or likelihood ratio tests, both of which rely on large-sample asymptotic behaviour, statistical inference from small sample sizes should be treated with some caution. Although resampling techniques are often used to overcome this problem, there seems to be little formal justification for the resampling strategies employed (but see Lapointe & Garland, 2001), which is surprising given that resampling methods for dependent data are usually nontrivial to develop (Shao & Tu, 1996). Furthermore, naive application of basic resampling methods often give incorrect results (Rao & Wu, 1988). For example, the validity of permuting species data over the phylogeny in a phylogenetic meta-analysis is unclear given that permutation tests require the data to be exchangeable under the null hypothesis. In Adams (2008) model, the stated null hypothesis was a zero effect size, not a lack of phylogenetic inertia; so, it seems unlikely that the data are expected to be independent under the null hypothesis. It would seem that the permutation test will give the distribution of the test statistic in the absence of phylogenetic inertia, irrespective of whether Bergmann's rule exists or not. Bayesian inference makes no large-sample approximations and the resulting posterior distributions are an accurate description of uncertainty given the probability model (Gelman et al., 2004). However, with small phylogenies the relative importance of prior information may increase, and effort should be made in obtaining accurate prior information and checking the sensitivity of the results to alternative prior specifications.

Although the generalized phylogenetic mixed model is a flexible tool for comparative analysis, having a range of other models as special cases (e.g. independent contrasts, Felsenstein, 1985; nested taxonomic model, Clutton-Brock & Harvey, 1977), there are alternative comparative methods that are based on different variance structures (Hansen & Martins, 1996; Martins & Hansen, 1997). These alternative models are often identical to well-known models in time-series analysis and geostatistics (Ives & Zhu, 2006); for example, the simple Ornstein–Uhlenbeck model (Hansen & Martins, 1996) is equivalent to the isotropic exponential; the continuous time analogue of the first-order autoregressive model. We are not familiar with these fields but given they are both large and have a long history we suspect that many of the problems involved with model fitting, over-identification, missing data and prediction already have good and well-tested solutions for these types of model. Following Ives & Zhu (2006), we stress that although phylogenies are unique and inherently interesting to biologists, statistically they are little different from space, time or pedigrees, all of which have been the focus of much statistical research.


Much of this work originated through discussion with Albert Phillimore who made extensive comments on previous drafts. We also thank Rob Freckelton, Gregor Gorjanc, Loeske Kruuk, Losia Lagisz, Darren Obbard, Ian Owens and Alastair Wilson. JDH was supported by a Leverhulme prize awarded to Loeske Kruuk, and a Natural Research Environment Council (UK) fellowship.


To obtain (E)GLS, REML or Bayesian estimates, it is necessary to form A−1 or S−1 where


F is a square matrix of dimension n − 2 (the number of internal nodes, excluding the root, where n is the number of tips). Following Henderson (1976), the lower triangle matrix (L) of the Cholesky decomposition S = LL′ can be formed recursively:


where pt indexes the parental node of t and ft is the length of the branch connecting node t to pt. It is interesting to note that ft is equivalent to the inbreeding coefficient in the context of a pedigree.

Note that L=TD where D is a diagonal matrix with diagonal elements equal to those of L (inline image), and T has the same nonzero pattern as L, but with all nonzero elements equal to one. Henderson (1976) shows that T is easy to invert, with T−1 being a lower triangle matrix with all diagonal elements equal to 1, and all nonzero elements left to the diagonal of the t th row being −1 for columns corresponding to the node tp. This is useful because:


and because D is diagonal, D−2 is also diagonal, each diagonal element of D taken to the power −2. This leads to a simple recursive algorithm for the inverse:


where inline image are the set of t’s child nodes.

Using expectation–maximization (EM) (Dempster et al., 1977) techniques for REML or data augmentation (Tanner & Wong, 1987) techniques for a Bayesian analysis, we can treat the ancestral states as missing data and work with the S−1 parametrization rather than the usual A−1 parametrization.