### Abstract

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

Although many of the statistical techniques used in comparative biology were originally developed in quantitative genetics, subsequent development of comparative techniques has progressed in relative isolation. Consequently, many of the new and planned developments in comparative analysis already have well-tested solutions in quantitative genetics. In this paper, we take three recent publications that develop phylogenetic meta-analysis, either implicitly or explicitly, and show how they can be considered as quantitative genetic models. We highlight some of the difficulties with the proposed solutions, and demonstrate that standard quantitative genetic theory and software offer solutions. We also show how results from Bayesian quantitative genetics can be used to create efficient Markov chain Monte Carlo algorithms for phylogenetic mixed models, thereby extending their generality to non-Gaussian data. Of particular utility is the development of multinomial models for analysing the evolution of discrete traits, and the development of multi-trait models in which traits can follow different distributions. Meta-analyses often include a nonrandom collection of species for which the full phylogenetic tree has only been partly resolved. Using missing data theory, we show how the presented models can be used to correct for nonrandom sampling and show how taxonomies and phylogenies can be combined to give a flexible framework with which to model dependence.

### Introduction

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

The work of Fisher, Haldane and Wright not only established the field of quantitative genetics but made substantial contributions to the field of statistics (Falconer, 1983). These statistical tools are still routinely used in comparative biology, although with a few notable exceptions (Lynch, 1991; Felsenstein, 2005; Naya *et al.*, 2006) the connection with quantitative genetics seems to have been largely lost. In this paper, we aim to reconnect quantitative genetics with comparative biology via the mixed model, highlighting solutions developed in quantitative genetics for problems that appear not to have been addressed or resolved in comparative biology.

Although used across the sciences, mixed models have their origin in quantitative genetics where a large and sophisticated, but perhaps inaccessible literature exists (Lynch & Walsh, 1998; Sorensen & Gianola, 2002; Thompson, 2008). Given their origin, it is perhaps not surprising that an early application of mixed models was to the analysis of data collected on individuals linked through a pedigree – an analysis now known as the ‘animal model’ (Henderson, 1976). In an important paper, Lynch (1991) showed that this same model can be applied to problems in phylogenetic comparative biology despite the difference in timescales over which shared ancestry is measured. Although Lynch's (1991) paper had received little attention until relatively recently (Housworth *et al.*, 2004; Felsenstein, 2008), an equivalent model (Pagel, 1999) was developed independently in the intervening period (Housworth *et al.*, 2004).

A perceived difficulty of Lynch's (1991) original phylogenetic mixed model was that finding the maximum likelihood (ML) estimate was too computer intensive to make it a practical tool (e.g Martins, 1996; Diniz-Filho *et al.*, 1998). However, a great deal of quantitative genetic literature had accumulated for efficiently fitting a range of large complex models (for a review, see Thompson *et al.*, 2005) and by at least 1996 this theory had a general implementation in the program ASReml (Gilmour *et al.*, 2002). For many data sets, Lynch's (1991) model could have been fitted in a matter of seconds using restricted maximum likelihood (REML), which became the method of choice in quantitative genetics relatively early (Patterson & Thompson, 1971). By contrast, the ML and generalized least squares (GLS) procedures advocated by Lynch (1991) and Pagel (1999) have largely been superseded in quantitative genetics due to their inherent bias and inflexibility. This bias arises because the methods fail to take into account the uncertainty in the fixed effects, resulting in downwardly biased variance components. The bias is likely to be severe in the context of phylogenetic comparative analyses because the fixed effects are associated with the ancestral state, and the ancestral state usually has high sampling error.

In this paper we start by showing that the relationship between the animal model and the phylogenetic mixed model is deeper than had been noted. The original phylogenetic mixed model was derived by making the analogy between the matrix of phylogenetic distances and the relatedness matrix defined by a pedigree. However, by expanding the phylogenetic covariance matrix to include ancestral nodes we show that these matrices also share several structural properties. More specifically, we show that a phylogeny is mathematically equivalent to an inbred pedigree, where the inbreeding coefficients are equal to the branch lengths. This relationship can be exploited in order to develop algorithms that are more accurate and orders of magnitude faster for large problems.

We go on to emphasize that general solutions and software are already available for dealing with many aspects of comparative analysis for which comparative biologists often flag as future avenues of research. We illustrate this by taking three recently published comparative papers (Ives *et al.*, 2007; Adams, 2008; Felsenstein, 2008) and show that they can all be considered phylogenetic meta-analyses in a mixed model framework. By doing this we highlight that the original phylogenetic meta-analysis (Adams, 2008) is implemented incorrectly, and that REML estimates could have been obtained for all three models over a decade ago without the need to develop new statistical tools or software. As a worked example, we re-analyse data collected by Adams (2008) in order to test Bergmann's (1847) rule – an ecological rule predicting a positive intraspecific correlation between body size and latitude.

We go on to discuss mixed model procedures for dealing with imperfect data in the context of comparative biology. In particular, the problem of missing data has received a great deal of attention in quantitative genetics and general methods that correct for nonrandom sampling are available and well understood (e.g Im *et al.*, 1989; Hadfield, 2008). These results are particularly important in the context of meta-analysis and comparative analysis because they may be able to correct for the publication bias that arises through nonrandom sampling of taxa, for example when common or ‘fluffy’ species are over-represented (Fisher *et al.*, 2003; Nakagawa & Freckleton, 2008). In a similar vein, the availability of a complete phylogeny may not be available for all taxa, and we show how taxonomic models (Clutton-Brock & Harvey, 1977) and phylogenetic models can be combined relatively simply using standard methodology. Although not an ideal solution, the method does provide a flexible work-around for analysing data where phylogenetic information is currently incomplete.

We end by discussing phylogenetic generalized linear mixed models for non-Gaussian traits, as standard REML methods are known to be unreliable due to the intractability of the likelihood. Markov chain Monte Carlo (MCMC) methods have proved to be useful tools for solving this problem both in quantitative genetics (Sorensen & Gianola, 2002) and phylogenetics (Pagel *et al.*, 2004; Felsenstein, 2005) and we show how efficient Gibbs samplers from quantitative genetics can be directly used for a wide range of phylogenetic methods. In particular, we discuss in detail a model where the trait can be one of *J* > 2 nominal states, as this type of model does not appear to have been used in quantitative genetics or comparative biology. The model allows the analysis of continuous and discrete characters to be brought under the same framework by shifting emphasis from evolutionary jumps between states to continuous evolution of the probability for expressing a state. In the context of phenotypic evolution, the proposed model seems to have an easier biological interpretation than currently available alternatives derived from substitution models of DNA (e.g. Pagel, 1994) because it allows for the fact that a whole host of developmental pathways are often required for the expression of complex categorical phenotypes. For example, a flightless stick insect is inherently more likely to produce a flying descendant than a flightless rodent.

### Phylogenetic covariance matrix vs. the additive relationship matrix

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

We give a brief description of the phylogenetic covariance matrix and the additive genetic relationship matrix because it was the link between these two concepts that allowed Lynch (1991) to develop the phylogenetic mixed model. We then show that if the phylogenetic covariance matrix is expanded to include ancestral species then these matrices are also similar in form. This may seem like a technical aside (the technical details are left to the Appendix), but we include it for two reasons. First, it makes the link between the phylogenetic mixed model and the animal model more explicit. Second, the structural properties of the additive genetic relationship matrix have played a key role in the development of robust and efficient algorithms in quantitative genetics. As phylogenetic comparative analyses become larger in scale, it will be useful and perhaps even necessary to exploit these properties.

The additive genetic relatedness matrix (**A**) is a square matrix equal in dimension to the number of individuals in the pedigree. Element *A*_{ij} is twice the probability that two alleles drawn at random, one from individual *i* and the other from individual *j*, are identical by descent. In the absence of inbreeding this is equivalent to the expected proportion of genes shared by two individuals (i.e. 1 if *i* = *j*, 0.5 if *i* and *j* are parent and offspring, 0.5 if full-sibs, 0.25 if half-sibs and so on).

In a phylogenetic context the equivalent matrix is equal in dimension to the number of species at the tips of the phylogeny. In this case the elements *A*_{ij} are equal to the length of the path from the most recent common ancestor of species *i* and *j* to the root of the phylogeny. Generally, the length of the path from the tips to the root of the phylogeny is scaled to unity so that the matrix is the correlation matrix with all the diagonal elements being 1.

However, in most statistical applications it is not **A** that is required, but its inverse **A**^{−1}. For pedigrees this matrix can be very large and efficient ways of obtaining the inverse made the fitting of these models practical (Henderson, 1976; Quaas, 1976; Meuwissen & Luo, 1992). These algorithms usually start by inserting ‘phantom parents’ into the pedigree so that all individuals can be traced back to a set of unrelated parents. By analogy, we can extend the concept of the phylogenetic covariance matrix to include all ancestral nodes:

- (1)

where **F** is a square matrix of dimension *n* − 2 (the number of internal nodes, excluding the root, where *n* is the number of tips). In the Appendix, we show why **S** has the same form as the complete pedigree matrix **A**, which allows us to apply Henderson's (1976) results directly to the problem of inverting **S** (i.e. **S**^{−1}). This equivalence of form is due to the fact that a phylogeny has the same graph structure as a pedigree without fathers, and the branch lengths between parent and child nodes are equivalent to inbreeding coefficients.

In later sections we will often use models parametrized in terms of **A**^{−1} rather than **S**^{−1} so that the connection with earlier work on comparative analysis is clearer. However, we emphasize that it is usually better to work with the **S**^{−1} parametrization, even though this involves including *n* − 2 missing records. There are three reasons for this. First, **S**^{−1} can be formed without the need to use direct inversion techniques such as Gauss–Jordan elimination. **A** has to be inverted this way which will be slow for large phylogenies, and may suffer from numerical problems as the matrix becomes ill conditioned, which is more likely with phylogenies than pedigrees as the variation in eigenvalues is generally higher (Housworth *et al.*, 2004). Second, **S**^{−1} has reduced storage requirements because the number of nonzero elements is linear in *n* (i.e. 6(*n* − 1)) and the matrix is said to be ‘sparse’. By contrast, **A**^{−1} is dense with the number of nonzero elements nonlinear in *n* (i.e. *n*^{2}). Last and most importantly, the pattern of zeros in **S**^{−1} allows GLS/mixed model equations to be re-ordered in such a way that the number of arithmetic operations needed to solve them is drastically reduced (for an introduction to sparse matrix methods, see Davis, 2006).

In Appendix S1, we simulate phenotypic data for the recently published mammal super-tree of 4510 species (Bininda-Emonds *et al.*, 2007) and show that depending on the method used, fitting a model with the **A**^{−1} parametrization either fails completely or takes up to a month of computing time. By contrast, the **S**^{−1} method takes between 0.2 s and 8 min depending on the method used.

### Phylogenetic meta-analysis

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

Using **A** to represent the phylogenetic relatedness matrix, we show that several recent comparative papers are variations of a common theme – phylogenetic meta-analysis. However, our main point is that this model is also a relatively minor variation of the basic mixed model, for which software has been available for some time.

The (co)variance structure of the data **V** is therefore of the form:

- (5)

In GLS, an estimate of the fixed effects () is obtained using:

- (6)

where **X** is the fixed effects design matrix (in this case an *n* × 1 vector of ones). **V**^{−1} is assumed to be known, up to proportionality, and when this is satisfied is equivalent to the best linear unbiased estimate (BLUE) of *μ* in a REML analysis.

Garland & Ives (2000) in their non-meta-analytic approach assume that **M** = **0**, naturally, but also assume no residual error (i.e. ). Under these assumptions, **A**^{−1} and **V**^{−1} are proportional because . Because of these assumptions Garland & Ives (2000) are able to project **X** and **y** using the matrix **Ψ** which allows an ordinary least squares parametrization of the model. This is possible because **Ψ** is defined as the non-normalized eigenvectors of **A**^{−1}, and has the property **Ψ**^{′}**A****Ψ** = **I** and so **A** = (**Ψ**^{′})^{−1}**Ψ**^{−1} = (**Ψ****Ψ**^{′})^{−1}, giving:

- (7)

which is equivalent to eqn 6 only because **A**^{−1}∝**V**^{−1}. This relationship can only be satisfied when there are no additional sources of random variation. This may be unrealistic, and particularly so in the context of meta-analysis where the aim is to take into account the variation in the precision of study-specific estimates, using weighted statistical models.

Again, it is implicitly assumed that and we disagree with Felsenstein (2008) that this model is equivalent to the original phylogenetic mixed model of Lynch (1991) because the two models only coincide when the phylogenetic heritability , which is also known as lambda (Pagel, 1999) is equal to 1]. Lynch (1991) did not assume that species means could be completely explained by Brownian motion down a phylogeny, and this is why a residual term was included to model deviations of species means from those expected. Although measurement error may be an important source of these deviations, there are a range of other processes that could cause them, and the inclusion of a residual term could be seen as a robust alternative (Lynch, 1991; Freckleton *et al.*, 2002; Housworth *et al.*, 2004; Revell *et al.*, 2008).

#### The method of Gilmour (*c*. 1996)

All of the above methods are standard in statistical quantitative genetics and a great deal of effort has gone into developing efficient computational strategies and understanding the properties of the REML estimators. Since 1996 at least, all of the above models could have been fitted using ASReml (Gilmour *et al.*, 2002). In the Supporting Information, we give the ASReml-R syntax for fitting the models presented above, and in each case we also include a model with a residual term. We do not present the theory, nor the algorithms involved, as this information is widely available (Gilmour *et al.*, 2002, and references therein), but note that all analyses can be fitted using a single line of code.

### Worked example: a re-analysis of Bergmann's rule

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

Adams (2008) analysis of Bergmann's rule found a mean effect size of 0.2883 (SE 0.0301) using conventional fixed effects meta-analysis, and a mean effect size of 0.672 (SE 0.4745) using phylogenetic meta-analysis.

Using ASReml we fitted the standard meta-analysis and obtained exactly the same result as Adams (2008). However, we prefer to use random effect meta-analysis where the assumption is relaxed that the correlation between latitude and body size would tend to the same value if replication within each species was very large. This type of assumption has been criticized even in the context of controlled clinical trials (Higgins *et al.*, 2009), and would seem untenable when the data are associated with different species (West & Sheldon, 2002). Using this technique the log-likelihood increased by more than 100 indicating the model was much better. Although the mean effect size was broadly similar (0.2271), the standard error increased substantially (SE 0.1156) because the original analysis underestimated the variability considerably. The correlation was not significant at the nominal 0.05 threshold (*P* = 0.057).

Using a phylogenetic meta-analysis under the assumption that we obtained a different answer from Adams (2008) due to problems with the calculations (mean effect size 0.1729 (SE 0.0995)). Relaxing the assumption that we obtained a mean effect size of 0.2271 (SE 0.1156) which coincides with the nonphylogenetic random meta-analysis because the REML estimate of the phylogenetic variance was zero. It is worth noting that the standard non-meta-analytic phylogenetic model gives a different mean effect size (0.4454, SE 0.2364) and indicates a reasonable phylogenetic signal (*H*^{2} = 0.444).

In the context of this study we suggest a more direct and powerful approach would be to fit a model of body size with latitude as either an additional response variable or as a fixed effect. Each study would be a separate data point with species fitted as an additional random effect. This analysis also allows species with only a single data point to be included, which are excluded from the former analysis because the correlation coefficient cannot be estimated. If these species are localized, and Bergmann's rule is the result of local adaptation, then these species may well be a nonrandom sample due to reduced gene flow, which in widespread species may weaken the relationship between body size and latitude. In later sections, we discuss multi-response models and biases resulting from missing data, both of which are relevant to meta-analysis.

### Taxonomic mixed model

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

For many comparative analyses, the taxonomic scope is often focused enough that a complete phylogeny is available. However, meta-analyses often include a heterogeneous collection of species for which the full phylogenetic tree has only been partly resolved (for example, Kingsolver *et al.*, 2001). To accommodate this, we show how the classic nested taxonomic model (Clutton-Brock & Harvey, 1977) can be combined with the phylogenetic mixed model (Lynch, 1991; Housworth *et al.*, 2004) to give a flexible framework with which to model phylogenetic dependence. Combining these models is a direct extension of Felsenstein (2008).

In the nested taxonomic model (Clutton-Brock & Harvey, 1977) we can model the data vector (**y**) as:

- (11)

where **c**, **o**, **f**, **g**, **s** are vectors of random effects for class, order, family, genus and species respectively. These can be thought of as the expected mean statistics for the different taxa, and these are related to the data by the random effects design matrices **Z**. These design matrices have dimensions equal to the number of data points (rows) and the number of taxa at the subscripted taxonomic level (columns). The design matrix for the vector of residuals (**e**) is assumed to be an identity matrix (**I**) and is therefore omitted. In addition, we assume a simple fixed effects structure where only an intercept is estimated. The taxonomic random effects are assumed to be identically and independently distributed (i.i.d.) and also normally distributed with a mean of zero and a taxonomic level-specific variance. For example, the distribution of genus effects follows:

- (12)

#### Taxonomic and phylogenetic mixed model

The taxonomic mixed model can be represented as a phylogenetic mixed model allowing taxonomy and phylogeny to be incorporated into a single analysis. Although the two types of model have different biological interpretations, they can be combined to give a description of the data that are consistent with an evolutionary process under certain, perhaps restrictive assumptions.

To illustrate the interpretational differences between the two types of model, we will start with a hypothetical taxonomy with two special properties. First, the taxonomy is an accurate description of the true phylogenetic topology which is polytomous when more than two representatives of a taxon exist. Second, the different taxonomic levels are assumed to be equally spaced in evolutionary time, and thus the taxonomy also captures phylogenetic branch lengths accurately. This second assumption will be relaxed later.

An example of such a taxonomy/phylogeny is depicted in Fig. 1 and can also be represented by the correlation (because the tree is ultrametric) matrix **A**:

- (13)

With the inclusion of a residual term the expected (co)variance between data points is

- (14)

As is an identity matrix, species effects are confounded with the residual component (each species is measured only once) and so an identifiable taxonomic model would be:

- (15)

where we use the superscript *s* in to indicate that the term includes variation at the species level (*s*) through to the residual level (*e*).

Alternatively, the model can be recast as the phylogenetic model, under the assumption that :

- (16)

In reality, a taxonomy is unlikely to coincide with a phylogeny exactly, except perhaps in topology, and so it may make more sense to estimate the taxon-specific variances. If it is found that the taxon-specific variances are not equal, then there are two equally valid interpretations. First, it could be that the taxonomy and the phylogeny do coincide and that the different variances represent temporal variation in phylogenetic inertia (Fig. 2). Or alternatively, the phylogenetic signal may be constant over time and the different variances indicate that the taxonomic branch lengths must be rescaled in order to coincide with the real phylogeny (Fig. 3).

Either interpretation is equally valid without prior information (Paradis, 2006). However, there may be parts of a taxonomy for which the phylogeny is available, and the assumption of a common variance across those taxonomic levels may be valid. For example, it may be that a phylogeny exists at the family level, but the classification of species into genera is by taxonomy. As in the standard taxonomic model, we can superscript phylogenetic effects with the region that they span, from the lowest taxonomic level to the highest, where **a**^{o:f} indicates that the phylogenetic effects are associated with the complete phylogeny up to the family level (assuming all taxa belong to the same order).

We could then fit the model:

- (17)

assuming that there are multiple species within genera, and multiple individuals within species, so that the generic and specific variances are estimable. **g** and **s** both follow the assumption of identical and independent distribution as in the taxonomic model. However, the phylogenetic effects (**A**) have expected (co)variances proportional to the phylogenetic covariance matrix (**A**) from the standard phylogenetic mixed model. However, rather than the tips of the phylogeny being species, the tips represent families (for the ASReml and MCMC syntax for fitting this model, see Supporting Information).

### Publication bias and missing data

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

An important part of meta-analysis is to assess the sensitivity of the parameter estimates to possible biases in the way the data were collected. This bias is often referred to as publication bias (Rothstein *et al.*, 2005) and it can occur at multiple stages of publication (e.g. submission, review or editorial decision). The main cause of publication bias is that statistically significant results are more likely to be published than nonsignificant results which are often relegated to the ‘file drawer’ (Rosenthal, 1979). This essentially results in missing data, where the probability of missingness depends on both effect size and sample size. In a comparative context, however, it is also possible that nonrandom sampling of species may happen due to species’ biology and status (e.g. accessibility, abundance or conservation status), which can be referred to as taxonomic bias (Nakagawa & Freckleton, 2008). In quantitative genetics this type of bias is known as selection bias (Lush & Shrode, 1950), and can occur when individuals in the pedigree have missing phenotypes, usually because they died before they could be measured or before they expressed the trait (Im *et al.*, 1989; Hadfield, 2008). If the missing phenotypes are a nonrandom sample, then biased estimates are possible. However, the problem can be alleviated if other data have been collected that determine the relationship between phenotype and the probability of missingness (Rubin, 1976). If this is the case the data are said to be missing at random (MAR) rather than missing completely at random (MCAR), which covers the intuitive concept of randomness (Little & Rubin, 2002; Nakagawa & Freckleton, 2008). In the context of comparative analysis the condition of MAR would be satisfied if a phylogeny is available that covers those species for which trait data are unavailable, and complete measurements are available for those traits that determine the probability of missingness. For example, information on life-history traits may be more likely to be unavailable for rare species than common species. Then, any association between abundance and life history can cause biases if the missing data are not taken into account. However, if abundance is available for all species then by including abundance in the analysis, either as a covariate or an additional response variable, unbiased estimates are possible. This is achieved by updating the missing life-history data conditional on the information provided by abundance using data augmentation, imputation or EM techniques (Fisher *et al.*, 2003; Nakagawa & Freckleton, 2008).

It is important to note that in phylogenetic meta-analysis, effect size statistics are often considered to be the species’ trait, for example when the relationship between two variables or difference between two groups are to be summarized (e.g. correlation coefficient or Cohen's *d*). Therefore, phylogenetic meta-analysis of summary statistics will often suffer from both taxonomic bias and publication bias, and this can be much harder to correct for. The difficulty arises because of uncertainty in the number of missing studies, and the complicated decisions that govern the process of publication. Numerous methods have been developed to detect and to correct for publication bias, but there appears to be no consensus on a general method (Smith *et al.*, 2000; Congdon, 2003; Rothstein *et al.*, 2005). A full review is outside the scope of this paper, but we briefly discuss some simple heuristic techniques for dealing with publication bias.

Although various correlation- or regression-based methods for detection of publication bias have been suggested, they all suffer from the problem of statistical power (Macaskill *et al.*, 2001; Sterne *et al.*, 2005). This is due to the fact that publication bias is more likely to occur and to cause incorrect estimates as the number of studies used in a meta-analysis becomes smaller (Moller & Jennions, 2001). Therefore, visual inspection of publication bias such as funnel plots (Sterne & Egger, 2005) is generally more preferable (but see Kulinskaya *et al.*, 2008). Once publication bias is (visually) detected, the correction of such bias may be necessary, and there are several easy-to-use sensitivity tests available. One of these is the ‘trim and fill’ method (Duval & Tweedie, 2000a,b; Duval, 2005) which relies on visual assessment of funnel plot asymmetry and then, adding data points that make the plot more symmetrical by utilizing existing data points. The trim and fill method has been successfully applied to meta-analysis in ecology and evolution (Jennions & Moller, 2002). In addition, the fail-safe *N* (file-drawer number; Rosenthal, 1979) and related statistics (e.g. Orwin, 1983; Rosenberg, 2005) have often been reported as a means of assessing the validity of mean effect size estimates in meta-analysis in evolution and ecology (Moller & Jennions, 2001), although more recently these statistics have come under heavy criticism (Becker, 2005).

### Markov Chain Monte Carlo techniques for non-Gaussian traits

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

Phylogenetic mixed models have mainly been applied to traits which are assumed to be normally distributed (for exceptions, see Felsenstein, 2005; Naya *et al.*, 2006). Generalized linear mixed models extend the linear mixed model to non-Gaussian responses, although model fitting has proved more difficult because the likelihood cannot be obtained in closed form. MCMC techniques solve this problem by breaking the high-dimensional joint distribution into a series of lower dimensional conditional distributions which are easier to sample from. By repeatedly sampling from these conditional distributions it is possible to very accurately approximate the complete joint distribution, and thereby extract things of interest (often marginal distributions).

For a thorough description of MCMC methods in the context of quantitative genetics we refer the interested reader to Sorensen & Gianola (2002). Here, we describe an MCMC algorithm for fitting a basic phylogenetic model and highlight those aspects which differ from already published results. Due to lack of space we are not able to cover the complete range of models that can be fitted using this technique, and so we restrict ourselves by highlighting a model with a nominal multinomial response, as this model does not seem to have been applied in quantitative genetics.

The general MCMC algorithm is described in detail elsewhere (Hadfield, 2010), and we also provide an R library (MCMCglmm) which is accompanied by a user manual in which the full range of supported models are discussed in greater detail. In brief, Gaussian, Poisson, Zero-inflated Poisson, Binomial, Multinomial, Ordinal and Exponential distributions are supported. More than one response variable is allowed, and the multiple responses can follow different distributions. Missing and censored data for the responses are tolerated under the assumption of a MAR process (Rubin, 1976). Any number of fixed and/or random effects can be fitted, and the random effects can be i.i.d. (as in species effects) or correlated (as in phylogenetic effects). The routines are fast, as all posterior simulations are done in compiled C++ using direct methods for sparse linear systems (Davis, 2006).

#### The basic model

With a Gaussian response the linear model is applied to *y*:

- (18)

where **W** is a design matrix which relates the predictors to the data, *θ* is a vector of location effects (fixed and random effects), and **e** is vector of residuals.

With non-Gaussian data a latent variable (*l*) is introduced which is the canonical parameter of some distribution on the link scale. For example, if datum *y*_{i} is Poisson distributed and a log link specified then the assumption is:

- (19)

where *λ* is the canonical parameter of the Poisson distribution (often called the rate parameter or mean parameter) and exp is the inverse link function.

In this case the linear model is applied to *l*:

- (20)

There is little distinction between fixed and random effects in a Bayesian analysis (hence we represent both with *θ*). The ‘fixed’ effects are usually assumed *a priori* to be independently distributed about zero with *specified* variance ():

- (21)

#### Parameter estimation

There are three types of parameter to estimate: the latent variables (*l*), location effects (*θ*) and the variance components and . We describe the basic sampling schemes using the example of a single trait following a Poisson distribution.

The latent variables (*l*) do not have a recognizable conditional distribution and so we sample them one at a time using Metropolis–Hastings updates (Metropolis *et al.*, 1953; Hastings, 1970). The conditional probability of the latent variable is proportional to the product of two terms: the Poisson likelihood of the data given *l* and the normal likelihood of *e* given a mean of zero and variance .

The location effects are multivariate normal, conditional on the latent variables and variance components, and can be Gibbs sampled (Geman & Geman, 1984). We use the Gibbs sampling method of Garcia-Cortes & Sorensen (2001) which updates all the location effects in a single pass and avoids inverting the mixed model coefficient matrix.

The variance components follow scaled inverted chi-squared distributions with scale equal to the cross-product of the ‘random’ location effects, or the cross-product of the residuals in the case of . As the variance components come from a known distribution they can be Gibbs sampled also. If the random effects are correlated, as they will be for the phylogenetic effects, the scale matrix is obtained using **a**^{′}**A**^{−1}**a** rather than the direct cross-product **a**^{′}**I**^{−1}**a**=**a**^{′}**a**.

#### Extensions to multiple responses

Multiple response models are not widely used in comparative biology but can be useful in many situations. Extending the model to the multivariate case is straightforward by concatenating the data vectors and latent variables for each trait and structuring the mixed model equations accordingly. For simplicity, we consider a bivariate Poisson analysis with response vector

and latent variable vector

In multi-response models it is usual to replace the variance components with their multivariate analogues, (co)variance matrices, which denote the variance within each trait and between each trait for the designated source of variance. For example, if **V**_{a} is a 2 × 2 matrix with the phylogenetic variance for the two traits along the diagonal, and the covariance between the phylogenetic effects for the two traits in the off-diagonals, then the complete set of phylogenetic effects

have the expected distribution *N*(**0**, **V**_{a }⊗ **A**) where ⊗ is the Kronecker product.

The location effects remain multivariate normal as before and so can be Gibbs sampled in a single pass following the multivariate extension to the Gibbs sampling method of Garcia-Cortes & Sorensen (2001) (Korsgaard *et al.*, 2003). The covariance matrices are Gibbs sampled using the multivariate analogue of the scaled inverse chi-squared distribution: the inverse Wishart distribution. The scale matrix in the multivariate case has a similar form: [**a**^{(1)}**a**^{(2)}]^{′}**A**^{−1}[**a**^{(1)}**a**^{(2)}].

Although similar in many respects to Felsenstein's (2005) application of MCMC to the comparative analysis of binary traits, the method differs in three respects. First, the model is constructed explicitly as a generalized linear model so that the concept of the latent variable can be extended beyond binary traits to other types of distribution. Second, the model can be more easily generalized to multi-trait models where the different traits can follow different distributions, and lastly all fixed and random effects are sampled simultaneously from their multivariate conditional distribution rather than one at a time from univariate conditional distributions. This is expected to improve the efficiency of the algorithm substantially (Roberts & Sahu, 1997).

#### Multinomial phylogenetic mixed model

We briefly describe some results additional to those described above that are required to fit a multinomial logit model for more than two nominal categories. The model does not appear to have been used in quantitative genetics but is quite widely used in econometrics and political science (Congdon, 2005). However, fitting such a model is a direct extension of a multivariate binary mixed model, with additional constraints on parameter space and a slight modification to the latent variable likelihood. If the categories are ordered, then it is possible to work with a different parametrization of the model presented below (Hedeker, 2003), or alternatively the ordered multinomial probit model can be used, which has been used in quantitative genetics (Gianola & Foulley, 1983).

In a binary model, a single data point can be one of two categories (*J* = 2), and this can be expressed as the univariate binary variable. Likewise, if *J* > 2 then it is usual to use a transformation that reduces the problem to *J* − 1 dimensions (Daganzo, 1979). In the binary model, the motivation for the dimension reduction is obvious; if a variable increased the probability of expressing the first category by 10%, it must by necessity reduce the probability of expressing the second category by 10% because an individual cannot express both categories simultaneously. The dimension reduction essentially constrains the probability of expressing the first or the second trait to unity [Pr(*Y*_{i} = *j*_{1}) + Pr(*Y*_{i} = *j*_{2}) = 1]. For the three-trait case, we will think of the three colours: red, black and white. Denoting *α*_{ij} as the probability that species *i* is colour *j*, the unit sum constraint has the form . To reduce the problem into *J* − 1 dimensions, it is usual to work with the parametrization in terms of a logs odd ratio with respect to an arbitrary baseline category (we will use the first category – red) so that

*l*_{ij} is the latent variable for individual *i* and colour *j*. The problem can be represented using the contrast matrix **Δ** (Bunch, 1991):

- (24)

### Discussion

- Top of page
- Abstract
- Introduction
- Phylogenetic covariance matrix vs. the additive relationship matrix
- Phylogenetic meta-analysis
- Worked example: a re-analysis of Bergmann's rule
- Taxonomic mixed model
- Publication bias and missing data
- Markov Chain Monte Carlo techniques for non-Gaussian traits
- Discussion
- Acknowledgments
- References
- Appendix
- Supporting Information

In this paper, we develop both classic and new results from quantitative genetics in the context of phylogenetic comparative analysis. We show how the relationship between pedigree structure and phylogeny structure can be used to exploit efficient algorithms already developed for ‘animal model’ analyses. We also suggest that many of the techniques currently being developed in comparative biology have already existed as standard REML tools in quantitative genetics for at least a decade. More recently, Bayesian models have gained popularity in quantitative genetics for certain types of problem for which standard techniques were known to behave poorly (Sorensen & Gianola, 2002; O'Hara *et al.*, 2008). Many of these problems also exist for the comparative method, and we show how quantitative genetic MCMC algorithms can be implemented for phylogenies. Taken together, these results not only demonstrate that the theory and software for fitting the original phylogenetic mixed model is well developed, but also that extensions dealing with meta-analysis, measurement error, missing data, multi-trait models, non-Gaussian data, small-sample inference and uncertainty in derived quantities such as phylogenetic heritability are readily available. In addition, we extend the quantitative genetic models to give a multinomial phylogenetic mixed model for analysing categorical traits: a type of model that does not appear to have been used in either field, quantitative genetics or comparative biology.

The multinomial model developed in this paper follows a logic similar to the threshold model of Wright (1934a,b) for binary characters in that a continuous polygenic probability is postulated that underlies which state is manifest. The model received some of the earliest Bayesian treatments in quantitative genetics (Foulley *et al.*, 1983; Gianola & Foulley, 1983), and was followed by the development of MCMC procedures (Sorensen *et al.*, 1995) which are discussed at length in Sorensen & Gianola (2002). Felsenstein (2005) discusses this threshold model in the context of phylogenies and provides alternative computational strategies for estimating the relevant parameters also, using MCMC. Alternative methods exist, mainly based around models of substitutions in DNA sequences (Pagel, 1994; Huelsenbeck *et al.*, 2003; Pagel & Meade, 2006), but Felsenstein (2005) gives a persuasive argument as to why the quantitative genetic approach is to be preferred for the comparative analysis of categorical phenotypes. From a statistical perspective, a multi-trait threshold model can often be parametrized with far fewer parameters, which given the small amount of information in phylogenies reduces the chances of trying to fit over identified models (Felsenstein, 2005). From a biological perspective, we believe the substitution model is harder to interpret in terms of phenotypic evolution because it assumes that the probability from jumping from a zero to one state is constant over the phylogeny. When focus shifts to the underlying probability, species that have zero states, but that are found in clades with many species in the one state, have an increased chance of flipping to the one state relative to a species found in clades dominated by zero states. This reasoning seems natural because the expression of phenotypes, even categorical ones, are often dependent on a whole range of developmental and biochemical processes being in place. For example, the re-appearance of wings in a wingless lineage of stick insects should be much less surprising than the appearance *de novo* of wings on guinea pigs. This being said, both the threshold model and the substitution model may be equivalent in the univariate case, and the differences may be merely interpretational; the substitution model would, no doubt, infer a wingless ancestral state to the rodents with very high confidence.

As Housworth *et al.* (2004) noted, although phylogenies tend to be more informative than equally sized pedigrees, typically phylogenies are orders of magnitude smaller than pedigrees in evolutionary biology. As measures of uncertainty and hypothesis testing are often conducted using Fisher information or likelihood ratio tests, both of which rely on large-sample asymptotic behaviour, statistical inference from small sample sizes should be treated with some caution. Although resampling techniques are often used to overcome this problem, there seems to be little formal justification for the resampling strategies employed (but see Lapointe & Garland, 2001), which is surprising given that resampling methods for dependent data are usually nontrivial to develop (Shao & Tu, 1996). Furthermore, naive application of basic resampling methods often give incorrect results (Rao & Wu, 1988). For example, the validity of permuting species data over the phylogeny in a phylogenetic meta-analysis is unclear given that permutation tests require the data to be exchangeable under the null hypothesis. In Adams (2008) model, the stated null hypothesis was a zero effect size, not a lack of phylogenetic inertia; so, it seems unlikely that the data are expected to be independent under the null hypothesis. It would seem that the permutation test will give the distribution of the test statistic in the absence of phylogenetic inertia, irrespective of whether Bergmann's rule exists or not. Bayesian inference makes no large-sample approximations and the resulting posterior distributions are an accurate description of uncertainty given the probability model (Gelman *et al.*, 2004). However, with small phylogenies the relative importance of prior information may increase, and effort should be made in obtaining accurate prior information and checking the sensitivity of the results to alternative prior specifications.

Although the generalized phylogenetic mixed model is a flexible tool for comparative analysis, having a range of other models as special cases (e.g. independent contrasts, Felsenstein, 1985; nested taxonomic model, Clutton-Brock & Harvey, 1977), there are alternative comparative methods that are based on different variance structures (Hansen & Martins, 1996; Martins & Hansen, 1997). These alternative models are often identical to well-known models in time-series analysis and geostatistics (Ives & Zhu, 2006); for example, the simple Ornstein–Uhlenbeck model (Hansen & Martins, 1996) is equivalent to the isotropic exponential; the continuous time analogue of the first-order autoregressive model. We are not familiar with these fields but given they are both large and have a long history we suspect that many of the problems involved with model fitting, over-identification, missing data and prediction already have good and well-tested solutions for these types of model. Following Ives & Zhu (2006), we stress that although phylogenies are unique and inherently interesting to biologists, statistically they are little different from space, time or pedigrees, all of which have been the focus of much statistical research.