## Introduction

The work of Fisher, Haldane and Wright not only established the field of quantitative genetics but made substantial contributions to the field of statistics (Falconer, 1983). These statistical tools are still routinely used in comparative biology, although with a few notable exceptions (Lynch, 1991; Felsenstein, 2005; Naya *et al.*, 2006) the connection with quantitative genetics seems to have been largely lost. In this paper, we aim to reconnect quantitative genetics with comparative biology via the mixed model, highlighting solutions developed in quantitative genetics for problems that appear not to have been addressed or resolved in comparative biology.

Although used across the sciences, mixed models have their origin in quantitative genetics where a large and sophisticated, but perhaps inaccessible literature exists (Lynch & Walsh, 1998; Sorensen & Gianola, 2002; Thompson, 2008). Given their origin, it is perhaps not surprising that an early application of mixed models was to the analysis of data collected on individuals linked through a pedigree – an analysis now known as the ‘animal model’ (Henderson, 1976). In an important paper, Lynch (1991) showed that this same model can be applied to problems in phylogenetic comparative biology despite the difference in timescales over which shared ancestry is measured. Although Lynch's (1991) paper had received little attention until relatively recently (Housworth *et al.*, 2004; Felsenstein, 2008), an equivalent model (Pagel, 1999) was developed independently in the intervening period (Housworth *et al.*, 2004).

A perceived difficulty of Lynch's (1991) original phylogenetic mixed model was that finding the maximum likelihood (ML) estimate was too computer intensive to make it a practical tool (e.g Martins, 1996; Diniz-Filho *et al.*, 1998). However, a great deal of quantitative genetic literature had accumulated for efficiently fitting a range of large complex models (for a review, see Thompson *et al.*, 2005) and by at least 1996 this theory had a general implementation in the program ASReml (Gilmour *et al.*, 2002). For many data sets, Lynch's (1991) model could have been fitted in a matter of seconds using restricted maximum likelihood (REML), which became the method of choice in quantitative genetics relatively early (Patterson & Thompson, 1971). By contrast, the ML and generalized least squares (GLS) procedures advocated by Lynch (1991) and Pagel (1999) have largely been superseded in quantitative genetics due to their inherent bias and inflexibility. This bias arises because the methods fail to take into account the uncertainty in the fixed effects, resulting in downwardly biased variance components. The bias is likely to be severe in the context of phylogenetic comparative analyses because the fixed effects are associated with the ancestral state, and the ancestral state usually has high sampling error.

In this paper we start by showing that the relationship between the animal model and the phylogenetic mixed model is deeper than had been noted. The original phylogenetic mixed model was derived by making the analogy between the matrix of phylogenetic distances and the relatedness matrix defined by a pedigree. However, by expanding the phylogenetic covariance matrix to include ancestral nodes we show that these matrices also share several structural properties. More specifically, we show that a phylogeny is mathematically equivalent to an inbred pedigree, where the inbreeding coefficients are equal to the branch lengths. This relationship can be exploited in order to develop algorithms that are more accurate and orders of magnitude faster for large problems.

We go on to emphasize that general solutions and software are already available for dealing with many aspects of comparative analysis for which comparative biologists often flag as future avenues of research. We illustrate this by taking three recently published comparative papers (Ives *et al.*, 2007; Adams, 2008; Felsenstein, 2008) and show that they can all be considered phylogenetic meta-analyses in a mixed model framework. By doing this we highlight that the original phylogenetic meta-analysis (Adams, 2008) is implemented incorrectly, and that REML estimates could have been obtained for all three models over a decade ago without the need to develop new statistical tools or software. As a worked example, we re-analyse data collected by Adams (2008) in order to test Bergmann's (1847) rule – an ecological rule predicting a positive intraspecific correlation between body size and latitude.

We go on to discuss mixed model procedures for dealing with imperfect data in the context of comparative biology. In particular, the problem of missing data has received a great deal of attention in quantitative genetics and general methods that correct for nonrandom sampling are available and well understood (e.g Im *et al.*, 1989; Hadfield, 2008). These results are particularly important in the context of meta-analysis and comparative analysis because they may be able to correct for the publication bias that arises through nonrandom sampling of taxa, for example when common or ‘fluffy’ species are over-represented (Fisher *et al.*, 2003; Nakagawa & Freckleton, 2008). In a similar vein, the availability of a complete phylogeny may not be available for all taxa, and we show how taxonomic models (Clutton-Brock & Harvey, 1977) and phylogenetic models can be combined relatively simply using standard methodology. Although not an ideal solution, the method does provide a flexible work-around for analysing data where phylogenetic information is currently incomplete.

We end by discussing phylogenetic generalized linear mixed models for non-Gaussian traits, as standard REML methods are known to be unreliable due to the intractability of the likelihood. Markov chain Monte Carlo (MCMC) methods have proved to be useful tools for solving this problem both in quantitative genetics (Sorensen & Gianola, 2002) and phylogenetics (Pagel *et al.*, 2004; Felsenstein, 2005) and we show how efficient Gibbs samplers from quantitative genetics can be directly used for a wide range of phylogenetic methods. In particular, we discuss in detail a model where the trait can be one of *J* > 2 nominal states, as this type of model does not appear to have been used in quantitative genetics or comparative biology. The model allows the analysis of continuous and discrete characters to be brought under the same framework by shifting emphasis from evolutionary jumps between states to continuous evolution of the probability for expressing a state. In the context of phenotypic evolution, the proposed model seems to have an easier biological interpretation than currently available alternatives derived from substitution models of DNA (e.g. Pagel, 1994) because it allows for the fact that a whole host of developmental pathways are often required for the expression of complex categorical phenotypes. For example, a flightless stick insect is inherently more likely to produce a flying descendant than a flightless rodent.