Phylogenetic signal and linear regression on species data

Authors


Correspondence author. E-mail: lrevell@nescent.org

Summary

1. A common procedure in the regression analysis of interspecies data is to first test the independent and dependent variables X and Y for phylogenetic signal, and then use the presence of signal in one or both traits to justify regression analysis using phylogenetic methods such as independent contrasts or phylogenetic generalized least squares.

2. This is incorrect, because phylogenetic regression assumes that the residual error in the regression model (not in the original traits) is distributed according to a multivariate normal distribution with variances and covariances proportional to the historical relations of the species in the sample.

3. Here, I examine the consequences of justifying and applying the phylogenetic regression incorrectly. I find that when used improperly the phylogenetic regression can have poor statistical performance, even under some circumstances in which the type I error rate of the method is not inflated over its nominal level.

4. I also find, however, that when tests of phylogenetic signal in phylogenetic regression are applied properly, and in particular when phylogenetic signal in the residual error is simultaneously estimated with the regression parameters, the phylogenetic regression outperforms equivalent non-phylogenetic procedures.

Introduction

Phylogenetic methods have become de rigueur in the analysis of interspecies data (Harvey & Pagel 1991). This is because species are non-independent for the purposes of statistical analysis due to their common history (Felsenstein 1985; Harvey & Pagel 1991). This problem of the statistical dependence of related species has been solved in various different ways for different types of data and scientific questions (e.g. Ridley 1983; Felsenstein 1985, 2005, 2008; Cheverud, Dow & Leutenegger 1985; Grafen 1989; Pagel & Harvey 1989; Maddison 1990; Garland et al. 1993; Hansen 1997; Pagel 1999; Garland & Ives 2000; Rohlf 2001; Butler & King 2004; Ives, Midford & Garland 2007; Revell et al. 2007; Hansen, Pienaar & Orzack 2008; Lajeunesse 2009; Revell & Collar 2009; reviewed in Harvey & Pagel 1991; Felsenstein 2004). Arguably, the most widely used statistical method for the analysis of interspecific data that accounts for the historical relationships of species is the phylogenetic regression (Felsenstein 1985; Grafen 1989).

Typical linear regression analysis is of the form: y =  + ɛ, with the ordinary least squares (OLS) solution: inline image, in which y is an n × 1 vector (for n species) containing values for the dependent variable, Y; X is an n × (m + 1) matrix containing 1·0s in the first column and the m independent (explanatory) variables of the model in columns two through m + 1; and inline image is a vector containing the parameter estimates (including intercept) of the fitted univariate or multivariate linear regression model (Rencher & Schaalje 2008). ɛ is an n × 1 vector containing the residual error in the model, and under OLS it is assumed that ɛ is multivariate normally distributed with a variance–covariance matrix given by inline image. Here, I is the identity matrix (an n × n matrix containing 1·0s on the diagonal and zeroes elsewhere), and inline image is the residual variance of the model (i.e. the variability in Y not explained by the regressors).

If the residuals in ɛ are not distributed according to inline image, but instead according to inline image in which C is known and is not proportional to I (i.e. C ≠ kI for k ∈ R and k > 0.0), then fitting the regression model becomes a generalized (instead of ordinary) least squares problem (Rohlf 2001; Kariya & Karuta 2004; Rencher & Schaalje 2008). For non-phylogenetic data, C ≠ kI might be true, for example, if the sampling variance of Y is uneven across data points (i.e. if our data for Y have been collected with varying amounts of error). In this situation, C would be a diagonal matrix containing the n sampling variances of each of the observations for Y. Here, the generalized least squares regression would be the same as a weighted regression in which the weights are proportional to the inverse of the sampling variances for each observation of Y. In the phylogenetic case, the problem is not usually that the diagonal of C is uneven – all extant taxa in a phylogeny are temporally equidistant from the root of the tree (by definition) so they are frequently assumed to have equivalent variances (given that they are all extant and have been measured with comparable accuracy; but see Ives, Midford & Garland 2007). Rather, in the phylogenetic case, it is that the off-diagonals of C are non-zero due to the correlated histories of related species (Butler, Schoener & Losos 2000; Garland & Ives 2000).

To solve this problem, we can find the minimum variance regression slope and intercept using the generalized least squares estimating equation (or Gauss–Markov estimator; Kariya & Karuta 2004):

image

This approach to the regression of interspecies data was first suggested by Grafen (1989), and has since been showed to be exactly equivalent to regression estimated using the contrasts method of Felsenstein (1985; Garland & Ives 2000; Rohlf 2001). The generalized least squares estimating equation is similar to the OLS estimator (given above), except that now we have down-weighted each observation for Y (and corresponding row of X) depending on the correlation of its residual error with the other observations in our set.

Under a simple Brownian motion model for evolutionary change in Y and the Xs (Cavalli-Sforza & Edwards 1967; Felsenstein 1985, 2004), y (or any column of X, barring the first) is expected to be distributed as a multivariate normal with variance–covariance matrix given by inline image (or inline image) in which C contains the height of each of the n tips of the tree on its diagonal, as well as the heights of the most recent common ancestor of each species pair i and j in each i,jth off-diagonal position (Felsenstein 1973; O’Meara et al. 2006). inline image (or inline image) gives the phylogenetic variance or ‘evolutionary rate’ for Y (or X; O’Meara et al. 2006; Revell 2008). More importantly, however, ɛ = y −  will also be distributed according to a multivariate normal with variance–covariance matrix given by inline image under this evolutionary scenario. Figure 1(b) shows the computation of C from a simplified five taxon tree given in Fig. 1(a).

Figure 1.

 (a) A simple, five taxon tree with branch lengths. (b) The matrix C, which is proportional to the expected variance–covariance matrix of the residual error in a PGLS phylogenetic regression model. (c) A more flexible residual error matrix incorporating the parameter λ, which can be estimated using maximum likelihood.

When data for our dependent and independent variables come from species it is a common procedure to estimate the degree to which each variable is distributed according to the variance–covariance matrices inline image and inline image. This measurement, which can be taken in a variety of ways, is usually described as a measure of ‘phylogenetic signal’ for the characters in question (e.g. Blomberg & Garland 2002; Freckleton, Harvey & Pagel 2002; Blomberg, Garland & Ives 2003; Revell, Harmon & Collar 2008). If X and Y have been evolved by Brownian motion evolution, then their phylogenetic signal will be high (i.e. close to 1·0; Revell, Harmon & Collar 2008). Furthermore, if X and Y have evolved by Brownian motion then ɛ = y −  will be distributed according to inline image and the phylogenetic regression is an appropriate method to analyze the relationship between the independent variables contained in X and the response variable of our model, Y. Thus, it is tempting to use high phylogenetic signal in the dependent and/or independent variables as a justification for the phylogenetic regression. This is, in fact, commonly done (e.g. Ashton 2002; Gustafsson & Lindenfors 2004; Rezende, Bozinovic & Garland 2004; Muñoz-Garcia & Williams 2005; Collen et al. 2006; Ebensperger & Blumstein 2006; Ezenwa et al. 2006; Duminil et al. 2007; Hendrixson, Sterner & Kay 2007; Johnson, Isaac & Fisher 2007; Rönn, Katvala & Arnqvist 2007; Beaulieu et al. 2008; Capellini et al. 2008; Møller, Neilsen & Garamzegi 2008; Lovegrove 2009; Lindenfors, Revell & Nunn 2010).

However, it does not follow that if phylogenetic signal for X and/or Y is relatively high then ɛ = y −  will necessarily be distributed according to inline image. Furthermore, it is also possible that even if phylogenetic signal is very low, ɛ = y −  may still be distributed with variance–covariance matrix inline image. Thus, the appropriate test for phylogenetic signal is actually on the residual variability in Y given our regression model – a test which is relatively infrequently applied. In this study, I simulate scenarios in which X and/or Y have relatively high phylogenetic signal, but in which ɛ = y −  is non-phylogenetic and thus the phylogenetic regression is inappropriate. I show that using a phylogenetic regression here will induce increased variance on the regression estimator. I also examine the possibility that X and/or Y are non-phylogenetic, but that ɛ = y −  is distributed according to inline image. In this case, the phylogenetic regression is appropriate; however, standard diagnostic tests on X and Y might be taken to imply that ‘phylogenetic correction’ of the regression is unnecessary. I show that ignoring phylogeny in this case can lead to poor statistical performance of the regression. Finally, I repeat a maximum likelihood procedure using the λ statistic of Pagel (1999) in which we simultaneously estimate phylogenetic signal and the regression parameters (e.g. Revell 2009), thus obviating the need for a priori estimation of phylogenetic signal in the regression variables.

Materials and methods

To illustrate the case in point, I conducted four sets of numerical simulations, each under various conditions. First, I simulated X with phylogenetic signal, but in which the residual error in ɛ = y −  was distributed according to inline image; i.e. it was non-phylogenetic. Depending on the relationship between X and Y (i.e. β), as well as the size of the residual variance inline image, Y may or may not also show phylogenetic signal in this model. Second, I simulated X without phylogenetic signal, but in which the residual variation in Y (ɛ = y − ) was distributed according to inline image; i.e. it was phylogenetic. Again, depending on β and inline image, Y may or may not have signal in this model. Third, I simulated X and the residual variability in Y both with phylogenetic signal. This is the traditional Brownian motion model (Felsenstein 1985, 2004). Fourth, I simulated neither X nor Y with phylogenetic signal. I conducted all four of these simulation scenarios across several conditions for the magnitude of residual variability, inline image, and for the relationship between X and Y, β. In particular, for fixed variance of X (inline image), I simulated low, medium and high residual variation in Y (inline image, 0·1, and 1·0). For each value of residual variability in Y, I simulated a low, moderate and high regression relationship between X and Y (in which ‘low’ means close to zero in this case, not negative). I used the regression slopes of β1 = 0·00, 0·75 and 0·90 for these conditions.

Generating data according to these models is quite easy. First, I simulated 1000 stochastic pure-birth phylogenetic trees, each containing a fixed number of species (n = 100). I arbitrarily rescaled these trees to have a total length from the root to any tip of 1·0. To simulate data on the trees in which X has signal, but in which the residual variation in Y given X is uncorrelated with the tree, I simply generated my data vector for X, as inline image, in which inline image denotes the upper triangular Cholesky decomposite of C times the rate of evolution in X (inline image), and u is a vector of values sampled randomly from the standard normal distribution. I then generated a similar vector for the residual error of my regression model, but in this case I simulated the error to be uncorrelated, i.e. inline image. v is a vector of uncorrelated random standard normal deviates, as before. I then simply computed y = xβ1 + ɛ, where β1 is the desired slope of the regression relationship between X and Y and the intercept of the model is (arbitrarily) set to zero.

Generating data for X that is uncorrelated with the tree, but in which residual variability in the model y = xβ1 + ɛ is phylogenetic, was equally straightforward. Here, I just simulated X as inline image, residual error ε as inline image, and computed y = xβ1 + ɛ, as before.

I also simulated data for X and residual error in Y that were both correlated with the tree. To accomplish this, I just calculated inline image, inline image and y = xβ1 + ɛ, for vectors of random standard normal variates u and v.

Finally, I simulated data for X and residual error in Y that was uncorrelated with the phylogeny. In this case, I just generated the random vector for x and residual error as inline image and inline image, respectively, for standard normal vectors u and v, as before. Then, I computed y = xβ1 + ɛ.

The last two generating models are commonly assumed or discussed in the phylogenetic literature. For instance, the model in which both data for X and residual error in Y have phylogenetic signal is exactly equivalent to the common Brownian motion model (e.g. Felsenstein 1973, 1985; O’Meara et al. 2006). The model in which neither X nor Y have phylogenetic signal might be expected if Y evolves as a strong adaptive response to X, but X is uncorrelated with the phylogeny (e.g. Blomberg, Garland & Ives 2003; Butler & King 2004; Hansen, Pienaar & Orzack 2008; Lavin et al. 2008).

The first two generating conditions are much more rarely considered. I can suggest a couple of scenarios to which they might well apply; however, I am sure that biologically savvy readers of this article will come up with others. For example, phylogenetic signal in X but no phylogenetic signal in Y given X would be expected if X evolved by Brownian motion, and Y was determined completely by X, but was measured or phenotypically expressed with error. As long as the measurement or expression error was not phylogenetically correlated, this would represent an example of the first generating model. In the second generating model, there is no phylogenetic signal in X, but residual values for Y given X have signal. We might expect this pattern of interspecific variation if Y represented a phenotypically plastic response to a random (non-phylogenetic) environment X, but the magnitude of the plasticity was phylogenetically autocorrelated. Other biological processes that could result in generating conditions one or two are certainly possible.

For each simulation model and parameter conditions, I fit three different linear regression models for the relationship between X and Y. First, I fit an OLS regression, for which the estimating equation is given above, using inline image, where 1 is a column vector of 1·0s. To test the null hypothesis that the regression slope is β1 = 0.0, we need to calculate the variance–covariance matrix of our estimator, inline image, which is given by:

image

We can then compute inline image, which should be distributed as a t-statistic with − 2 = 98 d.f. (Rencher & Schaalje 2008). With the term V11, I am referring to the estimation variance component corresponding to β1 (not β0, the intercept of the model), which is actually in the second row and column of V.

Second, I fit the phylogenetic generalized least squares model (PGLS), in which the error structure of the residual vector, ɛ = − , is assumed to be given by inline image. As noted earlier, this model will yield exactly the same regression slope estimate as the procedure of independent contrasts followed by regression through the origin (Felsenstein 1985; Garland, Harvey & Ives 1992; Garland & Ives 2000; Rohlf 2001). Here, C is a matrix with the tree height in every diagonal position, and the heights of the most recent common ancestor of species i and j in each i,jth off-diagonal position (Fig. 1b). The typical generalized least squares estimating equation, inline image, is also given above. To conduct a test of the hypothesis that β1 = 0·0, we again compute inline image, but this time using

image

Third, I simultaneously optimized the phylogenetic signal of the residuals of Y along with our statistical model (PGLS). To do this, I used the parameter λ, which is a multiplier of the off-diagonal elements in C (e.g. Pagel 1999; Freckleton, Harvey & Pagel 2002; Revell & Harrison 2008). Figure 1(c) gives the hypothetical computation of C (herein, C) for a given λ on the five taxon tree in Fig. 1(a). In addition to use of the λ parameter of Pagel (1999), there are a variety of different ways in which C can be transformed (Grafen 1989; Pagel 1999; Blomberg, Garland & Ives 2003; Hansen, Pienaar & Orzack 2008); and more or less the same effect that is accomplished by λ and related parameters can also be achieved by transforming the branch lengths of the tree prior to analysis (e.g. Garland, Harvey & Ives 1992; Clobert, Garland & Barbault 1998; Díaz-Uriarte & Garland 1998; Blomberg, Garland & Ives 2003; Lavin et al. 2008; Gartner et al. 2010). Here, I focus on λ due to its relative simplicity and because it can easily be simultaneously optimized with the regression slope and intercept of our model. To accomplish this, we optimize the following equation for the log-likelihood (L), based on the multivariate normal equation:

image

Here, Cλ is the variance–covariance matrix, C, to which the λ transformation has been applied (Fig. 1c; Pagel 1999; Freckleton, Harvey & Pagel 2002). We do not have an analytic solution for this equation, so it must be optimized numerically; however, the difficulty of this optimization is alleviated considerably by the fact that for any given value of λ (and thus Cλ), our conditional maximum likelihood estimates for β and inline image can be obtained as follows (Rencher & Schaalje 2008):

image

By substituting inline image for C, we can conduct our hypothesis test of β1 = 0·0 using the same calculations as for PGLS, above; however, we should test our t-statistic against a t-distribution with − 3 d.f. due to the one additional parameter (λ) estimated in the PGLS model. In this study, I limited estimation of λ to the interval inline image, because most values of λ outside this interval result in a likelihood equation that is not defined; however, in theory inline image or inline image are possible (Freckleton, Harvey & Pagel 2002).

For each simulated data set, I compared the performance of OLS and PGLS by determining which estimation procedure produced an estimated regression slope, β1, that was closest to its generating value. I also counted the number of significant regressions of each type to estimate the type I error (when the generating regression slope was β1 = 0·00) or power (when β1 > 0·00) of each procedure. Finally, I determined the bias of each estimating procedure by computing the mean parameter estimate across all the simulated data sets for each of the three estimators. Several studies have shown that OLS and GLS are unbiased even if the structure of the error term is specified incorrectly (e.g. Pagel 1993; Rohlf 2006; Rencher & Schaalje 2008; Revell 2009), so I did not expect bias to be significant for any of my estimation methods.

Finally, for each simulated data set I also computed a slew of phylogenetic diagnostics on the variables X and Y. I estimated λ (Pagel 1999; Freckleton, Harvey & Pagel 2002), now for each character separately; I computed K, a measure of phylogenetic signal developed by Blomberg, Garland & Ives (2003); and, finally, I also calculated independent contrasts (Felsenstein 1985), and computed both the Pearson correlation (r) and Spearman rank correlation (ρ) between the absolute values of the standardized contrasts and their expected standard deviations prior to standardization (Garland, Harvey & Ives 1992).

For a single character, contained in (say) character vector y, λ is optimized using likelihood by maximizing the following equation:

image

in which 1 is a vector of 1·0s, as before; and the conditional maximum likelihood estimates of inline image and inline image are given by inline image and inline image, respectively (Freckleton, Harvey & Pagel 2002). This equation is maximized using numerical methods. As before, we limited estimation of λ to the interval inline image.

Blomberg, Garland & Ives (2003) proposed an alternative measure of phylogenetic signal which is receiving wide utility. Their measure, K, can be computed as follows:

image

Here, inline image;  tr(C) indicates that the trace of C has been calculated; and all other terms have been previously defined (Revell, Harmon & Collar 2008).

As noted above, I computed the Pearson and Spearman correlations between the absolute values of the standardized independent contrasts of Felsenstein (1985) and the square roots of their expected variances (Garland, Harvey & Ives 1992). This method is intended to measure whether contrasts have been standardized appropriately – which is not expected to have been the case if Brownian motion is an inappropriate model for character evolution in our phylogeny. Thus, a non-significant relationship between the contrasts and their standard deviations is often taken as evidence that phylogenetic methods for regression are appropriate (Garland, Harvey & Ives 1992; Nunn & Barton 2000; Fisher, Blomberg & Owens 2003).

In a detailed Appendix, I have provided computer code written in the R programming language (R Development Core Team 2009) that implements the phylogenetic regression methods described above. I have also provided code to fit both Pagel’s (1999)λ (Freckleton, Harvey & Pagel 2002) and Blomberg, Garland & Ives’s (2003)K. Both are often used as measures of phylogenetic signal on individual traits.

Results

Table 1 shows the mean parameter estimates, type I error rates, and power for each estimating procedure (OLS, PGLS and PGLS), for data generated with phylogenetic signal in the independent variable, X, but no phylogenetic signal in the model residuals. As expected, both OLS and PGLS were unbiased. The most obvious features of Table 1 are fourfold. First, estimation accuracy is substantially higher for OLS than PGLS. Averaged across simulation conditions, in 84·2% of simulations OLS produced a better (i.e. closer to its generating value) estimate of the regression slope than PGLS (Table 1; Fig. 2). Second, variance among estimated values of β1 was much higher (on average 32·9 times higher) for PGLS than OLS. Third (and in spite of points one and two), type I error was not elevated relative to its nominal level for PGLS. Finally, in simulations of β1 ≠ 0·00, power of the PGLS estimator was deflated for conditions of high residual error, inline image.

Table 1.   Parameter estimation, type I error and power for OLS, PGLS and PGLS for data generated with phylogenetic signal in X, but no phylogenetic signal for the model residuals
β1inline imageinline imageinline imageError/powerinline imageinline imageError/powerOLS : PGLSinline imageinline imageError/power
  1. β1 and inline image indicate the generating regression slope and residual error, respectively. inline image indicates the mean parameter estimate by OLS and inline image, the variation among simulations in the estimated slope. inline image and inline image are likewise interpreted. OLS : PGLS indicates the fraction of simulations for which OLS provided a better estimate of the regression slope (i.e. one closer to its generating value) compared with the fraction for which PGLS provided a better estimate. inline image indicates the mean parameter estimate for the regression slope when the regression model was estimated simultaneously with λ and inline image, the variation among simulations. Finally, error/power indicates the type I error rate (if β1 = 0·00) or power of each estimation procedure.

0·000·01−9·62 × 10−41·49 × 10−40·055−3·37 × 10−33·49 × 10−30·0510·836 : 0·164−9·26 × 10−41·50 × 10−40·050
0·10−8·86 × 10−41·65 × 10−30·0603·17 × 10−35·68 × 10−20·0530·825 : 0·175−8·22 × 10−41·70 × 10−30·057
1·00−4·67 × 10−31·46 × 10−20·047−1·06 × 10−23·97 × 10−10·0450·829 : 0·171−4·61 × 10−31·51 × 10−20·045
0·750·010·7491·49 × 10−41·0000·7494·41 × 10−30·9970·836 : 0·1640·7491·47 × 10−41·000
0·100·7511·50 × 10−31·0000·7647·67 × 10−20·9520·867 : 0·1330·7511·55 × 10−31·000
1·000·7531·40 × 10−21·0000·7654·79 × 10−10·3940·850 : 0·1500·7521·43 × 10−21·000
0·900·010·9011·63 × 10−41·0000·9015·69 × 10−30·9980·831 : 0·1690·9011·67 × 10−41·000
0·100·9011·62 × 10−31·0000·8973·54 × 10−20·9660·848 : 0·1520·9011·64 × 10−31·000
1·000·9061·45 × 10−21·0000·9025·64 × 10−10·5110·853 : 0·1470·9071·48 × 10−21·000
Figure 2.

 A stacked bar graph showing the fraction of analyses in which OLS or PGLS yielded a better (i.e. closer to its generating value) parameter estimate of the regression slope, β1, for each of the four simulation models of this study. Results are averaged across simulation conditions for each model.

Table 2 shows phylogenetic diagnostics of X, Y, and the bivariate regression model including λ. Estimated phylogenetic signal was invariably high for X; however, it was also quite high for Y when the generating value of β1 was high and inline image was relatively low. I also computed two different independent contrasts diagnostics: the Pearson and Spearman rank correlation coefficients for the correlation between the absolute values of the standardized contrasts and the square root of their expected variances prior to standardization. In general, these correlations were near zero for X (as expected), and negative for Y, although for increasing β1 and sufficiently low inline image the strength of the correlations decreased along with my power to detect a significant relationship (Table 2).

Table 2.   Phylogenetic diagnostics for X, Y and the bivariate regression model, where the data have been simulated with phylogenetic signal in X, but no signal in the residual variability of Y given X
β1inline imageinline imageinline imageinline imageK(X)K(Y)r(X)Error/powerr(Y)Error/powerρ(X)Error/powerρ(Y)Error/power
  1. β1 and inline image are as in Table 1. inline image and inline image indicate the mean value of phylogenetic signal estimated using the λ method for each character, X and Y, separately. inline image indicates the mean fitted λ, where λ was estimated simultaneously with the regression model. K(X) and K(Y) indicate the mean value of phylogenetic signal in X and Y, respectively, estimated using the K method. r(X) and r(Y) indicate the mean Pearson product–moment correlation between the absolute values of the independent contrasts and the square roots of their expected variances prior to standardization. ρ(X) and ρ(Y) are the mean values of the corresponding Spearman rank correlations. Error/power indicates the type I error or power of each contrasts-based diagnostic.

0·000·010·9980·0140·0120·9910·066−0·0030·050−0·5211·000−0·0040·047−0·5381·000
0·100·9980·0140·0120·9930·068−0·0010·055−0·5231·000−0·0020·052−0·5491·000
1·000·9980·0160·0131·0100·0680·0050·038−0·5191·0000·0040·045−0·5451·000
0·750·010·9980·9790·0100·9950·7140·0010·061−0·1870·4960·0010·047−0·1210·245
0·100·9980·8270·0131·0040·2720·0020·047−0·4130·993  6·43 × 10−50·041−0·3400·919
1·000·9980·3000·0121·0200·0940·0070·052−0·5031·0000·0060·060−0·5010·997
0·900·010·9980·9840·0131·0010·777  1.30 × 10−40·057−0·1570·393  8·26 × 10−40·052−0·0990·175
0·100·9980·8720·0100·9960·3370·0030·052−0·3840·9760·0020·045−0·2990·838
1·000·9980·3860·0120·9800·105−0·0010·052−0·4971·000  6·95 × 10−40·054−0·4860·998

My results for data generated with no phylogenetic signal in the independent variable, X, but phylogenetically correlated residual variation in Y are given in Tables 3 and 4. Here, the PGLS estimator was better on average 84·5% of the time when compared with the estimated regression slope obtained using OLS (Fig. 2). However, OLS did not suffer from increased type I error when the generating regression slope was β1 = 0·00 (Table 3). Phylogenetic signal was invariably low for X (Table 4). Phylogenetic signal was also generally quite low for Y, except when β1 = 0·00. Diagnostic statistics on the contrasts for both X and Y overwhelming indicate inadequate standardization, again except when β1 = 0·00, in which case they indicated that Y, but not X, has been adequately standardized in the computation of independent contrasts (Table 4).

Table 3.   Parameter estimation, type I error and power for OLS, PGLS and PGLS for data generated with no phylogenetic signal in X, but phylogenetic signal for the model residuals
β1inline imageinline imageinline imageError/powerinline imageinline imageError/powerOLS : PGLSinline imageinline imageError/power
  1. Column headers are as in Table 1.

0·000·01−2·47 × 10−48·04 × 10−50·049−1·27 × 10−65·21 × 10−60·0550·151 : 0·849−1·50 × 10−45·47 × 10−60·049
0·10−1·01 × 10−37·90 × 10−40·0492·53 × 10−45·84 × 10−50·0530·160 : 0·8402·45 × 10−46·29 × 10−50·047
1·00−2·39 × 10−48·17 × 10−30·0492·40 × 10−45·76 × 10−40·0490·176 : 0·8245·94 × 10−46·04 × 10−40·044
0·750·010·7507·38 × 10−51·0000·7505·14 × 10−61·0000·157 : 0·8430·7505·70 × 10−61·000
0·100·7517·92 × 10−41·0000·7505·44 × 10−51·0000·153 : 0·8470·7506·02 × 10−51·000
1·000·7498·54 × 10−31·0000·7505·43 × 10−41·0000·134 : 0·8660·7505·65 × 10−41·000
0·900·010·9008·02 × 10−51·0000·9005·69 × 10−61·0000·152 : 0·8480·9006·08 × 10−61·000
0·100·9007·53 × 10−41·0000·9006·24 × 10−51·0000·153 : 0·8470·9006·55 × 10−51·000
1·000·9028·00 × 10−31·0000·9005·80 × 10−41·0000·163 : 0·8370·9006·18 × 10−41·000
Table 4.   Phylogenetic diagnostics for X, Y and the bivariate regression model, where the data have been simulated with no phylogenetic signal in the independent variable, X, but signal in the residual error
β1inline imageinline imageinline imageinline imageK(X)K(Y)r(X)Error/powerr(Y)Error/powerρ(X)Error/powerρ(Y)Error/power
  1. Column headers are as in Table 2.

0·000·010·0160·9980·9980·0661·001−0·5201·000−0·0030·053−0·5491·000−0·0030·055
0·100·0160·9980·9980·0671·002−0·5201·000−2·02 × 10−40·048−0·5501·000−9·87 × 10−40·042
1·000·0150·9980·9980·0681·011−0·5211·000−6·09 × 10−40·044−0·5461·000−0·0020·058
0·750·010·0140·0220·9980·0670·068−0·5191·000−0·5181·000−0·5461·000−0·5441·000
0·100·0150·1130·9980·0670·076−0·5201·000−0·5141·000−0·5461·000−0·5291·000
1·000·0150·5930·9980·0660·146−0·5181·000−0·4731·000−0·5441·000−0·4360·990
0·900·010·0140·0170·9980·0660·067−0·5221·000−0·5211·000−0·5481·000−0·5461·000
0·100·0130·0740·9980·0680·074−0·5211·000−0·5161·000−0·5470·999−0·5340·999
1·000·0130·4970·9980·0660·125−0·5201·000−0·4851·000−0·5491·000−0·4580·994

When I generated data with phylogenetic signal in both the independent variable and in the model residuals, I found that PGLS vastly outperformed OLS. In this case, the PGLS estimator was better on average 77·8% of the time (Table 5; Fig. 2). I also found that type I error was substantially increased for OLS, unlike in Table 3 when only the residual error was simulated with phylogenetic signal and where OLS had type I error near the nominal level. All measures of phylogenetic signal indicated high phylogenetic signal for both X and Y; and no independent contrasts diagnostic suggested that the contrasts for X or Y had been inadequately standardized (Table 6).

Table 5.   Parameter estimation, type I error and power for OLS, PGLS and PGLS for data generated with phylogenetic signal in X and the model residuals
β1inline imageinline imageinline imageError/powerinline imageinline imageError/powerOLS : PGLSinline imageinline imageError/power
  1. Column headers are as in Tables 1 and 3.

0·000·01−5·15 × 10−46·72 × 10−40·426−8·18 × 10−41·02 × 10−40·0440·224 : 0·776−7·96 × 10−41·02 × 10−40·045
0·10−2·54 × 10−37·27 × 10−30·446−8·25 × 10−41·05 × 10−30·0550·213 : 0·787−9·90 × 10−41·07 × 10−30·055
1·00−6·30 × 10−47·61 × 10−20·463−1·71 × 10−31·02 × 10−20·0450·222 : 0·778−1·36 × 10−31·02 × 10−20·045
0·750·010·7517·70 × 10−41·0000·7509·89 × 10−51·0000·201 : 0·7990·7501·00 × 10−41·000
0·100·7497·02 × 10−31·0000·7511·06 × 10−31·0000·223 : 0·7770·7511·07 × 10−31·000
1·000·7446·61 × 10−20·9810·7521·07 × 10−21·0000·237 : 0·7630·7521·08 × 10−21·000
0·900·010·8997·73 × 10−41·0000·9001·03 × 10−41·0000·239 : 0·7610·9001·03 × 10−41·000
0·100·9016·74 × 10−31·0000·9011·06 × 10−31·0000·217 : 0·7830·9011·06 × 10−31·000
1·000·8957·32 × 10−20·9910·9021·05 × 10−21·0000·223 : 0·7770·9021·04 × 10−21·000
Table 6.   Phylogenetic diagnostics for X, Y and the bivariate regression model, where the data have been simulated with phylogenetic signal in both X and the model residuals
β1inline imageinline imageinline imageinline imageK(X)K(Y)r(X)Error/powerr(Y)Error/powerρ(X)Error/powerρ(Y)Error/power
  1. Column headers are as in Tables 2 and 4.

0·000·010·9980·9980·9980·9911·001−0·0030·050−0·0030·053−0·0040·047−0·0030·055
0·100·9980·9980·9980·9931·002−0·0010·055−2·02 × 10−40·048−0·0020·052−9·87 × 10−40·042
1·000·9980·9980·9981·0061·013−2·89 × 10−40·060−3·17 × 10−40·052−6·59 × 10−50·062−1·67 × 10−40·050
0·750·010·9980·9980·9981·0011·0008·53 × 10−40·0615·36 × 10−40·0609·11 × 10−40·0472·05 × 10−40·042
0·100·9980·9980·9981·0040·9970·0020·0471·44 × 10−40·0516·43 × 10−50·041−0·0020·045
1·000·9980·9980·9981·0110·9906·91 × 10−40·038−0·0010·0383·85 × 10−40·050−0·0020·044
0·900·010·9980·9980·9981·0011·0001·30 × 10−40·0572·69 × 10−40·0568·26 × 10−40·0520·0010·054
0·100·9980·9980·9980·9960·9980·0030·0522·75 × 10−40·0460·0020·0457·02 × 10−40·048
1·000·9980·9980·9980·9800·9840·0020·0570·0020·0556·03 × 10−40·0542·00 × 10−40·048

Finally, when I generated data with no phylogenetic signal for X and no phylogenetic signal for the residual variability in Y, OLS outperformed PGLS, as expected. Here, the OLS estimator was better on average 79·9% of the time (Table 7; Fig. 2). I also found that the type I error rate when β1 = 0.00 was elevated for PGLS relative to its nominal level (Table 7), unlike the situation in Table 1. All measures of phylogenetic signal indicated low signal, and furthermore all contrasts-based diagnostics indicated inadequate standardization of X and Y (Table 8).

Table 7.   Parameter estimation, type I error and power for OLS, PGLS and PGLS for data generated with no phylogenetic signal in X nor in the model residuals
β1inline imageinline imageinline imageError/powerinline imageinline imageError/powerOLS : PGLSinline imageinline imageError/power
  1. Column headers are as in Tables 1, 3 and 5.

0·000·01−8·37 × 10−41·03 × 10−40·043−1·11 × 10−31·71 × 10−30·5030·792 : 0·208−8·18 × 10−41·03 × 10−40·041
0·10−8·54 × 10−41·05 × 10−30·0554·54 × 10−31·56 × 10−20·5190·799 : 0·201−7·69 × 10−41·06 × 10−30·054
1·00−1·26 × 10−31·02 × 10−20·046−9·34 × 10−31·65 × 10−10·5200·797 : 0·203−1·25 × 10−31·02 × 10−20·045
0·750·010·7509·90 × 10−51·0000·7491·71 × 10−31·0000·812 : 0·1880·7509·95 × 10−51·000
0·100·7511·07 × 10−31·0000·7581·94 × 10−20·9990·799 : 0·2010·7511·07 × 10−31·000
1·000·7531·08 × 10−21·0000·7442·23 × 10−10·9520·793 : 0·2070·7531·08 × 10−21·000
0·900·010·9001·03 × 10−41·0000·9012·47 × 10−31·0000·782 : 0·2180·9001·03 × 10−41·000
0·100·9011·06 × 10−31·0000·8961·83 × 10−20·9990·809 : 0·1910·9011·07 × 10−31·000
1·000·9021·06 × 10−21·0000·9142·37 × 10−10·9660·807 : 0·1930·9021·07 × 10−21·000
Table 8.   Phylogenetic diagnostics for X, Y and the bivariate regression model. Here, neither X nor the model residuals have been simulated with phylogenetic signal.
β1inline imageinline imageinline imageinline imageK(X)K(Y)r(X)Error/powerr(Y)Error/powerρ(X)Error/powerρ(Y)Error/power
  1. Column headers are as in Tables 2, 4 and 6.

0·000·010·0160·0140·0150·0660·066−0·5201·000−0·5211·000−0·5491·000−0·5481·000
0·100·0160·0140·0150·0670·068−0·5201·000−0·5231·000−0·5501·000−0·5491·000
1·000·0150·0160·0160·0680·068−0·5211·000−0·5191·000−0·5461·000−0·5451·000
0·750·010·0140·0150·0130·0670·067−0·5191·000−0·5191·000−0·5461·000−0·5461·000
0·100·0150·0140·0160·0670·067−0·5201·000−0·5191·000−0·5461·000−0·5471·000
1·000·0150·0130·0150·0660·066−0·5181·000−0·5191·000−0·5441·000−0·5450·999
0·900·010·0140·0140·0160·0660·067−0·5221·000−0·5221·000−0·5481·000−0·5481·000
0·100·0130·0150·0130·0680·068−0·5211·000−0·5211·000−0·5470·999−0·5481·000
1·000·0130·0160·0150·0660·066−0·5201·000−0·5171·000−0·5491·000−0·5461·000

Notably, PGLS, in which phylogenetic signal is estimated simultaneously with our regression model, effectively recovers the performance of the best model (either OLS or PGLS) under all of the simulation conditions of the study. This is evidenced by the very low estimation variance of β1 in the PGLS model, regardless of the generating conditions (Tables 1, 3, 5, 7). Because fitting the PGLS models requires the estimation of one additional parameter relative to OLS or PGLS, the accuracy of PGLS is slightly decreased relative to OLS, when the assumptions of OLS hold, or PGLS, when the assumptions of PGLS hold. However, the advantage of PGLS is that it very nearly recovers the performance of the best model when no particular level of phylogenetic signal in the residual error can be safely assumed a priori (as will most often be the case for empirical studies).

Discussion

Over the past 20 years the phylogenetic regression has become among the most commonly applied methods in comparative biology (Felsenstein 1985; Grafen 1989). However, its assumptions are still not widely understood (Rohlf 2006). To examine how and when it should be applied I will review, in turn, each of the simulated scenarios for this study.

Phylogenetic signal in the independent variable

When the generating model was one in which I simulated phylogenetic signal in the independent variable, but uncorrelated residual error in Y, a phylogenetic regression is not necessary. In this case, the assumption of independent errors holds and OLS is a perfectly appropriate method for fitting the regression model.

Consistent with this assertion, I found that OLS overwhelmingly provided a better regression slope estimate (84·2% of the time, averaged across simulation conditions; Fig. 2) than PGLS. Furthermore, the variance among simulations in the regression estimator was much higher for PGLS than OLS (on average 32·9 times higher; Table 1). However, I also found that type I error was not elevated for PGLS when the generating regression slope was β1 = 0·00 (Table 1). This interesting result will be discussed further below. Performance of the phylogenetic regression was fully recovered by simultaneously estimating the λ parameter of Pagel (1999; Revell 2009), as discussed in the ‘Materials and methods’. In general, the fitted value of λ for the regression model was very low (close to zero; Table 2) in this case, which makes PGLS nearly equivalent to OLS thus explaining its good statistical performance here.

No phylogenetic signal in the independent variable

When I used a generating model for simulation with no phylogenetic signal in the independent variable, but phylogenetic signal in the residual error for the dependent variable, a phylogenetic regression is appropriate. Here, the standard OLS assumption of independent errors is not true, and thus OLS is not an appropriate method to fit our bivariate regression model and PGLS should be used. As such, I was not surprised to find that PGLS regression yielded a better estimate of the generating regression slope for the vast majority (84·5%; Fig. 2) of simulated data sets across all simulation conditions.

In these simulations, however, I also found that oftentimes phylogenetic signal for X and Y was low by both of our chosen metrics (the λ statistic of Pagel 1999; and the K statistic of Blomberg, Garland & Ives 2003; Table 4). Furthermore, for all generating conditions except β1 = 0·00, Pearson and Spearman correlation-based diagnostic analysis of the independent contrasts suggested that contrasts had been inadequately standardized. Thus, for data generated under these conditions, standard diagnostics computed for the dependent and independent variables separately might be interpreted to suggest that standard PGLS or contrasts are inappropriate, even though they are called for in this case. As before, type I error of OLS was not inflated over its nominal level, a result which I will discuss at greater length below.

Phylogenetic signal in the independent variable, and the residuals

This is the traditional Brownian motion model for character evolution in X and Y. In this case, it should be no surprise that PGLS outperforms OLS, producing a parameter estimate for the regression slope that is closer to its generating value in 77·8% of simulations, averaged across conditions (Table 5; Fig. 2). I also found that type I error was considerably inflated if OLS was used, which is also consistent with earlier studies (e.g. Rohlf 2006; Revell 2009). Under these simulation conditions, all phylogenetic diagnostics (phylogenetic signal, Pearson and Spearman correlation diagnostics on the contrasts) suggest that the Brownian motion is an appropriate model for evolution and that the phylogenetic regression is indicated.

Non-phylogenetic independent variable and error

When I simulated data for X and residual error in Y that were entirely non-phylogenetic, I found that OLS was vastly superior to PGLS in both estimation and hypothesis testing, with PGLS showing severely inflated type I error for β1 = 0·00 (Table 7). Unsurprisingly, estimated phylogenetic signal was invariably low and significant negative Pearson and Spearman rank correlations between the absolute values of the independent contrasts and the square roots of their expected variances suggested inadequate standardization in the contrasts procedure (Table 8).

Diagnostics and the phylogenetic regression

I found mixed results with regard to diagnostics and the phylogenetic regression. Certainly, as in Table 6, when all diagnostics indicated high phylogenetic signal for the independent and dependent variables considered separately, as well as no relation between the absolute value of independent contrasts and their expected variances, I found that the phylogenetic regression as traditionally applied performed extremely well. However, I also identified conditions in which phylogenetic signal for X and/or Y was high, and in which correlation-based diagnostics generally indicated that the phylogenetic regression was appropriate – and yet residual error was uncorrelated among species, and thus OLS (not PGLS) was called for (e.g. Table 2). In addition, in some cases, I found conditions in which phylogenetic signal was low, correlation-based contrasts analysis indicated inadequate standardization, and yet the phylogenetic regression was fully appropriate (e.g. Table 4).

These results suggest that the tenet of this article, that assessing phylogenetic signal in the original variables is generally insufficient to diagnose whether the phylogenetic regression is appropriate, is correct. Instead, researchers should consider simultaneously fitting a model for phylogenetic signal in the residual error along with their regression model, as I have done using the parameter λ of Pagel (1999), above.

Simultaneous estimation of λ

It is possible to optimize the error structure of the residuals simultaneously with fitting the regression model via least squares (e.g. Grafen 1989). As noted in the preceding paragraphs, this approach seems preferable in general, and in this study, I found that the performance of the best model (OLS or PGLS, depending on the simulation conditions) could be fully or nearly fully recovered through simultaneous optimization of the λ parameter of Pagel (1999) (Tables 1, 3, 5, 7). However, this method has its own limitations. In particular, there are many ways in which the true error structure of our residuals could differ from the hypothesized error structure given by the tree (e.g. Fig. 1a) other than that described by λ (e.g. Blomberg, Garland & Ives 2003; Hansen, Pienaar & Orzack 2008; Lavin et al. 2008). To the end of obtaining an even better fit, Pagel (1999) proposed several other parameters which can be simultaneously optimized. Furthermore, Garland, Harvey & Ives (1992) and others have shown that many analogous transformations can be achieved by manipulating the branch lengths of the tree in various ways.

Type I error of the ‘wrong’ regression estimator

For some of the simulation conditions of this study, I found that the incorrect regression estimator still yielded appropriate type I error rates for the generating condition of β1 = 0·00. This result is somewhat perplexing. In general, I found that the incorrect estimation procedures overwhelmingly lead to estimated regression coefficients that were farther from the generating values than estimates obtained using the correct procedure (e.g. Tables 1, 3, 5, 7; Fig. 2). Thus, one might naively suspect that for a generating slope of β1 = 0·00, these coefficients might also more often be found to be significantly different than 0·00. This is not true because the estimated standard error for the incorrect model in these cases (in particular, phylogenetic signal in X, but not the model residuals; and no phylogenetic signal in X, but in the residuals) increases in direct proportion to the square root of the variance in the estimated regression slopes among simulations (Table 9). In fact, this is generally a property of regression tests that yielded appropriate type I error; but not for tests that produced error inflated over its nominal level (Table 9). This result is also somewhat encouraging, because it suggests that although type I error for the phylogenetic regression can be elevated under some simulation conditions (e.g. Table 7), these are not conditions in which diagnostic tests on the original variables or independent contrasts would generally suggest that the phylogenetic regression was appropriate.

Table 9.   The ratio of the square root of the variance of estimates of β1 across simulation conditions by both OLS and PGLS; and the mean ratio of the estimated standard errors for β1 by OLS and PGLS. If our estimate of the standard error captured the uncertainty in inline image, then the two ratios should scale proportionally (as they do for the first two simulation conditions). When the data were generated with phylogenetic signal in X and the model residuals, our estimated standard errors for β1 by OLS was too low; conversely, when the data were generated without signal, our estimated standard errors for β1 by PGLS were too low. This is why type I errors are inflated when the incorrect estimation procedure is used in each case. Standard deviations of the mean ratios among simulation conditions are given in parentheses after each entry
Simulation modelinline imageinline image
Phylogenetic X; non-phylogenetic ε0·178 (0·023)0·224 (0·005)
Non-phylogenetic X; phylogenetic ε3·768 (0·144)3·882 (0·033)
Phylogenetic X and ε2·631 (0·105)1·000 (0·005)
Non-phylogenetic X and ε0·234 (0·018)1·001 (0·011)

General recommendations

The general recommendations that can be derived from this study are straightforward. Firstly we cannot diagnose whether a phylogenetic regression is appropriate based on univariate measures of phylogenetic signal calculated on the individual variables in our analysis. Although such measures might be interesting for other reasons (e.g. Freckleton, Harvey & Pagel 2002; Blomberg, Garland & Ives 2003; but see Revell, Harmon & Collar 2008), they are not useful in assessing whether or not a phylogenetic regression is appropriate. Secondly, the suitability of a phylogenetic regression should actually be diagnosed by estimating phylogenetic signal in the residual deviations of Y given our predictors (X1, X2, etc.). However, thirdly, an alternative approach, which I recommend, is the simultaneous estimation of phylogenetic signal and the regression model. One example (Pagel’s λ) is provided herein; although many other potentially suitable transformations are also available (e.g. Garland, Harvey & Ives 1992; Pagel 1999; Blomberg, Garland & Ives 2003; Hansen, Pienaar & Orzack 2008; Lavin et al. 2008).

Conclusions

Ordinary least squares regression assumes that the residual error in our regression model is independent among observations. Commonly, this will be an incorrect assumption for various types of data, particularly for data from species related by a phylogenetic tree (Felsenstein 1985; Grafen 1989; Harvey & Pagel 1991). The phylogenetic regression (here, PGLS, but – equivalently – regression through the origin of independent contrasts; Felsenstein 1985; Garland & Ives 2000; Rohlf 2001) can be used for data from species in which the residual error is distributed with covariances between samples that are proportional to the amount of shared branch length from the root node of the tree to the common ancestor of each pair of species in the sample (Fig. 1b; Rohlf 2001). However, as a means of diagnosing a priori whether or not a phylogenetic regression is appropriate, it has become common practice to either: compute independent contrasts-based diagnostics, such as the correlation between the absolute values of standardized contrasts and the square roots of their expected variances; or to estimate phylogenetic signal in the independent and dependent variables of our model. In this study, I have shown that these measures can sometimes be inadequate, and even misleading, regarding whether a phylogenetic regression is called for. However, I have also shown that under conditions in which phylogenetic signal in the independent variable is high, but the phylogenetic regression is inappropriate (or vice versa), type I error is not inflated over its nominal level, even though the accuracy of parameter estimation is substantially decreased. Finally, I showed that simultaneously optimizing the error structure of our generalized least squares model along with the parameters of the model can be a useful approach when the suitability of our data for phylogenetic regression is not known.

Acknowledgements

L. Harmon, M. Lajeunesse, J. Losos and two anonymous reviewers kindly provided comments on this article. The National Evolutionary Synthesis Center (NSF EF-0423641) supports the author in his current position.

Ancillary