## Introduction

Many important problems in ecology, evolution and behaviour cannot readily be addressed using experimental approaches. Thus, whilst the fully randomized experiment is, for many, the ideal approach for hypothesis testing, alternative methods frequently have to be adopted (Maynard Smith, 1978; Harvey *et al.*, 1983). To address questions about long-term processes, observational and comparative approaches have been developed and are frequently employed (Maynard Smith, 1978; Felsenstein, 1988; Harvey & Pagel, 1991).

The comparative approach is used to test evolutionary hypotheses on datasets collected across multiple species. Trait or ecological data are collected for a group of species, and then statistical analysis is used to seek patterns consistent with alternative hypotheses (Clutton-Brock & Harvey, 1984; Harvey & Pagel, 1991; Harvey & Purvis, 1991). This is potentially an extremely powerful approach as the data collected usually span groups that encompass long periods of evolutionary change in a wide range of environmental conditions. The patterns examined in comparative analysis thus encompass very broad evolutionary processes. Comparative analyses allow macroevolutionary patterns to be explored, looking at the broad outcome of evolutionary processes across species to be examined (e.g. Harvey *et al.*, 1996). This contrasts with experimental approaches that focus typically on within-species microevolutionary processes.

As has been well discussed in the literature, comparative analyses have to deal with issues of phylogenetic nonindependence (Clutton-Brock & Harvey, 1984; Felsenstein, 1985; Harvey & Pagel, 1991; Garland *et al.*, 1992). That is, within a multi-species dataset species are related to each other to differing degrees and the degree of relatedness between species is often reflected in the amount of trait similarity. This happens because closely related species share more evolutionary history and have had less time to diverge than more distantly related ones.

The basis for many, if not most, comparative analyses is the analysis of associations between traits using correlation or regression. In this type of analysis if phylogenetic nonindependence is not accounted for then statistical analyses may be compromised (e.g. Harvey & Pagel, 1991; Martins & Garland, 1991) and results could be misleading. The consequences of ignoring nonindependence are numerous. For example, in simple bivariate analyses the type I error rate of significance tests will be inflated (Martins & Garland, 1991) as the variances of the traits will be incorrectly estimated. Similarly in analyses of trait differences between groups that differ in discrete characters, the effective sample sizes will be incorrect (the ‘radiation principle’ of Grafen, 1989). Alternatively, without accounting for evolutionary history differences in evolutionary trajectories between groups cannot be accounted for and will confound analyses: for example, Garland *et al.* (1999) and McKechnie *et al.* (2006) show that the slope of the relationship between basal metabolic rate and body size in birds is incorrect unless the split between passerines and nonpasserines is controlled for. The bottom line is that it is dangerous to ignore phylogenetic structure in data and in the same way that it is risky to ignore autocorrelation in time series or spatial data. As noted below, diagnosing the extent to which phylogeny is important is actually relatively straightforward.

A suite of comparative tests have been developed to deal with issues of nonindependence, and of these the most commonly employed are the method of independent contrasts (Felsenstein, 1985) and the method of generalized least squares (GLS) (Martins & Hansesn, 1997;Pagel, 1997, 1999; Garland *et al.*, 1999). Although formulated in different ways, these two approaches are essentially the same (see below) and take an underlying Brownian model of trait evolution to model the expected variance and co-variance of traits amongst species (e.g. Felsenstein, 1985; Pagel, 1997, 1999).

Because these statistical methods make assumptions about the underlying model of trait evolution, these translate into assumptions and predictions about the way that data should be distributed across species. If these do not hold then the tests may be compromised in some way. Accordingly a series of diagnostics designed for interpreting the results of phylogenetic analysis have been developed (Garland *et al.*, 1992; Purvis & Rambaut, 1995; Freckleton, 2000). In addition to the assumptions about the evolutionary process, more familiar assumptions such as homscedasticity and the distribution of residuals also apply.

Recently, there has been an increasing realization that the way that statistics is practiced in ecology and evolutionary biology may need to be thought. Some key issues to have emerged include looking at effect sizes rather than relying on *P*-value (Hilborn & Mangel, 1997; Burnham & Anderson, 2002; Paradis, 2005); allowing for model uncertainty and not simply relying on parameter uncertainty for testing models (e.g. Burnham & Anderson, 2002); and including multiple forms of uncertainty into models (e.g. Clark, 2007).

Unfortunately in some respects, the application of comparative methods has failed to keep up with these developments. This is particularly true of analyses that use comparative analyses to measure simple correlations and associations between traits using phylogenetic counterparts to conventional nonphylogenetic statistical methods. In this paper, I review seven areas in which current practices often lag behind statistical developments in other areas of ecology and evolution (summarized in Table 1). I highlight that one problem is a barrier between the users of phylogenetic methods and the techniques themselves. I believe that one reason for this is that previously users of phylogenetic methods have had to rely on relatively inflexible proprietary software packages. However, it is increasingly possible to conduct phylogenetic analysis in flexible computing environments such as r (R-Development-Core-Team, 2008; e.g. reviewed in Paradis, 2006) which is beginning to break such barriers down, and has been the medium for implementing new tools for comparative analyses (see Table 1).

Problem | Consequence | Solution | Software | ||
---|---|---|---|---|---|

Current URLs for the software mentioned are given below. MasterBayes: http://cran.r-project.org/web/packages/MasterBayes/index.html caic: http://www.bio.ic.ac.uk/Evolve/software/caic/index.html laser: http://cran.r-project.org/web/packages/laser/index.html geiger: http://cran.r-project.org/web/packages/geiger/index.html
| |||||

1 | Putting undue faith in models with low R^{2} | Models with low explanatory power may be statistically significant. This is often a consequence of large sample sizes and in practice the effects of variables included in models are weak | The importance of weak predictors may be over-emphasized; R^{2} is not a reliable measure of fit or relative fit | Use effect sizes as well as significance tests. AIC is a better measure for comparing model fit. Low R^{2} is a diagnostic of lack of model fit | MasterBayes; gee function in ape; pglm function in caicr |

2 | Reporting both PI and PC analysis | PI and PC make very different assumptions about the distribution of data, and are best regarded as alternative models for the same data. As such they should not be treated equally | Models with alternative assumptions are treated equally; potentially conflicting results may be reported | Check residuals and data for phylogenetic dependence; use a correction if appropriate | BayesTraits; geiger; pdap; caicr |

3 | Not testing distributional assumptions | Phylogenetically corrected models make assumptions about the distribution of residuals that are the same as those made in nonphylogenetic analysis and are well known | Parameter estimates may be incorrect or biased. Reported P-values may be incorrect | Use conventional regression diagnostics – check for linearity, normality of residuals and homogeneity of variance (all adjusted for phylogeny) | gee function in ape; caic/caicr; MasterBayes |

4 | Data dredging | In analyses, comparing a large number of predictors, best fit models are selected by comparing a large number of alternative models, or by using significance tests on parameters to distinguish models | High probability of type I errors; degenerate sampling distributions for parameters. Selected model is often no better than many possible alternatives. Outcome is highly sensitive to collinearity | Clearly identify hypotheses to be tested and test those. Report all stages in the model selection process. Use the full model when appropriate; when selection is necessary use model averaging or a multi-model approach | gee function in ape; caicr; MasterBayes |

5 | Treating residuals as data | Residuals from regressions of the response on confounding variables are used to control for unwanted effects in multi-variable regressions | Results in biases, particularly when the predictors are collinear | Use multipredictor analyses rather than univariate methods; do not use residuals in model fitting | gee function in ape; caicr; MasterBayes |

6 | Ignoring alternative models | Methods such as contrasts and GLS assume that residuals are distributed according to the predictions of a Brownian model of trait evolution. This may not be the cases and other processes may be operating | The phylogenetic correction may not be fully effective. The effects of important processes such as stablizing selection, varying rates of evolution or other factors shaping trait variation may be missed | Consider alternative models, such as OU model, δ, λ or κ transformations of Pagel (1997, 1999), or models incorporating rate variations | BayesTraits, geiger, laser, ape, ouch |

7 | Ignoring quality control of data | Data from disparate sources vary in quality and may be erroneous. Data may be missing for significant numbers of species | Low quality data will compromise statistical power. Missing data can lead to biases in the outcome of analyses | Employ quality criteria for data inclusion. Analyse data to determine whether data are missing randomly with respect to other variables. Consider imputation methods | MasterBayes |