Phylogenetic tests of ecological and evolutionary hypotheses: checking for phylogenetic independence



1. Phylogenetic methods that account for the degree of relationship between species are increasingly commonly used for cross-species comparisons of ecological data. In particular ‘phylogenetic contrasts’ are commonly used to generate data for analysis that are phylogenetically independent. The efficacy of this technique for removing phylogenetic correlations is rarely tested, however.

2. For a number of reasons, including non-Brownian modes of evolution, phylogenetic contrasts may not always be phylogenetically independent. This lack of independence defeats the object of phylogenetic analysis and effectively invalidates the results of such analyses. Typically such problems may be overcome using simple data transformations but it is necessary to identify the correct transformation for each analysis variable.

3. Examples are presented for which contrasts have failed to control for phylogenetic correlation. This was because the correct transformation of the data was not identified prior to analysis. It is highlighted that the very act of transformation of data increases our understanding of the ecological variables being studied because the choice of transformation depends on how the character in question evolved.


The advent of phylogenetically based comparative analyses has revolutionized the study of ecology and evolution ( Harvey & Pagel 1991; Harvey 1996; Silvertown & Dodd 1997). Such analyses explicitly recognize that species share many characteristics as a consequence of their common ancestry. By using phylogenetic information such similarity can be accounted for and corrected to produce data that are phylogenetically independent. These techniques have allowed tests, and in some cases, refutations, of important ecological processes such as the relationship between seed size and establishment ( Kelly & Purvis 1993), trade-offs amongst dispersal and other ecological traits ( Rees 1993, 1997), and the comparative ecology of native and alien plants ( Crawley et al. 1997 ). Whilst there has been some debate concerning the relevance of the phylogenetic approach ( Westoby, Leishman & Lord 1995a,b; Harvey, Read & Nee 1995a,b), it is now widely employed for cross-species testing of ecological hypotheses.

The phylogenetic comparative technique that is most commonly applied in ecology employs species contrasts ( Felsenstein 1985a; Harvey & Pagel 1991). That is, if we compare a number of species or clades, then rather than enter the character states for the groups directly into an analysis, the differences between pairs of sister groups are used. These differences are termed contrasts or, because this differencing should remove the component of the character state that results from common ancestry, Phylogenetically Independent Contrasts (PICs). The principle of the analysis has been likened to the role of blocking in experimental design (e.g. Rees 1995). The technique is most similar, though, to differencing as a form of control for autocorrelations in spatial or time series analyses (e.g. Haining 1990; Chatfield 1996). Indeed, phylogenetic correlation could be termed phylogenetic autocorrelation. To account for autocorrelation in spatial or time-series analysis a difference between neighbouring (in time or space) points is calculated; this removes a component of the variable value that is ‘inherited’ from previous points and can ensure that neighbouring data points tend to be uncorrelated, i.e. the autocorrelation is removed. The ultimate aim of the analysis therefore is to generate a new variable that is basically independent of the original variable. That is, groups that have large (small) means are no more or less likely to produce large (small) contrasts than groups with small (large) means.

A number of assumptions underlie this technique (e.g. Garland et al. 1992 ; Martins & Hansen 1996) and these need to hold for contrasts to be phylogenetically independent. In particular the method of contrasts could fail, particularly if the mode of character evolution is not Brownian (e.g. Felsenstein 1985a,b, 1988; Garland et al. 1992 ; Purvis & Rambaut 1995). Multiplicative modes of character evolution or correlations of several traits through life-history invariants (sensuCharnov 1993) or allometric relationships, for example, would tend to push patterns of cross-taxon character variation away from the Brownian model. The proximate practical consequence of this is that transformation of data may be required to normalize the data in some sense and hence ensure phylogenetic independence. Whilst transformation of data prior to analysis is common and recommended ( Purvis & Rambaut 1995), this stage of the analysis is rarely explicitly described. In this paper it is stressed that phylogenetic correction cannot be assumed, but rather should be tested for, and it is demonstrated how this can lead to problems in analysing cross-species data. In particular it is argued that appropriate transformations of data should be identified and applied.

Phylogenetically non-independent contrasts

The primary aim of phylogenetic correction is to generate a variable for analysis that is, on average, independent of the original character values (i.e. the inherited or phylogenetic component is removed) as outlined above. There is, however, no a priori reason to assume that contrasts are phylogenetically independent. For some forms of character variation across a taxon, contrasts will not be phylogenetically independent. To illustrate this, consider Fig. 1 which shows a phylogeny of eight species for which two characters A and B have been measured. To perform a phylogenetically corrected analysis of the relationship between the two characters we could simply calculate a difference between each species pair at the terminal nodes. This would generate four contrasts for each of the two characters. As shown in Fig. 2a,c, however, simply calculating contrasts based on the raw data in Fig. 1 would not phylogenetically correct the analysis. For both characters the value of the contrast would correlate strongly with the value of the mean character state of the two species at each node. The analysis would have failed to remove the phylogenetic component of character variation, i.e. the biggest contrast values would come from the nodes with the largest character values.

Figure 1.

Hypothetical phylogeny of eight species and the states of two characters A and B.

Figure 2.

Checking for phylogenetic independence in the hypothetical data presented in Fig. 1: (a) relationship between the terminal node mean of character A and phylogenetic contrast value, based on untransformed data, compared with (b) the relationship based on the logarithmically transformed data; (c) relationship between the terminal node mean of character B and phylogenetic contrast value, based on untransformed data, compared with (d) the relationship based on the logarithmically transformed data.

The resolution to this problem is simple. Transformation of the data is required in order to ensure that the contrasts are phylogenetically independent. For character A, a simple logarithmic transformation of the original data ensures that the contrast value is independent of the mean at the node ( Fig. 2b). The pattern of variation in character A (a simple geometric series) is typical of the variation in many types of ecological variables, for example, body size. Whilst this need for transformation of variables has been previously recognized (e.g. Garland et al. 1992 ; Purvis & Rambaut 1995), explicit tests of the phylogenetic independence of contrasts following transformation are rare. To demonstrate the problems that this may cause, consider character B in Fig. 1. Character B does not vary according to a logarithmic series, but rather according to a power law (2/3 power). Consequently the logarithmic transformation fails to ensure that contrasts are phylogenetically independent ( Fig. 2d). The rank correlation of contrast values on node means is perfect because an adequate transformation of the data has not been identified. Without careful checking of the assumption of independence of contrasts therefore the analysis would not have been strictly speaking phylogenetically corrected.

Whilst the need for testing for phylogenetic independence of contrasts has been previously identified, because tests for phylogenetic independence are rare it is uncertain whether the problem of non-independence of contrasts generally exists.

Testing for phylogenetic independence

This section aims to demonstrate how the conclusions drawn from comparative analyses can be influenced by analysing whether data have been phylogenetically corrected. The first data set which is analysed is a small data set which is typical of many that are analysed in ecological studies and for which the original data values were presented. The data are on seed mass and variance in seed dimensions for 101 species of native Australian plants ( Leishman & Westoby 1998). The aim of the original analysis was to determine whether species with highly dormant seeds tended to be smaller and more compact than those with less persistent seeds. To test this hypothesis contrasts were calculated based on seed size and size variability between pairs of taxa (generally genera or families) differing in the degree to which their seeds persisted in the soil. In order to demonstrate our point we therefore use the same pairs of contrasts as they used. Note that the analysis below is not testing the same hypothesis as the original analysis which is unaffected by the choice of transformation.

The seed-size variability data did not require a transformation as there was no significant correlation between the mean seed size and contrast for species pairs (Spearman’s rho = 0·05; P = 0·88). However, an analysis based on the untransformed seed-mass contrasts is not phylogenetically independent ( Fig. 3a). There is a strong and highly significant relationship between the absolute value of the contrasts on seed size and the mean seed sizes for pairs of contrasted taxa (Spearman’s rho = 0·90; P < 0·01). It is only when the data are logarithmically transformed that phylogenetic independence is achieved ( Fig. 3b; Spearman’s rho = 0·48; P = NS). In this case the logarithmic transformation is adequate to ensure phylogenetic independence.

Figure 3.

Analysis of data presented by Leishman & Westoby (1998) on 12 pairs of taxa: (a) relationship between the terminal node mean of seed mass and phylogenetic contrast value, based on untransformed data, compared with (b) the relationship based on the logarithmically transformed data; (c) relationship between seed variability contrast and seed-mass contrast, compared with (d) the relationship when the seed-mass data are logarithmically transformed.

How does the transformation affect the interpretation of the data? Figure 3c shows the contrasts of seed-size variability plotted against contrasts of seed mass for the untransformed data. Following convention, we have scaled the contrasts on the x-axis to be positive, i.e. we analyse how the absolute degree of seed-size difference affects seed-size variability. There is a striking trend in the data, which is statistically equivocal. The trend is statistically significant according to one test (Kendall’s tau = −0·485, P = 0·028), but not according to another (Spearman’s rho = −0·517, P = 0·086). Following removal of the extreme outlier (in any case regression analysis would be invalid were this point included), linear regression indicates a highly significant relationship (r2 = 0·66, n = 11, P < 0·01). (It is worth noting that this point represents a small seed-mass contrast value: had the seed-mass values for either group been only slightly different, therefore, the seed-mass contrast could have had a different sign and thus this point would have been reflected about the x-axis.)

Figure 3d shows the analysis based on the transformed data. In this case the analysis is unequivocal and indicates no pattern of association (linear regression, r2 = 0·021, n = 12, P = NS). Given that the assumption of phylogenetic independence has been verified ( Fig. 3b), we can be confident of this result. The pattern in Fig. 3a arose because of a weak but highly significant, association between seed-mass and seed-size variability in the original data (linear regression, r2 = 0·071, P < 0·05, n = 101).

Identifying the best transformation

The ultimate aim of the phylogenetic correction is to generate an analysis variable that is independent of the original data. The simplest way to check this, therefore, is to examine the correlation or covariance of the contrast with the original mean node values, as above. In the above case the criterion for determining the adequacy of the transformation was simply the statistical significance of the rank correlation coefficient. The rationale for using this measure rather than the parametric product-moment correlation coefficient is that the latter would not identify the existence of perfect rank correlations such as in Fig. 2d.

Whilst this approach can be viewed as being adequate for the purposes of generating phylogenetically independent contrasts, it does not necessarily identify the best transformation of the data. This point is illustrated in Fig. 4a,b, where power transformations have been applied to the hypothetical data from Fig. 1. These graphs show the covariance between the untransformed mean and contrast for each species pair plotted as a function of the exponent used to transform the original data. In the case of character A, which varies as a geometric series, the covariance increases as the value of the exponent increases. There is no minimum on the plot (other than the trivial case of a zero valued exponent), hence there is no best power transformation; instead the logarithmic transformation provides the best transformation. In the case of character B, there is a clear minimum (zero) covariance achieved at an exponent of 1·5, i.e. the reciprocal of the 2/3 transformation used to generate the data. Whilst, if the statistical significance of the correlation coefficient were used to judge the acceptability of a given transformation a range of transformations may appear acceptable, in this case a ‘best’ transformation can be identified and applied.

Figure 4.

Identifying the best transformation to ensure phylogenetic independence. The graphs show the covariance between contrast values and original mean node values as the exponent of a power transformation is varied: (a) hypothetical character A from Fig. 1; in this case the logarithmic transformation only ensures phylogenetic independence; (b) character B from Fig. 1. A 3/2 power transformation is required for this character; (c) seedling size. A logarithmic rather than power transformation is the optimum transformation; (d) RGR. A 0·583 power transformation gives complete phylogenetic independence. Data for (c) and (d) are taken from Armstrong & Westoby (1993).

How does this relate to observed patterns of character variation? Figure 4c,d shows how transformation affects the covariance between the node mean and contrast for two characters, seedling size ( Fig. 4c) and Relative Growth Rate ( Fig. 4d) measured across 22 pairs of species ( Armstrong & Westoby 1993). In the case of seedling size, either the logarithmic transformation or a very small exponent produces the best transformation. At very low exponents, the two transformations are effectively equivalent: at an exponent of 10−6 the rank correlation between contrasts calculated using the two transformations is perfect; for a square root transformation, in contrast, the rank correlation coefficient is 0·427 and just non-significant.

The data on RGR, however, require a power transformation ( Fig. 4d). A logarithmic transformation produces a covariance of −0·04, whereas a transformation to the power c. 0·583 produces a covariance of zero. Whilst the rank or parametric correlations of the logarithmically transformed mean and contrasts were not statistically significant, weak correlations would nevertheless be transmitted into the analysis if this transformation were used that could be removed by identifying the best transformation.


The examples presented above illustrate that it is essential to check the assumptions of phylogenetic analyses. Without checks on the assumptions we cannot safely assume that analyses based on phylogenetic contrasts are phylogenetically independent. The occurrence of significant correlation of contrast values with mean character states clearly defeats the object of phylogenetic analysis, and could lead to misleading results. This point has been made before but appears largely ignored. Furthermore, even when an adequate transformation has been identified, this need not represent the best transformation, in the sense that other transformations may generate contrast values for which the phylogenetic component has been completely removed. This distinction between adequate and optimum transformations will be important, for example, if analysing large numbers of characters that are strongly correlated prior to transformation. In this case the apparently weak correlations between node mean and contrast values that remain following transformation could generate spurious patterns of correlation in the analysis of the contrast values.

If variables are not transformed to ensure phylogenetic independence of contrasts then the results or conclusions of analyses based on such contrasts will simply mirror those based on un-corrected data. In the example given in Fig. 3c, for example, the apparent trend in the data simply mirrored a pattern of variation in the original data. Only when the correct transformation had been applied was the analysis phylogenetically correct in the sense that autocorrelations owing to phylogeny had been removed.

A criticism of transformations of data prior to analysis is that the interpretation or meaning of the data is modified (e.g. Martins & Hansen 1996). In a sense this is true: if we logarithmically transform our data, for example, differences become relative rather than absolute. There is a more positive view of transformation of data, however. The simple act of identifying the correct transformation should indicate the nature of cross-taxon character variation and improve our understanding of the evolution of these characters. In the case of character A, for example, the need for a logarithmic transformation could lead us either to conclude that evolution has proceeded in a multiplicative manner (i.e. is Brownian in the logarithmic domain) or an exponential function of a character that evolved according to a Brownian model. For character B, we could reasonably hypothesize that evolution has proceeded through covariation of the character state with another character, for example, through a life-history invariant (sensuCharnov 1993). The simple process of identifying the correct transformation to ensure phylogenetic independence of contrasts could therefore suggest further hypotheses or interpretations of patterns of character variation and evolution. Checking the assumptions of phylogenetic analyses can therefore not only ensure validity but also improve our understanding of the system under study.


I should like to thank Simon Jennings, Nick Dulvy and John Reynolds for discussion of this work and comments on the manuscript.

Received 16 March 1999; revised 23 September 1999;accepted 16 October 1999