To evaluate rates of evolution, to establish tests of correlation between two traits, or to investigate to what degree the phylogeny of a species assemblage is predictive of a trait value so-called tests for phylogenetic signal are used. Being based on different approaches, these tests are generally thought to possess quite different statistical performances. In this article, we show that the Blomberg et al. K and K*, the Abouheif index, the Moran's I, and the Mantel correlation are all based on a cross-product statistic, and are thus all related to each other when they are associated to a permutation test of phylogenetic signal. What changes is only the way phylogenetic and trait similarities (or dissimilarities) among the tips of a phylogeny are computed. The definitions of the phylogenetic and trait-based (dis)similarities among tips thus determines the performance of the tests. We shortly discuss the biological and statistical consequences (in terms of power and type I error of the tests) of the observed relatedness among the statistics that allow tests for phylogenetic signal. Blomberg et al. K* statistic appears as one on the most efficient approaches to test for phylogenetic signal. When branch lengths are not available or not accurate, Abouheif's Cmean statistic is a powerful alternative to K*.
Phylogenetic signal is obtained when phylogenetically related species tend to have more similar trait values than more distantly related species. It is tested with different aims: (1) to find models and rates of evolution for explaining extant species’ traits (e.g., Blomberg et al. 2003); (2) to find which test approach should be used to compare two traits in cross-species analyses and whether phylogenetic information should be included in these tests (e.g., Abouheif 1999; but see Rohlf 2006 and Revell 2010); (3) to elucidate the processes that underpin patterns in phylogenetic diversity in ecological studies of communities, species interaction networks, and ecosystem services (e.g., Mouquet et al. 2012).
Variations in trait states may have various levels of association with the species phylogenetic history (Hansen and Martins 1996) and a number of different statistics are widely used to test for phylogenetic signal (see Revell et al. 2008). Some of these tests are flexible and might be model-dependent or model-free depending on how phylogenetic proximities/distances among species are defined. These include the Mantel (1967) test developed to compare, via a correlation, any kind of dissimilarity matrices (see also Mantel and Valand 1970), and the Moran (1948) test originally developed to detect spatial signal in environmental variables and introduced in a phylogenetic context by Gittleman and Kot (1990).
Abouheif (1999) proposed a model-free test of phylogenetic signal for a continuous character adapting a diagnostic test for serial independence originally developed by von Neumann et al. (1941) in a nonphylogenetic context. Recently, Pavoine et al. (2008) provided an exact analytic formulation of the Abouheif test showing that it turns out to be an application of the Moran test (1948), with a particular definition of the pairwise phylogenetic proximities between species. In contrast, Blomberg et al. (2003) proposed two statistics K and K* to compare the evolution of a trait to that expected under a Brownian motion model of trait evolution.
Being based on different approaches all these methods are thought to possess different statistical performances in terms of power and type I error (see, e.g., Harmon and Glor 2010; Hardy and Pavoine 2012). In this article, we will show that Blomberg et al. K and K*, the Abouheif index, the Moran's I and its generalizations, and the Mantel correlation are all based on a cross-product statistic, such that whenever the significance of the tests is evaluated via permutation procedures, the test procedures are identical to each other. What changes is the way the phylogenetic and trait similarities (or dissimilarities) are computed. Accordingly, the observed differences in the statistical performances among the tests are related to differences in how the similarity/dissimilarity matrices are constructed, rather than to the mathematical formulation of the tests. We thus compared the ways phylogenetic (dis)similarity and trait-based (dis)similarity are defined in these statistics of phylogenetic signal and evaluated the consequences on the performance of the associated tests of phylogenetic signal. We end with discussion and recommendation on which statistic could be usefully preferred in which circumstance.
Matrices of Phylogenetic (Dis)similarity
Many tests of phylogenetic signal require the definition of a matrix of phylogenetic similarity or dissimilarity. Hereafter, we will consider two complementary matrices of phylogenetic similarities among tips: A (Pavoine et al. 2008) and the Brownian covariance matrix C. In the next section, we will demonstrate that the values of C−1, the inverse of C used for instance in Blomberg et al. (2003) statistics of phylogenetic signal, can be considered as measures of phylogenetic differences among species.
The matrix A= (aij) was discovered by Pavoine et al. (2008) when providing an analytical solution to the test of phylogenetic signal developed by Abouheif (1999). For a tip i of a phylogenetic tree, aii is the inverse of the product of the number of branches descending from each ancestral node of the tip and, for a couple of tips (i,j), aij is the inverse of the product of the number of branches descending from the ancestral nodes unshared by tips i and j and that of their most common ancestor only (nodes located in the shortest path that connects the two tips). The values aii have been interpreted as measures of how isolated a tip is in the phylogenetic tree. A tip is isolated if it descends from lineages that embed few tips. Extreme isolation is obtained when the tip is the sole descendent from a branch directly connected to the root. One of the main characteristics of matrix A is that it avoids the cost associated with assuming that branch lengths and a model of evolutionary change are known and accurate (Abouheif 1999).
The matrix C= (cij) is connected with Brownian evolution. The diagonal value cii is defined as the sum of branch lengths between tip i and the root of the phylogenetic tree. The off-diagonal value cij are defined as the sum of branch lengths between the first common ancestor of tips i and j and the root of the tree (i.e., the height above the root of the most recent common ancestor of some pair of tips). C is the basis of the matrix of variance–covariance (where is the rate of evolution) among tips’ traits according to a Brownian motion model. Each diagonal value is the variance of the trait value at each tip according to a Brownian evolution from the root of the tree; and each off-diagonal value is a covariance between the trait values at two tips according to the same Brownian model. Examples of calculation of the matrix C of the Brownian motion model and of the phylogenetic proximity matrix A are given for a simple theoretical tree in Figure 1.
Special Measures of Phylogenetic Differences: Matrix C−1
Matrices A and C of phylogenetic similarities have been defined in the previous section. We show below that C−1 is a particular measure of phylogenetic differences among tips (we use the word “difference” here instead of “dissimilarity” because the term “dissimilarity” has usually been associated with non-negative matrices whereas these values of difference are allowed to be negative).
DEFINITION OF THE VOLUME OF A TREE
Matrix C can be associated with two graphical representations: a tree and a parallelepiped. Hereafter, we will use the expression “the volume of a tree” to designate the volume in n dimensions of the parallelepiped associated with a tree of n tips with the following definition. Let be the standard basis , where is associated with the first tip, is associated with the second tip, … and is associated with the last tip. Let us denote the columns of C. The coordinates of the vector in the standard basis are the variance of the kth tip on axis and the covariance between the kth tip and each of the other tips on the other axes. The n vectors c1, … , cn, define a parallelepiped as illustrated in Figure 2 (the set of points whose coordinates in the standard basis are in ). In the extreme situations where the matrix C has zero off-diagonal values, the associated tree is a star phylogeny with all branch lengths descending from the root node and the associated parallelepiped has right angles and a volume equal to the product of its edge lengths (i.e., the variances or diagonal values of C; Fig. 2a).
If some of the covariances (off-diagonal values of C) are positive, then the phylogenetic tree has bifurcating interior nodes between the tips and the root node and some of the angles in the parallelepiped are acute. In that case, if three tips are considered, the three-dimensional parallelepiped looks like a flattened cardboard box (examples are given in Fig. 2). The volume of a tree thus increases with the variances and decreases with the covariances among tips. For a given number of tips (n), the volume of a tree is thus a measure of phylogenetic diversity in and it can be measured by the absolute value of the determinant of matrix C (see File S1 for details).
C−1: MATRIX OF NEGATIVE PHYLOGENETIC DIFFERENCES
Consider a reference tree from which matrix C is calculated and C−1= (δij) is the inverse matrix of C. It can be shown (File S1) that the diagonal values of C−1 (δii) correspond to the ratio of the volume in n− 1 dimensions of the new, degraded trees obtained by removing one tip at a time in the reference tree to the volume of the reference tree. It can also be shown (File S1) that the off-diagonal value of C−1, δij, depends on a reduced tree obtained by dismantling the structure of the tree as follows: (1) first remove the path that connects the ith to the jth tip; this leads to several disconnected subtrees; (2) if none of the subtrees contain the root node of the main tree, δij= 0; otherwise, reconnect the subtrees by their root, and δij is equivalent to minus the ratio of the volume of this new reduced tree to the volume of the reference tree (Fig. 3).
The diagonal value of C−1 is thus positive and it is high if removing the ith tip hardly change the volume associated to the reference phylogenetic tree. As the influence of a tip depends on its variance and covariance with the other tips, this means that is high if the ith tip is confined in the phylogenetic tree, with lots of relatives as regards its distance to the root node. Values on the diagonal of C−1 are thus related to the concept of phylogenetic originality or distinctiveness of a tip (May 1990; Vane-Wright et al. 1991; Pavoine et al. 2005, 2008): they increase with decreasing phylogenetic distinctiveness.
The value of the off-diagonal entry of C−1 is high if removing the path that connects the ith to the jth tip breaks the tree into large subtrees. It measures thus some kind of phylogenetic difference between tips i and j, which is high if the two tips compared are far from each other and far from other tips. These values of difference have the particularity of being negative. A value close to zero thus means high phylogenetic difference and a strongly negative value means very low phylogenetic difference. A value of zero in C is conserved as a zero in C−1. Tips separated by the root node are thus considered unrelated by this approach both in C (where zero is the lowest possible phylogenetic similarity) and in C−1 (where zero is the highest possible phylogenetic difference).
A Comparison between Matrices A, C, and C−1
The main difference between matrices A, C, and C−1 is that matrices A and C measure phylogenetic similarity whereas matrix C−1 measures phylogenetic difference. However, a more subtle property is shared by matrices A and C−1; they contain relative (instead of absolute) measures of phylogenetic (dis)similarity. Usual matrices of pairwise phylogenetic proximities/differences between tips (e.g., patristic distances) have off-diagonal values that only depend on the two concerned tips: evolution on all branches is independent on evolution on all other branches. They are absolute phylogenetic proximities/differences (e.g., Gittleman and Kot 1990). In contrast, matrices A and C−1 contain relative measures of phylogenetic proximities and differences, respectively, that depend on the pool of taxa considered.
In the previous section, we have shown that C−1 measures pairwise phylogenetic differences among tips influenced by how confined the pairs of tips are (off-diagonal) and also by pure phylogenetic confinement of individual tips (diagonal). Confinement means presence in nested species-rich clades (in opposition to originality that is associated with species-poor clades). This should be connected to the matrix A. The diagonal elements of A measure the phylogenetic originality of the tips and the off-diagonal elements of A measure pairwise phylogenetic proximities influenced by how confined the pairs of tips are (Fig. 1). Contrary to classical matrices of phylogenetic distance or proximities (such as patristic distances), values in C−1 and in A thus depend on the shape of the phylogenetic tree, and not only on the path that connects the pair of tips.
Unification: Widely Used Tests of Phylogenetic Signal are Based on the Cross-Product Statistic
The general form of a cross-product statistic is given by (e.g., Getis 1991):
where wij and yij are the elements of two pairwise dissimilarity or similarity matrices W= (wij), Y= (yij) for objects i and j (i, j= 1, 2, … , n), and c is a constant that is invariant by permutation. We will consider hereafter that wij represents some measure of phylogenetic (dis)similarity between tips i and j of a phylogenetic tree, whereas yij is a measure of (dis)similarity between trait values at tips i and j. The cross-product statistic is flexible as wij and yij can be freely defined. We have suggested above two potential matrices of phylogenetic similarities (A and C) and a matrix of phylogenetic differences (C−1).
Because the elements of a (dis)similarity matrix are not independent, the significance of a cross-product statistic (i.e., the association between W and Y) is usually tested by randomly permuting the order of the elements within one matrix (rows and columns are permuted in tandem) keeping the other matrix unchanged (Rosenberg and Anderson 2011). P-values are then computed as the proportion of permutation-derived values that are as extreme or more extreme than the actual Γ value.
We demonstrate in Table 1 that several widely used statistics of phylogenetic signal are applications of the cross-product statistic. As a consequence, the differences between the permutation tests using these statistics are due to the choice of the matrices W and Y. We will show in the next section that this choice is critical to the performance of the test in terms of power and type I error. We illustrate below that the different cross-product statistics presented in Table 1 have had very different justifications when they were first developed and have different levels of flexibility in the definitions of W and Y.
Table 1. A comparison of the generalized Moran statistics, Bomberg et al. K, K*, and the new KW. xi is the trait value for tip i, , is the phylogenetic weight for tip i (see main text), R is the diagonal matrix with values ri for all i on the diagonal, 1n is the unit vector of length n, is the rate of evolution in the Brownian model.
The first statistic related to the cross-product is Mantel cross-product (1967, p. 213), where, in a phylogenetic context, Y is a matrix of pairwise dissimilarity between trait values at tips and W is a corresponding matrix of phylogenetic dissimilarity. Mantel test was developed to analyze the correlation between any two matrices of dissimilarities (first in a context of spatial and temporal aggregation in disease expansions, Mantel 1967). The main advantage of this index is its flexibility because the definition of how to compute the trait- and phylogeny-based dissimilarities among tips is left completely free to convenience of the user of the test. Note that using dissimilarities among species in Mantel test implies that a tip will never be compared to itself (the dissimilarity between a species and itself is zero: wii= 0 and yii= 0 for all i).
The statistics IR, IN, and IW in Table 1, are all rooted in the analysis of autocorrelation in time series and spatial data (Cliff and Ord 1973; Rohlf 2001). They are generalized versions of Moran's (1948)I autocorrelation index. They take the classic form of any autocorrelation coefficient: the numerator term is a measure of covariance among the trait values at tips and the denominator term is a measure of variance. The general formula is
Originally, the diagonal values of W= (wij) were set to zero. Here, we use a more general formula where they are allowed to be positive. The value xk is the trait value at tip k. In IR and IN, rk= 1/n for all k. The difference between IR and IN is that in IN, for all i (see Gittleman and Kot 1990 for an application of IN in a phylogenetic context). With this constraint, the numerator of IN can be seen as a covariance between the observed value of i (xi) and the average value of the other tips where each tip is weighted by how closely related (as measured by wij) it is from tip i (Cliff and Ord, 1981; see also File S3 where the equations of all statistics are detailed). The covariance is expected to be high if the value at tip i is close to the values at its most related tips. The index IW use different ri values: . This weighting grants a higher importance to tips having many closely related tips. It was suggested (Thioulouse et al. 1995) to unify several points of view on how autocorrelation (spatial autocorrelation in Thioulouse et al. paper, for us phylogenetic correlation) should be measured and analyzed including Moran's (1948) index, Geary's (1954) index, the local variance (Lebart 1969), the local principal component analysis (Le Foll 1982; see Thioulouse et al. 1995 for details). According to Pavoine et al. (2008, p. 83), Abouheif (1999)'s Cmean test of phylogenetic signal turns out to be equal to IR(A). Given that for A= (aij), , the weights defined in IW are simply , so that Cmean=IR(A) =IW (A) =IN (A).
The denominator of this ratio is a scaling factor that does not depend on the traits and that is thus unchanged by permutation. It is the expected value the numerator would have if the traits were distributed according to a Brownian model of evolution. K and K* were defined by Blomberg et al. (2003). Consider the model x=μ1n+ɛ, where x is the vector of observed trait values at the tips of a phylogeny, μ is a mean (scalar), 1n is the unit vector of length n, and ɛ the vector of residuals. In ordinary least squares (OLS) the elements of ɛ are assumed to be independent and identically distributed according to a normal distribution of mean zero and variance σ2. The mean square error of the model in OLS is . In generalized least squares (GLS), ɛ is assumed to follow a multivariate normal distribution of mean 0n (null vector of length n) and of covariance matrix σ2C (of size n × n). The mean square error of the model in GLS is , where . In K* (see Table 1 for the whole formula), MSE*/MSE is thus the ratio of the mean square error of the model in OLS to the mean square error of the model in GLS.
K is similar to K* except that is replaced with , with defined above. The justification for this replacement was that is the “phylogenetically correct mean” (Blomberg et al. 2003): the estimated value at the root node of the phylogenetic tree. Rohlf (2006) considered K* but not K and wrote for the component of K (Appendix in Rohlf 2006) that “it is somewhat inconsistent to use deviations from a GLS mean when computing a MSE representing the results of using OLS.” Even if K is more widely used in practice and practically available in statistical softwares (e.g., functions “Kcalc” and “phylosignal” in picante, Kembel et al. 2010, in R Development Core Team 2012), K* seems to be preferred in theoretical, statistical studies (e.g., Ives et al. 2007). On the contrary, Blomberg et al. (2003) wrote: “we only present results for K, which we feel has greater heuristic value.” Given that K does not have a strong theoretical justification, the evaluation of its performance in the permutation test, in comparison with K*, is critical to determine any recommendation about its future use.
Here, we would like to introduce the new statistic KW, which could reconcile Blomberg et al.'s (2003) advice of using only the phylogenetic correct mean in a statistic for phylogenetic signal with the general idea of having a strong theoretical basis for any statistic. KW (see Table 1 for the formula) is similar to K except that it replaces with ; with the notation C−1= (θij), . With this definition, it can be shown that KW=λ/IW(C−1) (File S3), where λ is a scalar (i.e., real value) invariant by permutation. A permutation test based on KW is thus equivalent to a permutation test based on IW, which is rooted on a strong statistical framework that unifies several points of view on the measures and analyses of autocorrelation as indicated above.
Mantel's statistic and Moran's generalized statistics thus belongs to the same statistical framework, the cross-product statistic, as Blomberg et al.'s statistics. Specifically, the observation that K*, K, and KW can be considered in the context of cross-product statistics and are closely related to Moran's generalized statistics, is based on our demonstration that C−1 can be interpreted as a matrix of phylogenetic differences. The characterization of matrix C−1 was thus critical to unify all these indices. The Mantel test compares phylogenetic dissimilarities with trait-based dissimilarities and Moran's tests, when applied to phylogenetic similarities, compare phylogenetic similarities with trait-based similarities. In contrast, Blomberg et al.'s K, K* and the new KW statistics all mix phylogenetic differences in and trait-based similarities. However, as functions of the inverse of a cross-product, K, K*, and KW also increase with phylogenetic signal. If applied to C−1, any generalized Moran statistic (IR, IN, IW) decreases with phylogenetic signal but can still be used to test for phylogenetic signal provided one considers that low values of IR, IN, IW in that case correspond to high phylogenetic signal. To evaluate the performance of these statistics in testing for phylogenetic signal, we provide in the next sections a comparison between the generalized Moran statistics, K, K*, and KW statistics and between the definitions of phylogenetic (dis)similarity in terms of power and type I error of tests for phylogenetic signal.
Type I Error and Power
Recommendations that increase the power of the Mantel's test, and a comparison between the power of the Mantel's and Blomberg et al.'s (2003) test can be found in Hardy and Pavoine (2012). We focus here only on the different generalizations of the Moran test (IR, IN, IW), and the statistics K and K*. The performances of KW are equivalent to those of IW(C−1) and the performances of Abouheif's test are equivalent to those of IW(A) (knowing that IW(A) =IR(A) =IN(A) see section Unification: Widely Used Tests of Phylogenetic Signal are Based on the Cross-Product Statistic). We have simulated data to evaluate type I error and power (1-type II error) associated with each coefficient of phylogenetic signal (IR, IN, IW, K, and K*) when permutation tests are used. The indices IR, IN, IW were applied with matrices A, C, C−1 first with the diagonal values given by their definitions and then by artificially setting zero on their diagonal to evaluate the role of diagonal values in power and type I error (see also File S4). We have shown in the previous sections that the indices IR, IN, IW, K, and K* are all part of the same statistical framework, the cross-product statistic. The differences between these indices concern mostly the way the trait-based similarities among tips are computed (column Y in Table 1), but also some restrictions on the way phylogenetic (dis)similarities are computed (column W in Table 1) such as the row normalization in IN. The restriction on W in K and K* is stronger as they were developed only to be used with matrix C−1 of phylogenetic differences among tips. We summarize in Table 2 the statistics of phylogenetic signal compared.
Table 2. Statistics used in the simulations. Note that when the diagonal values of A are not set to zero, Abouheif (1999), Cmean=IR(A) =IN(A) =IW(A), but when the diagonal values of A are set to zero, IR(A) ≠IN(A) ≠IW(A). In addition, when the diagonal values of C−1 are not set to zero, the statistical performance of KW and IW(C−1) are equal.
Statistics compared, which have an intrinsic definition of Y (see Table 1)
IR(A), IN(A), IW(A)
IR(C), IN(C), IW(C)
IR(C−1), IN(C−1), IW(C−1), K, K*
The calculus of the coefficients and the randomizations were done with functions “gearymoran” of package ade4 (Dray and Dufour 2007) of R and “Kcalc” of picante (Kembel et al. 2010) and with personal R scripts. Matrix A was computed with function proxTips of package adephylo of R (Jombart and Dray 2010), Matrix C with function “vcv” of the package ape (Paradis et al. 2004) and the inverse of C was obtained by function “ginv” of package MASS (Venables and Ripley 2002) and checked for congruence with function “solve” of the basis of R.
We simulated a series of trees as follows. We first simulated pure birth trees (with birth rate of 0.1 as in Harmon and Glor 2010) leading to relatively well-balanced trees (function “sim.bd.taxa” in the package Treesim, Stadler 2011, of R, R Development Core Team 2012). Next, we analyzed the effect of the strength of covariance among tips by transforming the previous trees first moving back most speciation events near the root (low expected covariance among tips) and then, inversely, moving forward most speciation events near the tips (low expected covariance among tips) using package geiger of R (function “deltaTree” with δ= 10 and 0.1, respectively, Harmon et al. 2009; Hardy and Pavoine 2012). We then obtained asymmetric trees by applying UPGMA on values randomly drawn from a log-normal distribution (Euclidean distance; mean and standard deviation of the distribution on the log scale equal 0 and 1, respectively). Finally, we also analyzed the power of the tests on nonultrametric trees (where the distance from tips to root is not a constant) by simulating trees where the topology is generated by splitting randomly the edges (function “rtree” of package ape of R, Paradis et al. 2004). We generated the branch lengths with a uniform distribution (bound between 0 and 1), next with a log-normal distribution (mean and standard deviation of the distribution on the log scale equal 0 and 1, respectively). Nonultrametric trees can represent unequal evolutionary rates in different parts of the phylogeny. Power analyses were based on 1000 trees per model and type I error on 10,000 trees to have a better precision on the deviation from the nominal α= 5% level. We simulated trees with 23= 8, 25= 32 and 27= 128 tips.
For power analyses, trait values were simulated per tree based on Brownian (BM) and Ornstein–Ulenbeck (OU) models with σ2= 1, θ= 0, and α= 2, 4, 6, 8, or 10 (scaling the maximum distance from a tip to the root of the tree equal to unity, see Pavoine et al. 2008 for details). We analyzed type I error by four models: (1) trait values drawn from a normal distribution with mean θ= 0 and variance σ2= 1; (2) trait values drawn from a log-normal distribution with mean θ= 0 and variance σ2= 1 on the log scale; (3) for n values simulated, n−1 were drawn from the normal distribution and an extreme value was added as max(n−1 values) + range(n−1 values) × 10; (3) for n values simulated, n−1 were drawn from the normal distribution and an extreme value was added as max(n−1 values) + range(n−1 values) × 100.
With the random normal and the log-normal distributions of trait values, all type I errors were correctly close to 5% (results of the type I and power analyses are detailed in File S5). The distribution of the 10,000 simulated P-values per model was always uniform (even) from 0 to 1. When the trees were ultrametric and relatively well balanced (i.e., here with the pure birth model), the distribution of P-values was still close to even from 0 to 1 when an extreme value was added to the trait dataset. When the trees were ultrametric but unbalanced and the number of tips was high (32 or 128) the type I error was inflated by an extreme trait value in which case the distribution of P-values was right-skewed with many low P-values and a few large P-values (typically when the diagonal values of the phylogenetic proximity/differences matrices were included) or of U-shape with high number of P-values close to 0 and 1 and low number of intermediate values (typically when the diagonal values of the phylogenetic proximity/differences matrices were not included). In contrast, when the number of tips was low (eight tips only), the type I error decreased below 5% leading to too conservative tests.
When the trees were nonultrametric and whatever the branch-length model, the type I error was correct for IR(C−1) with diagonal values and always near to correct (except with eight tips) for K*. It was inflated in all other cases except two: (1) with IR(C) with diagonal values the type I error was lower than 5% and the distribution of P-values was skewed to the left with 32 or 128 tips and correctly close to 5% with eight tips; (2) with IN(C−1) and the most extreme value we considered the type I error was lower than 5% and the distribution of P-values was skewed to the left.
Powers of tests increased with the number of tips in the phylogenetic trees as previously shown for instance by Pavoine et al. (2008) and Hardy and Pavoine (2012). We present the results of the power analyses for 128 tips in Figure 4. Results for eight and 32 tips can be found in File S5. Whatever the number of tips, we obtained the following main results (Fig. 4):
Result 1: Power is impacted by the shape of the phylogenetic tree, the matrix of phylogenetic proximity/differences and the way trait-based proximities are computed in the different statistics based on Moran (1948) and Blomberg et al. (2003) and summarized in Table 1.
Result 2: When A is used to describe phylogenetic proximities, considering the positive diagonal values of matrix A as defined in Pavoine et al. (2008) instead of arbitrarily setting them equal to zeros can increase the power of the generalized Moran tests.
Result 3: When C is used to describe phylogenetic proximities, the row normalization used in index IN decreases the power of the test for phylogenetic signal.
Result 4: When C−1 is used to describe phylogenetic differences, the differences in power among the generalized Moran tests are strongly dependent on whether the phylogenetic tree is ultrametric. With ultrametric trees, indices IR and IN used with zero values on the diagonal of C−1 decrease power; the index IW slightly decreases power in comparison with K, K*, and IR with positive diagonal values for C−1. With nonultrametric trees, K* and IR with positive diagonal values for C−1 have the highest powers.
Result 5: The highest powers are associated with coefficients that use C−1.
Result 6: The use of C to describe phylogenetic proximities generally decreases power in comparison with A and C−1.
Because Pavoine et al. (2008) suggested that contrasting values in phylogenetic (dis)similarity matrices could increase the power of the tests, we calculated the coefficient of variation (CV) of matrices A, C, and C−1. Detailed results are given in File S6. On average over all simulated trees, the CV increased from C (1.645 for the off-diagonal values and 0.145 for the diagonal values), through A (3.152 for the off-diagonal values and 1.637 for the diagonal values), to C−1 (12.304 for the off-diagonal values, in absolute value, and 1.342 for the diagonal values). The CV increased with the number of tips for A, C−1, and the off-diagonal values of C but not for the diagonal values of C.
In this article, we showed that all tests for phylogenetic signal we considered (Mantel, Moran, Abouheif, and Blomberg et al.) are connected to each other being related to a cross-product statistic. This means that the observed differences in their statistical performances cannot depend on the tests themselves; the only relevant change among the different approaches is the way the pairwise phylogenetic and trait (dis)similarities are computed. A posteriori, this seemingly counterintuitive result is not completely surprising. For instance, all tests for phylogenetic signal are aimed at comparing two datasets that are usually produced in two different formats: a phylogenetic tree and a vector (or a matrix) of traits. Accordingly, all these methods explicitly or implicitly convert trait values and phylogeny into pairwise (dis)similarities to calculate the test statistic shifting the attention from the mathematical formulation of the tests to how the (dis)similarity matrices are obtained. This observation has a number of valuable consequences.
Here we analyzed three matrices of phylogenetic (dis)similarities. We discovered C−1 as a matrix of phylogenetic differences among tips where the phylogenetic difference depend on volumes of subtrees degraded by the loss of one or two tips in a phylogenetic tree. This definition of C−1 allowed us to unify Mantel's and Moran's statistics with Blomberg et al. statistics and thus to compare their performance within this unique statistical framework of the cross-product. Power analyses for phylogenetic signal tests were performed in Pavoine et al. (2008) and Hardy and Pavoine (2012). In Pavoine et al. (2008) only IW was analyzed and it was shown that there may be a strong effect of the choice of the phylogenetic similarity matrix in Moran's permutation test. In Pavoine and Hardy (2012), the focus was put on Mantel test and K and it was shown that the power of the Mantel test might in certain circumstances exceed the power associated with K. Compared to these previous studies here we analyzed several generalizations of Moran's index, K, K*, and KW, which altogether encompassed a range of different ways of measuring phylogenetic (dis)similarities and trait-based (dis)similarities among the tips of a phylogenetic tree. We also analyzed the impact of the positive diagonal of matrices A, C, C−1 of phylogenetic (dis)similarities among tips on the performance of the tests.
Harmon and Glor (2010) stated that converting raw data into matrices of pairwise differences among species is an inefficient process that reduces power of tests. They also stated that, unlike the Mantel test, data are not converted into pairwise distances to calculate K. To the contrary, although this is not immediately evident, we have seen that data are actually converted into (negative) measures of phylogenetic differences between tips. In fact, Blomberg et al.'s K uses a particular phylogenetic matrix of differences among tips that are positive on the diagonal and negative elsewhere. In our simulations, this matrix C−1 of phylogenetic differences was often associated with the highest powers of test for phylogenetic signal. When the trees were ultrametric, K and IW(C−1) gave close but slightly lower powers than K* and IR(C−1). Pavoine et al. (2008) found that IW(A) had a higher power than IW(C) whatever the tree shape except for comb-like trees with which IW(C) had approximately the same but slightly higher power than IW(A). A similar result was obtained here when speciation events were driven toward the tips of the tree (see also Hardy and Pavoine 2012 for a similar result with Mantel test). However even in that case, our simulations showed that K* and IR(C−1) had higher power than all statistics applied to matrices A and C. When the trees were not ultrametric, K* and IR(C−1) clearly reached much higher powers and correct type I errors even in presence of outliers in trait values. IR(C−1) is very closely related to Mantel tests, which confirms that Mantel test power might be high with appropriate definitions of phylogenetic and trait differences among tips (Hardy and Pavoine 2012). K* differs from the scheme of the Mantel test by the use of both a phylogenetically weighted mean for the tips’ trait values and an unweighted mean. Despite these differences we obtained similar results in both type I and power analyses for K* and IR(C−1). More generally, in our simulations, the diagonal values of phylogenetic proximity/difference matrices tended to increase the power of the tests. Contrary to what is done in spatial analysis, we thus recommend their use in phylogenetic analysis. Finally, despite not based on branch lengths, matrix A provided surprisingly high powers as observed already by Pavoine et al. (2008). Matrix A with positive diagonal values satisfies IW(A) =IR(A) =IN(A). A with any generalized I index is thus particularly recommended in all situations when branch lengths are unavailable.
In addition to power analyses, we have provided a thorough analysis of type I error. The type I error of tests for phylogenetic signals has generally been studied with normal traits and the sensitivity to asymmetric distributions and to outliers was neglected. Asymmetric distributions of trait values might result from independent evolution along a phylogenetic tree where a tip is greatly separated to the other tips. However real data might also contain outliers or might have intrinsic asymmetric distributions even if this asymmetry is not the result of an evolution along a long branch in a phylogenetic tree. Our simulations show that most generalized Moran tests are impacted by outliers but not by asymmetric distributions (here log-normal distributions). Despite K* and IR(C−1) were robust to outliers with nonultrametric phylogenetic trees, they were associated with inflated type I error in presence of outliers as all other statistics with ultrametric phylogenetic trees. We thus advocate that transformations are used on data to limit the impact of outliers in all tests for phylogenetic signal. For instance, body weight, one of the most studied traits, can be transformed by cubic root and logarithm to reduce the effect of potential outliers.
Given the lack of theoretical justification for K and the lower power and less stable type I error compared to K*, we suggest abandoning this statistic K. Compared to KW, K* was found to be more powerful with correct type I error in our simulations and should thus be preferred when used in permutation tests at least in the situations covered by our simulations. Overall, based on our results, we recommend the use of K* or IR(C−1) (with positive diagonal values for C−1) to test for phylogenetic signal whatever the tree shape when branch lengths are available and accurate. Nevertheless, although branch lengths were assumed to be known in our simulations, in many real datasets they are expected to be estimated. Incorrect branch lengths estimate could decrease the power of K* and IR(C−1) to detect phylogenetic signal. To avoid the unknown cost associated with assuming that the branch lengths are known, Abouheif's Cmean statistic can be recommended as it provided nearly as high power as K* and IR(C−1) whatever the tree shape. It should be recalled that all statistics assume that the topology is known and accurate.
To explain differences in power of the statistics of phylogenetic signal when different matrices of phylogenetic (dis)similarities were used, Pavoine et al. (2008) suggested that the high power associated with A could be due to more contrasting values in A than in C. To demonstrate that, they calculated the average CV of the off-diagonal values of A and C, and obtained a higher CV with A. With our data, the CV of the three matrices of phylogenetic (dis)similarity increased from C, through A, to C−1. We also found that in the only situation where higher power is associated with C than with A (when most speciation events are moved back near the root of the phylogenetic tree), the CV of C is higher than that of A. Pavoine et al. (2008) suggestion is thus supported by our article. Pavoine et al. (2008) also suggested that the higher power associated with A could be due to the fact that A never considers that tips connected only at the root node of the tree are not related. However, this hypothesis is refuted by our present article as, despite C−1 considers that tips connected only at the root node of the tree are not related, it is associated to the highest power.
Future researches now need to take account, in all these coefficients of phylogenetic signal (Mantel r, generalized Moran IR, IN, and IW, Blomberg et al. K, K*, and the new KW), of (1) within-species variations or measurement errors; (2) groups of traits; (3) traits of different statistical types. Within-species variations and measurement errors have already been considered for K* (Ives et al. 2007). This consideration increased the power of the permutation test based on K* (Hardy and Pavoine 2012). In addition, when approaches that incorporate measurement errors were used, phylogenetic signal could be detected even with small phylogenies with low number of tips (Zheng et al. 2009). Adapting these developments for the other indices is needed to complete the evaluation of their performance. Regarding the use of groups of traits, tests of phylogenetic signal based on combined traits have been developed by Zheng et al. (2009). Jombart et al. (2010) used IN with several traits at the same time to describe lineage-dependent phylogenetic signals in combinations of traits. These developments are particularly needed in ecology as many ecological processes involve a combination of traits rather than a single trait. From a biological viewpoint, the species ecological behavior, for example, is expected to be driven by complex interactions among functional traits that are not fully independent from each other (see for instance Milla et al. 2009). Therefore, in some cases the researcher may be more interested in testing for phylogenetic signal in a combination of traits, rather than in a single trait only, and this is easily done if single trait differences between species are combined into a multivariate pairwise dissimilarity matrix (Pavoine et al. 2009, Ricotta and Moretti 2010). Regarding the statistical type of traits, while tests of phylogenetic signal are easily performed on quantitative variables or on ordinal variables transformed to ranks, it is unclear how to deal with nominal traits. The usual approach consists in coding the information in nominal variables by as many independent binary variables as the number of categories. Then, each binary variable is tested separately for phylogenetic signal. However, the categories of a nominal variable may often be nonexclusive. That is, a species may be characterized by the simultaneous presence of two or more character states. These nonexclusive states may be binary coded as is done for the exclusive states, or they may be fuzzy coded (Chevenet et al. 1994). In this case, each character state receives a positive score in the range of 0–1 that describes the affinity of a species for that state. For example, diet habits in animals or Grime's (1979) CSR (Competitor-Stress tolerator-Ruderal) strategies in plants are typically coded as fuzzy variables. Nonetheless, irrespective of how nonexclusive nominal variables are coded, a multivariate measure of dissimilarity or disagreement between pairs of species seems the most straightforward solution for summarizing interspecies functional differences (Podani and Schmera 2007). The cross-product can thus be adapted easily to deal with any type and number of traits.
We analyzed here a limited number of indices that can be derived from the cross-product statistic. Overall the power of the test is influenced by the choice of the phylogenetic and functional interspecies (dis)similarities wij and yij. Although the choice of an appropriate functional (dis)similarity depends on the number and type (i.e., quantitative, ordinal, nominal, etc.) of the selected traits (Legendre and Legendre 1998; Pavoine et al. 2009), the choice of an appropriate phylogenetic (dis)similarity matrix primarily depends on whether branch lengths are known and whether a particular model of evolution is assumed (e.g., Brownian model). In this view, the general formulation of equation (1), which is at the core of the tests for phylogenetic signal analyzed in this article offers very high flexibility in the calculation of the phylogenetic distance matrix W=[wij], depending on the available phylogeny and on the problem under scrutiny, that is to say the reasons why the test of phylogenetic signal is performed. In Abouheif's test, the matrix of phylogenetic proximity is obtained solely from the tree topology assuming all branch lengths to be equal. However, as stressed by Crozier (1997, p. 243): “[Phylogenetic] measures using branch-lengths are better than procedures relying solely on topology.” Despite that, we showed that matrix A maintain high power of tests for phylogenetic signal especially when it is used with its positive diagonal and matrix A was found often highly associated with matrix C−1. These two matrices have the particularity of being relative measures of phylogenetic proximities/differences, as the value of proximity/difference between two tips is not only dependent on the shared history of the taxa at these tips but also depend on the level of shared history with the other tips of the tree. Such relative, integrative measures of phylogenetic proximities/differences appear to enhance the power of tests.
To conclude, methods for testing for phylogenetic signal in functional traits should be possibly designed to deal with the particular question asked. The cross-product statistic offers a common statistical framework for widely used tests for phylogenetic signal. It also offers large flexibility into well-known tests for phylogenetic signal, which provides the opportunity for adapting tests to the taxa at hand, the quality of the phylogeny and trait measurements and to the particular question asked. By this it opens new directions for analyzing multivariate phylogenetic signals in biological traits ensuring correct type I error and high power.
Associate Editor: E. Abouheif
The authors would like to thank the editor and anonymous referees for their useful comments on our article.