1. Tests of significance of the individual canonical axes in redundancy analysis allow researchers to determine which of the axes represent variation that can be distinguished from random. Variation along the significant axes can be mapped, used to draw biplots or interpreted through subsequent analyses, whilst the nonsignificant axes may be dropped from further consideration.
2. Three methods have been implemented in computer programs to test the significance of the canonical axes; they are compared in this paper. The simultaneous test of all individual canonical axes, which is appealing because of its simplicity, produced incorrect (highly inflated) levels of type I error for the axes following those corresponding to true relationships in the data, so it is invalid. The ‘marginal’ testing method implemented in the ‘vegan’ R package and the ‘forward’ testing method implemented in the program CANOCO were found to have correct levels of type I error and comparable power. Permutation of the residuals achieved greater power than permutation of the raw data.
3. R functions found in a Supplement to this paper provide the first formal description of the ‘marginal’ and ‘forward’ testing methods.
Redundancy analysis (RDA, Rao 1964) and canonical correspondence analysis (CCA, ter Braak 1986, 1987) are two forms of asymmetric canonical analysis widely used by ecologists and palaeoecologists. ‘Asymmetric’ means that the two data matrices used in the analysis do not play the same role: there is a matrix of response variables, denoted Y, which often contains community composition data, and a matrix of explanatory variables (e.g. environmental), denoted X, which is used to explain the variation in Y, as in regression analysis. Contrast this with canonical correlation and co-inertia analyses where the two matrices play the same role in the analysis and can be interchanged; see, however, Tso (1981) for an asymmetric interpretation of canonical correlation analysis. RDA and CCA produce ordinations of Y constrained by X.
This paper deals with methods to test the significance of the canonical axes that emerge from this type of analysis. The canonical axes are those that are formed by linear combinations of the predictor variables; they are sometimes referred to as ‘constrained axes’. The section ‘Background: the algebra of redundancy analysis’ will show how they are computed. Individual canonical axes may be tested when the overall relationship (R2) between Y and X has been shown to be significant. We will concentrate on RDA; our conjecture is that the conclusions derived from our simulations should apply to CCA as well.
As we deal with complex, multivariate data influenced by many factors, it is to be expected that several independent structures coexist in the response data. If these structures are linearly independent, they should appear on different canonical axes. Each one should be identifiable by a test of significance. Canonical axes that explain no more variation than random should also be detected; they do not need to be further considered in the interpretation of the results.
It is not always necessary to test the significance of the canonical axes when there are only a few. However, researchers often face situations where there is a large number of canonical axes; they may want to know how many canonical axes should be examined, plotted, and interpreted. Spatial modelling is a good example of these situations: when analysing the spatial variation of species-rich communities (hundreds of species, hundreds of sites) by spatial eigenfunctions, we may end up with hundreds of canonical axes. Different types of spatial eigenfunctions have been described in recent years to model the spatial structure of multivariate response data: Griffith’s (2000) spatial eigenfunctions from a connection matrix of neighbouring regions or sites; Moran’s eigenvector maps (MEM, Borcard & Legendre 2002; Borcard et al. 2004; Dray, Legendre, & Peres-Neto 2006); asymmetric eigenvector maps (AEM, Blanchet, Legendre, & Borcard 2008). It is then of interest to determine which of the canonical axes derived from these eigenfunctions represent variation that is more structured than random. This is the objective of the tests of significance of the canonical axes. Variation along the (hopefully few) significant axes can be mapped, used to draw biplots or interpreted through subsequent analyses. The nonsignificant axes can be dropped because they do not represent variation more structured than random. The eigenvalue associated with each axis expresses the variance accounted for by the canonical axis; the fraction of the response variation that each axis represents is a useful supplementary criterion: axes that account for less than, say, 1% or 5% of the variation may not need to be further analysed even if they are statistically significant.
Three methods have been proposed to test the significance of canonical axes in RDA. We will compare them using simulated data to determine which ones, if any, have correct levels of type I error (defined in the section ‘Simulation method’). We will also compare these methods and two permutation procedures in terms of power. This paper provides the first formal description of these methods.
Background: the algebra of redundancy analysis
Redundancy analysis (RDA, Rao 1964) of a response matrix Y (with n objects and p variables) by an explanatory matrix X (with n objects and m variables) consists of two steps (Legendre & Legendre 1998, section 11·1). In the algebraic description that follows, the columns of matrices Y and X have been centred to have means of 0.
•Step 1 is a multivariate regression of Y on X, which produces a matrix of fitted values through the linear equation:
This is equivalent to a series of multiple linear regressions of the individual variables of Y on X, calculation of the vectors of fitted values and binding these column vectors to form matrix .
•Step 2 is a principal component analysis of . This PCA produces the canonical eigenvalues and eigenvectors as well as the canonical axes (object ordination scores). This step is performed to obtain reduced-space ordination diagrams displaying the objects, response variables, and explanatory variables for the most important axes of the canonical relationship.
Like the fitted values of a multiple linear regression, the canonical axes (object ordination scores) are also linear combinations of the explanatory variables in X. These linear combinations are the defining properties of canonical axes in the presentation of RDA by ter Braak & Prentice (1988) and ter Braak (1995). The present paper focuses on the problem of determining which of the canonical axes are important enough to warrant consideration, plotting and detailed analysis.
The statistical test for identifying the significant axes requires the notion of a partial RDA, that is, RDA with additional explanatory variables, called covariables, assembled in matrix W. In partial RDA, the linear effects of the explanatory variables in X on the response variables in Y are adjusted for the effects of the covariables in W.
In multiple regression, we know that the partial regression of y by X in the presence of covariables W can be computed in two different ways (Legendre & Legendre 1998, section 10·3·5). One first computes the residuals of y on W (noted yres|W) and the residuals of X on W (Xres|W). Then, one can either regress yres|W on Xres|W, or regress y on Xres|W. The same partial regression coefficients and vector of fitted values are obtained in both cases. The R2 of the first analysis is the partial R2 whereas that of the second analysis is the semipartial R2.
The same two approaches can be used for partial RDA. First, one computes the residuals of Y on W (noted Yres|W) and the residuals of X on W (Xres|W). Then, one can compute either the RDA of Yres|W by Xres|W or the RDA of Y by Xres|W. The two approaches produce the same canonical eigenvalues, eigenvectors and axes and can be used to test the significance of the canonical axes. In partial RDA, the canonical axes are linear combination of the adjusted X variables, Xres|W, and are orthogonal to the covariables in W. The R2 obtained in the first approach is the partial canonical R2 whereas that of the second analysis is the semipartial canonical R2; these two statistics are described in eqns 9 and 10 below.
Statistics, simple RDA
The first step of RDA already leads to informative statistics. With matrices Y and , one can compute the following statistics:
•The canonical R2, which Miller & Farr (1971) called the bimultivariate redundancy statistic, measures the strength of the linear relationship between Y and X:
where is the total sum-of-squares (or sum of squared deviations from the means) of and SS(Y) is the total sum-of-squares of Y. Miller & Farr (1971) derived this equation as follows. They considered the case where the p variables of Y are standardized, forming matrix Ystand. They computed a principal component analysis (PCA) of Ystand, then a multiple regression of each principal component j on the explanatory matrix X. This multiple regression is noted PCj|X and its R2 is . Miller and Farr established that
The PCj are the principal components of Ystand and the λj are the corresponding eigenvalues. The sum of the eigenvalues is p, the number of variables in Ystand, because the variables in Ystand have been standardized. So , where Ystand|X denotes the multivariate regression of Ystand on X, is a weighted mean of the coefficients of determination of the PCj regressed on matrix X, the weights being given by the proportion of the variance of Ystand occupied by each principal component (i.e. the eigenvalues divided by p). The same value of would be obtained by calculating the mean of the coefficients of determination of the standardized Ystand variables regressed one by one on X:
For the general case, where Y is not standardized, is computed using eqn 2.
•The adjusted R2 is computed using the classical Ezekiel (1930) formula:
where m is the number of explanatory variables in X or, more precisely, the rank of the variance-covariance matrix of X.
•The F-statistic for the overall test of significance is constructed as follows (Miller 1975):
This statistic is used to perform the overall test of significance of the canonical relationship. The null hypothesis of the test is H0: the strength of the linear relationship, measured by the canonical R2, is not larger than that which would be obtained for unrelated Y and X matrices of the same sizes [Note: in the absence of relationship, the expected value of R2 is not 0 but m/(n–1)].
When the variables of Y are standardized, the F-statistic (eqn 6) can be tested for significance using the Fisher–Snedecor F-distribution with d.f.1 = mp and d.f.2 = p(n − m − 1); p is the number of response variables in Y; m parameters were estimated for each of the p multiple regressions used to compute the vectors of fitted values forming the p columns of ; hence, a total of mp parameters were estimated. This is why there are mp degrees of freedom attached to the numerator of F (d.f.1). Each multiple regression equation has degrees of freedom equal to (n − m − 1), so the number of degrees of freedom of the denominator, d.f.2, is p times (n − m − 1). Miller (1975) conducted numerical simulations in the multivariate normal case, with combinations of m and p from 2 to 15 and sample sizes of 30 to 160. He showed that eqn 6 produced distributions of F values that were very close to theoretical F-distributions with the same numbers of degrees of freedom. Additional simulations that we conducted (reported in Appendix S1) confirmed that this parametric test of significance had correct levels of type I error when Y was standardized. This was not the case, however, for nonstandardized response variables. In our simulations, the columns of Y were random numbers drawn from linearly independent statistical populations. Further simulations should be performed to check the validity of Miller’s parametric test when there are correlations among the columns of Y.
In many analyses, the response variables should not be standardized prior to RDA. With community composition data in ecology (species abundance data), for instance, the variances of the species should be preserved in most analyses because abundant and rare species do not play the same roles in ecosystems. Our simulation results (Appendix S1) show that parametric tests should not be used when the Y variables have unequal variances, especially when the error is not normal. Permutation tests always had correct levels of type I error in our results. For permutation tests, one can simplify the equation of the F-statistic and eliminate the constant p:
This simplification does not change the value of F. Equation 7 is the one used in programs of canonical analysis, such as canoco and vegan’s rda(), designed to analyse empirical user’s data. The F-statistic is tested by permutation in these programs.
Statistics, partial RDA
•For analysis in the presence of W containing q covariables (partial RDA), the F-statistic is constructed as follows (ter Braak & Šmilauer 2002):
There are several ways of computing the sum-of-squares of the fitted values SS(Yfit) and residuals SS(Yres) in the partial RDA case. The most convenient is the following: SS(Yfit) = SS(Yfit|(X + W)) − SS(Yfit|W) and SS(Yres) = SS(Y) − SS(Yfit|(X + W)). (X + W) designates the concatenation of X and W in a single matrix. Yfit was noted in eqn 1, which did not involve covariables W.
•The semipartial R2, , is the proportion of explained variation with respect to the total variation in Y. This is the most widely used R2 statistic in partial RDA. It is the R2 of the simple RDA of Y by Xres|W:
•The partial R2, ;]>, is the proportion of explained variation with respect to the total variation in Y residualized on the matrix of covariables W. Although more rarely used, it can be computed as the R2 of the simple RDA of Yres|W by Xres|W:
Methods to test the significance of individual canonical axes in RDA
The null hypothesis for the test of significance of the jth axis is H0: the linear dependence of the response variables Y on the explanatory variables X is less than j-dimensional. More informally, the null hypothesis is that the jth axis under test explains no more variation than a random axis of the same order (j), for matrices Y and X (in the presence of covariables W, if applicable), given the variation explained by the previously tested axes.
Test all individual canonical axes simultaneously
The first and simplest method can be described as follows (covariables W are not considered in this first form of test):
1Compute the RDA of Y by X. Extract all canonical eigenvalues.
2A rough test could be based on the eigenvalues themselves. For a better test design, for each canonical eigenvalue λj, we will compute F-statistics using the following formulas:
where n is the number of objects, m is the rank of the variance-covariance matrix of X, k is the number of canonical eigenvalues and SS(Y) is the total sum-of-squares in Y. The first formula for the F-statistic is related to the formula found in the canoco manual (ter Braak & Šmilauer 2002, p. 51) to test the first canonical eigenvalue; it uses the sum of canonical eigenvalues up to and including the λj being tested. The second formula uses the sum of all k canonical eigenvalues in the denominator of F in the same way as the marginal method below. In both formulas, the eigenvalue λj in the numerator could be divided by m. That constant, as well as the number of degrees of freedom (n − 1 − m) in the denominator, has no influence on the results of permutation tests.
3Permute the rows of Y at random, obtaining matrix Y*.
4Compute the canonical analysis of Y* by X. Obtain all canonical eigenvalues. For each canonical eigenvalue of the permuted analysis, compute the F1j* and F 2j* statistics under permutation, using the formulas above.
5Repeat steps 3 and 4 a large number of times, say 999 times, to obtain estimates of the distributions of F1j* and F2 j* under permutation. Add the reference values of F1j and F 2j to their respective distributions (Hope 1968).
6Calculate the associated probabilities as the number of cases in the distributions that are larger than or equal to the reference values, divided by the number of permutations plus 1.
That method is appealing because of its simplicity. Computation is faster than with the two following methods. Because it is the simplest one, it may be appealing to the developers of new programs; this is why it is described, tested and discussed here. Our simulation results will show that this method should not be used.
Marginal and forward test of the canonical axes
The marginal and forward tests of canonical axes use F-statistics similar to those described in eqns 11 and 12, respectively. They differ from the simultaneous test described in section ‘Test all individual canonical axes simultaneously’ except for the first canonical axis. For the second and later canonical axes, the forward and marginal tests are based on a partial RDA, even in the case where there were initially no covariables. For testing the jth axis (j > 1), the lower-numbered canonical axes (1, 2 … (j–1)) are added to the matrix of covariables W. Calculation of the residuals, which are permuted and added to the unpermuted fitted values during the permutation test for each axis, is performed on the covariables W incremented with the previously tested axes. The test cannot be performed simultaneously for all axes because the residual sum-of-squares, which is computed using the covariables W incremented with the previously tested axes, is one of the elements that form the denominator of the F-statistic and it varies for each axis. Furthermore, for partial regression and partial canonical analysis, Anderson & Legendre (1999) have shown that permutation of the residuals of either the reduced or the full model (see ‘Permutation methods’ below) is preferable to permutation of the raw data.
It is very difficult to describe the methods for testing the canonical axes in any detail using sentences and paragraphs. Considering the popularity of the R statistical language which is used by many ecologists and other application scientists, we append to the paper (Data S1) two R functions, called marginal.test() and forward.test(), which serve as complete descriptions of the marginal and forward methods; to keep these functions simple, the analysis is performed without true covariables and the functions are written without shortcuts or optimizations. Detailed comments have been added to the code to make it understandable even by those who are not fluent in R. The purpose of these functions is to unambiguously describe the two methods. More complex functions involving covariables are presented in Data S2; they are called test.axes.canoco() and test.axes.cov(). These four functions do not call compiled code that would increase computation efficiency during permutation testing. They are not intended for routine testing of canonical eigenvalues, although they do produce correct, publishable results.
The marginal method, which is based on the approach used in marginal tests of significance in partial regression, is computed as follows. A matrix of covariables may be present in the analysis; it is included in all steps of the description.
1Compute the RDA of Y by X in the presence of covariables W, if any; see ‘Partial RDA’ in ‘Background: the algebra of redundancy analysis’. Extract the eigenvalues as well as the canonical axes that are linear combinations of the explanatory variables X (i.e. the object score ordination axes).
2Test the significance of the successive canonical eigenvalues, λj, using the following F-statistic:
In the marginal method, the denominator of the F-statistic, SS(Yres|(X + W)), is always (i.e. for the tests of all axes) the residual sum-of-squares of the model including all explanatory variables X and all covariables W, if any. Permutation of the raw data can be used to test the first canonical eigenvalue if there is no matrix W in the analysis. Permutation of the raw data and permutation of the residuals of the reduced model are identical in tests without covariables (Legendre & Legendre 1998, Table 11·7). If there are covariables in the analysis, and for the test of all successive canonical axes, use permutation of the residuals of the reduced or full model (see ‘Permutation models’ below). In the marginal.test() function in Data S1, permutation of the residuals of the reduced model is used to test the canonical axes.
A forward approach to the test of the canonical axes has been implemented in the canoco program since version 3.10 (ter Braak 1990). It can be described as follows. The following description includes a matrix of covariables that may be present in the analysis.
1Compute the RDA of Y by X in the presence of covariables W, if any; see ‘Partial RDA’ above. Set aside the vector of canonical eigenvalues and the matrix containing the canonical axes that are linear combinations of the explanatory variables X.
2Test the significance of the successive canonical eigenvalues, λj, using the following F-statistic:
where λj is the eigenvalue under test. λj is the first eigenvalue from the analysis of Y [or permuted Y during the permutations] by X in the presence of covariables W and the previously tested axes 1 to (j–1), which are added as columns to W. SS(Yres|[W + Axis (1) to Axis (j–1)]) is the residual sum-of-squares of that partial RDA model. From that quantity, we subtract eigenvalue λj, which is recomputed during each permutation. Permutation of the raw data can be used to test the first canonical eigenvalue if there is no matrix W in the analysis. If there is, and for the test of all successive canonical axes, use permutation of the residuals of the reduced or full model (see ‘Permutation models’ below). In the forward.test() function in Data S1, permutation of the residuals of the reduced model is used to test the canonical axes; see the section ‘Permutation methods’ below.
Besides the three methods described above, Lazraq & Cléroux (2002) proposed a parametric method to test the significance of the successive components in RDA under the assumption of multinormality. Takane & Hwang (2005) showed, however, that the test of Lazraq and Cléroux lead to strongly biased results. Takane & Hwang (2005) also suggested that the permutation test used by Takane & Hwang (2002) in generalized canonical correlation analysis can be adapted to RDA when the multivariate normality assumption does not hold; it leads to the forward permutation procedure described in the present paper. In support of their suggestion, Takane & Hwang (2005) cited some of the simulation results reported in the present paper, which they had been shown during a seminar by PL in 2005.
In the marginal and forward methods, significance tests of the canonical axes involve either unrestricted permutation of the residuals of the reduced model, a method proposed by Freedman & Lane (1983), or permutation of the residuals of the full model, a method proposed by ter Braak (1990, 1992). These methods are described in Anderson & Legendre (1999) for multiple regression and in Legendre & Legendre (1998, section 11·3) for RDA and CCA. Permutation of the raw data is another possible option; it will be compared to the permutation of the residuals of the reduced model in the simulations reported in section ‘Results and Discussion’.
•In permutation of the raw data (method = ‘direct’ in vegan and in our simulation software), the rows of Y are permuted at random to produce the matrix of permuted response data Y*.
•In permutation of the residuals of the reduced model, one computes the matrix of fitted values Yfit|W and the matrix of residuals Yres|W of the multivariate regression of Y on the matrix of covariables W. The rows of Yres|W are permuted, producing matrix Yres|W*. The matrix of permuted response data, Y*, is obtained by adding Yfit|W (unpermuted) to Yres|W*.
•In permutation of the residuals of the full model, one computes the matrix of fitted values Yfit|(X + W) and the matrix of residuals Yres|(X + W) of the multivariate regression of Y on the matrix obtained by concatenation of X and W by columns into a single matrix. The rows of Yres|(X + W) are permuted, producing matrix Yres|(X + W)*. The matrix of permuted response data, Y*, is obtained by adding Yfit|(X + W) (unpermuted) to Yres|(X + W)*.
In version 3.x of the canoco program, the default method for the test of significance of the first canonical eigenvalue was permutation of the residuals of the reduced model, and permutation of the residuals of the full model for the overall test of significance of the canonical relationship. In version 4.x of canoco, the default became permutation of the residuals of the reduced model in all cases. The default method for the marginal test in vegan was permutation of the raw data (method = ‘direct’) until version 1.14; the default was changed to method = ‘reduced’ in version 1.15.
Permutation of the residuals of the reduced and the full model were found by Anderson & Legendre (1999) to produce equivalent results. Only permutation of the residuals of the reduced model was used in the simulations reported in the present paper.
Besides these methods, one can also permute Y in a way imposed by the logic of a problem. The most important methods of restricted permutation are permutation within the levels of a factor or block which is used as a covariable in the study and loop permutation along a time series or toroidal permutation of the points on a geographical surface.
As mentioned above, Data S1 of this paper contains functions that describe the marginal and forward methods for simple RDA, using R code. Data S2 describes two R functions designed for testing canonical axes in simple or partial RDA. These two functions are programmed differently, following the two approaches for partial RDA described in the ‘Partial RDA’ subsection of ‘Background: the algebra of redundancy analysis’ of this paper. They produce identical results. For users who are analysing real data, the R package ‘vegan’ (Oksanen et al. 2010) offers canonical analysis by RDA and CCA with covariables W, with tests of significance of the canonical axes through the marginal test. The marginal method is implemented in the permutest.cca() function, which carries out the tests of significance of the canonical axes when users call the anova.cca() function after canonical analysis by the functions rda() and cca(). The program canoco (ter Braak & Šmilauer 2002) offers tests of significance of the canonical axes through the forward method.
Simulations were carried out to compare the statistical properties of three methods developed to test the significance of the canonical axes in RDA described in sections ‘Test all individual canonical axes simultaneously’ and ‘Marginal and forward test of the canonical axes’. The simulations were designed specifically for that purpose, by opposition with simulations that could be performed to illustrate the use of RDA and its tests of significance in a wide range of real data analyses. One of the limitations of any simulation study is that not all possible cases can be covered. So, in this paper, we limited our simulations to key situations where a known number of canonical axes were generated. The three methods will be compared as to their ability to detect the correct number of significant axes in the simulated data. A method that detects significant axes not corresponding to linear relationships built into the data will be declared incorrect. In practice, then, we generated data having a known number of linear relationships (canonical dimensions) to determine whether the testing procedures found the correct number of dimensions. This is the role of the blocks of variables created in the response matrix Y and described below. This data structure is a simplification, but not an oversimplification, of the kind of relationships that would be found in real data, for example when analysing the relationships between species and environmental variables.
Type I error consists in finding ‘false positives’, or false significant results, during statistical tests when there is no effect – here, a linear relationship between the response and explanatory matrices. To estimate the rate of type I error in the tests of the canonical axes, pairs of Y and X matrices were generated using random deviates and tested for significance. A test of significance is valid if the probability of type I error is no greater than the significance level α, for any α value (Edgington 1995, p. 37). We used the following values to generate the data for the simulations to study type I error: n = 20 or 100; p (number of variables in Y) = 3, 5, or 8; m (number of variables in X) = 3, 5, or 8. Error was normally distributed.
Power is the ability of a statistical method (here the test of individual canonical axes in RDA) to detect a relationship when one is present in the data. The difficulty in the present study was to generate data that contained a known number of linear canonical relationships, that is, a known number of dimensions, to check that the testing procedures for the canonical axes found the correct number of dimensions. That is the role of the blocks of variables generated in the response matrix Y. The generation method is described in the following paragraphs.
To give a simple example, we could have generated a single variable in Y related to a single variable in X as follows:
Analysis of that pair would have been expected to produce a single significant canonical eigenvalue.
In our simulations, we generated data that contained 1–4 blocks, each containing 1–3 variables in Y; each block was related to two variables in X. Because of that, when there was more than a single y variable per block, more canonical axes were created than the number of significant axes, which was equal to the number of blocks of Y variables. For example, create four independent random normal variables following N(0,1), called x1…x4, and six more random normal variables following N(0,1), called ε1…ε6. Then, create two blocks in Y, each block containing three y variables:
In this example, the coefficient c determines the importance of the random error component; that coefficient will vary in the simulations reported in the next section. The Y variables within each block are all related to a linear model of the same pair of X variables but differ from one another by their random component ε, which differs from variable to variable. So, an analysis of that pair of matrices Y and X would be expected to produce two significant canonical axes.
We used the following values for the simulations for power reported in this paper: No. blocks = 1–4, No. y per block = 1–3, n = 20 or 100. As two X variables were generated for each block in Y, there are m = 2–8 variables in X and p = 1–12 variables in Y. Error distributions: normal, exponential, cubed exponential, as in Manly (1997) and Anderson & Legendre (1999). The weight of the error component in the generation of the y variables was c = 0·2, 0·5, or 0·8.
In both the type I error and the power studies, each simulation run consisted of the analysis of 1000 pairs of independently generated data matrices. The permutation tests involved 999 random permutations. The significance level of the tests was α = 0·05.
Results and Discussion
The simulation results lead to the following observations.
Simultaneous test of all canonical eigenvalues
The simulation results reported in the first six rows of Table B1 (Appendix S2), parts a and b, show that in the absence of a relationship between Y and X, the rejection rates of H0 were always close to the nominal significance level, here 5%. The tests had correct rates of type I error in that situation for both ways of computing the F-statistic (F1 and F2, eqns 2 and 3). When relationships were present in the data (Power and type I error sections, Table B1 in Appendix S2), the tests had good power to detect the significant axes displaying the relationships; the expected number of significant axes is the number of blocks of Y variables (‘No. blocks’ column). The tests of the following eigenvalues, which did not display relationships built in the data, should, however, have had rejection rates close to the significance level (α = 0·05) or lower. The results show that the rejection rates for these axes had highly inflated levels of type I error (values in bold in the table). So that form of test is invalid. As a consequence, the simultaneous test of all canonical eigenvalues should not be used.
Marginal and forward methods, normal error
The results presented in Fig. 1 and Tables B2 and B3 in Appendix S2 indicate that these two methods have very similar properties: in the absence of a relationship between Y and X, the rejection rates of H0 were always close to or lower than the significance level α = 0·05 (Tables B2 and B3 in Appendix S2, ‘Type I error’ sections). When relationships were present in the data, the tests had good power to detect the axes displaying the relationships (Tables B2 and B3 in Appendix S2, ‘Power and type I error’ sections); the expected number of significant axes is the number of blocks of Y variables (‘No. blocks’ column). For all axes that were not expected to display a significant relationship, the rejection rates (type I error) were always lower than α = 0·05. These two forms of test are thus valid, even though their conservative behaviour for the nonsignificant axes indicates a slight loss of power for the significant axes after the first one.
Comparison of the ‘direct’ and ‘reduced’ permutation methods
Does the choice of a permutation method, ‘direct’ or ‘reduced’, make a difference for the marginal method? Simulations were conducted with increased weights for the error component in the generation of the Y variables (parameter c = 0·5 and 0·8, instead of 0·2 in Tables B2 and B3, Appendix S2), in the hope of displaying a difference of power between the two permutation methods. The results presented in Table B4 in Appendix S2 for the marginal test show that permutation of the residuals of the reduced model has more power than permutation of the raw data to detect a relationship when there is a large amount of error in the data, especially when n is small. For c = 0·8 for example, compare rows of results with the same values of n and ‘No. blocks’: the rejection rates for the axes that were expected to be significant, following the first one, are always higher for method = ‘reduced’ than for method = ‘direct’. As expected, power is much higher for n = 100 than for n = 20.
Influence of increasing weight of the error component
As expected, increasing the weight of the error component in the generation of Y (parameter c) reduced the power of the marginal and forward tests of the canonical axes (Tables B2 and B3, Appendix S2: c = 0·2; Tables B4 and B5: c = 0·5 and 0·8; Fig. 1).
Influence of error type
For reduced model permutations, the results for normal and exponential error are similar (Fig. 1). For cubed exponential error (very highly skewed data), the power to detect significant relationships almost completely vanishes because the linear component of the X–Y relationships is weak. Detailed results are presented in Tables B6 and B7 in Appendix S2. For highly skewed data, such as those generated here using cubed exponential deviates as the error term, one might more easily identify relationships between y and X using De’ath’s (2002) multivariate regression tree, which aims at identifying breaks in the response data values that correspond to thresholds in the explanatory variables, instead of linear relationships with the explanatory variables as it is the case in RDA.
There is no need for an adjustment for multiple testing when testing the canonical axes. An overall test of the canonical relationship (canonical R2) must be carried out, ideally before the canonical axes are computed, and certainly before the axes are tested and examined. So one is not interested here in adjusting the significance level of individual tests to obtain a fixed (e.g. 5%) experimentwise error rate, but only in a test of significance that has a correct rate of type I error for each axis.
Statistical significance is of course not the same as biological importance. If an axis is not statistically significant, it usually does not warrant mapping, biplotting or biological interpretation, with perhaps an exception: when the number of sites is very small and power is low, one might still want to draw a biplot and examine some of the first nonsignificant axes. On the other hand, even when an axis is statistically significant, it may not be worth interpreting when it explains little variation; this may happen when the number of objects is large. We advise that the results of tests of statistical significance should not be used blindly.
In this paper, we addressed the following questions through numerical simulations:
•Which of the three testing methods produced correct type I error for the test of the successive canonical eigenvalues? The computationally more simple method which consists of testing all axes with a single set of permutations produced incorrect type I error for all axes except the first one; so it is invalid and should not be used to test the significance of canonical axes in RDA. The marginal and forward procedures are both valid as they displayed type I error rates no greater than the significance level α; we used α = 0·05 in this simulation study. These two permutation methods produced comparable results and were equally powerful for the range of conditions studied in our simulations (Tables B1–B3 in Appendix S2).
•How does permutation of the raw data compare to permutation of the residuals in the marginal and forward tests? Permutation of the residuals of the reduced model provided greater power than permutation of the raw data in the tests of the axes following the first one (Tables B4 and B5 in Appendix S2).
•What is the effect of the type of error on the power of the marginal and forward tests? The importance of the loss of power depended on the number of observations n and on the type of error: with normal or exponential error, the loss was very slight when using permutation of the residuals [Tables B4 (method = ‘reduced’) and B5–B7 in Appendix S2].
The R functions found in Data S1 provide the first formal description of the ‘marginal’ and ‘forward’ testing methods.
We are grateful to Daniel Borcard, Stéphane Dray, as well as two anonymous reviewers for discussion of the ideas presented in the Introduction of this paper and for useful suggestions about the manuscript, and to Guillaume Guénard for assistance during the simulation work. This research was supported by NSERC grant no. 7738-07 to P. Legendre.