Dutilleul's modified t-test which takes the effect of spatial autocorrelation into account is a major breakthrough for the analysis of survey data. However, it requires good estimates of the spatial autocorrelation present in the variables under study. As sample size increases, the estimates become more accurate. Conversely, with small sample size, the accuracy of the modification to the F-statistic and the degrees of freedom is reduced. This is an inherent limitation of any method of modification based upon estimates of spatial autocorrelation obtained from the data themselves. The take-home message for ecologists is that, if the variables under study are spatially autocorrelated, the sample size (n) should be as large as possible – for instance: n=100 in the case of strongly autocorrelated data.
For Dutilleul's modified t-test, the reduction of the rate of type I error due to the presence of a deterministic structure actually depends on the scale of that structure. The effect on the rate of type I error is stronger for broader-scale deterministic spatial structures such as gradients, and smaller for smaller structures such as big bumps in the centre of the field. With some combinations of SA in the environmental and response variables, designs utilizing transects have poorer power than the simple random, systematic or aggregated designs, probably because the assumption of stationarity, required by the correlograms computed in the Dutilleul procedure, is violated by the presence of these structures.
Taking broad-scale spatial structures and finer-scale SA into account
The simulations have shown that, when a broad-scale spatial structure is present in the environmental variable E, this structure makes even Dutilleul's modified t-test have reduced rate of type I error. Although the test remains valid, its power is reduced, so this is an undesirable property. Instead of analysing data containing a broad-scale spatial structure, a better way is to look for, and identify, the gradient in the environmental variable using some form of spatial modelling. The method of analysis is the following.
1) Is there a broad-scale spatial component in the environmental variable E? One can use the results of a pilot study or field observations to answer this question. If so, this structure must be identified and “peeled off” the data before studying the relationship between the environmental and response variables. In some cases, the broad-scale spatial component can be modelled using a linear or polynomial trend-surface equation. Trend-surface analysis is a classical form of spatial modelling; it is explained in several textbooks, including Legendre and Legendre (1998, Section 13.2.1). In other instances, the broad-scale component can be hypothesized to have other functional forms. For instance, a patch can be modelled by a Gaussian response function (an example is given below) which can be modelled by a normal density function through nonlinear regression; a discontinuity can be modelled by a dummy variable in linear regression. k is the number of parameters required to fit the broad-scale spatial model to variable E.
2) Calculate the partial correlation between R and E. According to our hypothesis, if a broad-scale spatial structure is present in the data, it is caused by the broad-scale spatial structuring of the environmental variable. 2.1) Compute the vector of residuals Res(Ei) of the environmental variable E after fitting the broad-scale spatial model. 2.2) Compute the vector residuals Res(Ri) of the regression of the response variable R on the fitted broad-scale spatial model. 2.3) Compute the correlation r between Res(E) and Res(R). This correlation is actually the partial correlation between E and R after controlling for the broad-scale spatial model.
3) Compute the associated probability: 3.1) To take into account the spatial autocorrelation potentially present in E and R, compute the Dutilleul-corrected number of degrees of freedom, νDut A program to compute the modified t-test for the Pearson correlation coefficient corrected for spatial autocorrelation, following Dutilleul (1993), is available at URL 〈http://www.fas.umontreal.ca/biol/legendre/〉. This program can be used to estimate the partial correlation r (the same value is obtained as in step 2.3 above) as well as the corrected number of degrees of freedom νDut.
Note that in some cases, after removing the broad-scale spatial structure, spatial autocorrelation analysis may not detect any significant autocorrelation remaining in one, the other, or both residuals Res(E) and Res(R). In that case, one does not have to compute a modified number of degrees of freedom using Dutilleul's method: when autocorrelation is present in only one of the variables under study, or in none of them, the rate of type I error is not modified, as shown by Bivand (1980) and illustrated in Fig. 2A.
3.2) Compute the modified partial t-statistic from the partial correlation coefficient r:
In this formula, use a corrected number of degrees of freedom νc=νDut−k where k is the number of parameters required in step 1 (above) to fit the broad-scale spatial model to variable E. If no spatial autocorrelation is present in the data (or, at least, in one of the residual variables), νc=(n−2)−k; t is then the classical statistic for testing the significance of a partial correlation coefficient. This value is identical to the t-statistic used for testing the significance of a partial regression coefficient in multiple regression.
3.3) Find the probability associated with the t-statistic in a one-tailed or two-tailed test, for νc degrees of freedom.
One should check that there is no other broad-scale spatial component in the response variable R, besides the one modelled for E. If this happened, it would mean that some other environmental variable E′ containing a broad-scale spatial structure is also an important determinant of R; the model should be redesigned to include this variable.
Let us illustrate this procedure using two numerical examples. We used a sampling field of size (100×100 points). Using our simulation program, we generated a first pair of variables similar to those of Fig. 1, but without any effect of the environmental variable E (Fig. 7a) on the response variable R (Fig. 7b); to do so, we simply set the simulation transfer parameter (β) to 0. The environmental variable was made to contain a large patch in the centre of the field, as in Fig. 1, plus spatially-autocorrelated error and non-spatially-structured normal error. The response variable only contained spatially-autocorrelated error and non-spatially-structured error. For both variables, the spatially-autocorrelated error component was generated using a spherical variogram model with range of 25 in both directions. We ran a horizontal transect through the centre of the patch (like the horizontal arm of the cross design in Fig. 1) and measured the two variables at 50 equispaced points along the transect.
Figure 7. Illustration of the data used in Examples 1 and 2. In Example 1, R is independent of E. In Example 2, a dependence between E and R was created by setting the response parameter beta to 0.4. E is the same in the two examples. (a, b, c) Raw data. The Gaussian density function fitted to E, called Fit(E) in the text, is also shown (curve). (d, e, f) Plot of the residuals. Adjacent values are linked by lines to make it easier to appreciate the autocorrelation remaining in the residuals.
Download figure to PowerPoint
We will now assume that we don't know how the variables were generated and analyse them as we would do for field data. Our task is to deconstruct the data, peeling off the broad-scale spatial component, and find whether or not the residuals are significantly related to each other. So, we will proceed to spatial modelling of variable E. Figure 7a indicates the presence of a bump in the E values observed along the transect. A bump would be difficult to model using a polynomial trend-surface equation, and would require many terms to approximate its shape. We chose to model it using a Gaussian function (normal density):
where Xi represents the position of site i along the transect, a represents the estimate of the mean, b is that of the variance, and c is a vertical scale parameter. The model was fitted to the E data using nonlinear regression (curve in Fig. 7a, R2=0.91); k=3 parameters were estimated to fit the model. The parameter estimates were a=50.44, b=92.54, and c=242.81. The fitted values of this model, Fit(E), were computed and used in the sequel as our estimates of the broad-scale deterministic spatial structure identified in E. The residuals Res(E) of this model were also calculated (Fig. 7d).
The next step is to test the hypothesis that the broad-scale spatial structure found in E may have been passed on to the response variable R. The regression of R on vector Fit(E) was computed and the residuals Res(R) were calculated (Fig. 7e); as expected, this regression explained very little of the response data (R2=0.08) since the data had been generated with a beta coefficient of 0.
The correlation between the two vectors of residuals was r=0.1654. Dutilleul's modified t-test for the correlation coefficient, corrected for spatial autocorrelation, was computed; among other information, the program provided the number of degrees of freedom corrected for spatial autocorrelation (νDut=28.65). For νc=νDut−k=25.65, the corrected t-statistic was
and the associated probability was p=0.4032. We can now compare this answer to the results one would have obtained from the calculation of a correlation coefficient between the two original variables E and R: r(E,R)=0.3178, p=0.0245. At significance level 0.05, one would have drawn the erroneous conclusion that there was a significant relationship between R and E. This would have been due to the inflated type I error rate of the test in the presence of autocorrelation (Fig. 2A) and of a broad-scale deterministic structure (Fig. 4A) in the data (Table 2, central column). As we observed in Fig. 2A, the test would have been too liberal (p=0.2509 in the central portion of the Table) if we had not applied Dutilleul's modified t-test, which corrects for the spatial autocorrelation present in the data.
Table 2. Three estimates of the significance of the correlation between a response (R) and an environmental (E) variable. In Example 1, R is independent of E. In Example 2, a dependence between E and R was created by setting the response parameter beta to 0.4. k is the number of parameters required to fit the broad-scale spatial model to variable E; k=3 in these examples.
| ||Example 1 beta=0||Example 2 beta=0.4|
|Correlation between R and E|| || |
| p||0.0245*||< 0.0001***|
| || || |
|Correlation between residuals after controlling for effect of broad-scale spatial structure|
| r[Resid(R), Resid(E)]||0.1654||0.4466|
| p||0.2509 N.S.||0.0012**|
| || || |
|Correlation between residuals using Dutilleul's modified t-test|
| r[Resid(R), Resid(E)]||0.1654||0.4466|
| p||0.4032 N.S.||0.0202**|
A second pair of variables was simulated, but this time there was an effect of E (Fig. 7a) on R (Fig. 7c) that was generated by setting the simulation transfer parameter (β) to 0.4. Except for that, the deterministic structure in E, and the SA and normal error components in R and E, were the same as in the first example, so that the estimated broad-scale deterministic structure in E, Fit(E), as well as the residuals Res(E), were the same as in Example 1.
In the second step, R was regressed on Fit(E) and the residuals Res(R) were calculated (Fig. 7f); this time, the regression explained an important portion of the variance of the response data (R2=0.5868) since the data had been generated with a beta coefficient of 0.4.
The correlation between the two vectors of residuals was r=0.4466. Dutilleul's modified t-test for the correlation coefficient, corrected for spatial autocorrelation, was computed; the program provided the number of degrees of freedom corrected for spatial autocorrelation (νDut=25.15). For νc=νDut−k.=22.15, the modified t-statistic was
and the associated probability was p=0.0202. We can now compare this answer to the results one would have obtained from the calculation of a correlation coefficient between the two original variables E and R: r(E,R)=0.8172, p<0.0001. The statistical conclusion drawn from this result would have been correct, but the probability is far too small. The incorrect and correct results are summarized in the right-hand column of Table 2. Again, and as in Fig. 4A, the test would have been too liberal (p=0.0012 in the central portion of the Table) if we had not applied Dutilleul's modified t-test, which corrects for the spatial autocorrelation present in the data.