## INTRODUCTION

Ecologists have long been aware of spatial autocorrelation in their ecological data (Sokal & Oden, 1978a,b; Legendre, 1993; Koenig, 1999) and statistical methods for handling such complications have been available for almost as long (summarized in Cressie, 1993). However, these statistically correct methods have only recently been incorporated in distribution analyses. Still, most (i.e. > 80%) publications dealing with the analysis of spatial data in the ecological literature do not attempt to explicitly model spatial autocorrelation (SAC).

The effect of spatial autocorrelation on the interpretation of ordinary statistical methodology *in general* has been assessed several times (e.g. Liebhold & Sharov, 1998; Lennon, 2000; Dale & Fortin, 2002). It has been shown to influence both coefficients (for case studies see, e.g. Jetz & Rahbek, 2002; Lichstein *et al*., 2002) and inference in statistical analyses (e.g. Chou & Soret, 1996; Fortin & Payette, 2002). For example, Tognelli and Kelt (2004) found a considerable change in the importance of explanatory variables after incorporating SAC: the second and third most important variables (vapour pressure and wind speed) in their non-spatial model dropped to become the least important in the spatial model.

The regression models employed for species distribution analyses model the expected value for a given set of environmental variables. The observed data points are the expected value plus an additional, unexplained noise — the variance. In truly independent data, the variance around the expected value is modelled as . In the spatially autocorrelated case, however, this variance has an additional component which specifies the covariance between values of *x* at locations *i* and *j*:

(Haining, 2003; p. 275). This means that the larger the spatial autocorrelation the larger the covariance and the larger also the true variance around the expected value. Ignoring the second term in the above equation will lead to a downward-biased estimate of σ^{2} and accordingly incorrect tests of the significance of x̄ in regression models. The statistical point thus is unambiguous (Griffith & Lagona, 1998): the statistical analysis of spatial data needs to incorporate spatial autocorrelation to avoid the pitfalls of spatial pseudoreplication (Hurlbert, 1984; Legendre, 1993), similar to a nested experiment requiring a mixed-model analysis to be structurally correct.

However, from the ecological point of view, spatial autocorrelation contains information one might not want to ‘correct for’ in the analysis. The most obvious ecological cause of spatial autocorrelation in species distribution data is dispersal (e.g. Austin, 2002; Epperson, 2005; Karst *et al*., 2005; Lloyd *et al*., 2005; Jones *et al*., 2006). No species has globally dispersing offspring, and the density of propagules and progeny is usually decreasing with distance. Hence, the observed distribution pattern is the result of environmental factors as well as dispersal, competition and other ecological factors. Guisan and Thuiller (2005, p. 1002) have argued that predictive distribution models therefore need to explicitly model processes responsible for spatial autocorrelation (see also González-Megías *et al*., 2005). Along a different line of argument, Diniz-Filho *et al*. (2003) see the use of spatial autocorrelation analysis as important at a different spatial scale, insofar as large-scale analyses (e.g. continental scale) may not need to incorporate SAC, since spatial variation occurs at a much larger scale than the ecological processes of dispersal and biotic interactions. However, from an ecological point of view spatial autocorrelation needs to be incorporated to account for dynamic ecological processes such as dispersal in static, statistical models (Austin, 2002).

Finally, several studies have analysed species distributions in a standard, non-spatial way, and found no evidence for spatial autocorrelation in model residuals (e.g. Higgins *et al*., 1999; Hawkins & Porter, 2003; Bhattarai *et al*., 2004; Flinn *et al*., 2005; Warren *et al*., 2005). The argument here is that ‘A properly parametrized model (i.e. a model with the correct covariates) would reduce the need for the CAR [conditional autoregression] spatial structure’ (B. D. Ripley, comment in Besag *et al*., 1991). Legendre *et al*. (2002) called this ‘spatial dependency’ (i.e. SAC introduced into the response variable due to its dependence on an autocorrelated explanatory variable) as opposed to true spatial autocorrelation arising from ecological processes (e.g. dispersal). Therefore it has been argued that models with a high autocorrelation load in their residuals simply miss important ecological variables (Guisan & Thuiller, 2005). While this may be true, most of the time the ‘correct’ environmental variables will not be available for the analysis at the required spatial resolution (e.g. prey densities, pesticide application intensity, abundances of competitors, parasites and/or hosts) or at the necessary biological accuracy (e.g. temperature dependence of reproduction rate, habitat quality). So, for the time being, we have to manage with surrogate variables for such factors [climate, land use coverages, normalized difference vegetation index (NDVI), distances to specific habitats, etc.]. As a result, model residuals will probably display spatial autocorrelation.

In this review, I draw together evidence from the published literature on the effect of incorporating SAC into the analysis of spatial ecological data. I focus on species distribution data, together with spatial pattern in species richness or species performance. Two main questions shall be addressed: (1) does spatial autocorrelation influence the parameter estimates for species distribution data, or does intrinsic variability overshadow the problem; (2) does incorporating SAC lead to better-fitting statistical models?