Effects of incorporating spatial autocorrelation into the analysis of species distribution data

Authors


*Correspondence: Carsten F. Dormann, Department of Computational Landscape Ecology, UFZ Centre for Environmental Research, Permoserstrasse 15, 04318 Leipzig, Germany. E-mail: carsten.dormann@ufz.de

ABSTRACT

Aim  Spatial autocorrelation (SAC) in data, i.e. the higher similarity of closer samples, is a common phenomenon in ecology. SAC is starting to be considered in the analysis of species distribution data, and over the last 10 years several studies have incorporated SAC into statistical models (here termed ‘spatial models’). Here, I address the question of whether incorporating SAC affects estimates of model coefficients and inference from statistical models.

Methods  I review ecological studies that compare spatial and non-spatial models.

Results  In all cases coefficient estimates for environmental correlates of species distributions were affected by SAC, leading to a mis-estimation of on average c. 25%. Model fit was also improved by incorporating SAC.

Main conclusions  These biased estimates and incorrect model specifications have implications for predicting species occurrences under changing environmental conditions. Spatial models are therefore required to estimate correctly the effects of environmental drivers on species present distributions, for a statistically unbiased identification of the drivers of distribution, and hence for more accurate forecasts of future distributions.

INTRODUCTION

Ecologists have long been aware of spatial autocorrelation in their ecological data (Sokal & Oden, 1978a,b; Legendre, 1993; Koenig, 1999) and statistical methods for handling such complications have been available for almost as long (summarized in Cressie, 1993). However, these statistically correct methods have only recently been incorporated in distribution analyses. Still, most (i.e. > 80%) publications dealing with the analysis of spatial data in the ecological literature do not attempt to explicitly model spatial autocorrelation (SAC).

The effect of spatial autocorrelation on the interpretation of ordinary statistical methodology in general has been assessed several times (e.g. Liebhold & Sharov, 1998; Lennon, 2000; Dale & Fortin, 2002). It has been shown to influence both coefficients (for case studies see, e.g. Jetz & Rahbek, 2002; Lichstein et al., 2002) and inference in statistical analyses (e.g. Chou & Soret, 1996; Fortin & Payette, 2002). For example, Tognelli and Kelt (2004) found a considerable change in the importance of explanatory variables after incorporating SAC: the second and third most important variables (vapour pressure and wind speed) in their non-spatial model dropped to become the least important in the spatial model.

The regression models employed for species distribution analyses model the expected value for a given set of environmental variables. The observed data points are the expected value plus an additional, unexplained noise — the variance. In truly independent data, the variance around the expected value is modelled as inline image. In the spatially autocorrelated case, however, this variance has an additional component which specifies the covariance between values of x at locations i and j:

image

(Haining, 2003; p. 275). This means that the larger the spatial autocorrelation the larger the covariance and the larger also the true variance around the expected value. Ignoring the second term in the above equation will lead to a downward-biased estimate of σ2 and accordingly incorrect tests of the significance of x̄ in regression models. The statistical point thus is unambiguous (Griffith & Lagona, 1998): the statistical analysis of spatial data needs to incorporate spatial autocorrelation to avoid the pitfalls of spatial pseudoreplication (Hurlbert, 1984; Legendre, 1993), similar to a nested experiment requiring a mixed-model analysis to be structurally correct.

However, from the ecological point of view, spatial autocorrelation contains information one might not want to ‘correct for’ in the analysis. The most obvious ecological cause of spatial autocorrelation in species distribution data is dispersal (e.g. Austin, 2002; Epperson, 2005; Karst et al., 2005; Lloyd et al., 2005; Jones et al., 2006). No species has globally dispersing offspring, and the density of propagules and progeny is usually decreasing with distance. Hence, the observed distribution pattern is the result of environmental factors as well as dispersal, competition and other ecological factors. Guisan and Thuiller (2005, p. 1002) have argued that predictive distribution models therefore need to explicitly model processes responsible for spatial autocorrelation (see also González-Megías et al., 2005). Along a different line of argument, Diniz-Filho et al. (2003) see the use of spatial autocorrelation analysis as important at a different spatial scale, insofar as large-scale analyses (e.g. continental scale) may not need to incorporate SAC, since spatial variation occurs at a much larger scale than the ecological processes of dispersal and biotic interactions. However, from an ecological point of view spatial autocorrelation needs to be incorporated to account for dynamic ecological processes such as dispersal in static, statistical models (Austin, 2002).

Finally, several studies have analysed species distributions in a standard, non-spatial way, and found no evidence for spatial autocorrelation in model residuals (e.g. Higgins et al., 1999; Hawkins & Porter, 2003; Bhattarai et al., 2004; Flinn et al., 2005; Warren et al., 2005). The argument here is that ‘A properly parametrized model (i.e. a model with the correct covariates) would reduce the need for the CAR [conditional autoregression] spatial structure’ (B. D. Ripley, comment in Besag et al., 1991). Legendre et al. (2002) called this ‘spatial dependency’ (i.e. SAC introduced into the response variable due to its dependence on an autocorrelated explanatory variable) as opposed to true spatial autocorrelation arising from ecological processes (e.g. dispersal). Therefore it has been argued that models with a high autocorrelation load in their residuals simply miss important ecological variables (Guisan & Thuiller, 2005). While this may be true, most of the time the ‘correct’ environmental variables will not be available for the analysis at the required spatial resolution (e.g. prey densities, pesticide application intensity, abundances of competitors, parasites and/or hosts) or at the necessary biological accuracy (e.g. temperature dependence of reproduction rate, habitat quality). So, for the time being, we have to manage with surrogate variables for such factors [climate, land use coverages, normalized difference vegetation index (NDVI), distances to specific habitats, etc.]. As a result, model residuals will probably display spatial autocorrelation.

In this review, I draw together evidence from the published literature on the effect of incorporating SAC into the analysis of spatial ecological data. I focus on species distribution data, together with spatial pattern in species richness or species performance. Two main questions shall be addressed: (1) does spatial autocorrelation influence the parameter estimates for species distribution data, or does intrinsic variability overshadow the problem; (2) does incorporating SAC lead to better-fitting statistical models?

METHODS

I searched for studies comparing a traditional distribution analysis with those incorporating a correction for spatial autocorrelation. As appropriate methods to deal with SAC I considered the following: autologistic regression (Augustin et al., 1996; Gumpertz et al., 1997; Wu & Huffer, 1997), generalized least square (GLS) regression (including simultaneous and conditional autoregressive models: Cliff & Ord, 1981; Anselin, 1988; Cressie, 1993; Anselin & Bera, 1998; Haining, 2003) and correction of significance levels (Clifford et al., 1989; Dutilleul, 1993). Inclusion criteria for studies to this review were: (1) the distribution of organisms or ecological attributes (such as occurrence, abundance, biomass, species richness, etc.) was analysed; (2) both a traditional analysis [usually a generalized linear model (GLM) or generalized additive model (GAM)] and a spatial model were employed; and (3) results of this comparison were presented (although in some cases only in a qualitative way). Further restrictions (e.g. coefficients for the two model types need to be given; the analysis of residuals needs to be quantified) would have led to an even smaller set of studies.

Web of Science hits on the search phrase ‘“spatial autocorrelation” AND (ecology OR distribution)’ were scanned for relevant studies. Additionally I searched for papers citing the benchmark study by Lichstein et al. (2002) and specifically for SAC-methods (‘generalized least squares’, ‘autoregressive model’ and ‘autologistic regression’). Finally, I examined studies cited in each of the studies I used for this review. My search yielded 21 studies satisfying the inclusion criteria (Table 1). From each study I extracted the following information: (1) arrangement of samples (lattice/points), (2) spatial extent/grain, (3) species/group, (4) response variable (presence/absence, richness, yield), (5) statistical methods, (6) size of neighbourhood, (7) type of autoregressive function, (8) quality of removal of SAC judged by semivariance, etc., (9) coefficients, SE and P-values of covariates, and (10) importance of spatial correction [R2, deviance, Akaike information criterion (AIC)]. These data form the basis of the further analysis and review. In publications dealing with more than one species or group (e.g. Sanderson et al., 2005) all modelled independent species were included as separate cases.

Table 1.  Details of studies comparing spatial and non-spatial models in the analysis of distribution data
StudySampling*RegionArea (km2)Data pointsResolution (km)Organism groupResponse variableSAC method§SAC range [cells]Weighting functionCoefficient effectr2 (non-spatial)r2 (spatial)Comments$
1lEurope1.1 × 107441950pPAal[1]????Means across 174 species models; neighbourhood set to 1, not tested or derived
2pGrampian130012771.0mPAal7????SE of coefficients much higher in spatial model
3pCalifornia2.7 × 10534?prSAR?????Tables are wrong (S. Dark, pers. comm.)
4lGermany40001255.5prDut????? 
5lEurope1.2 × 107250220brGLS?exp0.699858821AIC
6lSouth America1.8 × 107374220brGLS7sph0.67719741809AIC
7pReserve, NY0.0052000.025pPADut?????Effective sample size decreased to 4.5–85% (of 200)
8lIberian peninsula6.0 × 10524050prGLS?sph0.28616551588AIC
9p?8.8 × 10−4640.0037pyGLS3sph0.083?? 
10lAfrica2.6 × 1072605100brCAR2?0.2500.760.66 
11pNorway0.0457115mcCAR2.25exp0.266269.4268.1AIC, difference n.s.
pNorway0.0457115mcSAR2.25exp0.301269.4267.7AIC, difference n.s.
pQuebec2.6 × 10−5702.0 × 10−4acGLS0exp0.000??LR test for spatial dependence: P = 0.71
pQuebec2.6 × 10−5702.0 × 10−4acGLS11Gau0.192??LR test for spatial dependence: P = 0.02
pQuebec2.6 × 10−5702.0 × 10−4acGLS30exp0.785??LR test for spatial dependence: P < 0.01
pQuebec2.6 × 10−5702.0 × 10−4acGLS4Gau0.691??LR test for spatial dependence: P < 0.01
12pPennsylvania2500900.64bPAal3inv0.308??Commision and omission errors for spatial and non-spatial models given
13pAppalachians60011770.2bcCAR3inv0.0830.530.55 
pAppalachians60011770.2bcCAR3inv0.2000.220.25 
pAppalachians60011770.2bcCAR3Inv0.1550.390.46 
14pSweden37503132500bPAal?10.316144.8141.2AIC
PSweden37503132500bPAal?10.163169.8171.7AIC
15pEngland3760.005iPAal2inv?0.120.14Deviance-based pseudo-R2
pEngland3760.005iPAal2inv?0.090.12Deviance-based pseudo-R2
pEngland3760.005iPAal2inv?0.080.13Deviance-based pseudo-R2
pEngland3760.005iPAal32inv?0.230.32Deviance-based pseudo-R2
pEngland3760.005iPAal4inv?0.140.34Deviance-based pseudo-R2
pEngland3760.005iPAal5inv?0.200.26Deviance-based pseudo-R2
pEngland3760.005moPAal7inv?0.210.34Deviance-based pseudo-R2
pEngland3760.005moPAal20inv?0.130.15Deviance-based pseudo-R2
pEngland3760.005cPAal10inv?0.160.25Deviance-based pseudo-R2
pEngland3760.005cPAal1inv?0.470.54Deviance-based pseudo-R2
16lPortugal99,00099310rPAal2inv???Data are prediction quality across 44 species
17pTunisia45,000530.4bPAal?inv0.32653.426.9AIC
pTunisia45,000530.4bcal?inv0.27439.137.2AIC
18lColorado0.0012560.002pcCRH7????Only change in P-value is given, which is significant in 4/9 cases
19lSouth America1.8 × 1071828100mrCAR3inv0.0960.790.84 
lSouth America1.8 × 1071828100mrSAR3inv0.4380.790.93 
20pOntario0.0868110.001psGLS4exp?0.760.96 
21lSpain10,00015210bPAal[1]10.2520.870.87nb. size set to 1, not tested or derived
lSpain10,00015210bPAal[1]10.3480.800.83nb. size set to 1, not tested or derived
lSpain10,00015210bPAal[1]10.6630.880.90nb. size set to 1, not tested or derived

To quantify the effect of correcting for SAC on model coefficients, I used the following formula to transform spatial and non-spatial model coefficients (βs and βns, respectively) into a ‘relative SAC effect’[rSACe, comparable to the relative neighbour effect (RNE) in competition studies; Markham & Chanway, 1996]:

image

This formula allows for a direct comparison of coefficients from different studies: the larger rSACe is, the greater is the difference between coefficient estimates from the spatial and non-spatial model. Calculation of effect sizes (Gurevitch & Hedges, 1999, 2001) as in a meta-analysis would have been preferable, but in only six studies were estimates of variance, standard deviation or standard error given. The rSACe should hence be interpreted cautiously, since the error on the coefficient estimates may be so large as to make differences insignificant.

Several parameters were log normally distributed (spatial resolution of study, area of study, rSACe). To summarize these values parametrically, I log10-transformed them before analysis and back-transformed them afterwards. Error bars were calculated on log10-transformed data as well, and back-transformed from log mean ± log SE.

The effect of correcting for SAC on overall model quality can be quantified by AIC, R2 or deviance-based pseudo-R2 values given in the publications. However, AIC and R2 values are not directly comparable without the deviance of the intercept-only (null) model. Hence these values should be interpreted mainly qualitatively. All analyses were carried out using R version 2.1.1 (R Development Core Team, 2004).

RESULTS

Because 13 studies report on single species (e.g. their occurrence, abundance or some measure of their performance such as yield) while eight recorded species richness (see Table 1), I first analysed whether there was a systematic difference between these response types in terms of relative spatial autocorrelation effect. anova results (F1,22 = 0.91, P = 0.351) indicate that there was no difference in rSACe. Additionally, in all analyses presented below, response type was included both as a main and an interacting factor, and was not significant in all cases. I therefore present the studies together, irrespective of the type of response assessed.

Study regions ranged in size from 1000 m2 (a field experiment: Gotway & Stroup, 1997) to 26 million km2 (sub-Saharan Africa: Jetz & Rahbek, 2002). Accordingly resolution (or grain) ranged from approx. 25 m2 (Sanderson et al., 2005) to 220 km2 (Diniz-Filho et al., 2003; Diniz-Filho & Bini, 2005).

There was a highly significant correlation between the range of spatial autocorrelation (also called ‘neighbourhood size’, i.e. the distance over which spatial autocorrelation was taking effect), and spatial resolution of a study (Fig. 1). Phrased differently, when correcting for the differences in spatial resolution of the studies, neighbourhood size in terms of multiples of resolution was relatively constant across the entire range of study areas (back-transformed mean of log10-transformed data = 4.29, mean ± 1 SE = 5.00 and 3.68, respectively) and significantly different from 1 (t1,29 = 9.48, P < 0.001). This clearly shows that spatial autocorrelation was apparent in all cases. Among the vertebrates, mammals (three studies) and birds (seven studies) showed no difference in slope (ancova of log-transformed data; interaction between resolution and group: F1,6 = 0.004, P = 0.951).

Figure 1.

Neighbourhood size is related to the spatial resolution of a study: y = x0.88 (± 0.055) − 0.70 (± 0.119). R2 = 0.90, P < 0.001, n = 30. Note that values on both axes were log10-transformed. Grey points represent freshwater organisms, all belonging to the data set of Sanderson et al. (2005). Note that the three rightmost points represent analyses of species richness, not species occurrence, abundance or performance.

Effects of spatial autocorrelation on coefficient estimation

Across the 24 cases analysed the mean effect of spatial autocorrelation on regression coefficients (rSACe) was 0.25 (back-transformed mean of log-transformed data; mean + 1 SE = 0.31 and mean − 1 SE = 0.21). In all cases where comparisons between spatial and non-spatial models had been made, coefficients were biased.

There was no evidence for a change of rSACe with region size or resolution of the study (Kendall's correlation τ = 0.210, P = 0.162 and τ = 0.252, P = 0.094, respectively). Nor was there a significant difference in rSACe with respect to the separation into plants and animals or further into plant, vertebrates and invertebrates (in all cases Kruskal–Wallis and anova P > 0.4; see Fig. 2).

Figure 2.

Relative effect of incorporating spatial autocorrelation into models on the model coefficients for different species groups. Bars (± 1 SE) represent back-transformed means of log-transformed relative SAC effects, error bars are hence asymmetric. The difference between plants and vertebrates is not significant (P = 0.223, anova on log(x + 0.01)-transformed data).

Effects of spatial autocorrelation on model quality

Across the 20 cases which provide model R2 values (or pseudo-R2 in the case of non-normal errors) incorporating SAC increased the mean from 0.43 for non-spatial models to 0.49 for spatial models (Fig. 3). The average increase in adjusted model R2 was 0.060 (min. = −0.10, max. = 0.20), with only one of the 20 spatial models performing worse than its non-spatial counterpart. Studies only giving AICs for spatial and non-spatial models report a significant decrease in AIC by incorporation of a SAC correction in six of nine cases.

Figure 3.

Difference in model fit between non-spatial and spatial models. This difference is significant in a paired Wilcoxon signed-rank test (V = 15, P < 0.01).

In addition to those studies that have been quantitatively summarized above some studies provide strong evidence for the importance of SAC for modelling species distributions, but do not give the investigated parameters in the publications. Augustin et al. (1996) report R2 values for four different spatial models, but not for the non-spatial. Araújo and Williams (2000) present a summary of 174 models for European tree species. Their analysis shows, without further quantification, a strong effect of incorporating a correction term for spatial autocorrelation. Kaboli et al. (2006) showed that the spatial autocovariate remained significant in all three of their models on richness, abundance and composition of bird communities in Iran. The Dutilleul correction employed by Deutschewitz et al. (2003), Fang (2005) and Rodríguez et al. (2006) only corrects the significance value of a correlation (in both cases the effect was pronounced), but cannot be applied to model R2. The same holds true for Thomson et al. (1996), who use a correction proposed by Clifford et al. (1989; see also Haining, 2003). These four studies together find a pronounced change (loss) in correlation significance in 18 of 35 correlations. Keitt et al. (2002) give likelihood-ratio tests for spatial dependence (significant in two of three cases), but not the deviances of the models. Klute et al. (2002) assess their model quality in terms of commission and omission errors (which were reduced by 50% and 30%, respectively, in spatial models), but not the fit as such. Segurado and Araújo (2004) give model quality across 44 different species being modelled. The spatial versions of their GLMs were significantly better (in terms of Kappa index). This did not hold true for GAMs, which were similar for both spatial and non-spatial types.

For the eight studies that presented both model coefficients and adjusted R2 values no correlation between these two parameters was detectable (Kendall's τ = −0.028, P = 0.917). This indicates no systematic relationship between model fit and bias in coefficients with respect to the comparison of spatial and non-spatial models.

DISCUSSION

Ecological causes of spatial autocorrelation

In this review I have considered both single-species distribution and performance data as well as species richness pattern. Obviously the causes of spatial autocorrelation are not necessarily the same for these two types of responses. Since composite measures of assemblages are also affected by the spatial autocorrelation in each of the contributing species, I will first discuss the mechanisms, both endogenous and exogenous (see review by Liebhold et al., 2004), that introduce spatial autocorrelation into the distribution data for a single species.

Exogenous factors, such as climate, soil type, stochastic disturbances or even solar activity (Ranta et al., 1997), may lead to a similar occurrence probability in neighbouring sites, simply because the external factors show a specific autocorrelation pattern. These exogenous factors can ideally be included into the statistical model as environmental covariates, reducing and even removing the residual spatial autocorrelation (e.g. Besag et al., 1991; Higgins et al., 1999; Fisher et al., 2002; Hawkins & Porter, 2003; Bhattarai et al., 2004; Flinn et al., 2005; Warren et al., 2005). If omitted, residuals are likely to display spatial autocorrelation.

Endogenous factors are due to the biology of the species under consideration: dispersal, colonial breeding, home-range size, competition, host availability, predation or parasitization risk, and so forth. Van Horne (2002) showed that mapping eagle occurrences at a spatial scale of 1 km caused a high level of spatial autocorrelation because these birds roam distances of tens of kilometres (see Scott et al., 2002; for further examples). These causes of SAC are usually much more difficult to quantify, and data are often scarce (e.g. Cain et al., 2000). Most of these (most noticeably the interaction with other species) occur at small spatial scales (e.g. less than 1 km), and dispersal for plants and insects becomes extremely rare events at larger scales.

For species assemblage data (such as species richness, percentage of endemics, proportion of Red Data Book species, etc.) additional processes may introduce spatial autocorrelation. Most prominent among them is the omission of a variable relevant at the community scale (such as disturbance or management). Also possible are artefacts due to species-specific bias or different recorder density. Taxonomic specialists may, for example, subdivide plant species into more ‘species’ than a common botanist, or a recording team may sample one region more intensively than another, producing a bias unrelated to the environment. Because it is not always possible to correct for such artefacts, they may still show up as residual spatial autocorrelation. Spatial autocorrelation may also be introduced as a consequence of the sampling scheme (Fortin & Dale, 2005), when the regions of a known occurrence are sampled with higher intensity than regions of unclear occurrence (e.g. for Red List species). Finally, ecological interactions between species (competitive replacement) or founder effects in isolated habitat patches (fragmented landscapes, lakes) will add to SAC in assemblage data that is absent from the individual species distribution data.

Endogenous causes of SAC (dispersal, interspecific interactions, disturbance) can be expected to operate at smaller spatial scales (Guisan & Thuiller, 2005), and should hence be relevant for studies with a higher resolution. The present study shows that the range of spatial autocorrelation was very constant across seven orders of magnitude and various groups of organisms. This result suggests that various processes (biological and other) may contribute to spatial autocorrelation and that we cannot assume SAC to be a small-scale problem. As another consequence, resampling data at a coarser spatial scale, as a recommended treatment against SAC (Qi & Wu, 1996; Aubry & Debouzie, 2000; Guisan & Theurillat, 2000; Rossi & Nuutinen, 2004), will not necessarily solve the problem (see also Fortin & Dale, 2005; p. 248 et seq., for other reasons why sampling at coarser scales may not be adequate).

As can be concluded from the limited set of studies available so far, there is no evidence that models for plants and different animal groups differ in their susceptibility to SAC. The effect of SAC on regression coefficients was similar for all groups investigated. However, several studies found no spatial autocorrelation in the model residuals, and hence would not find an effect of including a SAC correction. These were mainly plant distribution analyses (e.g. Higgins et al., 1999; but see Fisher et al., 2002; Hawkins & Porter, 2003; Williams et al., 2005), suggesting that plant distributions are less spatially autocorrelated than those of animals. The data analysed here were too sparse to assess differences between organism groups with respect to the range of SAC (Fig. 1). The large interval of study sizes and resolution is founded mainly on mammal and bird studies, while plants and invertebrates were only available for a much smaller spatial resolution interval. Only one study reported the effect of SAC for insects at the country scale (Luoto et al., 2005).

Effects of spatial autocorrelation on model parameters

Ignoring SAC leads to two kinds of possible errors: biased parameter estimates and overly optimistic standard errors (for examples see Albert & McShane, 1995; Keitt et al., 2002). The latter is important when model results are used for predicting species distributions for environmental change scenarios since it is another intrinsic problem adding to prediction uncertainty. The studies reviewed here are unambiguous with respect to the bias introduced by neglecting spatial autocorrelation: in all cases coefficients were affected. The large difference of close to 25% between the spatial and non-spatial models indicates that this effect is not only common, but also strong. To use spatial autocorrelation in the prediction is extremely difficult: Augustin et al. (1996) used a Gibbs sampler to iteratively recalculate occurrence probabilities using the autocovariate signal they detected in their data. Such an approach is very sensitive to starting conditions and may in many cases not converge (J. McPherson, pers. comm.).

However, not all previous studies which did not incorporate spatial autocorrelation are fundamentally flawed. Models may be wrongly specified because they contain the ‘wrong’ explanatory variables (e.g. they ignore environmental factors that are important), which may lead to far worse models than ignoring spatial autocorrelation (Haining, 2003, p. 273). Hence spatial autocorrelation is only one more potential problem of which ecologists should be aware. This review hopes to have shown that, based on currently available evidence, spatial autocorrelation is relevant across all groups of organisms and all spatial scales. The state of the art simply demands spatial models if residuals are spatially autocorrelated.

Effects of spatial autocorrelation on model building and fit

Why bother with better models? Usually the model fit of the non-spatial version was already good, and the improvement with the SAC correction term was only moderate (Fig. 3). In a so-far unique study, Chou and Soret (1996) demonstrated that models may still change qualitatively after accounting for SAC. They first simplified a non-spatial model, then added an autologistic term, and finally eliminated insignificant model terms again. These final models differed qualitatively from the final non-spatial model, and therefore hint that non-spatial models for autocorrelated data may be inadequately ‘well fitting’. To date, no study has attempted to evaluate whether the more sophisticated ‘true’ spatial models are actually less biased than ‘wrong’ but simple non-spatial models (e.g. on artificial data with known causal relationships and controlled spatial autocorrelation). This is clearly a question deserving future research, since model predictions depend strongly on the bias introduced by the model structure (Reineking & Schröder, 2006).

At present, no standard approach to model construction and simplification under spatial autocorrelation has been established. Most commonly, a non-spatial model is constructed, simplified to contain only significant variables (usually using information-theoretical methods: Burnham & Anderson, 2002) and eventually the spatial version is employed. This is the case both for ordinary least square (OLS) models that in the final step are reformulated as GLS or conditional/simultaneous autoregressive model, and for GLMs employing an autocovariate (e.g. autologistic regression). Non-spatial OLS and GLM are much faster to run and hence to simplify than their spatial counterparts. Dale and Fortin (2002) argue that autologistic regression corrects the coefficient estimates for the presence of SAC, but does not correct the degrees of freedom employed. This result suggests that models may be wrongly specified, as model simplification will be based on incorrect likelihood ratios. The results of Chou and Soret (1996) illustrate how the autocovariate alters the model structure, but whether this is due to false degrees of freedom or due the ‘true’ effect of SAC remains obscure.

Outlook

Will species distribution models need to go spatial? Guisan and Thuiller (2005) claim that SAC models are barely transferable in space, because the spatial configuration of the landscape will usually be different in a new site, hence making the SAC correction (which is based on a neighbourhood size) inappropriate. This may be so. However, non-spatial models suffer the same deficiency, since their estimates are biased due to the same landscape configuration issue, only it is not captured by one spatial coefficient but rather influences all coefficient estimates. Landscape configuration is a variable worth considering as a covariate in species distribution models, especially if the landscape metric employed is based on ecological reasoning. This should alleviate some of the problems Guisan and Thuiller foresee. If, however, SAC is due to biological processes such as dispersal or species interactions, spatial models will be transferable (reusing their covariance structure or with Gibbs sampling) and more accurate because the spatial component represents some relevant biological process that cannot be accounted for by environmental covariates.

Some of the studies reviewed here were intended to investigate the effect of correcting for spatial autocorrelation as part of their study design. However, most of them needed to present their findings for both spatial and non-spatial models because the need to incorporate spatial autocorrelation is not yet generally perceived to be an important issue (e.g. it is not even mentioned in the excellent methodological review by Elith et al., 2006). Only very recently studies have used corrections for spatial autocorrelation without having to justify their effect on model quality (e.g. Kühn et al., 2003; Luoto et al., 2005; Newbury & Simon, 2005; Orme et al., 2005; Warren et al., 2005; Worm et al., 2005; Kaboli et al., 2006; Stephenson et al., 2006). The significance of the correction terms in their analyses is evidence that the residuals of their models were indeed spatially autocorrelated. These authors certainly represent a minority of spatial ecologists who have taken SAC matter-of-factly into consideration and included it in their models. For the time being, both spatial and non-spatial models should be presented to aid future comparisons on the effect of incorporating spatial autocorrelation on model estimates.

ACKNOWLEDGEMENTS

This research was instigated by a workshop funded by the German Science Foundation (GZ 4850/191/05). I would like to thank all workshop participants for discussion and Alexandre Diniz-Filho, Justin Calabrese, W. Daniel Kissling, Ingolf Kühn, Björn Reineking, Boris Schröder and two anonymous referees for comments on an earlier version of this manuscript.

BIOSKETCH
Carsten F. Dormann is an ecologist interested in the identification of drivers of species distribution at the landscape level. His focus is on methodological issues of statistical analyses and the investigation of causal mechanisms using field experiments and ecological modelling.

Editor: José Alexandre F. Diniz-Filho

Ancillary