Distance weighting avoids erroneous scale effects in species-habitat models

Authors


Correspondence author. E-mail: birgit.aue@bio.uni-giessen.de

Summary

1. In species-habitat models, curves describing the changes in correlation strength between habitat cover and species occurrence over a continuum of spatial scales tend to be hump-shaped, and the correlation maximum is generally assumed to indicate the ecologically most meaningful scale of habitat influence on the species. This approach does not take account of the fact that distant habitat is overrated by increasing area with larger buffer sizes whilst habitat influence decays with distance. We devised four levels of distance weighting, down-weighting more distant habitat with increasing realism.

2. We analysed correlation strength across scale (200 m − 50 km) in simulations assuming a Gaussian distance kernel of habitat influence and in empirical data for Eurasian lynx (Lynx lynx) covering Central Europe. Regressions were run with the four levels of distance weighting.

3. Both in the simulated data and the empirical data distance weighting transformed humped correlation curves into saturation curves with high correlations at large scales, thereby eliminating a well-defined correlation maximum.

4. We argue that saturation curves naturally reflect the integration of habitat influence over increasing buffer areas. We conclude that without distance weighting, the correlation strength between habitat cover and species occurrence is prone to misinterpretation. We present two approaches to implementing distance weighting in species-habitat regressions.

Introduction

There is consensus among ecologists that species-habitat relationships are integrating over several scales (Martinez, Serrano, & Zuberogoitia 2003; Bowyer & Kie 2006; Ma 2008). Different species may perceive the same habitat on different spatial scales in accordance, for example, with the differing activity or locomotory abilities of those species (Wiens 1989). Also, one species may perceive different habitats on different spatial scales, because the ‘ecological neighbourhood’ (Addicott et al. 1987) of this animal may differ between the requirements of daily foraging, of the annual life cycle or of the lifetime movements. Definitions of scale apply to space, time and levels of organization and these dimensions are commonly interrelated (Wu 2007). For example, the organizational hierarchy, i.e. species, populations and individuals, has been shown to interact with grain and extent (Turner 1989; Vaughan & Ormerod 2003) when modelling resource selection functions (Meyer & Thuiller 2006). Generally, empirical evidence for the dependence of species-habitat relationships on spatial scale is drawn from the observation that the correlation between habitat cover and species occurrence is strongest on a particular extent (Steffan-Dewenter et al. 2002; Gabriel, Thies, & Tscharntke 2005; Holland, Fahrig, & Cappuccino 2005; Kleijn & van Langevelde 2006). Typically, a humped correlation curve is obtained when regression results are plotted over a continuum of scales (Graf et al. 2005; Condeso & Meentemeyer 2007; Pinto & Keitt 2008; Schmidt et al. 2008). This has led to two fundamental beliefs in current species-habitat modelling: (a) that the scale of maximum correlation indicates the functional scale of habitat influence (Hirao et al. 2008; Holzschuh, Steffan-Dewenter, & Tscharntke 2008; Dallimer et al. 2010) and (b) that multi-scale analysis is required to understand how species respond to the multiple facets of their environment (Cushman & McGarigal 2002; Boyce 2006; Laliberte et al. 2009). Here, we follow the general concept of scale-dependent species-habitat relationships, and we illustrate that habitat influence across scale should indeed be hump-shaped, but we argue that equating the shape of correlation strength across scale with the shape of habitat influence across scale is misleading.

In the spatial dimension, scale-dependence of habitat influence arises from two mechanisms as depicted in Fig. 1. The first mechanism describes how the amount of habitat, which is available at a given distance, increases linearly (i.e. on the perimeter of a circle of radius r) with distance r from the focal point – assuming a constant proportion of the habitat in the landscape and a homogeneous quality and effectiveness of the habitat with respect to a particular species. Based on this first mechanism, the influence of the habitat on the occurrence of a species will therefore grow with increasing distance (Fig. 1a). Conversely, the second mechanism describes how habitat influence decreases with increasing distance from the focal point because of the degrading forces of dilution and dispersion, or owing to the increasing energy demand of organisms for movement over longer distances. This decay follows a certain distance kernel, for example a Gaussian function (Fig. 1b). The two mechanisms combined result in a distance function showing a humped shape of habitat influence over distance (Fig. 1c). These curves represent density functions of the influence of habitat precisely located at a given distance from the focal point, i.e. the influence of an infinitesimally thin circle around the focal point. The next panels illustrate how habitat influence sums up across the surface of a buffer area around the focal point, i.e. how the distance function integrates over the interval from radius zero to the radius of the buffer margin. Again, both mechanisms are effective: Habitat area increases quadratically (proportional to r2) with increasing buffer radius (Fig. 1d), and distance decay implies that the accumulation of habitat influence saturates at large buffer radii (Fig. 1e). The two mechanisms combine into a cumulative distance function with a sigmoid saturation curve (Fig. 1f). All equations are listed in Table 1a.

Figure 1.

 Mechanisms of habitat influence. The strength of habitat influence onto a focal point combines from the increase of available habitat with distance and the simultaneous decay of habitat effect with distance. Figures a, b, c depict the density functions of available habitat and distance decay, and the resulting distance function with a hump-shaped curve. Figures d, e, f depict the corresponding cumulative functions representing the integrals of the density functions over a buffer area as dependent on the radius of the buffer area. Integration results in a sigmoid shape of the cumulative distance function. The cumulative distance function levels off at large buffer radii and approaches a value of 1, i.e., the buffer then extends far enough to encapture the habitat’s total influence onto the focal point.

Table 1.   Equations of (a) density functions of available habitat, distance decay and the resulting distance function, and equations of cumulative functions representing the integrals of the density functions over a buffer area. (b) Equations of distance kernels and distance functions of distance weightings used in the regression. Each distance function Y(r) was derived from its generating distance kernel y(r) through multiplication by the perimeter effect c1·2πr and subsequent re-parameterization of the constants
(a) Distance effect assumed in the simulations
Available habitatDistance decayDistance function
  1. where Erf(x) is the integral of the Gaussian distribution, given by inline image= regression parameter, c1c3 = constants, = scaling factor of distance, r = radius (distance).

c1·2πy(r) = c2·inline imageY(r) = c3·r·inline image
Cumulative habitatCumulative decayCumulative distance function
c1·πr2y(r) = (c2·d·√π)/2·Erf(r/d)Y(r) = (c3·d2)/2·(1 – inline image)
(b) Distance weightings used in the regressions
NameDistance kernelDistance function
Gaussian decayy(r) = c2·inline imageY(r) = b·r·inline image
Habitat areay(r) = c2Y(r) = b·r
Proximity indexy(r) = c2·r−1Y = b
Root indexy(r) = c2·r−1 + c3·r−1/2Y(r) = b1 + b2·r1/2
Gaussian indexy(r) = c2·inline imageY(r) = b·r·inline image

This cumulative distance function has an important interpretation. It shows the influence of increasingly larger areas on the focal point. The influence originating from a small buffer will be small, because the area is small. With increasing buffer size, habitat influence will grow, until it levels off at buffers large enough to capture the relevant part of the habitat. This concept applies to the spatial scale-dependence of any ecological mechanism, be it effective at a resolution of a few millimetres or several kilometres. Clearly, the extent, i.e. the buffer size where saturation of habitat influence occurs, varies depending on the organism and hierarchy of the selected habitat under observation.

Taking the concept of a hump-shaped distance function of habitat influence as a basis, it is obvious why a conventional regression cannot establish a good match between habitat and habitat influence at all scales. By ‘conventional’ regression, we imply a regression model where habitat is not explicitly weighted according to its distance from the focal point. In such a regression, habitat area within a buffer of given radius is specified as predictor variable, and habitat influence given as the numerical response, i.e. occurrence probability or abundance of an animal species, is specified as response variable. This applies irrespective of whether habitat area is expressed in absolute (m², ha) or relative (density, per cent cover) terms because linear transformation does not affect explained variance in a regression. Therefore, if a conventional regression is performed over increasing buffer sizes, the predictor variable of habitat influence is modified by the first mechanism of increasing habitat area, but is not corrected for the second mechanism, namely the decay of habitat influence by distance. Thus, such a modification of the predictor variable implies a very unrealistic distance function of habitat influence (identical to Fig. 1a). Using a small buffer size, this unrealistic distance function of the predictor variable might match the true distance function (from Fig. 1c) quite well. However, the larger part of the habitat’s influence on the focal point is not represented in the predictor variable and cannot be explained in a regression using this predictor. It therefore seems obvious that regressions conducted on too small extents will generally exhibit low explained variance even if the resolution of the data is appropriate to the scale at which the ecological mechanism operates.

If the predictor variable is measured within larger buffers, the proportion of habitat influence inside the buffer increases gradually, and the correlation between habitat area and total habitat influence possibly improves. However, with increasing buffer radius, a mismatch between the curve of the predictor variable and the curve of the dependent variable within the buffer arises. Because all habitat is equally weighted within the buffer, central parts of the habitat are underrated and peripheral parts of the habitat are overrated by the predictor variable, thus compromising the correlation with the dependent variable.

Thus, there is a conflict inherent to the conventional regression technique, between either choosing a small buffer radius and thereby neglecting the influence of distant habitat, or choosing a large buffer radius and thereby overestimating the influence of distant habitat. Our main hypothesis is that introducing a realistic distance weighting into the regression will resolve this conflict and will lead to sigmoid correlation curves across scale. We suspect that the humped shape of a conventional correlation curve across scale (Steffan-Dewenter et al. 2002; Holland, Fahrig, & Cappuccino 2005) results erroneously from a lack of information at small extents and model mismatch at large extents and that humped correlation curves are improper means to elucidate the characteristic scale.

To examine our hypothesis, we devised four levels of distance weighting with increasing capability to match habitat influence at large radii. Using a landscape simulation model in which we had control over the assumed distance function of habitat influence, we compared regressions with the different distance weightings for the scale-dependence of correlation strength. We then applied regression models with the same distance weightings to a data set of Eurasian lynx (Lynx lynx) covering Central Europe to examine whether theoretical findings are of relevance in empirical species-habitat relationships. We conclude by discussing a linear and a nonlinear approach for integrating distance weighting into regression models.

Material and methods

Distance weighting

Four levels of distance weighting were devised to be used in the regressions: (i) no distance weighting: Habitat area, (ii) simple distance weighting: Proximity index, (iii) improved distance weighting: Root index and (iv) realistic distance weighting: Gaussian index. Equations are listed in Table 1b.

All indices were specified in a pixel-based form. ‘Habitat area’ denoted the cover of a particular habitat type h, i.e. b·nh, where nh is the count of pixels classified as habitat h within the buffer area and each pixel is weighted equally irrespective of its distance to the focal point, and b is the regression parameter. The ‘Proximity index’b·Σ(1/rh) we used is a pixel-based version of the index proposed by Gustafson & Parker (1994) and is proportional to the sum of 1/r over all pixels of habitat h, where r is the distance of each pixel to the focal point, and b is the regression parameter. We complemented these two indices commonly found in the literature by two indices specifically tailored to match theoretical distance functions of habitat influence. We assigned the term ‘Root index’ to an index with two regression parameters b1·Σ(1/rh) + b2·Σ(1/rh1/2) made up of the reciprocal distance and the reciprocal square root of distance, added for all pixels of habitat h. We derived this index heuristically through parameter optimization in our simulations, whilst restricting the exponents of the distance rh of both terms to steps of 0·5 (because of computational constraints). The fourth level of distance weighting was named ‘Gaussian index’b·Σ[inline image] because it implements a Gaussian distance kernel. A Gaussian distance kernel is often observed in natural processes of dispersal and diffusion (Klein et al. 2006; Jongejans, Skarpaas, & Shea 2008). We therefore consider the Gaussian index to be the most realistic of our set of indices. Similar to the Root index, the Gaussian index also had two parameters. However, whilst the Root index was amenable to building linear regression models, the Gaussian index required nonlinear regression.

Simulations

To provide a known ‘ground-truth’ against which to compare inference obtained from regression, 1000 artificial landscape sections were generated, each containing a habitat patch at some distance from a focal point. Sizes and distances of the habitat patches were randomly varied across landscape sections, based on a uniform probability distribution over all patch areas. The influence of habitat on the focal points was then modelled through integration over each of the patch areas using a Gaussian distance kernel. The distance of maximum influence was set to rmax = 1 (in arbitrary units) in all simulations.

Regressions were run on the artificial data to investigate how much of the modelled habitat influence could be retrieved by regression, and at which scale. Four augmenting levels of distance weighting, as described earlier, were used in the regressions. Habitat area (with the respective distance weighting) was specified as predictor variable and habitat influence (as obtained from the model) was specified as independent variable. In the model, habitat influence was completely determined by habitat with no other drivers or stochastic effects obscuring the relationship. Ideally, the regressions should therefore indicate a strong correlation between habitat area and habitat influence.

Field data

We chose field data of the Eurasian lynx Lynx lynx (L.) to demonstrate the relevance of distance weighting in empirical species-habitat modelling. The Eurasian lynx is a highly mobile species with a large home-range, and the home-range size is thought to be related to the relevant scale of species-habitat relationships (Lawler & Edwards 2002; Graf et al. 2005). The empirical model is intended to address typical issues commonly encountered in species-habitat regression that were not represented in the simulations: multiple independent variables, several local variables in addition to the matrix habitat variable and binary dependent variables requiring the use of multiple logistic regression rather than simple linear regression.

We used the presence/absence records of Lynx lynx from the European Natura 2000 network of protected areas that are available from the European Nature Information System (EUNIS) of the European Environmental Agency (EEA 2008). Records were restricted to the region of Central Europe to achieve an area with fairly homogeneous climatic conditions for modelling. We assumed plausible absence of the species if any other carnivore species were reported at the same site, but not the target species. To account for the species’ geographical range within the modelling area, absence points were restricted to lie within a convex hull polygon of the outmost presence points. By this procedure, we obtained 136 presence points and 140 absence points of Lynx lynx within the 581 896 km2 of the model region. These figures should provide a realistic representation of the species’ prevalence in the data (Jimenez-Valverde & Lobo 2006; Jimenez-Valverde, Lobo, & Hortal 2009) because the sites of the Natura 2000 network are covered by intensive monitoring schemes.

To obtain local variables for each sampling point, climate variables and altitude were extracted from 1 × 1 km resolution grids provided by WorldClim (Hijmans et al. 2005), and we derived terrain slope from the 1 × 1 km raster version of the European Soil Database (van Liederkerke, Jones, & Panagos 2006) where the variable was provided on a categorical scale (Data S1). All GIS work was conducted using ArcView GIS 9·2 (ESRI Inc., Redlands, CA, USA).

Habitat variables of land-use data were obtained from the CORINE land cover inventory (EEA 2007, 2009) of the years 1990 and 2000 at a resolution of 100 × 100 m. Because Eurasian lynx inhabits forests with complex structure providing cover irrespective of the dominant tree species (Podgorski et al. 2008), we merged broad-leaved, coniferous and mixed forests into a single forest class. We implemented an extension for ArcView GIS 9·2 (ESRI Inc.) to calculate the indices Habitat area, Proximity index and Root index for forest habitat within a given buffer radius around each sampling point. The buffer radius varied in 11 steps in an exponential series ranging from 200 m to 50 km. For each sampling point, the indices were calculated based on the land cover map with the date nearest to the date of the species record. All independent variables were standardized prior to analysis.

From the set of land cover, climate and topographic variables, we selected the most parsimonious model that was kept constant across different radii and indices, thereby allowing a comparison between the different levels of distance weighting. Spatial coordinates with all third degree polynomial terms were also included to account for spatial gradients (Legendre 1993; Lichstein et al. 2002). Because the explanatory power of the habitat variable was expected to vary considerably across different radii and distance weightings, model selection was carried out at the largest radius using the Root index because theory predicted that the Root index would reveal a strong habitat effect at any large radius. To avoid a biased estimation between independent variables because of multicollinearity (Graham 2003), we chose a very conservative approach of checking the correlation-matrix of the variables against a Spearman r of |rs| < 0·5 of correlated pairs of variables we retained only those that performed best in the univariate situation (Muller, Schroder, & Muller 2009). The final model was then chosen by a stepwise procedure using both forward and backward strategy based on the Akaike information criterion corrected (AICc) for small sample sizes (Burnham & Anderson 2002). This procedure is less susceptible to be biased by remaining multicollinearity than the use of marginal statistics (Graham 2003). The resulting model contained forest habitat as predictor variable of primary interest, which was subjected to different levels of distance weighting. Maximum temperature and altitude and their quadratic terms as well as geographical coordinates including their polynomial interactions were specified as additional local variables.

Natura 2000 sites are unevenly distributed across Central Europe, and therefore, sampling points were weighted according to the logarithm of the Euclidean distance to the nearest neighbour, to avoid over-weighting of clumps of close-by sampling locations. We checked model residuals for spatial autocorrelation to ensure that spatial autocorrelation was properly removed and did not confound effects of distance weighting. For this purpose, we calculated a Global Moran’s I (Fortin & Dale 2005; Dormann et al. 2007) where neighbours were considered up to a distance bound of 4 on the standardized data and were weighted according to a row standardized weighting scheme using the R-package spdep (Bivand 2010). The lag distance was derived by visually exploring the correlograms with the R-package ncf (Bjoernstad 2009).

Species-habitat models for Lynx lynx were implemented as logistic regressions with a logit link function and a binomial error distribution with presence/absence data specified as dependent variables. The generalized linear models (GLM) using Habitat area, Proximity index and Root index were fitted by the function glm of the statistics package R (R Development Core Team 2009), whereas models using the Gaussian index were calculated with a nonlinear optimization procedure that is described below in detail.

In contrast to the other levels of distance weighting, the Gaussian index inline image cannot be pre-calculated, because the parameter d (scaling factor of the radius r) needs to be estimated jointly with the regression coefficients to maximize the loglikelihood. As a consequence, the Gaussian index transformed the linear combination of the logistic regression into a nonlinear equation system. From the resulting value of d obtained at the largest radius, we then derived the Gaussian distance function.

To enable the estimation of d, we programmed a function to which an estimated value of d was passed and which calculated the sums of the habitat pixels with each pixel weighted by inline image. This Gaussian index calculating function was then invoked by the optimization process at each iteration prior to determining the loglikelihood of the data. To ensure stable programme execution, the parameter d of the Gaussian index was restricted not to fall below the limiting resolution of the land-use map of 100 m. The optimization was carried out by the general purpose optimization optim in the R-package stats with the quasi-Newton optimization algorithm L-BFGS-B, which is capable of handling bounds and is well suited for dense or unstructured problems (Byrd et al. 1995). Prior to optimization, we checked the objective function to be smooth with values of d ranging between 0·01 and 20000 and tested at least four sets of parameter values per radius to start the optimization to avoid local minima. The Gaussian index calculating function was implemented in the Visual Basic.NET programming language (Microsoft Corp., Redmond, WA, USA) to allow for working on raster maps in ArcGIS and was invoked by the optimizing algorithm through the COM-Interface using rcom (Baier & Neuwirth 2007). To enhance the performance of the index calculating function, we embedded many improvements such as avoiding multiple ArcGIS accesses by caching the habitat grid cutouts of interest, calculating a distance grid only once and pre-calculating the formula inline image for a given d usable with all sample buffers, but the code was not applicable to parallel processor use. Still, calculating the Gaussian index resulted in a high computational burden, i.e. at the largest radius each iteration required the calculation of 992844 pixels in each of the 276 buffers. The code of both the index calculating function and the optimization procedure is provided in the supplement (Data S2 and S3).

The effects of the different distance weightings were determined by measuring explained deviances (D2) of the habitat. Significance of explained deviances of habitat was tested in a likelihood ratio test (LRT) (Whittaker 1984) by comparing differences of the deviances between the total model and a model without the habitat variable against the χ2 statistic, and significance of the total models was tested by comparing deviances of the total model and the null model, i.e. a model containing solely the intercept. Standard errors of regression coefficients were calculated by finite difference approximation of the Hessian matrix, and the Wald statistic was employed to determine significance of the coefficients (Hosmer & Lemeshow 2000). Goodness-of-fit of the total model was measured by the log odds ratio and by the AICc for small sample size (Burnham & Anderson 2002). After applying the model to the known data set and after discriminating predicted presences and absences at a threshold of 0·5, we assessed prediction accuracy with the area under the receiver operating characteristic (ROC) curve (AUC) (Fielding & Bell 1997) and the percentages of correctly predicted presences and absences. The high computational burden of calculating the Gaussian index prohibited extensive cross-validation procedures; nevertheless, these measures are considered a useful method to give a rough impression of the predictive power of the model.

Results

Regressions on simulated data

The simulations provided known quantities of habitat influence, e.g. representing the response of an animal species to the habitat. We analysed how much of this known habitat influence could be reconstructed from regressions of response on habitat using the four levels of distance weighting. Fig. 2 summarizes the regression results. Regressions using realistic distance weighting, i.e. a Gaussian weighting set that matches the distance kernel used to create the data, did not exhibit a humped curve of correlation strength (Pearson R²). Instead, they formed a monotonously rising curve of correlation strength, which saturated and remained constantly high at large buffer radii. High correlations of R2 ≥ 0·99 were reached at a buffer radius of r0·99 = 1·7.

Figure 2.

 Simulation results. The figure depicts correlation curves across scale obtained from simulated data using four levels of distance weighting: no distance weighting = Habitat area, simple distance weighting = Proximity index, improved distance weighting = Root index and realistic distance weighting = Gaussian index. Habitat area, or one of the weighted indices, was specified as predictor variable and simulated habitat influence, e.g. representing the response of an animal species to the habitat, was specified as response variable. The Gaussian index used for regression matched exactly the distance weighting of the Gaussian distance function used to generate the data.

In contrast to the saturation curve observed with realistic distance weighting, regressions with lower levels of distance weighting exhibited hump-shaped curves of correlation strength, and humps became more prominent at lower weighting levels (Fig. 2). The location of the hump, i.e. the radius of maximum correlation, varied depending on the distance weighting. Regressions using Habitat area, Proximity index, Root index and Gaussian index to predict habitat influence had maximum correlations at radii of 1·8, 2·3 and 2·3, respectively.

Habitat area showed by far the highest loss of correlation at large radii and therefore also formed the strongest hump in the curve of correlation strength. The Proximity index followed next, with less but still substantial loss of correlation strength at large radii and with a broader hump. The Root index performed remarkably well and revealed a weak but still noticeable tendency to form a hump. Neither of these correlation curves matched the shape of the distance function used to generate habitat influence in the simulation model.

Empirical species-habitat model

The effects of distance weighting in the empirical species-habitat model (multiple logistic regressions) for Lynx lynx are depicted in Fig. 3 where correlation curves of the four levels of distance weighting formed a clear series with decreasing tendency to form a hump. However, the Root index performed well, and its correlation curve hinted at a decrease at the second-largest radius only.

Figure 3.

 Empirical model of Lynx lynx (n = 278): proportional explained deviances of forest as primary habitat (a), total model fit (b) and the standardized regression coefficients of the covariates maximum temperature, altitude and the quadratic terms. Each figure depicts the correlation curves across scale of four levels of distance weighting: no distance weighting = Habitat area, simple distance weighting = Proximity index, improved distance weighting = Root index and realistic distance weighting = Gaussian index. The Gaussian distance function is the function resulting from the estimation of d (scaling factor of the radius r) of Gaussian index.

Cross-scale analysis of the partial contribution of the species’ primary habitat to explained deviance revealed that a very different impression of habitat importance ranging from 5 to 17% contribution would be obtained depending on buffer radius and level of distance weighting (Fig. 3a).

The two inferior levels of distance weighting (Habitat area and Proximity index) produced correlation curves with strong humps, suggesting low explanatory contribution of the primary habitat at large buffer radii. The higher levels of distance weighting (Root index and Gaussian index) also exhibited a decline of explained deviance towards larger radii, but to a much lesser degree. Maximum explained deviance of the primary habitat was obtained at relatively large buffer radii (forest at 10–17 km radius) using either the Proximity index or the Root index. The Gaussian distance function (Fig. 3a) obtained from the estimate of d showed a maximum at c. 6500 m, therefore clearly differing from the peaks produced by the correlation curves.

The AICc and the scaled deviance D² of the total modal also varied greatly depending on buffer radius and the level of distance weighting (AICc: 871·3–747·4; scaled deviance: 0·60–0·65) and reflected the behaviour of the habitat variable across scale submitted to different levels of distance weightings (Fig. 3b). In contrast, the AICc of the model leaving out the habitat predictor attained a value of 882·9 constantly across all buffer radii, because these locally retrieved variables were independent of the buffer size.

Fig. 3c illustrates the characteristics of the regression coefficients of the covariates. The standardized regression coefficients of altitude, maximum temperature and the respective quadratic terms vary across scale without forming a series as clear as in Fig. 3a. Still, models using the Gaussian index produced the most stable effect sizes at large radii with all four covariates.

Explained deviance of the habitat variable was significant above a value of 3·84 (LRT, P < 0·05, d.f. = 1) for the models using Habitat area and Proximity index and above a value of 5·99 (LRT, P < 0·05, d.f. = 2) for models using Root index and Gaussian index compared against the reduced model. Thus, explained deviance of the total model that was compared against the null model, as well as the contribution of primary habitat, was significant for all buffer radii and distance weightings.

Checking for spatial autocorrelation resulted in values of Moran’I between −0·0037 and −0·0032 varying between buffer radius and level of distance weightings, but none of these values were significant (2-sided Global Moran’s I test, expectation = −0·0037, P < 0·05). A comprehensive table of all model results is provided as a supplement (Table S1).

Discussion

Our results suggest that an adequate distance weighting can eliminate the correlation humps that are commonly observed in species-habitat regressions performed over a continuum of scales. Introducing a realistic distance weighting into our regressions consistently transformed humped correlation curves into saturation curves with stable high correlations at large scales. We obtained perfect saturation curves from the regressions on simulated data whilst the correlation curves of the regressions on empirical data closely approached a saturation curve with the most realistic level of distance weighting.

Our simulations were deliberately kept simple, with a primitive patch structure and univariate habitat influence, and simulated data were analysed through simple linear regression. We are therefore confident that our simulation results do not depend on hidden structures of the simulation or regression, but point to general properties of species-habitat correlations. In fact, we analysed several variations of the simulations applying non-uniform distributions of habitat in the landscape and using different distance kernels to create habitat effects, and we always obtained very similar results (data not shown).

One observation from the simulations was that in the regressions with realistic distance weighting, correlation strength increased monotonically with buffer radius. As the buffer radius increased, more habitat was included in the predictor variable of the regression, whilst also realistically down-weighting the influence of peripheral habitat within the buffer. Thereby, distance weighting resolved the scaling conflict inherent to the conventional regression technique between either neglecting effects from habitat outside a small buffer or overestimating the influence of distant habitat within a large buffer.

Another observation from the simulations was that regressions with no or inadequate distance weighting produced strong correlations between habitat cover and simulated habitat influence at intermediate buffer radii. The radius of maximum correlation differed, however, between weighting indices and always diverged from the radius of maximum habitat influence. These results suggest that interpreting the maximum of the correlation as the maximum of habitat influence (Steffan-Dewenter et al. 2002; Holland, Fahrig, & Cappuccino 2005) is incorrect.

In the empirical model, the strong explanatory power of the land-use type forest well reflects the biology of the Eurasian lynx which prefers forests of very large extends in mountainous areas (Schadt et al. 2002; Basille et al. 2008, 2009). Basille et al. (2009) showed that abundances of the lynx are closely related to prey density which was not available as a variable in our data set, but the nonlinear effect of the temperature variable might well indicate an optimum which beneficially affects productivity (Melis et al. 2010) and therefore continuous resource availability. Species distribution models of Lynx lynx have to take into consideration that the species was reintroduced in most parts of Central Europe after being hunted to extinction and has not yet recolonized all regions providing suitable habitat (Niedzialkowska et al. 2006; Molinari-Jobin et al. 2010). Despite the biased distribution, none of the model residuals was significantly correlated to a spatial structure, so we are confident that the observed effects of distance weighting are not confounded by effects of spatial autocorrelation across scale (de Knegt et al. 2010). Still, multicollinearity between environmental predictors and geographical coordinates might have influenced the model selection process. Because the empirical model was built to demonstrate the effects of distance weightings on real data, the outcome of this study does not rely on finding the most meaningful ecological model. Overall, AUC values of the model across different radii and distance weighting were higher than 0·8, indicating that the models predicted species presence and absence reasonably well (Fielding & Bell 1997). We therefore assume that observations on the effects of distance weighting made in these models reliably reflect general patterns of empirical species-habitat regressions.

In the empirical model, distance weighting had the same effects as in the simulations. Increasing levels of distance weighting successively lessened the humps from the correlation curves. The curves of total model fit and of the partial contributions of the habitat were converted into curves that approach saturation very closely when the most realistic distance weighting was applied. We suspect that distance weighting would also enable a more consistent interpretation of different habitats across scales. If more than one landscape matrix variables were included, a model subjected to distance weighting might reveal the influence of different habitats at a single buffer radius.

The covariates were retrieved locally at the focal point and were not amended by distance weighting. Therefore, the model fit of the reduced model without the habitat predictor did not vary across scale. However, in the total model, the regression coefficients of the covariates were affected by the distance weighting applied to the habitat predictor because of the variation shared by the two variables (Whittaker 1984; Borcard, Legendre, & Drapeau 1992). Because the variable elimination prior to the model selection process was very restrictive, we surmise that joint effects rather than remaining multicollinearity caused shared variation, but any combinations of both are possible (Lawler & Edwards 2006). Similar to the correlation curves, stable estimates were obtained at large buffer radii when using the most realistic distance weighting.

By building an empirical model, we considered issues that are frequently encountered when modelling empirical data, e.g. multiple logistic regressions on presence/absence data. Spatial autocorrelation and multicollinearity of the covariates are typically present in empirical data sets, at least to some extent. We are confident that the effect of distance weighting does not interfere with any of these issues because the simulations also clearly demonstrate the scale effect. In the simulation, neither autocorrelation nor multicollinearity was present because habitat influence was completely determined by habitat with no other drivers or stochastic effects obscuring the relationship.

We implemented the Root index as an example of a linear weighting index and the Gaussian index as an example of a nonlinear weighting index. Owing to the linear structure of the Root index, its two partial terms can be pre-calculated outside the regression, and it performed systematically better than Habitat area and Proximity index. In the empirical model, the correlation curve attained an almost sigmoid shape when the Root index was employed. Unfortunately, a straightforward interpretation of the Root index is hampered by its two partial regression coefficients taking opposite signs. Thus, to evaluate the explanatory power of the habitat variable across scale, we had to use a likelihood ratio test in which the deviance of the model was compared against the deviance of a reduced model where both partial terms of the Root index were removed.

Owing to its nonlinear nature, the Gaussian index could not be pre-calculated outside the regression. Therefore, all landscape buffers had to be recalculated on each iteration of the logistic regression with a modified parameter d. Calculating the empirical model at the largest buffer radius with the Gaussian index took c. 12 h on a single processor of a 2·66 GHz Core2 Quad CPU. The high computational burden prohibited the use of state-of-the-art techniques in ecological modelling such as bootstrapping (Boos 2003) or hierarchical partitioning (Mac Nally 2002), thereby impeding the general use of the Gaussian index. Nevertheless in the theoretical framework of this study, the Gaussian index performed better than all other distance weightings used in the empirical model, because it most closely approximated the theoretical saturation curve and attained fairly stable high correlations at large radii, contrary to the Root index.

Still, the empirical model did not exhibit saturation curves as perfect as in the simulations where the Gaussian index used for regression matched exactly the distance weighting of the Gaussian distance function used to generate the data. In empirical data, the Gaussian index might not always satisfactorily map onto the unknown distance function underlying the data as it did with our data. Indeed, we tested different distance kernels in a preliminary study (data not shown) and the Gaussian function appeared to be the most adequate distance kernel for our data and for the purpose of illustrating the potential of distance weighting. However, other distance kernels than those derived from the exponential family, such as a power law functions, might provide a better match in some cases (Klein et al. 2006; Petrovskii & Morozov 2009). To account for mechanisms that heterogeneously influence species occurrence and may cause shifts from positive to negative effects with distance (Wiens, Rotenberry, & Vanhorne 1987; Bailey & Thompson 2007; Deppe & Rotenberry 2008), we might even have to consider non-monotonic functions of distance kernels to disentangle the model mismatch at large scales caused by overrating distant parts of habitat and the intrinsic change of habitat influence.

We conclude that correlation curves across scale form saturation curves which naturally reflect the integration of habitat influence over increasing buffer areas. In contrast, humped correlation curves arise erroneously from a lack of information at small scales and model mismatch at large scales. Specifically, the scales at which species perceive the various characteristics of their habitat may not necessarily be reflected by the shape of a humped correlation curve. Our results suggest that using distance weighting avoids these erroneous effects in species-habitat models. The development of more advanced techniques for distance weighting is highly desirable. This might include the development of more accurate linear indices than our Root index as well as the development of faster nonlinear indices than our Gaussian index, e.g. by implementing search algorithms specifically tailored to distance weighting.

Acknowledgements

This study was part of the BIOPLEX project funded by the German Ministry for Research and Education BMBF, grant number 01LC0620A2. We thank four anonymous reviewers for their helpful comments that lead to major clarification of the theory and concepts, and Mick Locke (Göttingen) for relentlessly tracking the Germanisms that we involuntarily introduced into the text.

Ancillary