## Introduction

Understanding the determinants of species distributions has been a primary interest of ecology since its inception (Darwin 1859; Rapoport 1975). Strong correlations between distributions and physical factors such as climate have long been documented (Andrewartha & Birch 1958; Gaston 2003) and have received renewed recent interest in questioning how species distributions might respond to climate change (e.g. Thomas *et al.* 2004). This has led to the applied activity of species distribution modelling (SDM), which aims to build explicit quantitative models of the relationship between species distributions and the environment to predict potential future distributions.

However, the expected relationship between species distributions and climate may not always be retrieved by current SDM (Beale, Lennon, & Gimona 2008; Chapman 2010) and the ability of current SDM methods to make plausible predictions is frequently regarded as limited (see: Austin 2002; Gaston 2003; Hampe 2004; Guisan & Thuiller 2005; Araújo & Rahbek 2006; Araújo & Guisan 2006; Heikkinen *et al.* 2006; Austin 2007; Soberón 2007; Elith & Leathwick 2009). For example, current methods do not account for the differences between observed species distributions (realised niches) and potential distributions (fundamental niches) (Soberón 2007; Soberón & Nkamura 2009) that are generated by *process errors* such as populations dynamics, spatial dynamics and biotic interactions (Guisan & Thuiller 2005; Elith & Leathwick 2009). In addition, there are differences between the reality we wish to model and available data. These *observation errors* may be derived from sampling biases in data collection (e.g. Cabral & Schurr 2010) and biases introduced by imperfect measurement (Huston 2002; Clark 2005; Araújo & Rahbek 2006; Austin 2007). As there are multiple sources of both observation and process error, selecting and developing a solution to any source of error should not preclude other solutions to other errors.

In this paper, we address a problem inherent to all SDM analyses, namely uncertainty in predictor variables attributed to fine-scale environmental variation. Consider a typical application of SDM where the probability of occurrence for a species is modelled as a function of the predictor variables associated with the survey sites. The predictor variables (such as climate, soil and ‘habitat’ information) are not measured continuously through time and space, but instead are taken from interpolations of weather station data or from gridded climate re-analysis products. Thus, the *true* value of the predictor variable – the value actually affecting the species biology – has a noisy relationship with the *measured* (or apparent) value of that predictor variable used in the SDM analysis. Uncertainty in predictor variables can be introduced by both instrument error and spatial scaling (Huston 2002).

Within the coarse scale of cartographic grid cells, there are multiple possible local environments: for example, cooler, northern slopes within grid cells that are warm on average (e.g. Grime 1997) or patches of deep soil in regions of mostly shallow soil. Without accounting for this fine-scale variation, the breadth of tolerance to a predictor may be poorly estimated (Palmer & Dixon 1990). This is because a species with a true requirement for cool temperatures will appear to be tolerant of warm temperatures where it occurs on cool slopes within a grid cell with a warm average temperature. The situation is of course more complex than species simply existing in cooler sites in warm grid cells (and vice versa) because other climate-independent factors also vary at fine scales (e.g. soils, habitat loss, fire frequency; see Thomas *et al.* 1999).

These and other consequences of ignoring fine-scale variation are related to a well-explored statistical issue called ‘*regression dilution*’ (or ‘*attenuation bias’*) (e.g. Frost & Thompson 2000; Bartlett, De Stavola, & Frost 2009). Without correcting for errors in a predictor variable, regression analysis assigns errors in the estimation of that predictor variable to uncertainty in the response variable given the predictor variable. This misappropriation of variation tends to squash the apparent functional responses compared to their true values (Palmer & Dixon 1990; Frost & Thompson 2000). For example, in linear regression, errors in predictor variables decrease the estimated slope and increase the estimated intercept. Within SDM, this kind of regression dilution would flatten the estimated species’ functional responses to environmental variables compared to the true functional response.

Crucially, regression dilution can have scale-dependent effects for SDMs. We would expect the errors in predictor variables to differ between studies carried out at different scales [e.g. a fine-scale model with locally measured environmental variables vs. coarse-grained models with gridded climate data (Trivedi *et al.* 2008)]. This implies that the strength of regression dilution will depend on the scale of an SDM analysis. For this reason alone, models estimated at one scale cannot be expected to apply at other scales (e.g. Frost & Thompson 2000).

In the following paper, we first illustrate the problem of regression dilution by carrying out ‘virtual’ SDM on artificially generated data. This enables us to compare the estimated functional responses to known true responses. We then illustrate a practical and general solution to this problem – inspired by the study of growth-light relationships in saplings carried out by Lichstein *et al.* (2010)– relying on a Bayesian approach to SDM using latent variables (also see Clark 2005). Other methods are available for correcting regression dilution in linear regression and when estimating nonlinear response functions (e.g. Phillips & Davey Smith 1991; Frost & Thompson 2000; Bartlett, De Stavola, & Frost 2009). However, many of these rely on estimating a correction factor from repeated measures or supplementary data and are frequently restricted to simple errors in the predictor variable (Frost & Thompson 2000).

We explore the latent variable approach because it is simple to describe and implement (despite being viewed as an advanced topic); might be integrated with other solutions to process and observation errors within a Bayesian approach (see Discussion); is an adaptable method applicable to simple and complex forms of both response functions and error structures in predictor variables; and explicitly accounts for parameter uncertainty that can be subsequently incorporated readily into model predictions. As we demonstrate, the method can be further enhanced for SDM analyses by fitting multiple co-occurring species simultaneously rather than one by one (see Discussion). Using this ‘neighbourly advice’, the SDM approaches the ideal situation where predictor variables are measured perfectly and continuously through space. In discussion, we explore how switching to a Bayesian approach could offer solutions to problems of both process and observation error in SDM, which could be integrated into a unified next-generation set of SDM methods.