Improving species distribution models for climate change studies: variable selection and scale
Mike P. Austin, CSIRO Sustainable Ecosystems, GPO Box 284, Canberra, ACT 2601, Australia.
Statistical species distribution models (SDMs) are widely used to predict the potential changes in species distributions under climate change scenarios. We suggest that we need to revisit the conceptual framework and ecological assumptions on which the relationship between species distributions and environment is based. We present a simple conceptual framework to examine the selection of environmental predictors and data resolution scales. These vary widely in recent papers, with light inconsistently included in the models. Focusing on light as a necessary component of plant SDMs, we briefly review its dependence on aspect and slope and existing knowledge of its influence on plant distribution. Differences in light regimes between north- and south-facing aspects in temperate latitudes can produce differences in temperature equivalent to moves 200 km polewards. Local topography may create refugia that are not recognized in many climate change SDMs using coarse-scale data. We argue that current assumptions about the selection of predictors and data resolution need further testing. Application of these ideas can clarify many issues of scale, extent and choice of predictors, and potentially improve the use of SDMs for climate change modelling of biodiversity.
The use of statistical species distribution models (SDMs) to predict the potential changes in species distributions under climate change scenarios is now commonplace. There is, however, rising concern at how SDMs are being used to predict the impact of climate change on biodiversity (Heikkinen et al., 2006; Beale et al., 2008; Jeschke & Strayer, 2008; Luoto & Heikkinen, 2008; Morin & Lechowicz, 2008; Pöyry et al., 2008; Randin et al., 2009; Willis & Bhagwat, 2009). Assumptions criticized include: the expectation of equilibrium conditions (Schröder & Seppelt, 2006), ignoring the effects of evolutionary adaptation and limitations on dispersal (Dormann, 2007; Jeschke & Strayer, 2008), and ignoring the acclimatization and persistence ability of species (Willis & Bhagwat, 2009). Other concerns are the disregard of appropriate scales for plant–environment and biotic interactions (Luoto & Heikkinen, 2008; Randin et al., 2009), the lack of modern analogues of future climates (Heikkinen et al., 2006), and the absence of ecophysiological and experimental confirmation of models (Dormann, 2007; although see Austin et al., 2009 for an exception). While many of these issues require substantial research, we suggest that, by re-examining a simple conceptual framework and considering common ecological assumptions, improvements in the use of SDMs for climate change predictions can be made.
Early influential studies of species distributions used only climatic variables, for example the mean temperature of the coldest month, growing degree-days, and an index of annual moisture availability and species presence–absence data (Huntley et al., 1995; Sykes et al., 1996). Subsequent authors have often adopted this approach of using only climate predictors and a resolution of 10 × 10 km or 50 × 50 km pixel SDMs to explore a variety of methodological comparisons focused on SDM (e.g. Araújo et al., 2006; Thuiller et al., 2006; Pöyry et al., 2008) and ecological issues (e.g. Luoto et al., 2006; Svenning et al., 2008). Araújo & Guisan (2006, pp. 1681–1682) state that the ‘use of automated solutions to predictor selection ... should not be seen as a substitution for preselecting sound ecophysiological predictors based on deep knowledge of the biogeographical and ecological theory’, and that ‘when predicting the likely impact of climate change on species distributions, across large regions, one can reasonably assume that using climate predictors alone should prove sufficient to assess the main changes in distribution’. However, see Araújo & Luoto (2007) for a statement in which they call for more stringent evidence in support of the idea that purely climate-based modelling is sufficient to quantify the impact of climate change on species distributions.
We focus on two aspects of assuming that climate variables are sufficient in climate change studies, as follows. (1) How important are non-climatic variables? (2) What is the appropriate resolution for incorporating biophysical and biotic interactions for successful modelling? It is necessary to test these assumptions because they are often inconsistent with available ecological information (Dormann, 2007).
We use a conceptual framework to explore the choice and scale of environmental predictors currently being used in SDMs. We use light as an example of a predictor that is highly dependent on local topography, requiring the use of high-resolution data. We suggest that application of these ideas clarifies problems of the choice of environmental predictors, as well as of the data scale, resolution and extent, which can potentially improve how SDMs can be used for climate change modelling.
Conceptual framework for species distribution modelling
Textbooks conventionally present plant growth and distribution as being controlled by both environmental and biological variables:
Each of these variables can be defined as either a direct or a resource variable sensuAustin (1980). That is, they have a direct effect on plants, for example temperature, or are consumed by plants in order for them to grow, for example nitrogen. (Note that here we focus on plants, but similar arguments can be advanced for animals.)
Improvement of SDMs requires an explicit statement of why predictors have been selected, what ecophysiological process they are intended to represent, and what assumptions have been made about non-included variables. Many predictors used in SDMs have no direct effect on growth (Austin, 1980), being surrogates for more biologically relevant variables, for example elevation for temperature. An advantage of equation (1) is that it focuses simply on the direct variables affecting plant growth.
A biologically relevant variable must also have a data resolution that is consistent with the scale at which the ecophysiological processes show greatest variation. Some hierarchical frameworks recommend critical scales for different environmental characteristics but assume the use of coarse-scale data for large areas and fine-scale data for small areas (e.g. Pearson & Dawson, 2003). This is not necessarily appropriate, as local topographic factors may modify the climatic impact, particularly when studies are applied to very large areas (see Austin & Van Niel, 2010). Suitable conceptual models of the use of predictors have been presented (Franklin, 1995; Guisan & Zimmermann, 2000) but are often ignored in climate change studies. We suggest that equation (1) or a similar expression should be used to explicitly structure the selection of environmental predictors based on known ecophysiological processes, choice of data resolution, and arguments for the inclusion or exclusion of variables.
An early approach
Perring, in a series of three papers (Perring, 1958, 1959, 1960), provided a very clear example of the explicit reasoning that could be used for the selection of predictors, based on Major’s (1951) suggestion of applying Jenny’s functional factorial approach in soil development (Jenny, 1941) to vegetation:
where cl is the regional climate, p is the parent material, r is the topography, o is the biotic factor and t is time.
The topographic and climatic gradients were analysed for individual plant species in chalk grasslands in England and northern France using graphical techniques (Perring, 1958). The study examined variation in regional climates (cl) by stratification, sampling four regions with different rainfall and temperature regimes, and a range of topographic conditions (r) within each region. Parent material (p) was controlled for by sampling only chalk substrates. Species displayed a range of distinctive distributions in relation to slope and aspect (Perring, 1959), and the patterns changed in a consistent manner when compared across climatic regions (Perring, 1960). A strong interaction between regional climate and local topography was critical in influencing the distribution of chalk grassland species. The influence of rainfall and temperature was shown to be conditional on local topography, owing to modification of the local climate by differences in radiation resulting from differences in topographic exposure and aspect. This outcome is significant for SDMs assessing the effects of climate change, as it underlines the importance of local topography in understanding species distributions.
While the conceptual framework presented has serious weaknesses of definition, the absence of a consistent framework for choosing environmental predictors in SDMs makes it difficult to compare models used in climate change predictions (Elith & Leathwick, 2009; Franklin, 2009).
Araújo & Guisan (2006) recognize the challenge of understanding how geographical extent and resolution affect the selection of predictor variables and model performance. Implicit in this recognition is the need for functionally relevant predictors (Elith & Leathwick, 2009). Rather than provide a review, we have selected papers published over the last 20 years in order to display the range of ways in which variables are selected and expressed.
Choice of predictors
Franklin (2009) provides an illustrative table (table 10.3) of the number and types of environmental predictors used in SDMs based on 28 studies. The range in data resolution for the 19 plant studies is from 0.000625 to 2500 km2. The number of predictors used ranges from 3 to 36. Climate variables are used in all 19 studies, while substrate predictors are used in only 11 (numbers of predictors range from 1 to 22).
We use a similar illustrative table (Table 1) to explore the use of predictors, structured using the conceptual framework presented in equation (1). It is clear that each study has an implicit conceptual model, but there is little consistency between them. No study includes predictors for all six conceptual variables, although the category ‘other predictors’ may provide surrogates for them. The total number of predictors ranges from 5 to 38. The number of predictors used for each conceptual variable varies greatly; for example, for water Pearman et al. (2008) used one variable while Coudon et al. (2006) used ten. All studies in the table include temperature- and water-related predictors but only six include light. No two studies have identical predictors for temperature or water. This raises the question of how to assess the comparative value of models when no two studies measure the conceptual variables in the same way.
Table 1. Illustrative examples of environmental predictors and data resolution used in species distribution models (SDMs).
|Light||Mean annual radiation|
(used slope, aspect, location and rainfall)
|–||–||1. Clear-day potential solar radiation, spring|
2. Ditto winter
|1. Mean annual solar radiation|
2. June radiation (adjusted for correlation)
|Temperature||Mean annual temperature||1. Mean annual temperature|
2. Minimum temperature of coldest month
3. Growing degree-days >5 °C
|1. Mean annual temperature|
2. Minimum temperature of coldest month
|1. Mean minimum temperature of coldest month|
2. Mean maximum temperature of warmest month
|1. Mean annual temperature|
2. July minimum temperature (adjusted for correlation)
|1. Mean minimum temperature of coldest month|
2. Heat units annual ∑ daily temperatures >18 °C
|Water||1. Mean annual rainfall|
2. Rainfall seasonality
|1. Mean annual precipitation|
2. Mean summer precipitation
3. Mean winter precipitation
4. Potential evapotranspiration
|1. Mean annual precipitation|
2. Precipitation December–March
3. Precipitation June–August
|Mean annual precipitation||1.Soil water deficit (used water balance model including precipitation, temperature, radiation, soil texture and rooting depth)|
2. lowest mean monthly humidity
|1. Annual potential evaporation|
2. Winter soil moisture days
3. Summer soil moisture days
|Nutrients||‘Nutrient index’||–||–||–||–||Soil fertility|
|Other predictors||1. Lithology|
2. Topographic position
2. Site drainage
|1. Soil sand content|
2. Soil clay content
|Number of predictors||7||7||5||8||8||8|
|Purpose||Use of SDMs for climate change modelling||Influence of data resolution on predicted extinction under climate change||Testing the ability of SDMs to predict past climate-change impacts||SDMs using climate and terrain||Are species in equilibrium with present environment?||Predicting extinction risk under climate change|
|Resolution||≤0.4 ha plot||Various 64 m2–2500 km2||2500 km²||10 × 10 m plot||0.04, 0.4 ha plots||1-minute grid|
|Region||SE New South Wales Australia||Europe & 2 Swiss regions||Europe||South-western California||New Zealand||Western Cape Floristic region South Africa|
|Author(s)||Austin (1992)||Randin et al. (2009)||Pearman et al. (2008)||Franklin (1998)||Leathwick (1998)||Keith et al. (2008)|
|Light||–||–||Potential clear-sky annual radiation||Summer solar radiation|| ||Annual radiation|
|Temperature||1. Mean summer temperature|
2. Temperature seasonality
3. Absolute minimum temperature
4. Growing degree-days >5 °C
|1. Mean annual temperature|
2. Mean winter temperature
3. Growing degree-days until April
4. Growing degree-days until August
|1. Growing degree-days >0 °C|
2. Mean temperature of coldest month
3. Summer frost frequency
|Growing degree-days >0 °C||1. Mean annual temperature|
2. Mean January temperature
3. Mean July temperature
4. Mean May–September temperature
5. Mean difference July and January temperatures
|7 predictors including annual temperature, frost days, growing degree-days >6 °C|
|Water||8 predictors including annual precipitation and 4 other measures of precipitation|
3 measures of water balance
1. Annual precipitation
2. Winter precipitation
3. Moisture index (equilibrium evapotranspiration-precipitation)
Model B (LPJ-GUESS)1
|1. Summer precipitation days|
2. Mean annual precipitation
3. Site water balance (used soil properties)
|Summer moisture index (precipitation – potential evapotranspiration)||1. Annual precipitation|
2. Mean May–September precipitation
|10 predictors including annual rainfall, autumn rainfall, length of drought period, 3 measures of evapotranspiration, 4 measures of water balance|
|Nutrients||–||–||Nutrient index||–||Potential soil productivity||1. Calcium content|
2. Magnesium content
3. Potassium content
|Biota||–||–||1. Percentage broad-leaf cover|
2. Percentage conifer cover
|–||4 land cover classes||–|
|Other predictors||–||–||1. Slope|
2. Snow cover (used climate variables adjusted for slope aspect and shade)
2. 5 measures of elevation
3. 9 soil classes
4.10 soil properties
2. Base saturation rate
3. C/N ratio
|Number of predictors||12||A, 7 B, >100||13||5||38||24?|
|Purpose||Possible existence of glacial refugia||Testing value of vegetation growth model LPG-GUESS in SDM predicting future distribution under climate change||Is SDM predictive success a function of species traits?||To test transferability of SDM between regions||Estimating potential habitat under six climate scenarios||Importance of soil nutritional factors for SDM|
|Resolution||2500 km2||2500 km2||0.01 km2||16 m², 5–30 m² plots||400 km2||c. 400 m²|
|Region||Europe||Europe||Switzerland||Parts of Switzerland and Austria||Eastern United States||France|
|Author(s)||Svenning et al. (2008)||Rickebusch et al. (2008)||Guisan et al. (2007)||Randin et al. (2006)||Iverson et al. (2008)||Coudon et al. (2006)|
The illustrative table (Table 1) is in our opinion representative of the predictors currently used in SDMs and of the way in which they are expressed. We suggest that as much attention needs to be given to the choice of environmental predictors and their estimation from biophysical process models as is given to comparing statistical methods for SDMs. While the choice of predictors depends on available data, failure to incorporate an influential predictor reduces both model performance and the relevance of model outcomes.
Plot size in the papers varied from 16 m2 to 2500 km2 (Table 1). The larger grid cell sizes reflect the interest in climate change, and, importantly, the availability of distribution data at a grid cell size of 50 × 50 km. The assumption that only climate variables are important when the extent of a study is very large leads to the corollary that local environmental heterogeneity can be ignored in large-area studies. Since these assumptions were recognized (Huntley et al.,1995), they do not appear to have been explicitly tested, although see Coudon et al. (2006). Local heterogeneity is important for light (see below) and for soil properties such as nutrients.
Soil properties vary with lithology and along topographic gradients from ridge to gully. The magnitude of these local differences in soils will equal or exceed that between 50-km grid cells. Coudon et al. (2006) explicitly tested whether including soil nutrient variables with climate variables improved a model predicting the distribution of the tree Acer campestre across the whole of France. It did. Such soil heterogeneity may define local refugia for species, confounding predictions of distribution under climate change.
Environmental predictors: light as a critical example
We use light as an example of how we suggest that each variable should be considered.
Careful logical consideration of the biophysical processes associated with light should determine the level of resolution required to model the impact of light on species (Grace, 1987; Franklin, 2009). For all the variables in equation (1), knowledge exists concerning the distributions and biological responses of species to the variables, and there is an understanding of the biophysical processes linking plants to direct and indirect predictors. A careful collation of this knowledge is needed when selecting predictors. Light expressed as solar radiation has long been known to influence plant distribution (e.g. Boyko, 1947) based on known biophysical processes (e.g. Austin, 1972). There is now an extensive literature on the calculation of radiation models (see Wilson & Gallant, 2000) and their use in SDMs (Franklin, 2009, table 5.1), and ecophysiological studies have demonstrated the influence that radiation has on thermal, photosynthesis, photomorphogenesis and mutagenesis effects on plants (Jones, 1992).
Grace (1987) examined the climate tolerance of plants, drawing attention to species distributions that may be limited by low summer temperatures. Cirsium acaule, near its northern limits in the UK, occurs mainly on south-facing slopes. Failure to set seed appears to be the main limitation to northern expansion, and reproductive success can be improved experimentally. Grace quotes long-term results from Rorison et al. (1986) that the summer mean temperature (at 20 mm above soil surface) was 3 °C higher on a south-facing slope than on a north-facing slope for a similar community in the Derbyshire Dales, a difference equivalent to a latitudinal shift of several hundred kilometres. Rorison et al. (1986) recorded differences of 12 °C in maximum air temperature between north- and south-facing slopes for April. The magnitude of these differences implies that local topographic variability results in extreme differences in growing conditions for plants.
Authors treat radiation very differently, namely as a direct variable or as a component in a water-balance model, or they ignore it (Table 1). This is despite the ready availability of biophysical process models used to create variables representing light, providing proximal direct predictors for SDMs (Franklin, 1998; Leathwick, 1998; Coudon et al., 2006; Randin et al., 2006; Guisan et al., 2007). Only two papers (Franklin, 1998; Randin et al., 2006) use global solar radiation locally adjusted for slope, aspect and shade (i.e. light cut-off owing to horizon effects). Rickebusch et al. (2008) examined the incorporation of various components of vegetation growth models in SDMs for predicting response to climate change. However, their standard SDM does not explicitly use a radiation variable for comparison, while their vegetation growth model uses latitude to estimate radiation from sunshine hours to calculate photosynthesis (Hickler et al., 2009).
The importance of these local effects is supported by recent studies. A solar radiation model predicts that daily June radiation for a south-facing slope of 40° at latitude 36.5° S is 5% of that received by a similar north-facing slope (Kumar et al., 1997, figure 9). Lassueur et al. (2006) showed that the indirect predictors slope and aspect improved species modelling using a high-resolution digital elevation model (DEM). They concluded that slope at a 100-m resolution and aspect at 20-m resolution maximized predictive power, and recommended their use for predicting potential refugia in climate change scenarios.
This review of light as a predictor indicates that it constitutes a critical influence on plant distribution. These effects will be greatest at a local scale but vary along climatic gradients (Boyko, 1947; Perring, 1960). It also highlights the need to consider an appropriate scale for predictions, as topographic complexity is degraded by the use of coarse-resolution DEMs. The value of including radiation predictors in SDMs at an appropriate scale (i.e. including both local and regional effects) needs to be tested.
Elith & Leathwick (2009) in their review state that ‘we believe that a more wide-ranging approach to linking theory, data and models would bring substantial benefits’. A comparison of recent papers using the simple conceptual framework proposed in equation (1) demonstrates how inconsistent are the links being made between known ecological processes, environmental data and SDMs (Table 1). A consideration of biophysical models of solar radiation indicates that local variation in radiation will be greater than regional variation, leading to local climate refugia. Similar conclusions can be drawn from the known local variations in soil properties. This will have major implications for those climate change SDMs that use only regional climate predictors at a resolution of 100 km2 or greater.
Knowledge of biophysical processes also suggests there is a natural scale of resolution for an SDM, namely that which maximizes the relevant environmental differences between plots. Tests of whether local or regional differences are of greater magnitude need to be made. We suggest that local topography may create critical climatic refugia for species that are important even in studies of very large areas. We conclude that the relative importance of climatic and non-climatic predictors is best tested at high resolution and large extent before it is assumed that climate predictors alone are ‘sufficient to assess main changes in distribution’ (Araújo & Guisan, 2006).
Numerous problems need to be addressed by species distribution modellers in order to improve predictions under future climates. Some progress can be made if there is consistency in the choice of variables and careful consideration of appropriate scales relative to the organism being studied.
Mike P. Austin is a plant ecologist who has a long-term interest in the analysis of vegetation–environment relationships, with an emphasis on the role of ecological theory focusing on eucalypt forests. His other research interests have been in experimental plant multi-species competition along environmental gradients and the application of biodiversity studies to regional natural resource management.
Kimberly P. Van Niel is a biogeographer and spatial ecologist. Her research focuses on the physical and biotic interactions that influence species and assemblage distribution patterns, encompassing both terrestrial and marine ecosystems. She studies mainly terrestrial and marine plants, sessile organisms and demersal fishes.