Study area and predictor variables
The study was carried out in the Yorkshire Dales National Park (see Fig. S1a), an upland area in northern England, UK. Predictor variables for the simulation were the same as were chosen for the case study on curlew (Numenius arquata).
Variables that directly influence a species distribution, such as food availability or predation risk, are often difficult to record particularly for landscape-scale studies. Instead, we used habitat variables that were more readily available for large areas and were likely to be associated with the more direct, but unmeasured predictors. We sought to describe food availability (through possible associations with soil type, elevation, slope, aspect, rainfall), microclimatic conditions (through elevation, aspect, rainfall), habitat structure (through livestock numbers), disturbance (using settlements, paths/roads) or perceived predation risk (using settlements, field walls, viewshed) (for a more detailed description of the possible associations between the factors we sought to describe and the predictors used see Appendix S2).
Elevation was obtained as a Digital Terrain Model (DTM) at 50 m resolution (50 × 50 m) (Land-Form PANORAMA downloaded from the EDINA Digimap OS service; http://edina.ac.uk/digimap. © Crown Copyright/database right 1993. An Ordnance Survey/EDINA supplied service). Aspect and slope were calculated from the DTM. Soil data were extracted from the simplified version of the National Soil Map data set [NATMAP soilscapes; 1:125000 vector map (NSRI 2011)] after conversion into a 50 m resolution raster (NATMAP soilscapes © Cranfield University (NSRI) and for the Controller of HMSO 2009). The area of settlement (houses, gardens, barns etc.) was identified from the Ordnance Survey MasterMap (http://www.ordnancesurvey.co.uk/oswebsite/products/os-mastermap/index.html). Paths/roads (including railways) outside settlements were collated from the MasterMap and data on public rights of way held by the Yorkshire Dales National Park Authority (YDNPA). ‘Obstructing features’ in the MasterMap (line features which were mainly walls) were used as field walls.
Average annual rainfall for the period 1961–90 was obtained at 5 km resolution (Met Office, www.metoffice.gov.uk). Livestock numbers (sheep and cattle) in 2004 (AGCensus data downloaded from the EDINA Digimap OS service; http://edina.ac.uk/digimap. © Crown Copyright/database right 2009. An Ordnance Survey/EDINA supplied service) were obtained at 2 km resolution.
Curlew were repeatedly surveyed from sections of public right of way (henceforth called transects) in 244 observation units (see ‘'Survey data'’ below), and two transect-specific predictors were included: due to the hilly environment, parts of the observation units were invisible from the transect line, potentially decreasing the probability of recording curlew. The visible area per observation unit was recorded approximately on a field map (scale: ca. 1:8000) and later digitized and calculated. The number of walking groups (a potential source of disturbance) was recorded during each transect survey and the mean between all repeat surveys computed.
Curlew are highly mobile birds and we expected that the quantity of a predictor (e.g. of low elevation) within a scale would be more important in explaining the distribution of the species than the size of patches of the predictor or the connectivity between patches and restricted our analysis to the quantity of predictors within scales. For elevation, aspect, slope and soil type this necessitated creating categories. As we had no a priori knowledge on the grouping of categories (we hypothesised for example that the more wind exposed west facing areas (Met Office 2010) may be more thermoregulatory unfavourable than south facing areas, but did not know in which category to place south-west facing areas), we initially created a fine division of categories for elevation (0–200, 200–300, 300–400, 400–500, 500–600, 600–850 m), aspect (flat, north, north-east, east, south-east, south, south-west, west, north-west) and slope (0–2, 2–5, 5–10,10–15, 15–25, 25–60°). In the curlew model, for adjacent categories (e.g. south and south-east) with good model fit in repeated regressions (see below) at the same spatial scale, we plotted the component smooth function of a generalized additive model at the scale of the linear predictor, that is, the smoothed relationship between the response and the predictor. If the plots were similar, the adjacent categories were grouped. Soil data were grouped according to the attributes of the data (NSRI 2011): by texture (peat or loam), fertility and lime status (lime-rich, very low, low, moderate or high fertility. Very low and low as well as moderate and high were further grouped into fertile and unfertile soils) and drainage (well drained, impeded drainage or wet soil. The latter two were further grouped into moist soils).
For each pixel in a raster map (50 m resolution) of the study area, we calculated the area of each elevation, aspect, slope and soil category as well as the area of settlements within circular buffers of radii (i.e. the scales): 0·25, 0·5, 0·75, 1, 1·5, 2, 2·5, 3, 4, 5, 6, 7, 8, 9 and 10 km), the length of paths/roads at: 250, 500, 750, 1000, 1500 and 2000 m and the length of walls at: 250, 500, 750, 1000 and 1500 m. For each observation unit, values of each of these predictors at each scale were extracted for the centroid of the visible area. Viewshed (the area visible from a location) was calculated for four random points (which had to be at least 120 m apart) per observation unit and the scales 250, 500, 750 and 1000 m and averaged per observation unit and scale. Henceforth, a predictor considered at a specific scale will be called scale-specific predictor. Considering the eight factors (aspect, slope, elevation, soil, settlements, paths/roads, walls and viewshed) at multiple spatial scales resulted in 540 scale-specific predictors. Together with two single-scale predictors (rainfall and livestock numbers, extracted for the centroid of the visible area of each observation unit) and the two transect-specific predictors, the total number of predictors was 544.
We ran two sets of simulations differing in the maximum scale of predictors (5 and 10 km, respectively). For both sets, we created simulated responses based on two, four or six scale-specific predictors (henceforth called true predictors) and used arbitrary coefficient values for each predictor. For both six predictor scenarios and for the four predictor scenarios with maximum scale of 10 km, we created two simulated responses each, differing in the magnitude of the coefficients (see Table 1 and Table S1 for their values).
Table 1. Akaike information criterion, AUC and Naglekerke's R2 for models with a simulated response against the true scale-specific variables, the selected scale-specific variables and the true variables at arbitrary scales (500 m). The simulated responses were based on two, four or six true scale-specific variables (Var), and the scales of the variables were either restricted to a maximum of 5 or 10 km. When spatial eigenvectors were added, results are presented as: before/after addition of spatial eigenvectors
| ||Lower coefficients||Higher coefficients|
|True||Selected||Arbitrary scale||True||Selected||Arbitrary scale|
| ||Max scale 10 km|
|2 Var||AIC||309·72||309·28||328·34/320·5|| || || |
|AUC||0·71||0·71||0·65/0·7|| || || |
|Naglekerke's R2||0·17||0·17||0·08/0·14|| || || |
| ||Max scale 5 km|
|2 Var||AIC||276·42||282·28||309·81/294·58|| || || |
|AUC||0·79||0·8||0·71/0·76|| || || |
|Naglekerke's R2||0·32||0·33||0·18/0·25|| || || |
|4 Var||AIC||228·42||231·12||270·8/257·9|| || || |
|AUC||0·87||0·86||0·8/0·84|| || || |
|Naglekerke's R2||0·52||0·5||0·36/0·43|| || || |
To create the simulated responses, we preselected the spatial scales for each scenario. For the simulations with the maximum scale of 10 km, with two predictors, we selected 1 and 9 km, for four predictors, we used 0·5, 3, 7 and 9 km, and for six predictors, we used 0·25, 2, 4, 6, 8 and 10 km. For the simulations with maximum 5 km scale, with two predictors, we used 0·25 and 5 km, for four predictors, we used 0·25, 1·5, 3 and 5 km and for six predictors 0·25, 1, 2, 3, 4, and 5 km. We ordered all ‘types’ of predictors (i.e. flat: 1, north-facing: 2, etc.), then permuted their order.
We chose the first ‘type’ of the permuted order for the smallest preselected scale of each scenario. For each subsequent scale, we permuted the order of the remaining ‘types’ again. From the permuted order, we chose the first ‘type’ with Spearman ρ ≤ |0·3| to each previously selected predictor for the scale.
We standardized each true scale-specific predictor to a mean of zero and a standard deviation of one and calculated the linear predictor g(xi) for each of our 244 transect centroids i as g(x)i = α + β1 * x1i + … + βni * xni where x are the true scale-specific predictors, β the coefficients, n the number of true predictors and the intercept α = 0. We calculated the logistic model for the probability P of the response being 1 as
and drew a value from a binomial distribution with probability P for each of the 244 observation units.
To demonstrate the proposed approach on a real data set, we used survey data of a bird species, the curlew. Due to the size of the national park (ca. 1770 km2) and the limitations of available survey time, the survey was focused on areas below 500 m in elevation. Curlew were surveyed from 61 transects of 2 km length during 2008 (Fig. S1b). Three sets of repeated surveys between 2nd–30th April, 3rd–22nd May and 2nd June–1st July were carried out. Plots of the number of curlew records with distance from transects suggested reduced detectability beyond 200 m (see Fig. S2). Only observations within this distance were used. The area surveyed from each transect was divided into four observation units of approximately equal length to enable the study of environmental variation at this and larger scales. The observation units (244) roughly approximated the reported size of core areas used during nesting and chick rearing of the species (see Appendix S1).
Records that were unlikely to be of breeding birds, such as flocks, were omitted (see Appendix S1). Each observation unit was recorded as having the species present when at least one individual was recorded during at least one survey (see Appendix S1, also for further details of the selection of transect locations and survey methodology).