The uncertainty associated with species distribution model (SDM) projections is poorly characterized, despite its potential value to decision makers. Error estimates from most modelling techniques have been shown to be biased due to their failure to account for spatial autocorrelation (SAC) of residual error. Generalized linear mixed models (GLMM) have the ability to account for SAC through the inclusion of a spatially structured random intercept, interpreted to account for the effect of missing predictors. This framework promises a more realistic characterization of parameter and prediction uncertainty. Our aim is to assess the ability of GLMMs and a conventional SDM approach, generalized linear models (GLM), to produce accurate projections and estimates of prediction uncertainty.

Innovation

We employ a unique historical dataset to assess the accuracy of projections and uncertainty estimates from GLMMs and GLMs. Models were trained using historical (1928–1940) observations for 99 woody plant species in California, USA, and assessed using temporally independent validation data (2000–2005).

Main conclusions

GLMMs provided a closer fit to historic data, had fewer significant covariates, were better able to eliminate spatial autocorrelation of residual error, and had larger credible intervals for projections than GLMs. The accuracy of projections was similar between methods but GLMMs better quantified projection uncertainty. Additionally, GLMMs produced more conservative estimates of species range size and range size change than GLMs. We conclude that the GLMM error structure allows for a more realistic characterization of SDM uncertainty. This is critical for conservation applications that rely on honest assessments of projection uncertainty.

Correlative species distribution models (SDMs) are used to predict how changes in climate will affect the spatial configuration of suitable habitat for a given species, allowing projections to be made under a variety of scenarios (Elith & Leathwick, 2009). SDMs are increasingly used for conservation planning and climate adaptation applications such as assisted migration and identifying locations suitable for reserves (Pearce & Lindenmayer, 1998; Araújo et al., 2004; Vitt et al., 2009; Carroll et al., 2010). Sound decisions require careful consideration of the uncertainties inherent in these projections (Burgman et al., 2005; Rocchini et al., 2011), yet the uncertainty associated with SDM projections, although acknowledged to be large, is poorly understood and rarely considered in applications (Elith et al., 2002; Dormann, 2007a). Reasons for this include methodological issues and a lack of temporally independent data for projection validation (Dobrowski et al., 2011). Repeated calls have been made for maps of uncertainty to be presented with results (Elith et al., 2002; Burgman et al., 2005; Rocchini et al., 2011), and their absence has led some to question the utility of SDMs for conservation planning (Heikkinen et al., 2006; Dormann, 2007a). In this study we assess the ability of a spatial regression SDM method to provide a useful characterization of projection uncertainty.

The uncertainty of SDM projections is difficult to quantify given the range of contributing sources (Elith & Leathwick, 2009). Studies have shown that the choice of modelling technique introduces the greatest amount of variability in projections (Araújo et al., 2005; Pearson et al., 2006; Buisson et al., 2010). This has led to the use of ‘ensemble’ methods, in which numerous models are fit using a range of methods and input data (Araújo et al., 2005). Outcomes are averaged and those consistent between fitted models are deemed more reliable than those for which the models do not agree. A lack of consensus within an ensemble qualitatively suggests uncertainty, but the reasons that methods disagree are poorly understood (Burgman et al., 2005).

Issues related to spatial autocorrelation (SAC) may partially explain inconsistency between methods in SDM projections. SAC arises because observations close in geographic space are generally more similar than those further apart. When a model is unable to fully explain the spatial pattern of a species' distribution, residual errors will exhibit this property, violating a key assumption of the statistical methods underlying most SDM approaches. SAC of residual error has been shown to be very common in SDM applications (Dormann, 2007b) and can easily be introduced if important covariates are missing or if a species exhibits spatial aggregation due to biotic factors. Most SDM methods are incapable of accounting for this type of error, since they consider only sampling variability and its resultant effect on the precision of parameter estimates. Although SAC has been shown not to bias parameter estimates, it has been shown to decrease their precision and lead to biased variance estimates, inflating tests of significance and thus biasing model selection procedures (Lennon, 2000; Dormann et al., 2007; Beale et al., 2010). This model misspecification may partially explain the disagreement between SDM methods and has been hypothesized to reduce their transferability through space and time (Randin et al., 2006). Numerous methods have been proposed to correct for the adverse effects of spatial autocorrelation on SDMs (Dormann, 2007). Generally, the focus of this research has been on methods to improve parameter estimates and tests of significance (Dormann, 2007a), and less on assessing the transferability of these models and accurately estimating projection uncertainty.

Several notable attempts have been made to quantify SDM prediction uncertainty. Buckland & Elston (1993) demonstrate a non-parametric bootstrapping approach in which numerous models are fit to permutations of the original data, resulting in maps indicating the proportion of iterations the species was predicted to be present. Hartley et al. (2006) present a Bayesian model averaging approach to estimating uncertainty. They fit a set of plausible models containing different covariates and calculate uncertainty by combining between-model and within-model variability. While these approaches provide a quantitative representation of uncertainty, they do not consider the bias induced by SAC on model selection and are unable to account for uncertainty due to important covariates not considered. Other authors have presented maps of uncertainty using Bayesian spatial regression approaches (Clements et al., 2006; Latimer et al., 2006; Finley et al., 2009a), but we are unaware of previous attempts to validate estimates of projection uncertainty using temporally independent data.

Generalized linear mixed models (GLMMs) extend generalized linear models (GLMs) to include random effects capable of accounting for additional sources of uncertainty. To account for SAC, this random effect can be specified as a spatially structured random intercept, or spatial process term, interpreted as the effects of unobserved processes with spatial structure (Diggle et al., 1998). The spatially-structured random intercept has intuitive appeal in that it is able to represent the greater confidence we feel in finding a species when closer to a known presence location. The variance–covariance parameters of the random intercept control the magnitude, range and smoothness of the dependence in space, and are estimated during the model-fitting process. This avoids subjective modelling choices regarding the zone of spatial influence and allows its effect to be integrated into both parameter estimates and predictions. Spatial process GLMMs can be fit through the use of Bayesian hierarchical methods and Markov chain Monte Carlo (MCMC) techniques (Banerjee et al., 2004). Although computationally intensive, this methodology provides full access to the distributions of the model's parameters given the data, i.e., posterior distributions, and the posterior predictive distributions of the response variable at unobserved locations and/or times. Latimer et al. (2009) and Finley et al. (2009a) have explored some of the utility of spatial process GLMMs (hereafter referred to as GLMMs) to model species distributions, but their projections have yet to be validated against temporally independent data. If validation shows that GLMMs are able to account for the uncertainties in modelling species distributions through time, realistic mapping of uncertainty and statistical inference on predicted range changes should be possible.

In this study we compare the ability of GLMMs with a spatially structured random intercept and non-spatial GLMs to project species distributions. We fit a suite of models for historical observations of 99 woody plant species from California, USA, and use contemporary data to assess the accuracy of projections of these models and their ability to characterize projection uncertainty through time.

Case Study

Vegetation data

To train our models, we used presence and absence data for 99 species from 13,746 vegetation plots collected as part of the USDA Forest Service's Vegetation Type Map Project (VTM) between 1928 and 1940 (Wieslander, 1935; Thorne et al., 2008) within the state of California, USA. Plot size was 800 m^{2} in forests and 400 m^{2} in other vegetation types. VTM plots were sampled in the mountainous regions of California (Fig. 1). For modern validation data, we compiled a collection of 33,596 contemporary (2000–2005) vegetation plots with presence and absence data from a variety of sources (further detail provided in Dobrowski et al., 2011). Plot size in the modern data ranged from 400 m^{2} to 800 m^{2} in size. Vegetation plots were aggregated to 10 km by 10 km grid cells and the count of presence observations within each cell, relative to the total number of observations in that cell, was considered the response. The spatial aggregation was performed to ease computational demands and we consider this resolution adequate for a comparison between methods. Because not all species were sampled at each vegetation plot, the total number of grid cells sampled varied by species. This yielded grid cell counts for species that ranged from 825 to 1302 for the historic data and 1334 to 1929 for the modern data. Historical prevalence values ranged from 2.4% to 39.6% at the grid cell level, while modern prevalence values ranged from 0.45% to 43.7%. The historic and modern samples overlapped in 320–715 grid cells depending on species.

Climate data

Climate covariates were derived from meteorological station data interpolated using the Parameter-elevation Regression on Independent Slopes Model (PRISM) (Daly et al., 2008) dataset. PRISM compares favourably to other methods of climate interpolation (Daly et al., 2008). PRISM data for precipitation and temperature were combined with information on geology and soils in a regional water balance model, the Basin Characteristic Model (Flint & Flint, 2007), to estimate soil water availability. Data on solar radiation, topographic shading and average cloud cover were integrated to estimate reference evapotranspiration (ET_{0}), actual evapotranspiration (AET), and climatic water deficit (CWD) (Flint et al., unpublished data). All metrics were averaged over 30-year periods; 1911–1940 for the historic period and 1971–2000 for the modern period. For modelling purposes we selected a subset of commonly used and biologically relevant climate metrics including AET, CWD, minimum annual temperature, maximum annual temperature and annual snowfall. We removed predictors in the historic training data with correlation coefficients greater 0.85. We chose this threshold because the primary impact of collinearity is to increase variance of coefficient estimates (O'Brien, 2007), an effect that should affect both candidate models equally. The data were originally provided at a resolution of 270 m and were aggregated to 10-km resolution using a simple average.

Over the study period, the study area experienced significant changes in climate. Mean temperatures increased by approximately 1.0 °C across the state while precipitation increased in the northern half of the state resulting in spatially variable trends in climatic water balance (Dobrowski et al., 2011).

Modelling techniques

For each species we fit GLMs and GLMMs to the full historic dataset assuming a binomial distribution for the response variable and a logistic link function. We follow Latimer et al. (2006) in using the count of presence observations per grid cell as our response, weighted by the number of vegetation plots per grid cell. Predictions from these models reflect estimated probability of occurrence for a species within each cell, equivalent to predicted prevalence. We used quadratic functions of all five covariates to allow for nonlinear relationships between the covariates and response variables.

For the spatial models, an exponential spatial correlation function was assumed. We used a spatial predictive process model to reduce the costly computations involved in estimating the spatial process (Banerjee et al., 2008; Finley et al., 2009b). Models were fit within a Bayesian framework using MCMC techniques. Computations were performed in r (2.10.1; R Development Core Team, 2011) using the spGLM routine in the spBayes package (Finley et al., 2007). Each model required several days to complete the MCMC sampling on a quad-core server (Intel Xeon E5440 2.83 Ghz). Details about model specification and example code are included in Appendices S1 and S2 in Supporting Information.

Model assessment

Candidate models, i.e., GLM and GLMM, were assessed using resubstituted historic training data (internal validation) and temporally independent data from the contemporary period (independent validation). For independent validation, parameter estimates from models fit to the historic data were used to make projections with the spPredict function in the spBayes library and modern climate data. The spatially varying random intercept was included in GLMM projections. For internal validation, comparisons of model fit were made using the Deviance Information Criterion (DIC; Spiegelhalter et al., 2002), which is a measure of prediction accuracy with a penalty, p_{D}, for model complexity interpreted as the effective number of parameters. Although DIC has been criticized for a variety of theoretical and applied shortcomings (see, e.g., the discussion supplement for Spiegelhalter et al., 2002), there are few alternative fit criteria suitable for hierarchical models and we feel its use for broad comparisons is reasonable. As a measure of predictive accuracy for both internal and independent validation, we used AUC (area under the receiver–operator curve), an index representing the ability of a model to discriminate between presence and absence observations (Hosmer & Lemeshow, 2000). Although AUC does not consider the calibration of predictions and required reducing our data to presence or absence within each grid cell, it remains useful for comparisons between candidate models for the same species.

To directly assess prediction uncertainty we estimated coverage rates of 90% credible intervals for probability of occurrence, derived from posterior predictive distributions for sampled grid cells. Coverage rates were calculated as the proportion of grid cells for which the observed prevalence value fell within their respective 90% credible intervals. Because a logistic link function can never return a value of zero or one, we considered intervals including 0.001 to include zero, and intervals including 0.999 to include one.

To assess both the range and significance of residual spatial dependence among the observations, we used Moran's I test based on 12 discrete distance classes. Details are given in Appendix S1.

Range size estimates

We estimated range size as the cumulative area of cells for which the posterior predicted probability of occurrence was above a threshold value. The threshold value was chosen to minimize the difference between sensitivity (proportion of presence observations correctly predicted) and specificity (proportion of absence observations correctly predicted) for the historic data used to fit the models. This threshold was calculated individually for each model and species. We tested the statistical significance of range size change by subtracting the posterior distributions of range size estimates for the two time periods to generate a posterior for range size change; if the 90% credible interval for this distribution excluded 0, the change was deemed significant.

In addition to estimating overall changes in range size, we identified where significant changes to the species ranges were predicted to occur. For each grid cell we compared the posterior predictive distributions in the historic period to those for projections in the modern period (see Fig. 6). From the historic posterior we calculated the probability of observing a value as extreme or more extreme than the median projected value.

Displaying uncertainty

In order to graphically depict uncertainty in our predictions, we adapted the methods of Hengl et al. (2004). Median predictions for each grid cell were displayed using a colour ramp and degree of uncertainty (width of a 90% CI) was shown by increasing the whiteness of these colours.

Results

Internal validation

Internal validation showed significant differences between model fits (Table 1 and Fig. 4). Median DIC scores dropped by 454.6 for GLMMs compared to GLMs, despite a median increase in model complexity ofp_{D} = 87.5, suggesting a considerable improvement in fit for GLMMs over GLMs. AUC scores for GLMs had a median value of 0.88, indicating good discrimination between presence and absence observations (Swets, 1988). GLMMs yielded a median AUC score of 0.98, indicating near-perfect discrimination between presence and absence observations. Coverage rates for GLMMs had a median value of 0.91, very close to their nominal value of 0.90, while those for GLMs had a median value of 0.46, implying overconfident predictions from the latter.

Table 1. Summary of median fit statistics on historic data (internal validation) for models fit for 99 plant species. Coverage is proportion of times a 90% credible interval for probability of occurrence contained the observed prevalence value. Range refers to the range of significant spatial autocorrelation found in binned Moran's I tests.p_{D} is a measure of model complexity, interpreted as the effective number of parameters in each model. DIC is the Deviance Information Criterion, lower values indicate better fit. Different letters indicate significant difference based on a matched-pairs t-test between models, adjusted for multiple comparisons following the method of Holm (1979)

AUC

Coverage

Range (km)

Moran's I

p_{D}

DIC

GLM, general linear model; GLMM, general linear mixed model.

GLM

0.88 a

0.46 a

45 a

0.28 a

10.7

2012

GLMM

0.98 b

0.91 b

0 b

−0.02 b

98.2

1557

The posterior distributions of regression coefficients differed greatly between GLMMs and GLMs. Figure S1a in Appendix S1 shows an example of parameter posterior distributions for Salvia mellifera. Standard errors of GLMM coefficients were, on average, 2.17 times greater than that of GLM coefficients. GLMMs had fewer significant coefficients: of the 5 covariates examined, the mean number that were significant as either 1^{st} or 2^{nd} order (90% credible interval not including 0) was 4.5 for GLMs and 3.0 for GLMMs. GLM estimates generally fell within the 90% GLMM CI (70.4% of all parameter estimates).

The Moran statistics and range of autocorrelation given in Table 1 show that GLMMs nearly eliminated spatial autocorrelation of residual error (although 3 of the 99 species still showed significant dependence with adjacent grid cells), while all GLMs exhibited significant autocorrelation of residual error with a median range of 45 km.

Independent validation

Temporally independent validation with modern data yielded lower mean accuracy statistics than internal validation for both GLM and GLMMs (Table 2 and Fig. 4). AUC values were slightly higher for GLMMs compared to GLMs. Coverage rates for GLMMs showed only a slight drop (compared to internal validation), remaining very close to their nominal value of 0.90 (Table 2), while those for GLMs improved but remained poor. Restricting our independent validation to those grid cells that were sampled historically had little effect on accuracy statistics but caused a slight drop in coverage rates for both candidate models, while restricting validation to cells not sampled historically had also little effect on AUC, as was demonstrated in Dobrowski et al. (2011), but caused a slight increase in coverage rates for both candidate models (results not shown).

Table 2. Summary of median fit statistics on the modern data (independent validation) for models fit for 99 plant species. Coverage is proportion of times a 90% credible interval for p(occurrence) contained the observed prevalence value. Letters indicate significant differences in matched-pairs t-tests, adjusted for multiple comparisons following the method of Holm (1979)

AUC

Coverage

GLM, general linear model; GLMM, general linear mixed model.

GLM

0.88 a

0.61 a

GLMM

0.89 b

0.87 b

Range size estimates and predicted changes

Mean range size estimates were correlated between time periods (Pearson correlation coefficient r = 0.94 GLM, r = 0.99 GLMM) and candidate models (r = 0.65 historic, r = 0.68 modern). Range size estimates varied by model with GLM estimates averaging c. 70% larger than GLMM estimates for both time periods (Fig. S1b in Appendix S1). Interval widths for estimated range size averaged 48.4% of range size for GLMMs vs. 25.0% for GLMs. Estimated changes in range size were also highly correlated between candidate models (r = 0.77), but GLMM estimates predicted, on average, 50% smaller changes in range size. Figure 5 shows estimates of percentage range size change by model, highlighting estimated changes that were significant (α = 0.10). It is notable that the two models predicted similar numbers of significant changes, but in many cases failed to agree on which species were facing these changes. Figure 6 shows an example of the spatial distribution of predicted changes in probability of occurrence for Salvia mellifera.

Discussion

Performance under internal vs. independent validation

GLMMs consistently outperformed GLMs under internal evaluation, but performed similarly when confronted with temporally independent data. Under internal validation, the flexibility of the spatially structured random intercept allowed it to capture spatial patterns not accounted for by our climate covariates. These patterns were smooth in space, as evidenced by the spatial autocorrelation of GLM errors and the ability of GLMMs to account for these errors. The similar performance of the candidate models under independent validation was surprising. This is apparently due to a lack of temporal persistence, for most species, of the latent effects accounted for by the spatial random intercept. In effect, many of the species' distributions shifted in ways which could not be explained by our climate covariates. From a Bayesian perspective, the spatial random intercept can be viewed as an informative prior for projections into new temporal domains – drawing the projections back toward the historic ranges when information in the covariates is lacking. If the latent effects represented by the spatial random intercept are expected to change over time, it may be desirable to specify a temporally dynamic residual spatial process, allowing the influence of the spatial random intercept to evolve over space and time, see, e.g., Finley et al. (2012). To our knowledge, this methodology has not been applied to SDM projections.

Projection uncertainty

Although the spatial random intercept did not markedly improve the projection accuracy of GLMMs, its ability to account for variability not explained by covariates yielded improved estimates of uncertainty. Including such estimates alongside mean projections gives a ‘map of ignorance’ as called for by Rocchini et al. (2011), highlighting areas where knowledge is lacking and could be improved with additional sampling effort or the inclusion of additional covariates. For instance, for Salvia mellifera, a historically calibrated GLM projection showed high probability of occurrence in the coastal regions of Southern California, the southern reaches of the Central Valley, and eastern portion of the Mojave desert (Fig. 2). These projections are flawed as the species does not currently occur in the latter two regions of the state. In contrast, the influence of the spatial random intercept term in the GLMM projection (Fig. 3) is readily apparent as the latter two regions of the state show lower probability of occurrence and more importantly, higher levels of uncertainty in projections to these regions (Fig. 2). In addition to improving the projections, the spatial random intercept term can provide biogeographical insights into latent covariates that can better explain the species distribution. In this case, the unobserved spatial process may be frequent disturbance from fire in the coastal sage and chaparral communities in which this species is found. Salvia mellifera has facultative fire adapted reproductive traits (Keeley, 1986) and although we cannot definitively prove that the spatial intercept is actually characterizing this latent process, this interpretation is consistent with the disturbance regime of the region and the autecology of the species.

Conservation applications

Conservation applications of SDMs such as reserve design (Pearce & Lindenmayer, 1998; Carroll et al., 2010) and assisted migration of species (Vitt et al., 2009) represent costly management actions involving complex decisions for which the consequences of mistakes are high. The independently validated estimates of uncertainty we have presented have utility in this context, allowing alternatives to be assessed with regard to the confidence of projections. The results we present for Salvia mellifera provide a relevant hypothetical example (Fig. 2). If there were concerns over habitat loss for this species, c. 1935, then GLM results suggest the southern Central Valley and Sierra Nevada ecoregion as plausible translocation sites for assisted migration planning. However, the GLMM projection suggests that the suitability of these regions is far from certain, providing useful information to a hypothetical conservation planner.

SDMs are also used to project loss of habitat and subsequent extinction risk (Thomas et al., 2004; Loarie et al., 2008). Estimates of habitat loss (or gain) are driven by the shape of response curves for individual covariates, making them sensitive to model specification. In this context, spatial regression methods such as GLMMs offer a distinct advantage in that they have been shown to give more precise parameter estimates and are less likely to identify spurious covariates as significant in the presence of spatial autocorrelation (Beale et al., 2010). The latter issue can be especially problematic when automated model selection techniques are used in conjunction with non-spatial SDM methods, a situation common in SDM applications. In our analysis, GLMMs yielded substantially more conservative estimates than GLMs of range size and range size change through time. This was likely due to the ability of the spatial random intercept to correctly identify areas of known absence not predicted by climate alone. Additionally, predicting a contraction or expansion of suitable habitat may be of limited use for conservation planning without regard to spatial context. We demonstrate that the posterior distributions of model projections can be used to distinguish between areas where habitat loss (or gain) is more certain compared to areas where change is less certain (Fig. 6). This type of analysis is valuable because changes occurring in areas where we have very little confidence in our original estimates should be of less concern than changes occurring in areas known to contain the focal species.

Caveats

Numerous criticisms could be made of our methods. Weaknesses include the coarse resolution of our study, missing predictors and misspecification of models. We used GLMs for comparison, yet studies have shown more sophisticated methods such as generalized additive models, Random Forest and Boosted Regression Trees to produce better fitted models (e.g. Elith et al., 2006). Although such methods offer many advantages, little focus has been given to their estimates of projection uncertainty, and their accuracy under spatially (Randin et al., 2006) and temporally (Dobrowski et al., 2011) independent validation has been questioned. The other weaknesses noted above should affect both candidate models equally, although the advantage of GLMMs would disappear under conditions in which a model is correctly specified and all relevant predictors included, conditions rarely encountered in practice (Heikkinen et al., 2006; Dormann, 2007a). Finally, one might look to other approaches to assess candidate models' predictive ability, see, e.g., Gneiting & Raftery (2007) for a discussion of proper scoring rules.

Conclusions

We found that spatial regression models, although they produced similar levels of projection accuracy under temporally independent validation, gave improved estimates of uncertainty over non-spatial methods fit to the same data. The ability of GLMMs to account for residual SAC and hence provide valid estimates of uncertainty suggests they are more suitable for drawing inference about SDM parameters and subsequent predictions. The degree of uncertainty was high in our fitted models, but their output provides valuable insight into the nature of this uncertainty and suggests ways it might be reduced. GLMM methods produced more conservative estimates of range size and range size change, and although we cannot definitely say these are more accurate than those derived from conventional methods, the statistical validity of GLMMs favours their estimates. Useful projections of species' distributions into the future require an honest assessment of projection uncertainty. GLMMs with a spatially structured random intercept offer a clear improvement over commonly used methods.

Acknowledgements

This research was supported by the National Science Foundation (BCS-0819430 to S.Z.D., BCS-0819493 to J.H.T., EF-1137309 and DMS-1106609 to A.O.F), the USDA CSREES 2008-38420-19524 to A.K.S, the California Energy Commission PIER Program CEC PIR-08-006, and the USDA Forest Service Rocky Mountain Research Station (JV11221635-201). We thank Lorraine and Alan Flint for providing climate data, Jeff Braun and the Rocky Mountain Super Computing facility, and the many agencies and institutions that have collected and stewarded the historical and modern inventory data used in the analysis.

Alan Swanson is a graduate student at the University of Montana, USA. His interests lie in understanding the complex range of factors affecting the distribution of species and in statistical methods to account for uncertainty in natural systems.