Volume 25, Issue 5
BIODIVERSITY RESEARCH
Open Access

Spatially explicit species distribution models: A missed opportunity in conservation planning?

Sami Domisch

Corresponding Author

E-mail address: domisch@igb-berlin.de

Leibniz‐Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany

Correspondence

Sami Domisch, Leibniz‐Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany.

Email: domisch@igb-berlin.de

Search for more papers by this author
Martin Friedrichs

Leibniz‐Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany

Department of Biology, Freie Universität Berlin, Berlin, Germany

Search for more papers by this author
Thomas Hein

Institute of Hydrobiology and Aquatic Ecosystem Management, University of Natural Resources and Life Sciences, Vienna, Austria

Search for more papers by this author
Florian Borgwardt

Institute of Hydrobiology and Aquatic Ecosystem Management, University of Natural Resources and Life Sciences, Vienna, Austria

Search for more papers by this author
Annett Wetzig

Leibniz‐Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany

Search for more papers by this author
Sonja C. Jähnig

Leibniz‐Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany

Search for more papers by this author
Simone D. Langhans

Leibniz‐Institute of Freshwater Ecology and Inland Fisheries, Berlin, Germany

Department of Zoology, University of Otago, Dunedin, New Zealand

BC3 – Basque Centre for Climate Change, Leioa, Spain

Search for more papers by this author
First published: 30 January 2019
Citations: 4

Abstract

Aim

Systematic conservation planning is vital for allocating protected areas given the spatial distribution of conservation features, such as species. Due to incomplete species inventories, species distribution models (SDMs) are often used for predicting species’ habitat suitability and species’ probability of occurrence. Currently, SDMs mostly ignore spatial dependencies in species and predictor data. Here, we provide a comparative evaluation of how accounting for spatial dependencies, that is, autocorrelation, affects the delineation of optimized protected areas.

Location

Southeast Australia, Southeast U.S. Continental Shelf, Danube River Basin.

Methods

We employ Bayesian spatially explicit and non‐spatial SDMs for terrestrial, marine and freshwater species, using realm‐specific planning unit shapes (grid, hexagon and subcatchment, respectively). We then apply the software gurobi to optimize conservation plans based on species targets derived from spatial and non‐spatial SDMs (10%–50% each to analyse sensitivity), and compare the delineation of the plans.

Results

Across realms and irrespective of the planning unit shape, spatially explicit SDMs (a) produce on average more accurate predictions in terms of AUC, TSS, sensitivity and specificity, along with a higher species detection probability. All spatial optimizations meet the species conservation targets. Spatial conservation plans that use predictions from spatially explicit SDMs (b) are spatially substantially different compared to those that use non‐spatial SDM predictions, but (c) encompass a similar amount of planning units. The overlap in the selection of planning units is smallest for conservation plans based on the lowest targets and vice versa.

Main conclusions

Species distribution models are core tools in conservation planning. Not surprisingly, accounting for the spatial characteristics in SDMs has drastic impacts on the delineation of optimized conservation plans. We therefore encourage practitioners to consider spatial dependencies in conservation features to improve the spatial representation of future protected areas.

1 INTRODUCTION

In the light of the ongoing decline in global biodiversity (Pimm et al., 2014), the implementation of protected areas in the terrestrial, marine and freshwater realms is yet the most widely used conservation approach to reduce the loss of biodiversity. Consequently, how to delineate protected areas so that they produce optimal outcomes for the targeted conservation features in a cost‐effective way has been widely covered in the systematic conservation planning literature (Barnes, Glew, Wyborn, & Craigie, 2018; Margules & Pressey, 2000). In the past, conservation goals mainly reflected habitat or species representation. Only recently the focus has shifted towards considering environmental and ecological processes, which are essential for securing species persistence (e.g., Klein et al., 2009). Such processes shape the distribution and abundance of species (Pressey, Cabeza, Watts, Cowling, & Wilson, 2007) with connectivity playing a paramount and distinct role in terrestrial (Lockwood, 2010), marine (Carr et al., 2003), and freshwater ecosystems (Hermoso, Filipe, Segurado, & Beja, 2018). Incorporating spatial connectivity in the planning process has, therefore, important implications for designing protected areas (Daigle et al., 2018; van Teeffelen, Cabeza, & Moilanen, 2006; Weeks, 2017). This fact is also reflected in the software that is used in conservation planning, such as marxan (Ball, Possingham, & Watts, 2009) or zonation (Lehtomäki & Moilanen, 2013). All of them base, among other parameters, the selection of potential planning units on algorithms that account for their spatial connectivity.

When accounting for spatial connectivity, conservation plans inherently build the protected areas based on the spatial dependencies in the planning units as well. For example, it is vital to account for the spatial structure of the environment around a given planning unit for assessing its importance as part of a protected area (Daigle et al., 2018; Weeks, 2017). However, this key characteristic of spatial dependency is rarely applied in the underlying conservation features themselves, which provide the basis for the conservation planning.

The most widely used conservation feature in conservation planning is the geographical distribution of multiple species, as species distributions are often best known compared to other biodiversity facets (such as functional or phylogenetic characteristics of species; McGill, Dornelas, Gotelli, & Magurran, 2015). However, species point occurrence data alone are not useful in systematic conservation planning, which requires range‐wide and seamless data of the targeted conservation features (Tulloch et al., 2016). This is likely one of the major challenges for conservation planners—to overcome the Wallacean shortfall, that is, to know the full geographical distribution of species (Bini, Diniz‐Filho, Rangel, Bastos, & Pinto, 2006; Meyer, Weigelt, & Kreft, 2016).

Currently, species distribution models (SDMs) are the main tools used to produce such range‐wide species distribution data (Guisan & Thuiller, 2005). SDMs assess habitat preferences in an environmental envelope given the species occurrences and the environmental predictors at the respective locations, and project a probabilistic habitat suitability index across the study area (Elith & Leathwick, 2009).

Species distribution models are considered useful in conservation planning when used within a “structured and transparent decision‐making process” (Guisan et al., 2013). In reality, however, most conservation plans are still based on species’ surrogates such as habitat maps, expert‐derived species distributions, or connectivity surfaces (Tulloch et al., 2016). As reviewed by Tulloch et al. (2016), SDMs are not yet widely used in conservation planning because of several constraints: (a) poor availability of species occurrence data and (b) respective predictor data, (c) uncertainties associated with the data and (d) the wish to understand processes rather than patterns, requiring alternative and additional data. Here, (a–c) are clearly methodological issues that need to be assessed with care, while (e) goes beyond the expectations towards SDMs, since per definition, they do not deliver information on processes. Hence, it is key to distinguish between constraints that can be tackled (Dormann, 2007; Fourcade, Besnard, & Secondi, 2018; Pearson & Dawson, 2003), and those that are beyond the actual purpose of SDMs (Araújo & Peterson, 2012).

Species distribution models require rigorous testing for methodological issues and statistical shortcomings (Record, Fitzpatrick, Finley, Veloz, & Ellison, 2013; Tulloch et al., 2016). Given that spatial conservation plans incorporate spatial information among planning units, it is obvious that SDMs that yield the conservation feature data should make use of the same information. However, the few studies that modelled and used species distributions for conservation planning are typically not accounting for such spatial dependencies (e.g., in the terrestrial (Rondinini et al., 2011), freshwater (Esselman & Allan, 2011) and marine realms (McGowan et al., 2013), but see White, Schroeger, Drake, and Edwards (2014)). The disadvantage of such spatially implicit SDMs (hereafter referred to as “non‐spatial SDMs”) is that they do not account for spatial dependence or irregular sampling intensity (e.g., Latimer, Wu, Gelfand, & Silander, 2006), and frequently violate the assumption of independent samples and spatial units (Araujo, Pearson, Thuiller, & Erhard, 2005; Hampe, 2004). Assuming independence of spatial units means that the model is not aware of the predicted habitat suitability of a given species in neighbouring planning units. In contrast, the optimization of the spatial conservation plan, that is, the selection of planning units, does account for the spatial relation among them, favouring neighbouring ones over those that are far apart.

In contrast, spatially explicit SDMs (“spatial SDMs”) account for the proximity and mobility (i.e., connectivity) in species populations. Hence, they provide more powerful inference about species distributions and niche relations (Latimer et al., 2006; De Marco, Diniz‐Filho, & Bini, 2008). Incorporating the assumption of spatial dependencies in the data has, therefore, the potential to provide more robust SDM predictions (Guisan & Thuiller, 2005; Record et al., 2013). Most importantly from a conservation planning perspective, spatial SDMs make use of the spatial information and dependencies of the species and environment in the planning units. Habitat suitability predictions from spatial SDMs are generally more contiguous (Domisch, Wilson, & Jetz, 2016) and less patchy than those derived from non‐spatial SDMs. The question remains, whether such contiguous predictions have the potential to maximize the management efficiency of spatial plans.

Species distribution models can be made spatially explicit in numerous ways (see Dormann, 2007, for a review of various methods). Spatial aspects are included in the SDM given the spatial autocorrelation of species distributions (Besag, 1974; Record et al., 2013). Spatial autocorrelation is a common phenomenon in ecology, as nearby locations (given species occurrences and environmental predictors) tend to be more similar compared to those located farther apart (Legendre, 1993; Record et al., 2013). One possible method to add the spatial dimension in SDMs can be, for instance, through spatial random effects (Dormann, 2007). Spatial random effects can be computationally demanding. However, their use may be warranted—or even required—when the task is to limit the false positive and false negative predictions of probability of occurrence within a spatial unit (Record et al., 2013). Limiting false positive/negative predictions is crucial in conservation planning, since the over‐ or underestimation of suitable habitats (and therefore species occurrences) impacts species protection itself and produces unnecessary costs, that is, for establishing protected areas in erroneous locations where the target species do not actually occur.

Here, we analyse whether accounting for spatial dependencies in both steps, the spatial prioritization process and in the underlying conservation features (here: species), influences the arrangement of potential protected areas. Using species survey data, we build non‐spatial and spatial SDMs and compare the resulting mapped spatial conservation plans across a range of conservation targets, that is, proportions of potential species distribution areas. We hypothesize that (a) spatial SDMs would outperform non‐spatial SDMs in terms of model evaluation scores, since spatial SDMs account for the influence of proximity in species populations; (b) this effect cascades through to the spatial conservation plans, and that those derived from spatial SDMs would differ significantly from the ones based on non‐spatial SDMs. We test these hypotheses in three case studies covering terrestrial, marine and the freshwater realms. We do so, because protected areas in each realm are typically planned using different shapes of spatial units (grids, hexagons and subcatchments, respectively), and the planning unit shape has shown to impact the spatial pattern and effectiveness of protected areas (Nhancale & Smith, 2011).

2 METHODS

2.1 General workflow of the analysis

Our aim is to test whether, and if so, how the addition of spatial information in the SDMs leads to different species habitat suitability predictions, and hence to different spatial conservation plans. The analysis is divided into five stages: (a) preparing the data and aggregating it into spatial units; (b) testing the degree of spatial autocorrelation in the species data to see whether adding spatial random effects in the SDMs is warranted; (c) running two SDMs for each species, one that does not use the spatial information (non‐spatial SDM) and another one that takes advantages of the spatial information via the spatial random effects (spatial SDM); (d) using the outputs from the spatial and the non‐spatial SDMs as separate inputs to calculate spatially optimized conservation plans with varying conservation targets; and, finally, (e) comparing the spatial conservation plans derived from the non‐spatial and the spatial SDMs. As the non‐spatial and spatial SDMs are, except for the spatial random effects, identical, this allows us to single out the effect that the spatial information in SDMs has on the subsequent spatial conservation plans.

2.2 Study areas and species data

The terrestrial case study (Figure 1a) comprised survey data of 33 Eucalyptus species in Eastern Australia derived from Fithian, Elith, Hastie, and Keith (2015). This dataset encompasses species detections/non‐detections sampled annually from 1970 to 2013 as well as environmental data covering climate and topography and derivatives thereof (Fithian et al. 2015). Environmental data layers were re‐sampled from the original 250‐m spatial resolution by factor 20 to obtain a grid with the individual squares sizing 0.05° (approximately 25 km2), using the raster package in R (Hijmans & van Etten, 2018; R‐Core‐Team, 2018). This yielded 7,763 grid cells that were used for all subsequent analyses.

image
Spatial arrangement of potential protected areas derived from non‐spatial and spatial SDMs, as well as their overlap in the terrestrial (a), marine (b) and freshwater realms (c), each using a specific planning unit shape (grids, hexagons, subcatchments). Spatial plans were defined for a 20% conservation target within a 10% gap to optimality. The inset shows the location of the study areas in Australia, US East Coast and the Danube river basin. See Supporting Information Figures S3–S5 for maps showing a wider range of conservation targets

The marine case study (Figure 1b) consisted of monitoring data of 53 reef fish species in the Southeast U.S. Continental Shelf region. Fish detections and non‐detections were derived from the Southeast Reef Fish Survey (SERFS) that was conducted annually from 1990 to 2013 (Bacheler et al., 2014; SEAMAP‐SA, 2017). Environmental data on ocean topography, currents, nutrients and light were obtained from the Bio‐Oracle (approx. 9 km spatial grain at the Equator, Tyberghein et al., 2012; Assis et al., 2018) and MARSPEC datasets (approx. 1 km, Sbrocco & Barber, 2013). All environmental data were harmonized to 7,123 hexagonal grids of approx. 19 km2, and the species data were summarized for each hexagonal grid cell (created using the R‐package sp; Pebesma & Bivand, 2005; Bivand, Pebesma, & Gomez‐Rubio, 2013).

The freshwater case study (Figure 1c) was based on detection/non‐detection data of 85 fish species across the Danube river basin. Fish survey data were derived from the EFI+ and the BioFresh project databases, sampled during 1955–2007 (Schinegger, Pletterbauer, Melcher, & Schmutz, 2016; Zupancic, 2015). We used the HydroBASIN dataset (Lehner & Grill, 2013) and selected all level 12 subcatchments draining into the Black Sea, comprising of 7,376 subcatchments with an average size of 108 km2. We then extracted climatic (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005), topographic (Amatulli et al., 2018) and land cover variables (Tuanmu & Jetz, 2014, all at ~1 km spatial grain), as well as the number of dams (Lehner et al., 2011) for each subcatchment, the latter representing a proxy for human impact besides land use.

Species data, including species detections, non‐detections and sampling dates, were aggregated to the planning units. Within each planning unit, environmental data were aggregated using various techniques (e.g., average, standard deviation; see Supporting Information Table S1). In the freshwater realm, we summed the precipitation across each subcatchment and routed the upstream accumulated precipitation along the hydrography to mimic run‐off (Domisch, Amatulli, & Jetz, 2015).

From a large set of layers per case study, we selected those that are meaningful from an ecological perspective to describe the distribution of the species. The decision regarding the choice of predictors was based on a mixture of expert knowledge, data availability and on preliminary model runs: For each realm, we did multiple preliminary model runs where we tested which set of predictors yields the highest model evaluation scores. Hence, we aimed to get the best‐possible model predictions for each realm. The set of predictors among realms is different as each realm needs to take realm‐specific characteristics into account. Highly intercorrelated environmental predictors were omitted (i.e., where Pearson's correlation tests between different predictors yielded a correlation coefficient >0.7 or <−0.7) (Dormann et al., 2013, see Supporting Information Figure S1 for all predictors used). All (continuous) predictors were centred (so all predictors have a mean of 0) and scaled by dividing by their standard deviations. The final number of predictors was 13, 11 and 8 for SDMs in the terrestrial, marine and freshwater realms, respectively.

2.3 Defining spatial connectivity

Regarding spatial SDMs, the connectivity among planning units for the terrestrial and marine realms was computed from a polygon shapefile using the R‐package spdep (using the queen's move; Bivand, Hauke, & Kossowski, 2013; Bivand & Piras, 2015). This procedure identifies all first‐order neighbours of each focal planning unit, that is, those planning units that are directly adjacent and connected. In the freshwater realm, it is crucial to account for the longitudinal connectivity among planning units (Abell, Allan, & Lehner, 2007; Hermoso, Linke, Prenda, & Possingham, 2011). Hence, the spatial connectivity was defined as all upstream subcatchments connected within a 100 km (as‐the‐fish‐swims) distance of a given subcatchment. This distance was chosen as a trade‐off between hydrological connectivity (which can exceed 100 km) and the computational demand and time for calculating SDMs and spatial plans. Note that with this information, full connectivity matrices (range‐wide and up/downstream) are built within SDMs and the spatial prioritization. In other words, each planning unit “knows” which planning units are connected to it.

2.4 Detecting spatial autocorrelation

We first assessed the degree of spatial autocorrelation in species occurrences among planning units (where occurrences were aggregated within planning units). Here, we used the Monte Carlo simulation method and calculated Moran's I and the associated p‐value for each species (at α = 0.05), using the R‐package spdep (with 999 permutations). Spatial weights were kept identical to the connectivity definition of planning unit above, specifying proportional weights to the connected planning unit (row standardization).

2.5 Species distribution modelling

We ran Bayesian occupancy models for predicting the habitat suitability of species in each realm. Our aim was to minimize the methodological shortcomings in SDMs that could be ported to the spatial prioritization. Hence, we refrained from artificial pseudo‐absences to draw species non‐detections, but used survey data with repeated visits to account for the detection probability of species at given sites. Specifically, we ran zero‐inflated binomial SDMs in a Bayesian framework using the hsdm R‐package (Vieilledent et al., 2014) with “hSDM.ZIB” and “hSDM.ZIB.iCAR” functions for the non‐spatial and spatial SDMs, respectively. These functions integrate two processes, a Bernoulli suitability and a Binomial observability process, into a hierarchical zero‐inflated binomial model. The Bernoulli suitability process uses species point records and environmental predictors as the response and explanatory variables, respectively (Vieilledent et al., 2014):
urn:x-wiley:13669516:media:ddi12891:ddi12891-math-0001(1)
urn:x-wiley:13669516:media:ddi12891:ddi12891-math-0002(2)

where zi is a random variable describing the binary habitat suitability at planning uniti, which follows a Bernoulli distribution of parameter θi; that is, the probability that the habitat is suitable in the planning uniti. θi is expressed as a linear model combining the matrix of environmental predictors Xi and parameters β using a logit link function.

In the spatial models, Equation 2 is extended by an intrinsic conditional autoregressive model (iCAR) in the suitability process:
urn:x-wiley:13669516:media:ddi12891:ddi12891-math-0003(3)
where ρi is the spatial random effect of the planning uniti. ρi accounts for the spatial autocorrelation of the presence probabilities variability in suitability that is not explained by the environmental variables:
urn:x-wiley:13669516:media:ddi12891:ddi12891-math-0004(4)

where μi is the mean of ρi in the neighbourhood of i, Vρ is the variance of the spatial random effects, and ni is the number of neighbours for the planning uniti (see also Latimer et al., 2006, for the formal description of the iCAR). Thus, we assume that in addition to the environmental characteristics within the planning units, the species’ occurrence probabilities also depend on the occurrence probabilities of the neighbouring planning unit (Vieilledent et al., 2014). A planning unit with “poor” environmental conditions next to planning units with “good” environmental conditions will result in higher estimated suitability, than if its neighbours also show “poor” environmental conditions (Domisch et al., 2016).

Regarding the observability process, we counted the number of repeat surveys within each planning unit; that is, we aggregated the surveys over time, yielding the information on how often each planning unit was visited. The model then estimates the probability of observation given the species presence in a planning unit, where we assume that, if the species was observed at least once during multiple visits in a given planning unit, the habitat is suitable and the absence of the species during other visits in the planning unit is due to imperfect detection. For instance, a given species was observed once in planning units A and B. Planning unit A was visited once, yielding a detection probability of 1, whereas B was visited 10 times (yielding a detection probability of 1/10 = 0.1). In combination with the suitability process (i.e., environmental conditions) and the spatial autocorrelation (i.e., the spatial neighbourhood information), the observability information yields more robust estimates in unsampled locations, that is, whether a species actually occurs there but was not detected, or if it is actually not present (and hence not detected). The probability of observing the species (δρ) was specified as:
urn:x-wiley:13669516:media:ddi12891:ddi12891-math-0005(5)
urn:x-wiley:13669516:media:ddi12891:ddi12891-math-0006(6)

where yi is a vector of the total number of observed presences in planning uniti. zi is the binary habitat suitability in planning uniti from the suitability process (Equation 1), and ti denotes the total number of visits within the planning uniti, including the non‐detections as described above. yi follows a binomial distribution of the combination of the habitat suitability zi at planning uniti and the parameters δi and ti. In other words, δi is the probability of observing a species in a location, if it was actually present. We estimate only the intercept (γ0) using a logit link function and assume a spatially constant detection probability (due to the relatively small occurrence datasets and limited spatial replication needed to understand the role of covariates in variable detection probabilities).

We ran three Markov chain Monte Carlo (MCMC) simulations with 200,000 iterations each, a burn‐in phase of 50,000 iterations and a thinning interval of 10. Model convergence was assessed by the multivariate potential scale reduction factor (MPSRF; Brooks & Gelman, 1998). For the suitability process, we used the coefficients from an initial, non‐spatial generalized linear model (GLM) as initial values, and both suitability and observability processes used uninformative priors centred at zero with a fixed large variance of 100 (Domisch et al., 2016). The prior distribution for the variance of the spatial random effects followed a uniform distribution, that is, a flat prior where the upper bound of the variance is set to 10.

Species data were split into training (70%) and validation (30%) sets, and model performance for each species was evaluated using the area under curve (AUC), true skill statistic (TSS), sensitivity and specificity (true positive and negative predictions, respectively) and the deviance information criterion (DIC). Note that the non‐spatial SDMs did not have any pre‐defined spatial configuration during the SDM calibration and prediction; that is, each planning unit was considered independent of each other in the SDMs, while all other data were kept constant (i.e., identical random seed in the SDMs, and identical predictors and species data including the subsets for validation). The spatial prioritization was undertaken using the realm‐specific spatial connectivity as described above, for predictions derived from both non‐spatial and spatial SDMs.

2.6 Spatial prioritization

We transformed the predictive posterior mean probability maps from SDMs into a semi‐binary scheme using TSS as a threshold (Allouche, Tsoar, & Kadmon, 2006), where all values below the threshold were converted to zero and values above the threshold retained their original values. This overcomes the problem of inflating the spatial prioritization with many planning units having low probabilities of occurrence (e.g., 10 planning units with probabilistic values of 0.1 would equal one planning unit having a value of 1). Simultaneously, this procedure retained the information of varying probabilities of occurrence (as recommended by Tulloch et al., 2016) above a certain level of confidence as given by TSS (i.e., within the range of the predicted “presence” of a species). We refrained from locking‐in any current, already established protected areas as this would have decreased the flexibility in selecting protected areas given only the conservation features, which was the aim of this study.

To create the boundary files (describing the spatial connectivity) for the terrestrial and marine case studies, we used the QMarxan toolbox in QGIS (QGIS‐Development‐Team, 2017). For the freshwater realm, we applied an inverse‐distance connectivity penalty, where subcatchments located closer to a given focal planning unit would get a higher penalty if not chosen as part of the protected area (opposed to those planning units located more distant, as proposed by Hermoso et al. 2011).

We first calibrated the boundary length modifier (BLM) for our analyses. The BLM is dimensionless and balances the spatial aggregation and patchiness of spatial plans. For the BLM calibration, we used the software marxan v.2.43v (Ball et al., 2009) within the R‐package marxan (Hanson & Watts, 2015). We created all necessary input files in base R, except the “input.dat” file, where for convenience, we used the marxan R‐package. We then calibrated the BLM following the recommendations by Ardron, Possingham, and Klein (2008). First, we ran marxan with a fixed species penalty factor of 10, and BLM values set to 0, 0.00001, 0.0001, 0.001, 0.01, 0.1, 1, 10, 100, 1,000, 10,000, 100,000 to approximate the range of the optimal BLM value. We plotted the connectivity against the BLM (Supporting Information Figure S6) and identified the elbow, that is, the point from where an increase in the compactness (higher BLM values) has no major effect on the connectivity in spatial plans anymore. Similarly, we also mapped the spatial plans to visually confirm the increasing compactness derived from increasing BLM values. In a second step, we repeated the previous analyses and maps with setting the BLM ranging within the approximate BLM value (e.g., between 10 and 100). Again, we ran sensitivity runs within increments of 10 and identified the BLM to 35, 20, 0.15 for the terrestrial, marine and freshwater realms, respectively (Supporting Information Figure S6).

We then used the gurobi optimizer 7.5 software (Gurobi Optimization, 2017) to find optimal conservation planning solutions based on integer linear programming (ILP) within the R‐package prioritizr (Hanson et al., 2017). ILP has shown to out‐compete traditional simulated annealing tools (e.g., Marxan, Ball et al., 2009) in terms of time and accuracy (Beyer, Dujardin, Watts, & Possingham, 2016), using the same input files as marxan. We varied the conservation target for all species in the spatial plans, ranging from 10% to 50% in 5% increments. For instance, a conservation target of 20% means that for each species 20% of its suitable habitat (as given by the SDM predictions) are required in this specific spatial plan.

For each run, we allowed a 10% gap to optimality in the spatial plans (as trade‐off between optimality, and the time the optimizer takes to converge). To rule out the effect of the planning unit itself (cost, i.e., area of the planning unit), we set a constant value of 1 as the cost of each planning unit. Likewise, we set no weights for single species. This means that only the conservation features, that is, the probabilistic information on species habitat suitability and the connectivity penalty (set by the BLM), were decisive to the objective function that was set to minimize the total number of planning units to be part of the selected spatial plan (Game & Grantham, 2008). Note that opposed to, for example, marxan, gurobi provides one optimal solution and hence requires that all species meet the conservation target.

2.7 Statistical analyses

For each realm and model type, we compared model accuracy given the model evaluation scores, the estimated detection probability, the range size estimates of model outputs considering all planning units that had a probability value above the threshold (equal to binary predictions), and the summed probability of habitat suitability across all species within each planning unit as a proxy for species richness (Mateo, Felicísimo, Pottier, Guisan, & Muñoz, 2012). For each conservation target run (10%–50%) derived from spatial and non‐spatial SDMs, we (a) assessed whether species were covered by the spatial plans, (b) counted the relative number of planning units needed to fulfil the given target of the spatial plan and (c) compared the spatial overlap of spatial plans by assessing the percentage of overlapping spatial units.

3 RESULTS

3.1 Spatial autocorrelation

Across the terrestrial, marine and freshwater realms, all but three of the 171 modelled species showed a significantly positive spatial autocorrelation (α > 0.05 for the freshwater fish species White‐eye bream [Ballerus sapa], sunbleak [Leucaspius delineates] and brown trout [Salmo trutta lacustris], see Supporting Information Table S2). Species occurrences across planning units were hence non‐randomly distributed, warranting the use of spatial random effects within SDMs to account for spatial autocorrelation.

3.2 Model performance

The SDM performance indicators AUC, TSS, sensitivity and specificity were on average consistently higher for the spatial SDMs than for the counter‐parts (Figure 2). DIC was on average lower for the spatial SDMs in the terrestrial and freshwater realm, but higher in the marine realm (marine non‐spatial vs. spatial SDMs; DIC: 515 ± 485 vs. 534 ± 596, mean ± SD). All chains of the SDMs converged successfully yielding a MPSRF value of 0.999. The detection probability across realms was on average consistently higher in the spatial than in the non‐spatial SDMs (Figure 2f,l,r).

image
Model evaluation scores representing AUC, TSS, sensitivity, specificity and DIC, as well as the estimated detection probability, summarized across 33 terrestrial (a–f), 53 marine (g–l) and 85 freshwater species (m–r) derived from non‐spatial (blue) and spatial SDMs (green). Bars represent median values and boxes the 1st and 3rd quartiles, respectively

3.3 Range size estimates

Spatial SDMs yielded generally more compact and less dispersed habitat suitability estimates (see exemplary maps in Supporting Information Figure S1), and this effect is also mirrored in the range size estimates. Range size estimates derived from spatial SDMs were significantly lower in the freshwater and terrestrial realms than in the non‐spatial models. This means that the predictions derived from spatial SDMs were more restrictive and species' suitable habitats were predicted to occur in less planning units than in the non‐spatial SDMs (see exemplary maps in Supporting Information Figure S1; non‐spatial vs. spatial SDMs in the terrestrial realm: 1,278 ± 750 vs. 1,057 ± 690 planning units, paired t‐test: t = 2.95, df = 32, p = 0.006; freshwater: 2,874 ± 1,491 and 2,532 ± 1,409 planning units, mean ± SD, paired t‐test: t = 2.0493, df = 84, p = 0.044). In the marine realm, no significant difference in range size estimates was observed (non‐spatial vs. spatial SDMs: 2,698 ± 1,564 and 2,628 ± 1,500 planning units, paired t‐test: t = 0.783, df = 52, p = 0.436).

3.4 Summed probabilities of habitat suitability

Across realms, spatial SDMs produced on average a lower summed habitat suitability across planning units (i.e., a proxy for species richness), than predicted by non‐spatial SDMs (non‐spatial vs. spatial SDMs in the terrestrial realm: 3.00 ± 1.28 vs. 2.37 ± 1.00 planning units, paired t‐test: t = 61.845, df = 7,762, p < 0.001; marine: 12.27 ± 3.01 vs. 11.79 ± 2.97 planning units, mean ± SD, paired t‐test: t = 28.458, df = 7,122, p < 0.001; freshwater: 21.36 ± 11.91 vs. 16.50 ± 9.48 planning units, paired t‐test: t = 46.057, df = 7,375, p < 0.001). See Supporting Information Figure S2 for a spatial representation of the differences in summed probabilities among non‐spatial and spatial SDMs.

3.5 Spatial similarity of spatial plans

All species targets across SDM types and spatial conservation plans were met in all spatial prioritization runs (data shown in the Pangaea repository). The degree of spatial overlap of potential protected areas derived from spatial and non‐spatial SDMs was target‐dependent. The overlap was lowest for the smallest conservation target (10%) with the per cent overlap ranging from 1% to 2%, and increased up to a maximum of 30% to 39% for a conservation target of 50% (Figure 3a, and Supporting Information Figures S3–S5). The number of required planning units increased linearly with increasing conservation targets: the higher the conservation target, the higher the required amount of planning units to meet the given target (Figure 3a).

image
(a) Spatial overlap in potential protected areas based on spatial and non‐spatial SDMs, along a conservation target of 10%–50% in the terrestrial, marine and freshwater realm. Please see Supporting Information Figures S3–S5 for the mapped spatial plans of each conservation target. (b) Relative number of planning units required for reaching the solution in spatial plans for the different realms with conservation targets ranging from 10%–50% in the terrestrial, marine and freshwater realms. Points and triangles along solid and dotted lines represent spatial plans derived from non‐spatial and spatial SDMs, respectively

The relative difference in the number of planning units needed for a given solution between spatial and non‐spatial SDMs was within a margin of 5% (Figure 3b). No significant differences in the number of planning units between spatial and non‐spatial models across conservation targets could be observed.

4 DISCUSSION

Incorporating connectivity has been successfully adopted in systematic conservation planning for building species migration and movement corridors (Margules & Pressey, 2000). Yet, this concept has been largely neglected in the underlying conservation features (e.g., species) that are used in the spatial planning. Our study provides a non‐exhaustive comparison, shedding light on the effects of non‐spatial versus spatially explicit conservation features in systematic conservation planning. We show that (a) spatially explicit predictions of species’ probabilistic habitat suitability outperform those derived from non‐spatial SDMs across the three realms, each with a specific landscape configuration and consequently distinct species dispersal and connectivity characteristics. Moreover, (b) spatially explicit predictions provide lower range size estimates, indicating less dispersed habitat suitability predictions. This effect cascades into the conservation planning as (c) the spatial arrangement of spatial conservation plans differs fundamentally between those derived from non‐spatial and spatial model predictions. Simultaneously, (d) both spatial and non‐spatial approaches require a similar number of planning units for delineating potential protected areas.

Spatial SDMs yielded habitat suitability predictions that were on average more accurate given their better model performance (Figure 2), than those derived from non‐spatial SDMs. This pattern is in line with previous findings from Record et al. (2013) due to three well‐known factors (reviewed by Dormann, 2007): (a) distance‐related dispersal or species interactions, (b) non‐linear species–environment relationships and (c) spatially structured environmental predictors causing spatial structuring in the response, that is, the modelled species’ habitat suitability (Besag, 1974). In addition, Crase, Liedloff, Vesk, Fukuda, and Wintle (2014) showed that if such induced spatial autocorrelation in species distributions is not accounted for, model predictions are likely to remain inaccurate. Contrary to Thibaud, Petitpierre, Broennimann, Davison, and Guisan (2014), who found that spatial autocorrelation itself has only a minor relative effect on model prediction accuracy (compared to the choice of predictors, sample sizes, sampling designs and modelling technique), our analyses reveal that accounting for spatial autocorrelation leads to a substantial improvement of model outputs (given that all other data were kept constant). This effect cascades into the resulting spatial plans: Those derived from spatial SDM predictions are likely to provide a better representation of areas needed to potentially protect, than spatial conservation plans derived from non‐spatial SDM predictions. In addition, spatial SDMs yielded, on average, a slightly higher detection probability and thus reduced the potential false absences caused by the imperfect detection of species (MacKenzie et al., 2017). Reducing the risk of overlooking species is therefore crucial for an effective conservation planning (Petitot, Manceau, Geniez, & Besnard, 2014).

We found only a minor overlap of potential protected areas based on spatial and non‐spatial SDMs for the commonly applied conservation targets of 10%–20% (e.g., the global target of 17% set by the Convention on Biological Diversity, CBD, 2010; Veach, Minin, Pouzols, & Moilanen, 2017). Interestingly, spatial plans, regardless of the modelling approach, required a similar amount of planning units. On average, spatial SDMs predicted a lower range size per species and, therefore, a lower number of species predicted to have a suitable habitat per planning unit. Hence, spatial plans using predictions from spatial SDMs require—relatively speaking—more planning units (e.g., assuming that non‐spatial and spatial SDMs would predict a suitable habitat of a species into 100 and 90 planning units, respectively, the spatial plans should cover theoretically 20 and 18 planning units given a target of 20%). This is likely to derive due to the, on average, lower summed probability of suitable habitats per planning unit in spatial SDM predictions (i.e., species richness sensu Mateo et al., 2012). Consequently, the optimizer (gurobi, in our case) had to choose from a smaller pool of available planning units. Hence, the optimizer is forced to seek additional planning units that lead to a remarkably different spatial arrangement of spatial conservation plans (Figures 1 and 3a). While we did not explicitly test for differences among realms, this pattern was similar across different landscape configurations and planning unit shapes. Broadening the perspective, non‐spatial SDMs tend to create artificial species “hot spots” by predicting a higher species richness per planning unit, and the derived spatial plans would take advantage of the umbrella effect, that is, where the conservation of a species confers protection to a larger number of co‐occurring species (Roberge & Angelstam, 2004). In our study, however, this pattern does not depict the best‐possible outcome given the lower model evaluation scores. In turn, spatially explicit SDMs have the potential, though not reducing the quantity of protected areas, to increase their quality (Barnes et al., 2018).

We acknowledge that a 10% gap to optimality, as we used it here, can introduce noise in the spatial arrangement of potential protected areas (e.g., Figure 3a). Even with this noise, however, differences in spatial plans were distinct in our study. While a more stringent gap (e.g., 0.5%, Beyer et al., 2016) could yield more accurate spatial plans, we intended to balance uncertainties in the spatial plans with those most probably apparent in the underlying data (note that due to computational reasons, we only extracted the mean posterior probability of SDMs). In other words, species and environmental data, as well as the aggregation of data into planning units, potentially lead to a certain level of uncertainty in the habitat suitability predictions in our study (Beale & Lennon, 2012). Hence, spatial plans should be given flexibility to accommodate such uncertainties in terms of the selection of spatial units during the spatial prioritization.

We singled out the effect of spatial autocorrelation in conservation features to explore the effect of the connectivity among planning units on the subsequent spatial conservation planning. We encourage future studies to further explore these effects using “real‐world” exercises, where the current and already established protected areas could be locked‐in during the spatial prioritization along with various costs for the planning units. This would decrease the flexibility in the selection of protected areas, and given the low overlap between non‐spatial and spatial conservation features, the impact on the outcome remains to be seen. Similarly, the degree of fragmentation (Cabeza, 2003) and the minimum patch size depend on the available area and financial resources (as a large amount of small patches are more expensive to manage (Smith, Minin, Linke, Segan, & Possingham, 2010)). Likewise, the implementation of various cost measures, such as area, land price or human influence, could be added in the analyses—a topic that was however beyond the scope of our study. Hence, testing spatial plans and with these additions and interactions has the potential to further reveal differences stemming from non‐spatial and spatially explicit modelled conservation features. In our study, the species data were contingent on publicly available survey data across the three realms, and hence, the species can be considered generalist species. Rare species are considered particularly important for spatial planning; however, such data were not publicly available for our study. It remains therefore to be seen whether SDMs of rare species would yield similar differences in spatial plans as shown in our study.

We note that by accounting for species’ observability by employing a hierarchical SDM, we aimed to limit the shortcomings of SDMs that else do not account for the fact that species might be overlooked during the sampling. Regarding the addition of the spatial information in SDMs, we want to highlight that instead of running hierarchical Bayesian models, other SDM techniques could be equally used, such as Generalized Linear Mixed Models using spatial random effects, or other SDMs algorithms that use spatial eigenvectors as a covariate containing information about the spatial structure in the study area (see Dormann, 2007, for a review on various methods).

5 CONCLUSIONS

Employing SDMs that account for spatial dependencies in conservation features provide a promising way forward to increase the quality of protected areas, however without increasing the area (and possibly costs) needed. They yield fundamentally different spatial conservation plans given more accurate species habitat suitability predictions, compared to SDMs that ignore such spatial dependencies. This highlights the importance of using best‐possible modelling practices sensu Dormann (2007) and Guisan et al. (2013), which is needed for a wider acceptance of SDMs in systematic conservation planning (Tulloch et al., 2016). Hence, we encourage modellers and practitioners to carefully assess the possibilities in adding spatial SDMs to their systematic conservation planning to derive sufficient options for choosing the optimal spatial arrangement of protected areas.

ACKNOWLEDGEMENTS

We wish to thank the following data providers for making their data available: the Flora Survey Module of the Atlas of NSW Wildlife by the Office of Environment and Heritage (OEH), the Southeast Reef Fish Survey (SERFS) comprising data from the Marine Resources Monitoring Assessment and Prediction program (MARMAP), the Southeast Area Monitoring and Assessment Program, the South Atlantic (SEAMAP‐SA) and the Southeast Fisheries Independent Survey (SEFIS), as well as the EFI+ and BioFresh consortia. This study was funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 642317. SDL has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska‐Curie grant agreement No. 748625, and SCJ from the German Federal Ministry of Education and Research (BMBF) for the “GLANCE” project (Global Change Effects in River Ecosystems; 01 LN1320A). We wish to thank Gwen Iacona and two anonymous referees for their constructive comments on an earlier version of the manuscript.

    CONFLICT OF INTEREST

    The authors declare no competing financial interests.

    DATA ACCESSIBILITY

    All results supporting the study are available at https://doi.pangaea.de/10.1594/PANGAEA.889033 (all model predictions, model evaluation scores and spatial conservation plans) for each realm. All environmental data used for modelling are available in the following repositories for download: www.worldclim.org, www.earthenv.org, http://sedac.ciesin.columbia.edu/data/set/grand-v1-dams-rev01, www.bio-oracle.org, www.marspec.org, and Supporting Information of Fithian et al. (2015). Species data are available for download in the following repositories: Supporting Information of Fithian et al. (2015), http://sero.nmfs.noaa.gov/sustainable_fisheries/caribbean/fish_indep_wkshp/surveys/sefis/, and www.freshwaterplatform.eu.

    BIOSKETCH

    Sami Domisch is an aquatic ecologist who is interested in freshwater biodiversity and biogeography. His research is based on spatial freshwater biodiversity science and biodiversity informatics, and how these tools can be further developed to yield robust estimates regarding freshwater biodiversity and conservation planning. The authors consist of a multidisciplinary team with a wide range of research interests, such as freshwater biodiversity, spatial modelling, river restoration, floodplain ecology and systematic conservation planning.

    Author contributions: SD, MF, SCJ and SDL designed the study; SD performed the modelling and spatial prioritization; and SD, MF and SDL analysed output data. TH, FP and AW collected and prepared data. SD, MF and SDL wrote the first draft of the manuscript, and all authors contributed substantially to revisions.

      Number of times cited according to CrossRef: 4

      • The role of spatial units in modelling freshwater fish distributions: Comparing a subcatchment and river network approach using MaxEnt, Ecological Modelling, 10.1016/j.ecolmodel.2020.108937, 418, (108937), (2020).
      • From topography to hydrology—The modifiable area unit problem impacts freshwater species distribution models, Ecology and Evolution, 10.1002/ece3.6110, 10, 6, (2956-2968), (2020).
      • Benthic diversity patterns and predictors: A study case with inferences for conservation, Marine Pollution Bulletin, 10.1016/j.marpolbul.2019.110748, (110748), (2019).
      • Identifying and applying an optimum set of environmental variables in species distribution models, Inland Waters, 10.1080/20442041.2019.1653111, (1-18), (2019).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.