A data‐driven geospatial workflow to map species distributions for conservation assessments

Species distribution maps are essential for assessing extinction risk and guiding conservation efforts. However, most come sourced as expert‐drawn range maps with known issues of accuracy or are developed with overly complex modelling procedures. Thus, data‐driven alternatives that are accessible and reliable are a welcome addition to the spatial conservation toolkit. Here, we developed a geospatial workflow to refine the distribution of a species from its extent of occurrence (EOO) to area of habitat (AOH) within the species range map. The range maps are produced with an inverse distance weighted (IDW) interpolation procedure using presence and absence points derived from primary biodiversity data.


| INTRODUC TI ON
Mapping species distributions is crucial to support conservation actions in the current extinction crisis (Bachman et al., 2019). The International Union for Conservation of Nature (IUCN) Red List of Threatened Species employs two spatial metrics that represent species distributions and support extinction risk assessments (IUCN, 2012): extent of occurrence (EOO) and area of occupancy (AOO). These metrics represent the upper and lower bounds of a species distribution and have a different theoretical basis-the former measures the degree of risk spread and the latter is closely linked to population size (Gaston & Fuller, 2009 Recently, Brooks et al. (2019) suggested that IUCN Red List assessments should incorporate area of habitat (AOH), defined as the area of potentially suitable ecological conditions for a species within its range, also known as extent of suitable habitat (KBA/SAC, 2019).
AOH is relevant to guide conservation by quantifying habitat loss and fragmentation within a species range (Brooks et al., 2019) and is already part of the criteria for identifying Key Biodiversity Areas (KBA) (IUCN, 2016).
To map AOH, researchers have relied mostly on refining expertdrawn range maps by clipping unsuitable areas based on published elevational limits and known habitat preferences (Harris & Pimm, 2008;Ocampo-Peñuela et al., 2016). This approach has been useful to inform the conservation status of different terrestrial organisms on a global (Ficetola et al., 2015;Rondinini et al., 2011;Tracewski et al., 2016) and local scale (Negret et al., 2021). However, expert-drawn range maps can have low accuracy due to errors of omission (known presences outside of the mapped area) and commission (known absences inside the mapped area) (Beresford et al., 2011;Mainali et al., 2020;Peterson et al., 2018). This is problematic at local scales where it is necessary to avoid mischaracterization of species distributional patterns for conservation applications (Ficetola et al., 2014;Mainali et al., 2020;Rahbek, 2005).
Because the AOH is a subset of the range (Brooks et al., 2019), the critical step to reliably map species AOH is to obtain accurate species range maps, which depict "the current known limits of distribution of a species, accounting for all known, inferred or projected sites of occurrence" (IUCN/SSC, 2019; KBA/SAC, 2019). There are different methods to produce species range maps, and it is crucial to understand the limitations and trade-offs of the different alternatives (Cantú-Salazar & Gaston, 2013;Maréchaux et al., 2017).
A common approach is to use species distribution models (SDM) that correlate environmental variables with known occurrences (Elith & Leathwick, 2009). These methods have been successfully applied to map species ranges and inform Red List assessments (Breiner et al., 2017;Pena et al., 2014;Syfert et al., 2014). However, SDMs are challenging to implement for multiple species, with modelling decisions that may be idiosyncratic and, if not well understood or explicitly communicated, can cause reduced transparency and reproducibility (Feng et al., 2019;Guisan et al., 2013;Sofaer et al., 2019). Additionally, methodological shortcomings could cause overprediction of species distributions and optimistic assessments of extinction risk (Velazco et al., 2020). Thus, there is a need for developing more straightforward approaches to produce accurate range maps especially for hundreds (or more) species (García-Roselló et al., 2019).
Another approach to map species distributions for conservation applications is spatial interpolation (Bahn & McGill, 2007;Li & Heap, 2008). Of the different types of interpolation available such as kriging, spline, or inverse distance weighting (IDW), the latter one is a deterministic method that is accessible, fast, user-friendly and could be readily employed to develop species range maps. IDW uses known presences and absences to produce a surface of probability values for the occurrence of a species within an area (Hijmans & Elith, 2019). The main assumption is that species are more likely to be found closer to presence points and farther from absences (i.e. spatial autocorrelation), and the local influence of points (weighted average) diminishes with distance. The ranges produced with IDW are consistent with the observation that distributions are naturally porous and have discontinuities (holes) at higher resolutions (Hurlbert & Jetz, 2007).
Inverse distance weighted has been used for mapping invasive plants (Roberts et al., 2004) and to understand the distribution of coral reef sessile organisms with better performance than ordinary kriging (Zarco-Perello & Simões, 2017). Moreover, distribution maps generated with this method have been found to largely overlap with SDMs derived from Maxent for hyper-dominant Amazonian tree species (Gomes et al., 2018). This overlap occurs when the environmental variables used for correlative modelling do not provide more explanatory power than the spatial structure of the point data (Bahn & McGill, 2007;Hijmans, 2012;Journé et al., 2020). Similarly to SDMs, interpolation with IDW benefits from larger sample sizes and unbiased spatial data, so procedures that reduce spatial bias and class imbalance between presence and absence points are recommended (Steen et al., 2021).
Here, we developed a geospatial workflow to map species distributions with presence and absence points derived from primary biodiversity data. To evaluate our workflow, we mapped the distribution of 723 forest birds in the Americas and assessed its performance in comparison with the range maps of Birdlife International. Our intent was to study differences and trade-offs to aid conservationists K E Y W O R D S area of habitat, geospatial analysis, inverse distance weighting, Red List assessment, species range maps take informed decisions when using mapping alternatives (O'Hara et al., 2017). Overall, by analysing differences in accuracy, spatial overlap, range size, and AOH estimates, we present an alternative approach that is suitable to map species distributions at local scales for conservation planning and decision-making.

| Geospatial workflow
We developed a geospatial workflow that uses presence and absence data to refine the distribution of a species from its extent of occurrence (EOO) to a final area of habitat (AOH) map within the species range. Users should first gather occurrence data, and we suggest following recent guidelines (Araújo et al., 2019;Feng et al., 2019).
Absences can be derived from known surveys, monitoring schemes, or citizen science projects that provide a measure of sampling effort (Johnston et al., 2021).
Our proposed geospatial workflow has five main steps ( Figure 1 (Amatulli et al., 2018) 3. Overlay presence and absence points to the clipped EOO. These are the data points that will be used for generating the range maps.
4. Interpolate presence and absence points using inverse distance weighting (IDW) within the clipped EOO. The output is a raster surface within the clipped EOO where each cell has a value between 0 and 1. The species range is obtained by using a threshold to convert probability of occurrence to a binary map where the species is present or absent (Liu et al., 2005). Choosing a threshold depends on the map's intended purpose and a default cut-off of 0.5 is not recommended (Freeman & Moisen, 2008;Jiménez-Valverde & Lobo, 2007).

| Case study
We implemented our geospatial workflow using a dataset of 723 resident forest landbirds in the Americas (North, South, Central America, and the Caribbean)-20% of all forest birds in the region according to the BirdLife Datazone (BirdLife International, 2021).
Our species selection focused on forest birds with smaller ranges (<1 M km 2 ; [Jenkins et al., 2013;Orme et al., 2005]) because they are typically associated with increased risk of extinction (Chichorro et al., 2018;Lucas et al., 2019). On the other end, we excluded species classified as Extinct (EX), Extinct in the Wild (EW), and possibly extinct (CR-PE) based on the 2021 IUCN Red List. We also excluded species endemic to the Galapagos, Cocos, and Hawaiian Islands. Furthermore, after data processing (next two sections), we removed species with less than 15 presences and absences (Papeş & Gaubert, 2007;Proosdij et al., 2016).

| Gathering and processing occurrence data
We gathered occurrence data from the Global Biodiversity Information Facility (GBIF; https://www.gbif.org). GBIF is an aggregator of multiple occurrence datasets spanning from museum specimen records to the eBird and iNaturalist citizen science projects, and it is extensively used for mappings species distributions (Heberling et al., 2021). We generated a GBIF occurrence data download (https://doi.org/10.15468/ dl.rrrrf3) on 01 July 2021 with the R package rgbif (Chamberlain & Boettiger, 2017) for an initial list of 1,696 species that fulfilled our species selection criteria.
The GBIF download was restricted to occurrences starting in the year 2000 to derive conservative estimates of 21st-century distributions. Filters were set for data with no geospatial issues and basis of record (i.e. type of evidence) as "observation," "human observation," "literature," "preserved specimen" or "material sample." The occurrence data from GBIF follow a stable taxonomic backbone that is not aligned with the taxonomy used in BirdLife (Burfield et al., 2017).
Hence, we needed to confirm that taxonomic names in both datasets actually relate to the same species concept (McClure et al., 2020).
Thus, we only included species where the GBIF occurrence points and the BirdLife distribution maps broadly covered the same geographical regions based on a visual assessment.
We employed a two-stage cleaning protocol to improve the quality of our study dataset (Panter et al., 2020). First, we removed duplicates and used the "clean_coordinates" function of the R package CoordinateCleaner (Zizka et al., 2019) with default settings to automatically filter common erroneous coordinates, such as those assigned to the sea, country capitals, or biodiversity institutions (Maldonado et al., 2015). Second, for each species we examined the spatial arrangement of the occurrence points and removed obvious geographical outliers using the interactive modular R platform Wallace (Kass et al., 2018). To reduce spatial bias in the occurrence data, we kept records at least 5 km apart by subsampling within equal-area hexagon cells (Johnston et al., 2021;Strimas-Mackey et al., 2020).

| Deriving absences
We developed an approach to identify absences based on eBird hotspots-publicly accessible locations that can aggregate observations from multiple independent checklists over time (Sullivan et al., 2009). Here, we deemed absences as eBird hotspots where a given species has never been recorded. We used this approach as a stringent criterion to minimize the uncertainties regarding absences (Lobo et al., 2010).
We extracted hotspots from the eBird Basic Dataset (eBird Basic Dataset, 2020) for 57 countries and territories in the Americas with the R package auk (Strimas-Mackey et al., 2016). We kept hotspots that had at least 10 complete eBird checklists (La Sorte & Somveille, 2020) that had a "stationary" or "traveling" survey protocol, with a duration <5 hr, distance <5 km and with up to 10 observers (Johnston et al., 2021). We obtained 73,634-point locations of hotspots which were used to identify absences on a species-byspecies basis. F I G U R E 1 Mapping protocol to refine the distribution of a species from extent of occurrence (EOO) to area of habitat (AOH). This is an illustrative example for the Glittering Starfrontlet (Coeligena orina), an endemic and threatened hummingbird in the western Andes of Colombia

| Case study workflow
Here, we describe the specific implementation of our geospatial workflow (section 2.1) and we provide code to reproduce it using R (see Data Availability Statement): 1. We drew each species EOO around the occurrence data as the minimum convex polygon (MCP).
2. We clipped the EOO to each species' elevational limits by extracting elevational values from the occurrence points. The elevational data were gathered using the srtm 4.1 DEM at 250-m resolution (Jarvis et al., 2008). To obtain conservative estimates, we removed the extreme low and high values outside the 95% elevational range limit (Neate-Clegg et al., 2021).
3. We overlaid presence and absence points to the clipped EOO.
Then, we removed absences within a 5-km buffer of any presence points. This accounted for the geoprecision of the eBird hotspots from which we derived the absences (Johnston et al., 2021;Strimas-Mackey et al., 2020). 4. We interpolated presence and absence points using inverse distance weighting (IDW) in the R package dismo (Hijmans et al., 2017). We transformed the continuous IDW surface into a species range map (presence/absence) following a fivefold crossvalidation procedure in which each fold is used for training and testing at least once, generating five different model partitions (required species with at least 15 presences and absences). We then averaged the threshold that maximizes overall accuracy as a binary cut-off-see accuracy assessment; section 2.3.2.
5. To derive AOH, we clipped a forest layer inside each species range map. We used a recently developed 50-m global forest/nonforest layer from the TanDEM-X satellite (Martone et al., 2018).
This product is designed to specifically estimate forest cover instead of other approaches that are more focused on evaluating forest cover change (e.g. Global Forest Watch). We aggregated to 250 m with a nearest neighbour assignment to match the resolution of our elevational data.

| Expert-drawn range maps
We gathered expert-drawn range maps from BirdLife International and HBW (2020). Each bird species is represented by polygons coded with different attributes that describe a species distribution.
For each forest species in our list, we extracted polygons coded as presence = Extant; Origin = Native; Season = Resident. These settings excluded parts of the range where a species might be introduced, or known to be extirpated. We then dissolved these polygons to represent each species range into a single spatial feature (Cantú-

Salazar & Gaston, 2013).
To estimate area of habitat (AOH) from the BirdLife range maps, we refined the polygon of each species using published elevational limits and forest cover (Ocampo-Peñuela et al., 2016). Species elevational limits were obtained from the IUCN Red List and if missing supplemented with information from Cornell's Birds of the World (https://www.birds ofthe world.org). We derived the AOH map for the BirdLife ranges using the same procedure as in the case study workflow (step 5; previous section).

| Accuracy analysis
We evaluated the performance of the IDW and expert-drawn range maps using two different facets of accuracy: reliability and discriminant capacity (Leroy et al., 2018). Reliability measures if the areas F I G U R E 2 Accuracy analysis for the expert-drawn range map of the Glittering Starfrontlet (Coeligena orina). A confusion matrix is used to evaluate the range map ability to correctly classify presences and absences. The IDW range is shown for visual comparison. The area of habitat (AOH) is not shown because accuracy was evaluated at the range map level. TP = true presences; FP = false presences; TN = true negatives; FN = false negatives. Photo by Anderson Muñoz depicted by a range map agree (or match) with areas of high probability of occurrence (Pearce & Ferrier, 2000) whereas the discriminant capacity is the ability to distinguish between presences and absences (Liu et al., 2011).
To evaluate discriminant capacity, we created binary IDW range maps (section 2.2.3) using 70% of the presences and absences, leaving the remaining 30% for testing. The expert range maps were evaluated against the same set of independent testing data. We used three basic metrics of discrimination performance (Figure 2 for reference): omission errors (false negatives), commission errors (false positives), and overall accuracy (Anderson et al., 2003). The overall accuracy was calculated using Jaccard's similarity index, which summarizes the ability of the mapped range to (a) maximize true presences, (b) reduce both false negatives and false positives and (c) disregard true negatives that are easily classified because of prevalence (Leroy et al., 2018;Li & Guo, 2013).
A caveat to the accuracy assessment based on data points is that it is limited by the availability of public data and it cannot account for the performance of a range map in undersampled areas. This means that the expert-drawn range maps could exhibit a higher accuracy than what the evaluation of discriminant capacity suggests, provided their predictions correctly point to areas where a species is actually present or absent. Thus, to alleviate this limitation we assessed the reliability between the expert-drawn range map predictions and the IDW continuous probability surface. For this, we used the expert score of Mainali et al. (2020). Higher scores indicate that the expert range maps predict presences in areas of high probability of occurrence (even if sampling is scarce or not available) while avoiding areas of low probability (see Mainali et al., 2020 for details).

| Statistical analysis
We calculated average values of the accuracy metrics using 20% trimmed means as a robust measure of centrality and estimated 95% confidence intervals using a percentile bootstrap procedure (Wilcox, 2017). We further examined how the overall accuracy of the expert-drawn range maps influenced spatial overlap with the IDW range maps using a robust linear regression. For this, we first computed the spatial overlap between both sets of binary range maps as a ratio of the area of intersection to the area of union (Mainali et al., 2020).
We tested for a statistically significant difference in range size between both datasets with a two-sample Kolmogorov-Smirnov test (K-S test). We also evaluated differences across the entire distribution of range size values. We used a shift function with the rogme package in R (Rousselet et al., 2017) that allowed us to quantify differences between deciles of the expert-drawn and geoworkflow range maps (Figure 3). The same analyses were conducted for evaluating differences in AOH. We then computed a robust log-linear regression with an MM-type estimator in the R package robustbase (Maechler et al., 2019) to examine how omission and commission errors influence range size in the expert-drawn range maps.
The results are visualized in Figure 3, and the accuracy metrics for each of the species analysed are available in Appendix S1.
Notwithstanding, we found the reliability of the expert-drawn range maps was fairly high, with their delineated areas matching areas of high probability of occurrence in the IDW surface (expert score 68%; 95% CI = 0.66-0.70; p < .001).

| Range size and AOH estimates
For both sets of range maps (n = 723 each), range sizes spanned from 1,667 km 2 to 4,336,938 km 2 . We did not find a significant difference in the distributions of range sizes (Figure 4a; K-S test = 0.05, p = .365) but there was a different distribution of AOH F I G U R E 3 Comparison of accuracy metrics between the IDW and expert-drawn range maps assessed with an independent dataset of presences and absences for each species (see methods). The boxes show the interquartile range (IQR) which visualizes the spread of 50% of the data points between the 25th and 75th percentile. The notches represent 95% confidence intervals of the median indicating differences because of no overlap size (Figure 4b; K-S test = 0.20, p < .001). The robust linear regression showed that for the expert range maps, omission errors had a greater influence on decreasing range size (coef = −0.82, SE = 0.12, t = −6.8, <0.001) than the effect of commission errors on increasing it (coef = 0.5, SE = 0.08, t = 6.39, <0.001).
A shift function showed the differences between the deciles of the expert and IDW range maps did not differ significantly (Figure 4c) but we found the expert-derived AOH maps are consistently smaller across the entire distribution of values ( Figure 4d). Thus, the typical AOH derived from the expert range maps was substantially smaller than the AOH obtained from the IDW range (expert range map t mean = 98,838 km 2 ; IDW range map t mean = 58,019 km 2 ) with an estimated mean difference of 40,819 km 2 (95% CI: 30,503-51,163 km 2 , p < .001). On average, the AOH from our workflow represented 63% of the range, whereas for the AOH derived from the expert maps was 33%.

| D ISCUSS I ON
We developed a reproducible geospatial workflow to map species distributions starting from primary biodiversity data that we tested for 723 resident forest birds in the Americas. We used inverse distance weighting (IDW) interpolation with presence data from GBIF and obtained absence records from eBird hotspots. Our results showed that on average, the geospatial range maps had an improved accuracy over expert range maps, which have previously been shown to exhibit substantial errors of omission and commission (Hurlbert & Jetz, 2007;Jetz et al., 2008;Mainali et al., 2020;Peterson et al., 2018). In consequence, the derived AOH maps from our workflow can be used more reliably for activities such as establishing protected areas, designing habitat corridors, or identifying critical areas for conservation actions.
Our approach is useful for local-scale applications but is not intended for every scenario-there are no "silver bullets" to represent species distributions for every purpose (Guillera-Arroita et al., 2015;Qiao et al., 2015). The generated maps stay close to the known data, which is useful for systematic conservation planning (Margules & Pressey, 2000) but not for exploration of areas likely to contain a species. Furthermore, our range maps are not suitable for macroecological studies (e.g. species richness or endemism) where coarser spatial resolutions are recommended (Hurlbert & Jetz, 2007). The expert maps are more appropriate in this situation because their delineated areas characterize the main core of species distributions at a broad scale, based on measuring their agreement with areas of high probability of occurrence in the IDW continuous surface of our workflow (Mainali et al., 2020).
On the other hand, the derived estimates of AOH from the expert-drawn maps were consistently smaller than our estimated AOH despite being generated from similar range sizes (Figure 4).
We attribute this result to a drastic reduction in area that occurs when refining the expert maps twice: first by published elevational limits and then by forest cover (Peterson, 2017). Moreover, this approach risks mischaracterization of species distributional patterns at F I G U R E 4 (a and b) Distribution of range map sizes and area of habitat (AOH) between the IDW and expert-drawn range maps. The thicker line corresponds to the median of the distribution and the thinner lines represent deciles. The size for the range maps for both datasets spans from 1,667 km 2 (log 10 = 3.2) to 4,336,938 km 2 (log 10 = 6.6). (c and d) Decile differences between the expert and IDW range maps (y-axis) as a function of the expert-drawn range maps deciles. The vertical lines indicate 95% confidence intervals derived with a percentile bootstrap procedure (Rousselet et al., 2017). There are no significant differences between range sizes whereas the AOH of the expert-drawn range maps are consistently smaller (see discussion) local scales and critical populations might be left out of the mapped AOH (Peterson et al., 2018;Figure 5). We therefore do not recommend refining expert maps without a previous assessment of their accuracy. In turn, the maps of AOH derived from the geoworkflow range maps can be more readily used for conservation planning and decision-making.
The geospatial workflow emphasized confirmed records and excluded areas of known absence which increased the spatial accuracy of predictions (Elith et al., 2006;Mainali et al., 2020). Besides the more reliable estimates from using primary biodiversity data (Peterson et al., 2018;Rotenberry & Balasubramaniam, 2020), our approach has the advantage that it can be adjusted based on user needs, and experts can review the produced range maps for quality checks to further improve their accuracy Merow et al., 2017;Velásquez-Tibatá et al., 2019). This can facilitate the integration of mapping species ranges with extinction risk assessments for the IUCN Red List, especially since our protocol starts with defining species extent of occurrence (EOO), which serves both as a metric under criterion B of the Red List and as the limits of the mapping area for a given species.
In the urgency to evaluate the conservation status of as many species as possible and implementing measures to halt biodiversity loss, our geospatial workflow is a valuable addition to the toolkit of conservation practitioners and decision-makers as it is consistent F I G U R E 5 Examples of spatial mismatch between the geospatial workflow and expert-drawn range maps for three forest birds in the Americas. (a) The Wattled Guan (Aburria aburri) in the tropical Andes. The IDW range map and its derived area of habitat (AOH) are larger than the expert map which exhibits a high rate of omission errors, including omitting the range in ecoregions such as the Santa Marta mountains (northern Colombia), and many areas of the western and eastern Andes. (b) The Black-throated Jay (Cyanolyca pumilo) in Mesoamerica. The species has little habitat remaining which is apparent from the geoworkflow mapping, whereas the coarse resolution of the expert map gives an overly optimistic outlook of the distribution of this range-restricted bird. (c) The Streak-capped Antwren (Terenura maculata) in the lowlands of the Atlantic Forest region. The IDW range improved predictions by avoiding areas of known absence but also expanded to areas of presence that were not represented in the expert map. The AOH of the species is highly fragmented and easily applicable. Notwithstanding, we acknowledge the scarcity of spatial data is a limiting factor and our approach might not adequately map ranges for severely under sampled groups, or those from which it is not yet possible to derive absence data. Only with more field explorations and carefully designed biological surveys, such as standardized camera trapping surveys, will we overcome the scarce knowledge of species distributions around the globe for effective on-the-ground conservation actions (Wilson, 2017).

ACK N OWLED G EM ENTS
We thank S.L. Pimm and R. Huang for suggestions during the initial development of the geospatial workflow. We thank N. Hazzi and M.
Di Marco for useful comments and edits.

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interest.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13424.

DATA AVA I L A B I L I T Y S TAT E M E N T
The bird species distribution maps of the world are available upon request to BirdLife International (http://dataz one.birdl ife.org/speci es/reque stdis). Species occurrences can be gathered from its data providers through GBIF.org. The eBird observations and associated metadata can be obtained upon completion of a data request form (https://ebird.org/data/download). The srtm 4.1 DEM is freely available for download from the CGIAR-CSI Consortium for Spatial Information (http://srtm.csi.cgiar.org/srtmd ata/) The TanDEM-X Forest/Non-Forest Map-Global can be freely downloaded from (https://downl oad.geose rvice.dlr.de/FNF50/). The R code to reproduce the geospatial workflow is available from the Dryad Digital Repository (https://doi.org/10.5061/dryad.9cnp5 hqjm).

R E FE R E N C E S B I OS K E TCH
Ruben Dario Palacio is the founder and science director of