Patterns of species richness hotspots and estimates of their protection are sensitive to spatial resolution

Species richness is a measure of biodiversity often used in spatial conservation assessments and mapped by summing species distribution maps. Commission errors inherent those maps influence richness patterns and conservation assessments. We sought to further the understanding of the sensitivity of hotspot delineation methods and conservation assessments to commission errors, and choice of threshold for hotspot delineation.


| INTRODUC TI ON
Broadscale spatial patterns of species richness are of great interest in the fields of macroecology and biodiversity conservation. There is a long tradition in which macroecologists have used species richness maps to identify the determinants of biodiversity (Currie, 1991;Hansen, Phillips, Flather, & Robison-Cox, 2011;Hurlbert & White, 2005;Rahbek, 2005). More recently, conservationists have been combining them with protected area maps to assess the efficiency of protected area networks (Di Marco, Watson, Possingham, & Venter, 2017;Jenkins, Van Houtan, Pimm, & Sexton, 2015;Veach, Di Minin, Pouzols, & Moilanen, 2017), and in hotspot analyses, which pair maps of the most species-rich areas ("richness hotspots") with maps of biodiversity threats to identify areas where conservation is most urgent (Martinuzzi et al., 2015;Myers, 1990Myers, , 2003Orme et al., 2005). Additionally, species richness maps can be used in tandem with spatial conservation prioritization algorithms to design improvements to protected area networks (Di Marco et al., 2017;Shriner, Wilson, & Flather, 2006).
Whereas species richness can be modelled directly with species counts and measurements of environmental predictor variables at sampled locations (Hawkins et al., 2003;Hurlbert & White, 2005Lawler et al., 2004, species richness maps are often created by summing maps of individual species distributions. A few different representations of species distributions are commonly used when deriving those summaries, each with its own strengths and weaknesses (Rondinini, Wilson, Boitani, Grantham, & Possingham, 2006) including (1) point-to-grid, (2) extent-of-occurrence (range) and (3) distribution models.
In the point-to-grid approach, species occurrence records are intersected with a coarse-resolution grid that covers the study extent to produce a map of grid cells where a species has been documented (Graham & Hijmans, 2006;Hurlbert & White, 2005). Such maps are known to have low rates of commission errors (i.e., falsely mapping a species where it does not actually exist) and high rates of omission errors (i.e., failing to map a species where it actually occurs) unless the grid cell size is very large (McPherson & Jetz, 2007;Rondinini et al., 2006).
In the extent-of-occurrence approach, a minimum bounding polygon within which a species occurs is delineated (Rondinini et al., 2006). Such maps are commonly called "expert drawn range maps" because they are usually drawn by specialists familiar with where species are known to occur or according to information gleaned from the scientific literature. As such, they are coarse depictions of species' distributions that have few or no delineations of where a species does not occur within the polygon ("porosity"; Hurlbert & White, 2005). Due to their coarse resolution and lack of porosity, range maps have high commission rates and low omission rates that can lead to overestimation of species protection and inefficient conservation prioritization decisions (Di Marco et al., 2017;Hurlbert & White, 2005;Rondinini et al., 2011).
Finally, spatially explicit distribution models can be used to mediate the high commission rates of range maps and the high omission rates of point-to-grid maps by estimating species occurrence using predictor variables along with environmental data layers (Rondinini et al., 2006;Scott & Jennings, 1998). These models can be built with empirical data in an inductive method or with information from the scientific literature and expert opinion in a deductive method (Rondinini et al., 2006;Scott et al., 1993). Furthermore, some models predict places where a species will occur ("species distribution models"), while others predict areas of suitable habitat ("habitat suitability models"). Although those two types of predictions are likely equivalent in some cases, that is not always true because suitable habitat is sometimes unoccupied (Pulliam, 2000). The error rates of distribution models can be better or worse than point-to-grid or range maps depending on the data or information used in the models (Graham & Hijmans, 2006;Rondinini et al., 2006).
Choices about what type of species distribution data to use for species richness maps are often determined by data availability (Meentemeyer, 1989) which is often limited when studies focus on broad extents or long lists of focal species. While fine-resolution maps are ideal, many species' distributions have not been mapped at such resolutions over broad extents (Wiens, 1989). One strategy for dealing with data scarcity is to relax data requirements and use whatever maps are available (or can feasibly be created) for all species of interest in the study extent; however, it is well documented that species richness patterns are scale-dependent and influenced by the type of underlying species distribution data (Fleishman, Noss, & Noon, 2006;Graham & Hijmans, 2006;Hess et al., 2006;Hurlbert & Jetz, 2007;Rahbek, 2005;Shriner et al., 2006;Stoms, 1994). Comparisons of the richness patterns derived from different types of distribution data (e.g., range, distribution models, point-togrid) have shown that spatial patterns vary by data type (Di Marco et al., 2017;Graham & Hijmans, 2006). Range maps produce higher richness estimates and are more spatially autocorrelated compared to maps based on data types that incorporate porosity (Hurlbert & Jetz, 2007). Similarity generally increases with grain size such that differences between data types are less pronounced at coarser resolutions (Graham & Hijmans, 2006;Hurlbert & Jetz, 2007). In cases where different data types do produce similar richness maps, differences in the data's spatial structure may affect the study's conclusions (McPherson & Jetz, 2007). As Shriner et al. (2006) found that patterns of richness mapped with coarse-grain data do not subsume those mapped with fine-grain data, one cannot reliably assume that coarse-grain identifications of richness will identify general areas within which fine-grain features occur (e.g., richness hotspots).
While much has been learned about the sensitivity of species richness patterns to data type and resolution, studies have only been performed at grain sizes at or over 1 km 2 , for limited geographic extents, or for limited groups of species (Graham & Hijmans, 2006;Shriner et al., 2006;Hansen et al., 2011;but see Rondinini et al., 2011). Therefore, we have a limited understanding of how sensitive species richness patterns are to data type and resolution and how much richness patterns mapped with intermediate-grain data resemble those mapped with finegrain data in the conterminous United States (CONUS). These knowledge gaps add uncertainty and potential for error into conservation plans and assessments.  -Gap Analysis Project, 2017a,b). These data are fine-grain 30-m (900 m 2 ) rasters that match the scale at which conservation decisions are made (Shriner et al., 2006) and give us the opportunity for the first time to explore species richness patterns at such a fine-grain for the United States.
In this study, we compare spatial patterns of species richness and richness hotspots generated from range and habitat maps with fine We expected several results. First, we anticipated that because we used the range maps to constrain the 30-m species habitat maps that the range maps would produce higher species richness estimates than habitat maps and for intermediate-grain habitat maps to produce higher estimates than fine-grain habitat maps. Second, we expected vertebrate richness maps from GAP ranges to resemble range-based richness compiled by Jenkins et al. (2015) in their assessment of the protected areas relative to biodiversity priorities in the United States because they were based on similar species lists and coarse-resolution range maps. Third, we expected that hotspot maps derived from habitat maps would not be nested within those derived from range maps and that intermediate resolution (1-km) hotspot maps would more closely resemble fine-resolution (30-m) hotspot maps than those derived from range data. Through this analysis, we hope to further the understanding of resolution and data type in conservation assessments that incorporate species richness.

| Study area
We primarily examined the terrestrial portions of CONUS, but also included areas of open water, such as lakes, sounds and oceans, that are within 1 km of land. We carefully considered how much open water to include because large water bodies would have skewed the frequency distributions of richness map cell values if they were used by few GAP species. Conversely, excluding all standing water from our study area would have omitted important near-shore habitats for many species, and those habitats may be important for conservation planning. We determined that omitting areas beyond 1 km from land involved few species and provided an optimal trade-off. The data layer created with this criterion was the extent of analysis for all geoprocessing calculations.

| Species list
We created range and habitat maps for a list of 1,590 species that inhabit CONUS during summer or winter (282 amphibians, 322 reptiles, 621 birds and 365 mammals; See Supporting Information Table S1). We did not include species that only inhabit CONUS during migrations. We built the species list by adding species that were known to breed or overwinter in CONUS during any of the five years between 1999 and 2009. We then verified the validity of each species' taxonomic concept with authoritative sources for that species' class (e.g., Banks et al., 2008;Crother, 2008;Wilson & Reeder, 2005). Sometimes we discovered new taxonomic revisions in the scientific literature and chose to adopt them. For example, when a rigorous genetic study elevated a subspecies to species status, then we added the new species to the list.

| Range maps
We created range maps by attributing information on species' distributions from the scientific literature and online sources (Supporting Information Table S2) to a polygon data layer that we derived from

| Habitat maps
We developed habitat maps using a deductive modelling process by compiling information on species' habitat associations into a habitat relationship database (Rondinini et al., 2006;Scott et al., 1993).
Information was compiled from the characterizations of species' habitat in books, databases and primary literature, as well as from expert opinion. We used habitat descriptions to determine which of the National GAP Land Cover ecological systems and land use classes (U.S. Geological Survey, Gap Analysis Project, 2011) the species could occur in. The dataset contains over 580 map classes and represents detailed and nationally consistent data on vegetation in the United States and provides the ecological context necessary to build fine-grain habitat associations. For many species, we used additional environmental variables including elevation, proximity to water features, proximity to wetlands, level of human development, forest ecotone width and forest edge. Each of these factors corresponded to a data layer that was available during map production (hereafter, "ancillary data layers"; see Supporting information Table   S2, Appendix S1). We generated habitat maps from the habitat models by querying the habitat relationship database for each model's parameters; reclassifying the 30 m GAP Land Cover and ancillary data layers within the species' range according to those parameters; and combining the reclassified layers to produce a 30-m (900 m 2 ) resolution habitat map. Habitat maps, therefore, not only reflect the ecological systems or land use classes that are selected in the habitat model, but also any other constraints in the model represented by ancillary data layers. For a list of citations used in developing the species ranges and habitat models, see Supporting Information Appendix S2.
We created intermediate-grain size (1,020 m; 1 km 2 , hereafter 1-km data) habitat maps from the fine-grain (30-m) maps with the ArcGIS Aggregate Tool (ESRI 2016) parameterized with a cell factor of 34 and aggregation type of "maximum." With these settings, any 1 km 2 cell that contained a 30-m cell of habitat was coded as habitat for the species. In other words, the total richness for each 1-km cell included any species with at least one 30-m cell of suitable habitat within its boundary. Other aggregation methods would have introduced omission errors into the 1-km distribution maps, and those errors would have restricted our inferences about the value of excluding unsuitable habitat at different grain sizes. We chose 1 km for the intermediate-grain because it was the smallest cell size examined in related studies (Graham & Hijmans, 2006;Shriner et al., 2006;Hansen et al., 2011; but see Rondinini et al., 2011), a large proportion of protected areas are near 1 km 2 in area (Shriner et al., 2006) and it falls within the range of sizes for protected areas in CONUS (0.25-25,000 km 2 , mean 19 km 2 ; Aycrigg et al., 2013). Because we derived the 1-km habitat maps from the 30-m habitat maps, they shared the same sources of error as those finer resolution data. If we had modelled the habitat using 1-km resolution datasets (e.g., land cover, distance to water) those models would have inherited a different set of errors from their underlying data layers. The relationship between our 1-km and 30-m data was ideal for our analysis because it meant that differences in richness and hotspots from the two habitat datasets could be attributed to commission errors associated with spatial grain.

| Data analyses
We created species richness maps from GAP range and habitat maps so that we could examine spatial richness patterns within CONUS and compare maps across major taxa, for two data types (range, habitat map) and at two grain sizes (30 m, 1 km). We generated rangebased richness (hereafter "range richness") by totalling the number of species within each 12-digit HUC. We created 30-m (hereafter, "30-m habitat richness") and 1-km ("1-km habitat richness") grain size species richness maps from the GAP habitat maps. GAP habitat maps are coded with values that denote seasonal habitat suitability (e.g., summer, winter or year-round), so we first reclassified each habitat map to one with values of 1 and 0, where 1 indicated that a cell was suitable habitat during any season, and 0 indicated that a cell was unsuitable during all seasons. Next, we summed the habitat maps of each species within taxa. Finally, we resampled the range and 1-km habitat richness maps to 30-m resolution with the ArcGIS Resample Tool (ESRI 2016) and "nearest neighbour" resampling. Each richness map was masked with the study area extent described above. We were interested in differences between richness estimates from the different datasets, so we calculated the cell-by-cell differences between range and 1-km habitat; range and 30-m habitat; and 1-km and 30-m habitat richness maps for each taxa group.
We reclassified each richness map to identify hotspots: grid cells with species richness values in or above the top 5% (Prendergast, Quinn, Lawton, Eversham, & Gibbons, 1993). However, setting the 95th quantile in the discrete richness values did not allow the isolation of exactly 5% of the cells in the study extent, which is optimal for comparisons of hotspot maps such as ours (Shriner et al., 2006). Therefore, we adjusted the quantile choice to produce a hotspot area as close to 5% of CONUS as possible (Table 1).
We were able to study the influence of porosity by comparing richness patterns derived from the two data types (ranges Although we used the 95th or 96th percentiles as hotspot thresholds for our primary analyses (Table 1), we also calculated protection at percentiles between the 50th and 99th in order to explore how protection responded to threshold choice. We conducted all spatial analyses using ArcGIS 10.4 (ESRI 2016).

| Richness patterns and data types
For each taxa group, omitting unsuitable habitat (introducing porosity) from range maps decreased estimates of mean and TA B L E 1 Percentile thresholds used for the creation of hotspot maps, their corresponding species richness values, and the resulting total areas of hotspots F I G U R E 2 Frequency distributions of grid cell values from species richness maps created from species range and 30-m and 1-km resolution habitat data for the conterminous United States (a, b, c and d) and cell-by-cell comparisons (subtraction) of those richness maps (e, f, g and h). A relatively small number of range and 1-km difference values were negative because we used an aggregation process to create 1-km habitat maps that had the capacity to extend habitat beyond the edges of species' ranges resulting in 1-km richness values that were greater than range richness in a small number of grid cells [Colour figure can be viewed at wileyonlinelibrary.com]

| Hotspot locations and data types
The coincidence of hotspots derived from different data sources varied ( Figure 6). Range and 1-km habitat hotspots overlapped the most for each species group with a maximum coincidence of 82.41% for amphibians. For birds, reptiles and amphibians, <30% of hotspots mapped with fine-resolution data overlapped with hotspots mapped with the coarser resolution data types. Mammal hotspot coincidence across data types was moderate (49.96%-62.59%).
Fine-grained habitat data identified richness hotspots in ecoregions that other data type and grain size did not. Northern Rockies that were identified with 30-m data (Figure 7i).
Compared to 1-km data, range data omitted reptile hotspots in the Southern Coastal Plain, Southeastern Plains and South Central Plains (Figure 7k,l). Interestingly, fine-grain data identified fewer hotspots in the South Central Plains than did the 1-km data and, instead, indicated that concentrations of hotspots exist in the Madrean

Archipelago, Sonoran Basin and Range and Chihuahuan Deserts that
were not picked up with the coarser data types (Figure 7j).

| Hotspot protection
For each taxa and each data type, the amount of coincidence with status 4 lands (those with no known mandate for protection) was largest, followed by status 3 (multiple use lands; Table 2). Overlap with status 1 and 2 lands, which are permanently protected and managed for biodiversity, was the smallest. Within species groups, however, our estimates of hotspot protection varied in unexpected ways; both data type and the percentile threshold used to create hotspots F I G U R E 5 Mammal and reptile species richness in the conterminous United States from three data types. The legends present the minimum and maximum species counts. We resampled the range and 1-km resolution maps to 30 m [Colour figure can be viewed at wileyonlinelibrary.com] influenced estimates of protection, and their relationships were inconsistent among species groups (Figure 8). With a 95th percentile threshold, the estimates of protection were most sensitive to underlying data type for birds and reptiles ( Table 2). The percentages of hotspot cells from fine-grain data that were in status 1 or 2 protection were below 14% for all species groups and were even lower according to the coarser data for amphibians, birds and reptiles (Table 2).

| D ISCUSS I ON
Here, we demonstrated that incorporating porosity and varying grain size in the species distribution maps influence the patterns of species richness and hotspots, which directly impacts the assessment of conservation status of species-rich areas. The species richness maps and estimates of maximum species counts we created for CONUS using GAP ranges were similar to ones published by Jenkins et al. (2015) and Currie (1991). However, the patterns changed when we introduced porosity by linking species to suitable habitats within their range. In richness maps derived from 30-m GAP habitat maps, species counts were lower and species-rich areas were more dispersed.
Similar to Di Marco et al. (2017) and Rondinini et al. (2006), we found the coarse-grain data overestimated richness when compared to the finer grain maps. One-kilometre resolution versions of GAP habitat maps produced spatial patterns and maximum richness estimates that resembled those from range data, indicating that a decrease in grain size (from 1-km to 30-m) can have a large effect on the richness maps.
This study represents the look at terrestrial vertebrate species richness patterns at a fine grain for CONUS. For each species group, high-resolution data identified hotspots within ecoregions not singled out by range data. We were able to identify these ecoregions because when we introduced fine-resolution porosity into species distribution maps, we reduced commission errors present in the range maps that caused overestimates of species richness in some locations. By reducing commission errors and holding the area of hotspots constant (i.e., close to 5% of CONUS for each data type), new hotspots were identified. It is important to note that because (1) many of the newly identified 30-m hotspot grid cells were sparsely distributed in landscapes, (2) individual 30-m × 30-m areas are unlikely to be selected for protection and (3) isolated or fragmented patches may not be valuable to wildlife (Crooks, Burdett, Theobald, Rondinini, & Boitani, 2011), it would be important to account for the size, configuration and landscape context of those hotspots if they were used to assess the conservation value of individual parcels of land. Nevertheless, our results clearly demonstrate that coarse-grain richness maps may not identify all of the general areas within which hotspots identified with finer resolution data could occur. Spatial resolution is important even for broadscale assessments with large extents.
We expected the 1-km hotspots to coincide with the 30-m hotspots, but found that they overlapped more with hotspots derived from ranges. This suggests that relatively modest improvements in the resolution of species distribution data may provide substantial improvements in richness maps. Investigations of the sensitivity of hotspot maps to resolution that examine more resolutions (e.g., 30, 60 and 90-m) might identify valuable thresholds in the benefits of increasing resolution.
We found that calculations of hotspot protection were sensitive to data type and grain size, as well as to the threshold used to define hotspots. Those sensitivities can be explained by differences in richness maps among data types. We used threshold values to define hotspots from the frequency distributions of the richness maps for each data type and grain size ( Figure 2); those differences in spatial patterns of richness caused the locations of hotspots to also vary among data type and grain size for each of the taxa (Figure 6).
Because the protected area map remained unchanged (Figure 1  Across all data types, hotspot protection was low for each species group (<14%), especially for amphibians. The low amount of overlap highlights differences between drivers of species richness and land protection. Spatial patterns of richness are determined by the biogeographic evolution of species and current land use practices, whereas the locations of protected areas have evolved through time in response to historical events and circumstances of the day (Platt, 2004). In CONUS, most large protected areas are in the West; so species groups centred there appear to be better protected (i.e., birds, mammals and reptiles). As most amphibian hotspots are in the East, few of their hotspots are currently within the protected area network.
We know that errors in species distribution models contribute to errors in species richness maps (Dean, Wilson, & Flather, 1997 the three datasets used in this analysis, we know the fine-resolution data has the greatest potential for omission error, meaning we may be missing hotspots that would be identified if omission in each of the species distribution models could be removed. For this analysis, we assumed that the 30-m and therefore the 1-km and ranges contained no omission error to explore the impact of commission error. For this study, we chose to include all terrestrial vertebrates and not exclude naturalized or introduced species to represent the entire faunal diversity present on the landscape. There are many ways to parse the species list depending on the question being asked; because we were interested in exploring the influence of porosity and grain size on the identification of hotspots, the full species list was appropriate. We can expect that richness maps and the patterns of hotspots based on subsets of the full list (e.g., conservation status, life history or endemism) would be sensitive to those same factors.
Our results are in agreement with other studies that showed species distribution data type and resolution influence the results of efforts to design and assess conservation networks (Di Marco et al., 2017;Shriner et al., 2006). We demonstrated that porosity and resolution of species distribution maps influence species richness maps, hotspot maps and assessments of hotspot protection, which underscores points made by Hurlbert and White (2005) specifically that accounting for geographic variability within species' distributions leads to substantially different patterns of richness and could enhance our ability to identify meaningful relationships between species richness and environmental drivers.

DATA ACCE SS I B I LIT Y
Following a USGS data review and release process, all species ranges, habitat models and ancillary data used in the creation of the models will be made available through USGS ScienceBase.