Patterns, biases and prospects in the distribution and diversity of Neotropical snakes

Abstract Motivation We generated a novel database of Neotropical snakes (one of the world's richest herpetofauna) combining the most comprehensive, manually compiled distribution dataset with publicly available data. We assess, for the first time, the diversity patterns for all Neotropical snakes as well as sampling density and sampling biases. Main types of variables contained We compiled three databases of species occurrences: a dataset downloaded from the Global Biodiversity Information Facility (GBIF), a verified dataset built through taxonomic work and specialized literature, and a combined dataset comprising a cleaned version of the GBIF dataset merged with the verified dataset. Spatial location and grain Neotropics, Behrmann projection equivalent to 1° × 1°. Time period Specimens housed in museums during the last 150 years. Major taxa studied Squamata: Serpentes. Software format Geographical information system (GIS). Results The combined dataset provides the most comprehensive distribution database for Neotropical snakes to date. It contains 147,515 records for 886 species across 12 families, representing 74% of all species of snakes, spanning 27 countries in the Americas. Species richness and phylogenetic diversity show overall similar patterns. Amazonia is the least sampled Neotropical region, whereas most well‐sampled sites are located near large universities and scientific collections. We provide a list and updated maps of geographical distribution of all snake species surveyed. Main conclusions The biodiversity metrics of Neotropical snakes reflect patterns previously documented for other vertebrates, suggesting that similar factors may determine the diversity of both ectothermic and endothermic animals. We suggest conservation strategies for high‐diversity areas and sampling efforts be directed towards Amazonia and poorly known species.

Software format: Geographical information system (GIS).

Results:
The combined dataset provides the most comprehensive distribution database for Neotropical snakes to date. It contains 147,515 records for 886 species across 12 families, representing 74% of all species of snakes, spanning 27 countries in the Americas. Species richness and phylogenetic diversity show overall similar patterns. Amazonia is the least sampled Neotropical region, whereas most well-sampled sites are located near large universities and scientific collections. We provide a list and updated maps of geographical distribution of all snake species surveyed.
Main conclusions: The biodiversity metrics of Neotropical snakes reflect patterns previously documented for other vertebrates, suggesting that similar factors may determine the diversity of both ectothermic and endothermic animals. We suggest conservation strategies for high-diversity areas and sampling efforts be directed towards Amazonia and poorly known species.
conservation, data availability, GBIF, geographical distribution, phylogenetic diversity, sampling gaps, Serpentes, species richness 1 | I NTR OD U CTI ON Reptiles are a highly diverse group of terrestrial vertebrates with 10,450 known species, with this number increasing at c. 100 per year (Tonini, Beard, Ferreira, Jetz, & Pyron, 2016;Uetz & Ho sek, 2016). It is probably the most neglected group in conservation prioritizations, as only 52% of the described species have been assessed in the International Union for Conservation of Nature (IUCN) Red List of Threatened Species (IUCN, 2017). Most of the assessed species have been categorized based on range size, of which 20% (B€ ohm, Collen, & Baillie, 2013) are considered data deficient owing to the lack of appropriate data on taxonomy, ecology, distribution, population trends and threats (Bland & B€ ohm, 2016;B€ ohm et al., 2013). This contrasts with that for other vertebrates, as for instance only 0.6% of birds and 15% of mammals are data deficient (Butchart & Bird, 2010;Schipper et al., 2008).
Among reptiles, there are c. 3,500 snake species globally, inhabiting temperate to tropical environments, in terrestrial and aquatic habitats (Uetz & Ho sek, 2016;Wallach, Williams, & Boundy, 2014). As for most reptiles, distribution data for snake species remain scarce, and consequently, they are excluded from most large-scale studies of biodiversity and distribution patterns (e.g., Jenkins, Alves, Uezu, & Vale, 2015;Moura, Villalobos, Costa, & Garcia, 2016). Although reliable estimates of snake diversity would contribute to global and regional strategies for biological conservation, no detailed data have yet been compiled for the Neotropics, despite it comprising one of the world's richest herpetofaunas (B€ ohm et al., 2013;. Here, we present a new database of snake occurrences covering the entire Neotropics and assess, for the first time, the diversity patterns for all Neotropical snakes as well as sampling artefacts. We hypothesize that snake diversity follows a similar pattern to those already described for other vertebrates in the Neotropics (Jenkins et al., 2015;Moura et al., 2016). We generated our novel database by combining the most comprehensive, manually compiled distribution dataset with publicly available data, from which we calculate species richness (SR) and phylogenetic diversity (PD) as well as sampling density and sampling biases. Finally, we discuss prospects for more informed conservation strategies and design research agendas.

| Data sources
We compiled three datasets for snakes recorded in the Neotropical region (sensu Olson et al., 2001), from central Mexico to southern South America, including all Caribbean islands. We included only records identified at the species level.
The raw dataset (RD) comprised georeferenced records for snakes downloaded from the Global Biodiversity Information Facility (GBIF; http://doi.org/10.15468/dl.tdwbqp). We filtered our search for records linked to specimens, literature occurrences and material samples, leaving out records lacking associated vouchers.
The verified dataset (VD) comprised geographical occurrences from vouchered specimens examined in natural history museums (Supporting Information Appendix S1) and required a large collaborative effort among herpetologists. The initial focus of the VD was to gather data on Brazilian snakes but also including their distribution outside the country. This was then expanded also to include species and records from other Latin American countries outside Brazil, through point occurrence data from vouchers and scientific literature. applied to the RD in this process (without voucher verification) is provided in Supporting Information Appendix S2. Then, we merged the GBIF cleaned dataset with the VD to form the CD. We also removed from CD all redundant coordinates for each species (i.e., records with identical latitude and longitude values).

| Species richness and phylogenetic diversity
We used the CD for species richness (SR) and phylogenetic diversity (PD) analyses. Both analyses were performed at two spatial resolutions: the grid cell scale, which was on an equal area Behrmann projection with 360 columns (corresponding to 18 3 18 at 308 N or 308 S, Phylogenetic diversity and species richness are usually correlated (Morlon et al., 2011). However, SR takes into account only distribution data for each species, whereas PD is calculated by using distribution data plus branch lengths of the phylogeny. Thus, PD incorporates evolutionary history that is not expressed by SR (Faith, 1992(Faith, , 2008Tucker et al., 2017).
The PD analysis was based on distribution data and a sample of 100 trees, from which we calculated mean values for each grid cell and   (14), Ivan Sazima (24, 35), Luiz C. Turci (7), Marcio Martins (4), Marco Sena (6), Martin Jansen (9, 13, 18, 23, 31), Otavio A. V. Marques (2,3,5,15,16,17,19,20,21,22,28,30,32), Ricardo J. Sawaya (33), Thaís B. Guedes (1,8,11,25,26,29,34) ecoregion polygons to synthesize the result of PD in a single map for each scale adopted. We used the phylogeny provided by Tonini et al. (2016). The variance in PD metrics across the sample of trees reported in their study was low; thus, we considered this a sufficient approximation of PD. Phylogenetic diversity analyses require a precise match between distribution data and the terminals of the phylogeny. Of the 886 species in the CD, 847 (96%) were present on the tree and were used for both SR and PD analyses to allow a more direct comparison between the two analyses.

| Sampling gaps
We calculated the number of occurrences across grid cells superimposed onto the Neotropical region to identify the intensity of sampling in each dataset.

| Data availability
The RD includes 7,299 records of 659 species of snakes from 12 families ( Table 1). The records are distributed over 25 countries (Figure 2a). The VD contains almost 20 times more records than the RD. It includes 140,368 georeferenced records for 488 species from 10 families ( Table 1)

| Sampling gaps
Based on CD, the most poorly sampled Neotropical region is the Amazon, where all grid cells harbour < 500 records and 1,600,000 km 2 have no records at all (Figure 2a). The Andean region is also poorly sampled, with 900,000 km 2 empty and all others having < 500 records.
The Lesser Antilles and Central America are also poorly sampled. The best-sampled region is the Atlantic Forest (400,000 km 2 , containing 1,000-3,000 occurrences; Figure 2a). Some cells are well sampled, even though surrounding cells have very few records.

| Data availability
We found errors associated with non-updated nomenclature and erroneous georeferences in the RD. This reinforces previous suggestions (e.g., Ficetola et al., 2013;Maldonado et al., 2015;Meyer, Weigelt, & Kreft, 2016) that GBIF data should not be used without proper verification and cleaning. The verified dataset, albeit smaller in the absolute number of species and records outside Brazil, can be considered well curated. As these two datasets are so different in geographical and taxonomic representation, merging them proved to be a suitable approach.
Combining the RD cleaned dataset with the verified dataset almost  (2013), which also included lizards, used species ranges instead of grid cells, adopted a different spatial scale, and was based on a random sampling, which in theory is meant to provide an adequate representation of species globally, but in practice may be problematical.
A different view of areas harbouring high SR and PD emerges on the scale of ecoregions (Figure 2d,e). For both indices, the Cerrado is the most diverse region. Accordingly, these results indicate that snake diversity in seasonally dry tropical forests may be more diverse than in rain forests, a pattern not previously inferred. The Cerrado is a global biodiversity hotspot (Mittermeier, Turner, Larsen, Brooks, & Gascon, 2011;Myers et al., 2000), harbouring 153 species of snakes, of which 49 are endemic (Guedes, Nogueira, & Marques, 2014). It is also the world's most species rich savanna in number of woody plant species and has higher diversity than any other dry forests in the Neotropics (DRYFLOR et al., 2016). However, our results could be biased by the ecoregion boundaries used here, which separated the Atlantic Forest into distinct subregions, but did not do so to the Cerrado. As a whole, the Atlantic Forest harbours the richest snake fauna, including 236 species, of which 83 are endemic (Guedes et al., 2014). This situation reinforces the importance of refined data on species distributions for assessing the influence of spatial scale on patterns of biodiversity.
Despite the close relationship between SR and PD, the most