Where are the bats? An environmental complementarity analysis in a megadiverse country

Field surveys are necessary to overcome Wallacean shortfalls. The task is even more important when human pressure on tropical—megadiverse—ecosystems is considered. However, due to financial constraints, spatial and temporal prioritization is required. Here, we used the concept of environmental complementarity to identify non‐surveyed regions for bats that are environmentally different from other already surveyed regions. We highlighted regions in Brazil where field inventories could be conducted to locate new occurrences or even new bat species.

surveyed regions. We highlighted regions in Brazil where field inventories could be conducted to locate new occurrences or even new bat species.

Location: Brazil.
Methods: We based our analysis on environmental characterization aiming to identify dissimilar regions to those already sampled for bats in Brazil. We used 21 environmental variables to characterize 1,531 unique localities where bats occur. Then, we applied the parameters of a generalized linear model (GLM) to extrapolate the expected values of the environmental variables for the entire country. We compared the predicted values of localities with newly described bat species occurrence against the values for other bat species.
Results: We found that sites from which recently discovered species were described are environmentally distinct from the sites where previously described species occur. Therefore, new occurrences and even new species could be found in regions that are environmentally dissimilar from those already surveyed. By crossing the model with a human footprint map, we defined temporal priorities for field inventories. Regions such as the Northern Cerrado and Western Caatinga should be surveyed first. Similar approaches could be undertaken for other biological groups or regions, allowing the identification of spatial congruence and the development of a comprehensive national programme for biological field inventories.
Main conclusion: Newly described species occurred in environments dissimilar to those previously identified, showing that environmental complementarity analysis is a valid approach to define priority regions for new bat inventories.

| INTRODUC TI ON
Knowledge concerning the spatial distribution of organisms on Earth is the cornerstone for understanding biogeographic patterns, historical speciation events, formation of centres of endemism, and how local and regional biodiversity might respond to increasing human pressure. The difference between what we know and what we should know about species distribution is called the "Wallacean shortfall" (Lomolino, 2004;Whittaker et al., 2005); this difference imposes immense constraints on the implementation of effective management and conservation actions (Cardoso, Erwin, Borges, & New, 2011).
The search for data on species distributions started with the first biogeographers at least two centuries ago; nonetheless that challenge persists, as primary biodiversity data for several terrestrial and marine areas are still incomplete or even completely missing (Funk, Richardson, & Ferrier, 2005;Lobo et al., 2018;Lomolino, 2004;Myers, Mittermeier, Mittermeier, Fonseca, & Kent, 2000;Olson & Dinerstein, 1998).
Nevertheless, these techniques try to solve the problem of how to do the surveys, but not where to do them, which remains a key question when time and money are major constraints. There have been some reports in the literature addressing the question of where to conduct additional biodiversity field surveys. Using a multicriteria modelling technique (the overlay of different standardized information, such as biodiversity, climate, terrain, and others, to map poorly surveyed areas), Chapman and McCaw (2017)  The use of spatial units for the identification of priority areas for biodiversity surveys is a common aspect between the studies of Chapman and McCaw (2017) and Brunbjerg et al. (2019). The first study defined "land conservation units" and the second "ecospaces." The definition of spatial units adds some potential subjectivity to the analysis because the parameters for its definition may vary between researchers. Other authors have suggested other strategies.
In the survey gap analysis proposed by Funk et al. (2005), there is no need to a-priori divide a region into spatial or environmental units.
However, there is a need for defining weights for each environmental variable in the linear regression model. According to these authors, a researcher with enough expertise may define "the relative importance of each variable in driving biological variation." Here we propose a novel method, aiming to diminish, as much as possible, the influence of the researcher on the definition of priority areas for biological inventories. Our method is based on the identification of non-surveyed sites that are environmentally dissimilar to those sites where biodiversity surveys have already been carried out. Thus, additional occurrences for taxa already described, but also unknown taxa, would be expected to occur as a result of surveying such distinct areas. We also combined the gradient of environmental similarity with a human pressure map to identify both spatial and temporal priorities and incorporate existing trends in the threats to the native ecosystems.
We used Brazilian bats as a model system; there are several reasons for this. First, Brazil is one of the most diverse countries for Chiroptera, harbouring at least 182 species (Nogueira et al., 2018); however, information on Brazilian bat diversity and distribution is still scarce. In fact, 60% of the 8.5 million km 2 constituting the continental Brazilian territory has never been surveyed for bats, and none of the six terrestrial Brazilian biomes-Amazonia, Cerrado, Caatinga, Atlantic Forest, Pantanal, and Pampas-is adequately surveyed (Bernard, Aguiar, & Machado, 2011). Second, human pressure on natural areas in Brazil is huge, and increasing alarmingly every day, with the constant loosening of biodiversity conservation policies. According to the Food and Agriculture Organization (FAO), over the last 15 years Brazil lost more forest area than any other tropical country (FAO, 2015). Several authors have demonstrated that although the responses of bats to human pressure are often species-specific, the prevailing trend is negative (see Voigt & Kingston, 2016 for details) which can cause decreases in populations and even local extinction of bat species (Aguiar, Brito, & Machado, 2010;Epstein et al., 2009). Thus, we used the human footprint index or map (Sanderson et al., 2002)  Recent reports have shown that increases in bat species richness across time are not stabilizing (Nogueira et al., 2018), meaning that the probability of finding new species is high if additional surveys are made in appropriately targeted regions. A synthesis of possible situations when and where to find new species is shown in Figure 1. The niche conservatism hypothesis states that closely related species tend to occupy similar niches (Peterson, Soberón, & Sánchez-Cordero, 1999;Wiens & Graham, 2005), suggesting that regions with environmental conditions similar to those more recently surveyed and where new species have been described could harbour undescribed-potentially cryptic-species. Indeed, over the past decade, 13 new bat species have been described in Brazil, of which 12 belong to an existing genus (Nogueira et al., 2018). Based on the niche conservatism hypothesis, new species could be found in regions environmentally similar to those of newly described species, regardless of whether the area is poorly or well-surveyed, that is, in regions C and D ( Figure 1). Alternatively, finding new species would be feasible if different environments are explored, preferable away from the already surveyed areas (regions A and B in Figure 1).
It is well known that the similarity between species communities decreases as distance between the communities increases; this is a consequence of environmental or climate gradients or a consequence of the organisms' dispersal limitations (Condit et al., 2002;Nekola & White, 1999).
Divergent natural selection then leads to distinct adaptations, evolutionary divergence and, eventually, to new species following a pattern of adaptive radiation into new environments (Schluter, 1996;Simpson, 1953). Adaptive radiation is likely to occur after the invasion of a new environment by one or a few lineages, or through the evolution of "key-innovations" opening new adaptive zones that may subsequently be partitioned ecologically (Simpson, 1953). So, speciation (and radiation) events may be occurring in regions A or B, which are environmentally different from regions C and D ( Figure 1). Furthermore, if the environmental characteristics of undersampled areas such as regions A and D are different from the regions already studied (i.e., regions B and C), then we should expect to find more new species in the regions that have been undersampled and that have a low similarity to the already surveyed regions (region A).
Thus, in this study, we test two hypotheses: (1) considering that the existing pattern of surveys in Brazil is very concentrated in the southeastern region, areas not surveyed in Brazil (regions A and D, Figure 1) will be environmentally different from those already surveyed (regions B and C, Figure 1); and (2) newly described bat species (in or after 2010) will tend to occur in areas that are environmentally distinct from those of the species described before 2010 (regions A and B, Figure 1).

| Bat species occurrence points
We compiled a list of all bat species known to occur in Brazil, considering the list of 182 species by Nogueira et al. (2018) (Table S1). We used a database organized by L.M.S. Aguiar over three decades by searching for distribution records in specialized literature, museums, and complementing this with records from online databases (e.g., Global Information Facility-GBIF). From the database of over 44,000 records for South America, we selected 17,055 records for bats occurring in Brazil. Recently obtained and other unpublished records from the personal collections of M.J.R. Pereira and M. Zortea were also added to this data set. We considered as a unique location any locality with one or more bats within a radius of 10 km. We used the package rangeBuilder (Rabosky et al., 2016) in R (R Core Team, 2019) to apply a spatial filter to eliminate all duplicated coordinates and keeping just one-point location (see details in R script-Supporting Information). Thus, we obtained a total of F I G U R E 1 Hypothetical situations where new bat species could be found in Brazil. The abscissa represents an environmental gradient related to the similarity between points. The ordinate represents a variation on the level of biological surveys in a region. The two lines represent the mean value of each axis. The contrast of the quadrants A + B versus C + D represents the hypothesis of the expected environmental difference between areas where newly described bat species occur in Brazil (after 2010) and areas where earlier described species occur. The contrast of the quadrants A + D versus B + C represents the hypothesis of the expected difference between regions with and without bat records in Brazil 8,110 records from 1,531 unique localities (considering a minimum distance of 10 km between localities). We categorized occurrence records of 13 bat species that were described in the last decade (after 2010) as recently described location records (Table 1). They represent the species described in a short period during the last decade, considering the cumulative curve of described species over time ( Figure S1).

| Environmental variables
We selected 21 environmental variables, 19 bioclimatic ( earth data.nasa.gov/) to mosaic and project them to geographical degrees with the WGS84 datum. Then, we used the raster package (Hijmans, 2017) to resample the map to the same spatial resolution as the bioclimatic variables (2.5 arc minutes or approximately 5 × 5 km per pixel). Additionally, we used a DEM derived from the Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) available at https://www.usgs.gov/land-resou rces/eros/coast al-chang es-and-impac ts/gmted 2010. The GMTED2010 is a topographic data set covering all regions of the globe. We downloaded the file corresponding to the maximum elevation found in each cell at 30 arc second of spatial resolution. Then, we clipped the file to the boundaries of Brazil and followed the same procedures described above to resample the DEM image, downscaling to 2.5 arc minutes.

| Spatial analysis
We used a generalized linear model (GLM) to create a map of the environmental characteristics of the Brazilian territory comparing the localities with and localities without bat records. First, we used the packages sf (Pebesma, 2018) and raster (Hijmans, 2017) to create a 50-km buffer around each one of the 1,531 localities where bats had been recorded, using the buffer as a mask ( Figure S2). Then, we used the function kfold from package dismo (Hijmans, Phillips, Leathwick, & Elith, 2017)   TA B L E 2 Environmental predictors variables selected initially with a variance inflation factor (VIF) the data set into two groups: 80% (N = 2,449) were used as training points in the GLM analysis, and the other 20% (N = 613) were used as testing points to calculate the accuracy of the model (see below).

| Environmental variables selection and model accuracy
We used the package MASS (Venables & Ripley, 2002) for model selection based on a stepwise process (ran in both directions, i.e. forward and backward) with Akaike information criterion calculation and an analysis of variance (ANOVA) for the selected predictors.
Considering that bioclimatic variables can be highly correlated, we used the function vif (variance inflation factor-VIF) from the package car (Fox & Weisberg, 2019) to eliminate all correlated explanatory variables of the full model, that is the initial model with all 21 environmental variables. The variance inflation occurs when explanatory variables present some level of collinearity, which causes the inflation of the standard errors of the parameters with the square root of 1/(1 -R 2 j ) (Zuur, Ieno, & Elphick, 2010). We analysed the VIF value for all variables and eliminated those that presented VIF above 3 (Zuur et al., 2010) following the decreasing order from high to low values. After the VIF analysis, our initial model included eight environmental variables (Table 2). After running the stepwise analysis on the GLM, we ended up with six environmental variables, all presenting a VIF value below 3 ( Table 2).

Considering that our main goal was to predict regions in Brazil
where field inventories should be carried out to maximize the chance of finding new occurrences, or even new species, we evaluated how well the obtained model is performing with respect to predicting values from an independent set of data. Basically, the approach was to apply the parameters of the fitted model (predictors and their weight) using an independent set of testing data (the above mentioned 20% of testing data) to validate the predicted values.
We used the functions prediction and performance from the ROCR package (Sing, Sander, & Beerenwinkel, 2005) to test the model's accuracy. The function prediction standardizes the values from the GLM model, and the function performance tests different and increasing cut-off values (resulting from the prediction function) to identify the peak of the model's accuracy. The "peak" is the value where most of the testing points, which are classified as 0 (localities with no bat records) and 1 (localities with bat records), are correctly classified.
To test our first hypothesis (areas with no bat records are environmentally different from the areas with bat records), we extracted the model's predicted values for 1,531 localities with bat records and 1,531 random localities with no bat records. Considering that the model values do not follow a normal distribution (Shapiro test, W = 0.9486, p < .001), we used the Wilcoxon nonparametric test to identify whether the training data and the testing data had statistically similar distribution. We used the same procedures to extract values from the prediction model for localities where bat species had been reported before 2010, and for the newly described bat species.
The model values for earlier described (<2010) and newly described (≥2010) species also do not follow a normal distribution (Shapiro test, W = 0.9486, p < .001 for newly described species, and W = 0.96541, p < .01 for older species). Therefore, we also used the Wilcoxon test for this case. Considering the unbalanced sample size for earlier described species (N = 8,110) and newly described species (N = 110), we created a routine in R to randomly select 110 records from the earlier described species data set and test the difference by using the Wilcoxon nonparametric test. We repeated this routine 1,000 times to obtain the mean values for W and p for the performed tests.

| Spatial prioritization
We applied the function predict from the R raster package (Hijmans, 2017) to the resulting GLM to create a map for the entire country, to estimate the relative similarities or dissimilarities of the environmental predictors at any region to the conditions at the localities of known bat occurrences. The procedure is similar, but not identical, to an ecological niche modelling, where regions with occurrences of a species under analysis are environmentally characterized (e.g., ranges of temperature, precipitation or topography) and extrapolated to other regions that might have the same parameters (Peterson, 2001;Soberón & Peterson, 2005).
We superimposed the environmental model with the Human and maps were done with the packages sf (Pebesma, 2018), raster (Hijmans, 2017), and tmap (Tennekes, 2018). . From those roads, only 38% were classified as "fine" or "good" for use, which imposes an enormous restriction on access to the region. While this inaccessibility has meant that these areas have maintained their value with respect to conservation, access to non-surveyed regions is a primary condition to promote biodiversity inventories.

| RE SULTS
The conduction of systematic inventories in Brazil, not only for bats but for all biodiversity components, is a demand of the Brazilian Decree 4339 that establishes the National Policy for Biodiversity (Brasil, 2002). However, since its publication, no consistent or longterm programmes aiming to cover poorly studied regions have been implemented in the country. It has been estimated that if we keep the same pace in conducting inventories without a systematic and targeted programme for biodiversity surveys, we will take around 200 years to survey the whole country for bats (Bernard et al., 2011).
Our results suggest that a survey gap analysis based on an environmental complementarity approach highlights not only regions where new distribution records can be registered but also those regions where there is a higher chance of finding new species.
Contrary to the hypothesis of niche conservatism (Peterson et al., 1999;Wiens & Graham, 2005), we found that newly de- evolutionary process can change from group to group, and possibly from region to region, as suggested by the tropical niche conservatism hypothesis (Wiens & Donoghue, 2004). While a conservative evolution of niche characteristics was found for sister species of butterflies, birds and mammals in Mexico (Peterson et al., 1999), a more dynamic change in niche evolution was observed for ovenbirds (Geositta genus, composed by 11 species) (Ribeiro, Peterson, Werneck, & Machado, 2017) and bats (Peixoto, Villalobos, & Cianciaruso, 2017;Ramos Pereira & Palmeirim, 2013;Stevens, 2011;Villalobos, Lira-Noriega, Soberón, & Arita, 2014) in South America. For ovenbirds, it was suggested that the rapid environment changes along the Andes caused by orogenic events during the Miocene resulted in pronounced niche evolution for the birds (Ribeiro et al., 2017).

F I G U R E 4
Priority areas for bat inventories in Brazil resulting from the combination of environmental characteristics of the localities with and without bat occurrences (small map 'a'), and the Human Footprint Index (small map 'b'). Dark red areas represent the top priority for conduction new inventories (low environmental similarity plus high human pressure). Light red areas are those with low environmental similarity plus high human pressure (medium priority). Green areas are those with high environmental similarity and high human pressure (low priority). Divisions on the smaller map (right side) represent the biomes: 1-Amazonia, 2-Caatinga, 3-Cerrado, 4-Atlantic Forest, 5-Pantanal, 6-Pampas For bats, the importance of niche conservative can vary among families (Peixoto et al., 2017;Ramos Pereira & Palmeirim, 2013). For example, the Phyllostomidae seems to be firmly attached to the tropical niche conservatism theory, since the species are concentrated in their region of geographical origin (the tropics). However, the diversity of the Vespertilionidae, presumably of temperate origin, was not associated with the niche conservatism theory (Ramos Pereira & Palmeirim, 2013). Considering that half of the newly de-  (Stadelmann, Lin, Kunz, & Ruedi, 2007). These are, in fact, three highly interconnected explanations.
One possible explanation for our result is the lack of data. As indicated previously, bats have been little studied in Brazil (Bernard et al., 2011), and most of the country is undersampled. We were able to find only 110 points of occurrences for the 13 newly described species (an average of 8.4 points per species). Unfortunately, this is the current level of understanding about bat species occurrence in Brazil, and it is the exact point where our study can contribute the most.
We were able to show the existence of differences in the environmental characteristics between regions where earlier described Considering human pressure on the environment, some regions should receive urgent attention as there is a risk of the loss of biodiversity occurring before the species are even described by science. Previous studies have indicated that it will take at least 200 years to minimally survey the Brazilian territory for bats if we maintain the same effort as in the past decades of study (Bernard et al., 2011). Obviously, we cannot wait that long. Although Brazil is one of the countries with highest rates of deforestation in the world (FAO, 2015), we still have a huge knowledge gap concerning how that is impacting the country's biodiversity. Although Brazil was the first signatory to the Convention on Biological Diversity (CBD) in 1992, there is no official or comprehensive programme for biodiversity inventory in the country. According to the First National Report to the CBD-Brazil (Brasil, 2015), only a few taxonomic groups (e.g., plants) have specific programmes for field inventories, and most actions are concentrated on gathering information to contribute to databases. In summary, the identification of regions for conducting optimized field inventories is just the first step to fill gaps in species distributions-the so-called "Wallacean shortfall," sensu Lomolino (2004), Whittaker et al. (2005); however, a large-scale and long-term inventory programme is the key subsequent action.

| The application of our approach in other regions
The method that we developed can be applied in any region and at any geographical scale if an appropriate data set is available. With respect to climate data, we consider that for any regional of subcontinental analysis (as we did for Brazil), the Worldclim data set is adequate. However, analysis at a smaller scale will demand the preparation of environmental maps with a finer resolution. For such cases, perhaps the use of satellite images would be the best starting point. Images generated by Landsat (Operational Land Imager-OLI) or Aqua/Terra MODIS satellites can be used to represent the environmental characteristics of any region in the world. MODIS images have the advantage of offering specific products (e.g., surface temperature, primary productivity, vegetation indices) for free and ready to be used at 250-m, 500-m and 1-km spatial resolutions (https://lpdaac.usgs.gov/). If local data on temperature and precipitation are available, maps similar to the bioclimatic variables can be produced by interpolation techniques (Sluiter, 2009). Once the geographical database is set, the method applied in this study can be easily replicated.

| CON CLUS IONS
Environmental complementarity analysis proved to be useful in identifying important regions for future bat surveys in Brazil. The combination of our statistical model and a map of the human footprint provided an indication of priority areas in which to conduct bat field surveys. Areas with high human pressure, such as the Northern Cerrado (woodland savanna) and Western Caatinga (seasonal tropical dry forest), are gaps in bat knowledge. In such areas, our statistical model predicts that unknown bat species could well be found.
We also recommend that similar analyses are conducted with other taxonomic groups to identify spatial congruences and, therefore, to build a comprehensive national programme for field inventories.

ACK N OWLED G EM ENTS
The Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) granted fellowships to LMSA, MJRP, and RBM. We thank the anonymous reviewers and the editors Dr. O. Razgour and K.C. Burns for their constructive comments.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13137.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data used in this study is openly available in Dryad at https://doi.