Are we overestimating the niche? Removing marginal localities helps ecological niche models detect environmental barriers

Abstract Correlative ecological niche models (ENMs) estimate species niches using occurrence records and environmental data. These tools are valuable to the field of biogeography, where they are commonly used to infer potential connectivity among populations. However, a recent study showed that when locally relevant environmental data are not available, records from patches of suitable habitat protruding into otherwise unsuitable regions (e.g., gallery forests within dry areas) can lead to overestimations of species niches and their potential distributions. Here, we test whether this issue obfuscates detection of an obvious environmental barrier existing in northern Venezuela – that of the hot and xeric lowlands separating the Península de Paraguaná from mainland South America. These conditions most likely promote isolation between mainland and peninsular populations of three rodent lineages occurring in mesic habitat in this region. For each lineage, we calibrated optimally parameterized ENMs using mainland records only, and leveraged existing habitat descriptions to assess whether those assigned low suitability values corresponded to instances where the species was collected within locally mesic conditions amidst otherwise hot dry areas. When this was the case, we built an additional model excluding these records. We projected both models onto the peninsula and assessed whether they differed in their ability to detect the environmental barrier. For the two lineages in which we detected such problematic records, only the models built excluding them detected the barrier, while providing additional insights regarding peninsular populations. Overall, the study reveals how a simple procedure like the one applied here can deal with records problematic for ENMs, leading to better predictions regarding the potential effects of the environment on lineage divergence.

We obtained occurrence records throughout the range of these species from the literature and our own fieldwork. For Proechimys guairae, we included records from a karyological analysis of Venezuelan spiny rat species (Aguilera et al. 1995). We used all records of P. guairae reported in that study, except those corresponding to "P. g. Barinas subsp." (currently not recognized as part of this species; see Carleton and Musser 2005). Additionally, we included records formerly referred to as P. semispinosus (from Handley 1976) that according to current taxonomy correspond to P. guairaesee correspondence between provenance of localities of these records and the geographic range of P. guairae described in the karyological study of Aguilera et al. (1995). Finally, we also added records from our fieldwork in northern Venezuela (Anderson et al. 2012). For Rhipidomys venezuelae, we obtained the majority of our records from the most recent taxonomic revision of the genus (Tribe 1996). In that revision, R.
venezuelae was treated as "R. latimanus venezuelae", but later this taxon was reinstated to species level (Voss et al. 2001). This occurrence dataset was supplemented by records from our fieldwork as well (Anderson et al. 2012). For Heteromys anomalus, we used data from Soley-Guardia et al. (2014), who obtained occurrence records from Anderson (2003), Anderson and Gutiérrez (2009), and Anderson et al. (2012). We obtained records of H. oasicus from our field work, as well as from an exhaustive revision of museum specimens (Anderson 2003 and additional localities reported here). Peninsular collecting localities not described in any of the aforementioned studies are documented in the gazetteer at the end of this document.
We obtained geographic coordinates for these records following several steps. For many records, coordinates of collecting localities had already been published following extensive georeferencing efforts (e.g., Paynter 1982Paynter , 1997Anderson et al. 2012;Soley-Guardia et al. 2014), and we used those in the present study. However, for records lacking this information (or where published coordinates were suspected of large error), we obtained geographic coordinates using detailed 1:100,000 and 1:25,000 state maps, according to site descriptions on specimen tags, collecting catalogs, and/or field notes. Estimates of potential georeferencing error are not documented in several of the published sources; however, where documented, it is estimated to be < 5 km for most localities, but up to 10 km for some (e.g., Anderson et al. 2012). We estimate that most georeferences obtained in the preset study have a maximum potential error of < 5 km, but this was not explicitly calculated in all instances.
Aiming to reduce the effect of sampling biases in niche inferences (Hortal et al. 2008;Phillips et al. 2009), we spatially filtered the occurrence records (Kramer-Schadt et al. 2013;Syfert et al. 2013;Boria et al. 2014). Specifically, we followed the approach implemented by Anderson and Raza (2010), and retained the maximum number of records that were separated from each other by an Euclidean distance of at least 10 km. For each cluster of records, we measured distances in ARCGIS ® 9.2 (ESRI, Redlands, CA, USA), and assessed all possible solutions. If multiple co-optimal solutions were possible, we randomly chose one, or gave priority to those including records with GPS coordinates when possible. These resulted in a total of 56 records for Proechimys guairae (from 83 original unfiltered records), and 22 for Study regions used to calibrate the models were demarcated by a rectangle encompassing all records after filtering, which was delimited by the nearest even 0.5° that was at least 20 km away from the most peripheral record in each cardinal direction (exact coordinates given in the main text). This delimitation scheme was chosen to reduce the likelihood of violating modeling assumptions (Peterson et al. 2011, pp. 29, 40); namely that the species has been able to disperse throughout the study region, has been at least adequately sampled, and is not inhibited from establishment due to heterogeneous biotic contexts (Anderson and Raza 2010;Barve et al. 2011;Anderson 2013;Saupe et al. 2012).

Ecological niche models: calibration and evaluation
We built models using MAXENT 3.3.1, an algorithm that has performed well in comparisons of various modeling techniques (e.g., Elith et al. 2006;Wisz et al. 2008), and produces outputs of much interest to ecologists (Elith et al. 2010(Elith et al. , 2011. As potential environmental predictors, we used the 19 bioclimatic variables available from the WorldClim project (~1 km 2 resolution at the equator; Hijmans et al. 2005), which should be relevant for modeling aspects of the 'Grinellian niche' of these species (Luoto et al. 2007;Soberón 2010). With the aim of maximizing predictive ability under a machine learning approach, we used the complete set of variables to calibrate the models (Breiman 2001;Araújo and Guisan 2006). However, we took advantage of MAXENT's regularization to approach optimal model dimensionality and complexity (i.e., amount of variables and parameters modeling the response; Elith et al. 2011;Merow et al. 2013; see also Muscarella et al. 2014). Specifically, for each species, we produced preliminary models with various combinations of MAXENT settings and evaluated their predictive performance using spatially independent splits of the mainland data (Peterson et al. 2011, p. 161;Radosavljevic and Anderson 2014). After determining settings resulting in the highest average predictive performance (see below), we built a model for each species using those settings and all mainland records. This model was later projected onto the peninsula. For Heteromys anomalus, we built a model using the settings deemed as optimal for the same dataset by Soley-Guardia et al. (2014).
Specifically, we created preliminary models for Proechimys guairae and Rhipidomys venezuelae varying two important settings from MAXENT that together affect both the dimensionality of the model (i.e., how many of the given variables are actually incorporated), and its complexity (i.e., how many parameters are necessary to model the response to each of the incorporated variables). This is controlled by the types of feature classes that the user allows the program to consider, and the level of regularization applied to each (Phillips et al. 2006;Merow et al. 2013). The former pertains to the types of transformations applied to the raw variables, which allow the model to explore and fit responses of various shapes to a single variable (e.g., linear, quadratic, or threshold-dependent), each response being determined by a given parameter.
The latter determines how close of a match is required between the modeled responses and the empirical values (i.e., those given by the occurrence records). In this way, regularization controls both dimensionality and complexity by penalizing inclusion of variables (or response types) that do not result in substantially higher performance (i.e., model gain). Essentially, higher regularization values result in stronger penalties to complexity, reducing the potential for overfitting to any noise or bias present in the dataset (Anderson and Gonzalez 2011;Shcheglovitova and Anderson 2013).
For each species, we built the preliminary models using two different sets of feature classes under a variety of regularization values. One set consisted of those feature classes suggested by default (Phillips and Dudík 2008), while the other consisted of one in which we either added or removed feature classes (depending on whether the sample size was near that recommended by MAXENT for the use of more complex feature classes; see below). To vary the level of regularization, we modified the regularization multiplier value (hereafter regularization multiplier), which simultaneously affects the regularization coefficient (β j ) assigned to each j feature class by default (Phillips et al. 2006;Phillips and Dudík 2008;Anderson and Gonzalez 2011). Specifically, we varied the regularization multiplier every 0.5 interval between 0.5-4.0 (default settings use a regularization multiplier of 1). For Proechimys guairae, which had a sample size of 56 occurrence records, the default set of feature classes consisted of linear, quadratic, and hinge. We also employed a more 'complex set' adding the threshold and product For all models, we used the logistic output of MAXENT (Phillips and Dudík 2008), and kept all other settings as default (e.g., maximum number of 10,000 background points; maximum of 500 iterations). The interpretation of the logistic output of MAXENT has experienced recent critique concerning the assumption it makes about 'prevalence' (Royle et al. 2012;Hastie and Fithian 2013;Renner and Warton, 2013;Merow and Silander 2014). However, its use in the present study is not problematic for several reasons. First, we do not interpret the models as probability of occurrence, but rather as indices of relative suitability (i.e., relative occurrence rates; Fithian and Hastie 2013;Phillips and Elith 2013;Merow and Silander 2014). Second, the evaluation statistics employed here do not vary across MAXENT outputs, which preserve relative ranks (Phillips and Dudík 2008;Merow et al. 2013). Third, we interpret outputs independently for each lineage studied. The latter avoids the problem of determining the actual values of We evaluated the performance of preliminary models on spatially independent subsets of the data. This method reduces inflation of evaluation scores, typically occurring due to artifactual spatial autocorrelation between calibration and evaluation data, especially deriving from sampling biases across geography (Veloz 2009;Hijmans 2012;Wenger and Olden 2012). To do this, we used a 'masked geographically structured k-fold cross validation approach' . Specifically, we divided occurrence records into geographically structured 'folds' or 'bins' (i.e., subsets of records with a corresponding subset of the study region associated to them). We sequentially built models using all but one of these bins, which was withheld for evaluation (i.e., neither occurrence records nor background information were drawn from that region). We achieved this by masking the geographic region corresponding to the evaluation bin during each iteration (in order to avoid violating assumptions implicit in selection of the study region; Anderson and Raza 2010; Barve et al. 2011). We defined bins longitudinally, dissecting the geographic background associated with each set of records by the middle of the longitudinally closest records of adjacent bins. For each species, the number of bins chosen corresponded to one appropriate for using the same feature classes when excluding one bin, as when using all records for the final model (suggested by MAXENT according to sample size; Phillips and Dudík 2008). While multiple options were available, we chose a number that did not result in overly narrow geographic areas (i.e., to maintain spatial independence). In this way, we used six bins for Proechimys guairae (each with 9-10 records), and four for Rhipidomys venezuelae (each with 5-6 records). For Heteromys anomalus, Soley- As our criteria for evaluating performance of preliminary models, we used omission rates and values for the area under the curve (AUC) of the receiver operating characteristic plot (ROC) (Fielding and Bell 1997;Peterson et al. 2011, p. 162). For each iteration, omission rates were calculated over the masked bin only (withheld from calibration), whereas AUC values were calculated over the totality of the study region (as comparisons of this measure with presencebackground data are only valid across the same region; see Lobo et al. 2008;Peterson et al. 2011, p. 171). We used omission rate as our principal criterion, as this tests the predictive capacity of the model regarding known data (i.e., occurrence records). Hence, it allows selection of settings that minimize overfitting to the calibration data (Shcheglovitova and Anderson 2013;Radosavljevic and Anderson 2014). Complementarily, AUC provides a relative measure of the discriminatory ability of the model. However, in presence-background models, discrimination is gauged using data for which the truth is unknown (background pixels). In this way, models that assign occurrence records higher suitability values than background records are rewarded with a higher AUC value, regardless of whether the former truly hold better conditions for the species (Merow et al. 2013). For this reason, we only use this measure as a secondary criterion to assess discriminatory performance of models resulting in essentially equal omission rates. To obtain binary predictions necessary to calculate omission rates, we used the lowest presence and 10 th percentile thresholds (Pearson et al. 2007; respectively the minimum training presence and 10 percentile training omission of MAXENT). We deemed as co-optimal, settings that, when averaged across all bins, resulted in an omission rate closest to that theoretically expected at both thresholds (i.e., zero for the lowest presence threshold, 10% for the 10 th percentile threshold).
Among co-optimal settings, we chose as optimal those with the highest AUC value, unless the difference was small (arbitrarily defined as < 0.01), in which case we chose the settings with a lower regularization multiplier value (as this usually enhances model discrimination).
Results of these preliminary models showed that settings deemed as optimal did not correspond to those suggested by default (Tables S1, S2; although the latter performed relatively well for Proechimys guairae). In essence, omission rates decreased for both 'simple' and 'complex sets' of feature classes as regularization increased. However, AUC values either consistently increased or decreased with increasing regularization, depending on the species.
Optimal settings consisted of the 'simple set' of feature classes for both lineages. However, 'complex sets' employing high regularization also exhibited evaluation scores as good as those of simpler sets (see Soley-Guardia et al. 2014 for parallel patterns in models for Heteromys anomalus).
Using settings deemed as optimal in these evaluations, we built a model for each species with all records. The number of variables and parameters that were actually incorporated by MAXENT in the final model of each species are shown in Table S3. This information was obtained from the 'lambdas' file produced by each model, and it does not equate to the 'percent contribution' or 'permutation importance' provided by MAXENT (which respectively indicate the contribution of each variable during internal iterations leading to a MAXENT model, and how much a single variable affects training AUC in this model; see MAXENT tutorial). The information is also provided for the models built using the same settings but excluding the subset of records from protruding spatially marginal (PSM) localities (see below).

Detection and exclusion of records from protruding spatially marginal localities
In order to detect PSM localities, we followed the approach suggested by Soley-Guardia et al.
(2014), who developed it using one of the species included in the present study, Heteromys anomalus. Specifically, we ranked occurrence records of Proechimys guairae and Rhipidomys venezuelae according to the prediction values received by the MAXENT model made with optimal settings and all mainland records (Fig. S1). Given that the five lowest-ranking records in both species spanned two major gaps in suitability values, we retrieved descriptive habitat information for these, following results of Soley-Guardia et al. (2014).
This procedure detected two records at PSM localities for Rhipidomys venezuelae, but none for Proechimys guairae (Table S4). Given that in both lineages several of these lowestranking records were not associated with PSM localities, we did not assess whether additional records from PSM localities existed in the dataset at higher ranks (i.e., assuming that if they did exist, they did not affect the model through issues related to their spatial marginality; see main text). This simple approach led to detection of records responsible for substantial changes in estimates of suitability across the study region, which would have otherwise lead to wrongful conclusions regarding potential for connectivity among known and hypothetical populations However, we recognize that the approach of excluding only the records from PSM localities that received lower rankings than those not associated to PSM localities might present some limitations. Namely, because of MAXENT's machine-learning approach, the relative rank given to each record can vary according to the exact dataset used to calibrate the model (see also main text). This occurs because the actual variables and parameters incorporated into each model can change according to the specific occurrence dataset used (e.g., including vs. excluding records from PSM localities). In this way, records from PSM localities that did not affect the first model (i.e., variables most affected by issues of spatial marginality there were not incorporated into the model), might affect the second model once it is parameterized again (i.e., variables most affected by issues of spatial marginality now become incorporated).  (Table S3). Furthermore, for the latter species, Soley-Guardia et al.
(2014) reported four additional records from PSM localities receiving higher ranks than records not associated to PSM localities. In the second model built for that species, the relative ranking of records changed; however, the lowest one still corresponded to a record not associated to a PSM locality.

Projections of ecological niche models
Interpreting ENMs in novel environmental conditions (i.e., model extrapolation) is a significant challenge and an area in great need of research (Anderson 2013;Owens et al. 2013). In this study, for every model projected onto the peninsula, we took advantage of several MAXENT outputs to assess whether there were any issues associated to model extrapolation. These outputs consisted of the modeled response curves produced in the 'html' file, as well as the 'clamping', 'multivariate similarity surface' (MESS), and 'most dissimilar variable' (MoD) analyses (Elith et al. 2010).
Specifically, we first assessed whether response curves to variables with the highest contributions to the model (gauged by the 'percent contribution'; see MAXENT tutorial) were close to achieving minimum or maximum suitability within the calibration datasets (0 or 1, respectively, for the logistic output). When modeled responses have plateaued near maximum or minimum values, different ways of extrapolating them should lead to similar conclusions, as suitability can only vary slightly. In most instances this was the case, a condition also evident in the clamping analyses (the latter consisting of zero values only). This analysis shows whether two different ways of extrapolation-suitability is assumed to remain the same for novel values (clamping) or to continue responding following the same trend (no-clamping)-lead to different estimates of suitability (see MAXENT tutorial).
However, the fact that different ways of extrapolating do not yield different results, by no means guarantees that either way is correct (regarding estimates of suitability). This is particularly true when the response curves have plateaued near maximum values, as this response has to decrease at certain point (Anderson 2013). For this reason, we also examined results from the MESS analyses. These revealed that environments within the peninsula are at most as novel as those existing within the calibration region but not incorporated during calibration (i.e., showing blue or white, but never red colors; see MAXENT tutorial). Moreover, the MoD analyses revealed that when environments were slightly novel in the peninsula (according to the MESS analyses), this novelty was never due to the variables with highest 'percent contribution' during calibration.
For these reasons, we infer that issues associated with model extrapolation had little effect in the results of the present study. We acknowledge that these analyses only consider novel environmental values, rather than novel environmental combinations (Owens et al. 2013).
However, the issue of interactions among variables in general is one that clearly needs further exploration within the field of ENM, outside the scope of the currently available tools for MAXENT.

Transforming the models into categorical predictions of suitability
To test the experimental predictions, we transformed the continuous estimates of suitability given by MAXENT into categorical ones (continuous predictions shown in Fig. S2). We did so by applying two thresholds related to suitability values assigned to particular mainland records.
Specifically, we first used the lowest presence threshold (Pearson et al. 2007; minimum training presence of MAXENT), which deems as suitable any pixel that received a value equal to or higher than the least-suitable record used to calibrate the model (i.e., even if the suitability assigned to a known record is very low, it should be sufficient for the species, given that it was found there).
However, to reduce oversimplification, we also applied a stricter threshold denoting areas of higher suitability. Whereas multiple values can suit this purpose (e.g., excluding the 10 th or 25 th percentiles of calibration records), we chose one that at the same time would allow us to assess whether the effect of PSM localities could be reduced simply by applying a stricter threshold in the original model. Specifically, we used the value given to the lowest-ranking record that did not represent a PSM locality; hence, this threshold deems unsuitable areas that received values as low as those assigned to records from PSM localities. In this way, for Rhipidomys venezuelae, where we detected two records from PSM localities, the stricter threshold corresponded to the 9 th percentile. For Heteromys anomalus, Soley-Guardia et al. (2014) detected 15 records from PSM localities; hence, the stricter threshold corresponded to the 12 th percentile. For Proechimys guairae, where records from PSM localities were not detected, we arbitrarily set the stricter threshold using the 9 th percentile. For consistency, these percentile thresholds were also applied to the second models built without PSM localities, where they simply denote areas of higher suitability (as is also the case in the model built for P. guairae). The logistic values used to set these thresholds are given for all lineages in Table S5.   Heavily forested and more frequently flooded than the El Orinoco area" (Hershkovitz 1947 (Voss 1991, p. 68-70). Tall evergreen forest, only patches are anthropogenic (Voss R.S. field notes 1986). Rhipidomys was collected on trees in the forest, its border with coffee plantations, or on coffee plantations (Kafka, H.L. field notes 1986; Torrealba, I. field notes 1986).  (Hershkovitz 1947)".
[Villanueva] "the primitive forest cover has been reduced to small isolated stands, scattered trees, shrubs and thinly wooded stream banks" (Hershkovitz 1960 "low ridge that is predominately covered by thorn forests but that also includes relatively mesic vegetation along a seasonal watercourse… There, suitable habitat for the species exists due to mist coming off the ocean, surrounded by a matrix of xerophytic habitat on the remainder of the Península de Paraguaná... The most common vegetative formations belong to dry and very dry tropical forest, with denser formations in the gallery forests or streambeds. Due to its proximity to the sea, mist accumulates in the morning hours although it has no permanent rivers or streams... Ángela Martino-G. kindly provided unpublished information regarding the only known specimens of Heteromys oasicus from Monte Cano. All were captured among terrestrial bromeliads in the bed of a seasonal stream, which was dry at the time. This small stream, which is the only significant watercourse in Monte Cano reserve, flows through deciduous forest; however, trees along the stream form a semideciduous formation that maintains shade throughout the year… Traps were also placed in the reserve outside the streambed, but they yielded no Heteromys" (Anderson 2003 This locality corresponds to a local aquifer with a higher tree density that results in locally cooler conditions. The trees are covered with vines growing along the canopy, which grant the place its local name 'La Cueva de Yabuquiva' [i.e., the cave] (Ochoa-G., J. in litt.; translated by MSG). "A thorn forest-desert scrub area, with a lot of medium sized succulent leaved trees. There are two windmills [used to extract water by the locals] and scattered pools of water occur in the area. Very little ground cover. The trees are leafless up to about 1.5 m. It looks as if the water level gets that high during part of the year. Only a few thorn trees and Opuntia are found here. The soil is a clay sand. No rocks could be found... The pools of water are used as watering places... Area #24 [on the other side of the road from #19]... A thorn forest-desert scrub area west of area# 19... Thorn trees, Opuntia, column cactus, and some low shrubs and grasses can be found but not the succulent leaved trees of area #19. The thorn trees are about 5 m apart and about 8 m high. There are no water holes in this area" (Peterson, N.E. field notes 1968). Table S5. Thresholds used in each species to transform the continuous estimates of suitability of the MAXENT models into categorical ones. The 'lenient threshold' corresponds to the lowest presence threshold of MAXENT and denotes all areas suitable to the species; the 'stricter threshold' is a percentile threshold that denotes areas of higher suitability (see text for details).

Ana
Here we report all specimens examined from Península de Paraguaná, exclusive of Cerro Santa Ana (specimens from the latter are reported in Anderson et al. 2012). Secondary information deriving from sources other than the collector is mentioned within brackets, followed by the source when applicable. Localities are arranged from north to south. For each entry, boldface type indicates the place name to which geographic coordinates correspond. We estimate that most coordinates have a maximum error of < 0.5 km; numbers of decimals vary according to the