Determinants of habitat suitability models transferability across geographically disjunct populations: Insights from Vipera ursinii urs inii

Abstract Transferability of habitat suitability models (HSMs), essential to accurately predict outside calibration conditions, has been seldom investigated at intraspecific level. We targeted Vipera ursinii ursinii, a meadow viper from southeastern France and central Italy, to assess determinants of transferability among geographically disjunct populations. We fitted HSMs upon occurrences of the Italian and French populations separately, as well as on the combined occurrence dataset. Internal transferability of HSMs, on spatially independent test data drawn from the calibration region, and their external transferability on the geographically disjunct populations were evaluated according to (a) use of full or spatially rarefied presence datasets; (b) ecology‐driven or statistics‐driven filtering of predictors; (c) modeling algorithm, testing generalized additive models and gradient boosting models; and (d) multivariate environmental novelty within test data. Niche overlap between French and Italian populations was also tested. Niche overlap was low, but niche divergence between the two populations’ clusters was not corroborated. Nonetheless, wider niche breadth and heterogeneity of background environmental conditions characterizing the French populations led to low intercluster transferability. Although models fitted on the combined datasets did not attain consistently higher internal transferability than those separately fitted for the French and Italian populations, ensemble projection from the HSMs fitted on the joint occurrences produced more consistent suitability predictions across the full range of V. u. ursinii. Spatial thinning of occurrences ameliorated internal transferability but did not affect external transferability. The two approaches to predictors filtering did not differ in transferability of the respective HSMs but led to discrepant estimated environment–occurrence relationships and spatial predictions, while the two algorithms attained different relative rankings depending on the considered prediction task. Multivariate novelty of projection sites was negatively correlated to both internal transferability and external transferability. Our findings clarify issues researchers should keep in mind when using HSMs to get predictions across geographically disjunct populations.

separately, as well as on the combined occurrence dataset. Internal transferability of HSMs, on spatially independent test data drawn from the calibration region, and their external transferability on the geographically disjunct populations were evaluated according to (a) use of full or spatially rarefied presence datasets; (b) ecology-driven or statistics-driven filtering of predictors; (c) modeling algorithm, testing generalized additive models and gradient boosting models; and (d) multivariate environmental novelty within test data. Niche overlap between French and Italian populations was also tested. Niche overlap was low, but niche divergence between the two populations' clusters was not corroborated. Nonetheless, wider niche breadth and heterogeneity of background environmental conditions characterizing the French populations led to low intercluster transferability. Although models fitted on the combined datasets did not attain consistently higher internal transferability than those separately fitted for the French and Italian populations, ensemble projection from the HSMs fitted on the joint occurrences produced more consistent suitability predictions across the full range of V. u. ursinii. Spatial thinning of occurrences ameliorated internal transferability but did not affect external transferability. The two approaches to predictors filtering did not differ in transferability of the respective HSMs but led to discrepant estimated environment-occurrence relationships and spatial predictions, while the two algorithms attained different relative rankings depending on the considered prediction task. Multivariate novelty of projection sites was negatively correlated to both internal transferability and external transferability. Our findings clarify issues researchers should keep in mind when using HSMs to get predictions across geographically disjunct populations.

| INTRODUC TI ON
The distribution of species is shaped by the interplay across space and time of abiotic and biotic factors. The presence of a species in a certain region requires the realized environment (i.e., the combination of abiotic conditions characterizing a geographical area) matching its physiological tolerances (i.e., its fundamental niche) (Guisan et al., 2014). The lack of physical barriers impeding the species to reach favorable habitats and the availability of trophic resources represent additional critical factors for the persistence of populations (Soberón & Nakamura, 2009). Usually, not all the areas matching such requirements are occupied. For instance, species which are nowadays patchily distributed may have continuously occurred in the past across wide extents: dramatic climate change, such as during the Quaternary glaciations, may have driven environmental conditions within portions of their historic range out of their fundamental niche, leading to local extinctions (Hewitt, 2000;Iannella et al., 2017). When existing clusters of populations are geographically segregated, with scant possibility of gene flow, intraspecific diversification may arise due to genetic drift. Moreover, different abiotic conditions characterizing the disjunct occurrence areas, though still comprised within the species' fundamental niche, and compositional differences of the respective biotic communities may boost intercluster diversification (Borer et al., 2011;Maia-Carvalho et al., 2018;Serra-Varela et al., 2015). The degree to which disjunct distribution of intraspecific lineages or closely related species is associated with divergence or conservatism of the respective environmental niches has been widely studied in recent years (Martínez-Freiría et al., 2020;McCormack et al., 2010;Rato et al., 2015).
Habitat suitability models (HSMs) are correlative models that estimate the relationship between the target biological entity and environmental predictors (e.g., climate, vegetation, topography, trophic resources) based on information about occurrence patterns of such entity across the study area . HSMs, also termed species distribution models (SDMs) or ecological niche models (ENMs) depending on whether the "modeling landscape" is a concrete geographical space or only an environmental one (Owens et al., 2013), are increasingly used to investigate the way historical and current abiotic and biotic drivers shape distributions at different scales (Acevedo et al., 2012;Iannella et al., 2018;Reino et al., 2017).
Indeed, once the entity-environment relationship has been estimated upon the available occurrence data (i.e., model calibration), HSMs may be projected to new scenarios (i.e., model transfer), such as past or future temporal horizons or distant geographical regions, predicting how suitable conditions to the entity would be distributed therein (Elith & Leathwick, 2009).
Given this predictive potential of HSMs, various researches aimed at evaluating their spatial and/or temporal transferability (Heikkinen et al., 2012;Qiao et al., 2019;Veloz et al., 2012), namely the degree to which their predictions would be reliable and biologically realistic outside calibration conditions. Transferability of HSMs has been related to several factors, such as sampling bias in calibration data, species' traits, model complexity, and similarity of environmental conditions between calibration and projection spatiotemporal scenarios (Werkowska et al., 2017;Yates et al., 2018).
Particularly, when environmental conditions in the projection region/timeframe fall outside the range of values characterizing one or more variables within calibration data (i.e., univariate/strict novelty), or include novel combinations of values which do not exceed the calibration range considering each variable separately (i.e., multivariate novelty), HSMs are forced to extrapolate the estimated entity-environment relationship into these "unknown territories," exacerbating the risk of spurious predictions (Zurell et al., 2012). Some populations' clusters of rare or cryptic species showing patchy distributions may be more extensively investigated than others, as emerged for some European bats , viperids (Mizsei et al., 2016), beetles (Bosso et al., 2013), and plants (Fois et al., 2015). In this context, calibrating HSMs on a thoroughly sampled portion of the species' range and projecting them to less explored regions seems a promising strategy to guide new field campaigns in these latter (Fois et al., 2018), possibly leading to the discovery of previously unknown populations (Mizsei et al., 2016;. However, if calibration data do not fully cover the range of environmental conditions characterizing the species' realized niche (i.e., its fundamental niche constrained by realized environment and biotic interactions), HSMs may fail in predicting species' potential presence in areas actually suitable but not surveyed yet . Moreover, the possibility of noticeable differences in the realized environment between calibration and projection areas usually increases with their geographic distance, worsening projection pitfalls due to extrapolation requirements (Qiao et al., 2019). Finally, if populations' clusters have been segregated for dozens of millennia, possible intraspecific niche differentiation may have occurred in the meanwhile, making the HSMs calibrated on a cluster difficultly transferable to a different one (Carretero & Sillero, 2016).
HSMs have been increasingly applied at intraspecific level, for instance to quantify the contribution of niche divergence to the genetic diversification of population lineages with allopatric distribution (Maia-Carvalho et al., 2018;Martínez-Freiría et al., 2020) or to assess the relative weight of abiotic and biotic drivers in microhabitat K E Y W O R D S extrapolation, habitat suitability models, niche differentiation, spatial thinning, transferability, Vipera ursinii ursinii selection (Peñalver-Alcázar et al., 2016). Nonetheless, to the best of our knowledge, the factors affecting the transferability of HSMs among populations' clusters having been geographically segregated for a long time were seldom targeted, although some previous attempts to unveil them do exist (Acevedo et al., 2014;Carretero & Sillero, 2016).
To fill in this gap of knowledge, we investigated HSMs transferability among geographically segregated populations targeting Vipera ursinii ursinii (Bonaparte, 1835) as a model system. Phylogeography of V. ursinii was shaped by the Quaternary glacial-interglacial cycles, with Balkan Peninsula probably acting as diversification center : Since 2.1 m.y.a., the ancestral lineage leading to the current Italian and French V. u. ursinii and Croatian V. u. macrops (Méhely, 1911) populations moved northwards from Greece to Croatia and Slovenia, while the lineage from which V. u. moldavica Nilson, Andrén andJoger, 1993 andV. u. rakosiensis Méhely, 1893 diversified emerged between Greece and Montenegro and later moved to the northeast toward the steppes of Hungary, Romania, and Moldova Zinenko et al., 2015). Successively, V. u. ursinii colonized the Alpine arc and then spread southwards in Italy . The current disjunct distribution of V. u. ursinii between western Alps in southeastern France and the Apennines massif in central Italy, with populations occurring in isolated patches dominated by alpine and subalpine grasslands (Luiselli, 2004;Lyet et al., 2013), is assumed to derive from contractions of suitable habitats during the Pleistocene interglacial periods, when forests replaced steppes at lower altitudes . Since the divergence between the Italian and French V. u. ursinii populations started around 0.6 m.y.a. , niche differentiation between them might have occurred: In this case, HSMs fitted on occurrences from the French populations would likely be poorly transferable to the Italian ones, and vice versa. Therefore, because of its distributional patterns and phylogeographic history, we chose V. u. ursinii to investigate the factors affecting transferability among disjunct populations. For this purpose, we built state-of-the-art HSMs upon occurrence data of the French populations, of the Italian ones, and of the two populations' clusters together, using either all the available occurrences or spatially rarefied subsets thereof. Moreover, since the way predictors are chosen is deemed to be an important driver of predictive performance (Bucklin et al., 2015;Werkowska et al., 2017), HSMs were fitted using either a set of uncorrelated predictors entirely filtered through statistical procedures or a set of predictors resulting from a preliminary check for multicollinearity followed by critical selection of variables linked to V. u. ursinii known autoecology. Once HSMs were calibrated, internal transferability (i.e., predictive performance on test data drawn from the calibration region) was evaluated on spatially independent test samples, while external transferability (i.e., predictive performance on a disjunct geographic area) was assessed validating the HSMs built for the French populations on the Italian ones and vice versa. Moreover, we implemented the analytical framework proposed by Broennimann et al. (2012) to investigate whether V. u. ursinii disjunct distribution is associated with some degree of divergence between the environmental niches of the French and Italian populations.

| Occurrence data, spatial thinning, and pseudoabsences
We gathered V. u. ursinii ( Figure 1) occurrence data (GPS coordinates or specific toponym for which uncertainty about observation point was less than 1 km 2 ) ranging from 1980 to 2017. Presence records for the Italian populations were extracted from the database on European occurrences of V. ursinii built by Console et al. (2020); for the French populations, we integrated occurrences from this database with observations extracted from the "SILENE faune" platform managed by the CEN-PACA (Conservatoire d'Espaces Naturels de Provence-Alpes-Côte d'Azur). This way, we collected 1,376 presence records for the French populations and 302 records for the Italian ones. Based on these two sets of occurrences, we used the "alphahull" (Rodríguez Casal & Pateiro López, 2010) R (R Core Team, 2019) package to draw alphahull-based polygons, which provide less biasprone range estimates than minimum convex polygons (Burgman & Fox, 2003): Such polygons were successively overlaid on maps obtained from HSMs projection across the study area (see "Ensemble predictions and climate-occurrence relationships") to more intuitively show how predicted suitability was distributed within the estimated range of the two populations' clusters.
Occurrence records were then allocated to three different calibration groups: (a) "France," containing only occurrences of the French populations; (b) "Italy," containing only the Italian records; (c) "Joint," comprising occurrences from both populations' clusters.
Occurrences were spatially rarefied through the "spThin" R package (Aiello-Lammens et al., 2015): We performed three thinning replicates for each populations' cluster (i.e., "France" and "Italy"), setting 1.5 km as minimum allowed distance between two thinned occurrences to rarefy presence points falling within neighboring raster cells of the predictors (see "Environmental predictors"); then, for F I G U R E 1 A specimen of Vipera ursinii ursinii (Bonaparte, 1835) pictured by one of the authors on emerging rocks within a mountainous meadow in central Italy each cluster, we selected the replicate providing the highest number of residual presence points, to preserve as much information as possible about the environment-occurrence relationship. The thinned occurrences retained for the French and Italian populations were then joined to build the "Joint" thinned dataset. Thus, we used six different presence datasets, resulting from three calibration groups * two occurrence data sizes (hereafter, "Full" and "Thin"), to calibrate the HSMs.
Since the algorithms used to fit the HSMs (see "Model fitting and evaluation") require both presences and absences, we generated, for each presence dataset, geographically buffered pseudoabsences following the "1° far" approach, claimed to assure good performances with the chosen algorithms (Barbet-Massin et al., 2012). By means of the "rgdal" R package (Bivand et al., 2019), we first drew an inner buffer polygon of 1.5 km radius and an outer buffer polygon of 120 km (approximately 1°) radius around each occurrence; the two polygons were subsequently subtracted to get a final buffer ranging from 1.5 to 120 km around each occurrence; the so-obtained buffer polygons were merged and then clipped to the boundaries of the study area, to finally generate 10 sets of 1,000 pseudoabsences each by random selection within the merged and clipped polygon. This way, neither a presence point could be incidentally selected as pseudoabsence, nor pseudoabsences could fall too close to any occurrence site. We finally joined each presence dataset with the corresponding 10 sets of pseudoabsences, obtaining the presence-pseudoabsence (hereafter Pres-PseudoAbs) datasets used for model fitting.
Elevation gradient across the study area, alphahull-based range estimates for the French and the Italian populations, and the respective buffer polygons from which pseudoabsences were drawn are show in Figure 2a.

| Climate-related predictors
Climatic variables represent important predictors of the environmental suitability for herptiles (Guisan & Hofer, 2003;Lunghi et al., 2018;Mizsei et al., 2016). Raster files of nineteen temperatureand precipitation-related variables (Bio1-Bio19) at 30 arc-seconds resolution (~1 km 2 at the equator) were downloaded from WorldClim 2.0 online repository (Fick & Hijmans, 2017). From the same repository, we also downloaded raster files of monthly averaged daily solar radiation (kJ m −2 day −1 ) and wind speed (m/s) at the same resolution and processed them in R to obtain ten new variables related to (a) annual trends, represented by yearly averaged daily solar radiation (Srad_Ann_Mean) and wind speed (Wspeed_Ann_Mean) and their standard deviation (Srad_Ann_Sd and Wspeed_Ann_Sd); F I G U R E 2 (a) Altitudinal gradient characterizing the study area; alphahull-based polygons estimating the current range of the Italian (brown) and French (blue) Vipera ursinii ursinii populations are overlaid on the map, along with the background polygons (gray contour) from which the pseudoabsences needed to fit the HSMs were generated. (b) Raster maps representing the percent cover of grasslands within each cell in the occurrence range (and its surroundings) of the Italian (left) and French (right) populations. (c) Raster maps representing the percent cover of rocky or sparsely vegetated areas within each cell in the occurrence range (and its surroundings) of the Italian (left) and French (right) populations (b) activity period of V. u. ursinii, which goes from late spring to early autumn (Lisse et al., 2012;Luiselli, 2004), represented by mean and standard deviation of daily solar radiation and wind speed during the May-October period (Srad_MayOct_Mean, Srad_MayOct_Sd, Wspeed_MayOct_Mean, and Wspeed_MayOct_Sd); (c) solar radiation extremes, namely mean daily solar radiation of the least sunny month (Srad_LSM) and of the sunniest month (Srad_MSM). The raster files of all these climate-related variables were then projected from the original WGS84 geographic coordinate system to ETRS89-LAEA to maintain equal-sized cells across the latitudinal range of the study area.

| Habitat-related predictors
Previous studies implementing HSMs on V. u. ursinii (Lyet et al., 2013) and other Vipera species (Santos et al., 2006) included as predictors also vegetation types, land use, and human disturbances, which actually affect the composition of animal communities and the distribution of single taxa at regional-to-local scales (Iannella et al., 2016;Luoto et al., 2007). Here, to derive predictors representing habitat types linked to V. u. ursinii autoecology, we downloaded raster files of Corine Land Cover ("CLC") data at of presence records fall within "Natural grasslands," "Bare rocks," and "Sparsely vegetated areas," we aggregated through bilinear interpolation the raster cells of CLC_2012 to the same resolution of climate-related rasters and derived two habitat-related predictors: (a) the percent cover of the "Natural grasslands" class within each aggregated raster cell (hereafter termed "Grasslands") and (b) the percent cover of the "Bare rocks" and "Sparsely vegetated areas" classes within each aggregated cell (hereafter, "Rocks_SparseVeg").

| Ecology-based versus statistics-based filtering of predictors
Joining climate-related variables and habitat-related ones, we considered a total of 31 candidate predictors. Since a high number of collinear variables may lead to unstable estimation of model parameters and biased contributions of the single predictors to the environment-occurrence relationship (Dormann et al., 2013), we first individuated the pairs of predictors showing Pearson's r correlation coefficient ≥ 0.7 across the study area extent.
Some authors suggest that selecting the variable presumed to be more significant for the species' autoecology from each pair of correlated predictors makes HSMs predictions less bias-prone and more easily interpretable (Rissler & Apodaca, 2007). Differently, others claim that hidden correlation structures may persist after a preliminary filtering of predictors based on pairwise correlations  and suggest more statistically robust techniques such as the variance inflation factor (VIF) analysis to filter candidate predictors Werkowska et al., 2017). Here, to evaluate the effect of different predictor filtering approaches upon HSMs internal and external transferability, we generated two sets of predictors. The former, hereafter named "Ecol", was built selecting the variables deemed to more directly influence V. u. ursinii ecological requirements among the ones showing multicollinearity issues (i.e., Pearson's r ≥ 0.7). The second, hereafter termed "VIF," derived from a two-stage procedure. First, we performed a VIF analysis across the study area and discarded the variables exceeding the rec-
GAM is based on smoothing functions permitting to fit the modeled response to the data when their relationship would be difficultly fitted through standard parametric methods .
Differently, GBM implements stochastic boosting to build several simple regression trees in a forward stagewise fashion, focusing at each iteration on residuals from the set of trees built up to that point, finally combining them in a single optimized model (Elith et al., 2008).
We chose GAM and GBM because these algorithms attained high predictive performance in previous transferability-focused studies considering multiple species (Heikkinen et al., 2012) or virtual species (Qiao et al., 2019), but to the best of our knowledge, their transferability at intraspecific level among disjunct populations has not been evaluated yet, contrarily to some presence-only algorithms (Carretero & Sillero, 2016).
Here, GAMs were fitted by means of the "gam" R package (Hastie, 2018), setting binomial distribution and cubic-spline smoother with maximum smoothing degree ("s") of 3; the initial model was then refined by means of stepwise selection (with both the forward and backward directions tested) of predictors via Akaike's information criterion (AIC) . GBMs were instead fitted through the "gbm" R package (Greenwell et al., 2019), setting learning rate = 0.001, three-way interactions (i.e., "tree complexity" = 3), maximum number of trees = 10,000, and fivefold internal cross-validation.
Occurrence data often show internal dependence structures possibly leading to model overfitting, incorrect estimation of errors, and inflated performance metrics (Roberts et al., 2017;Veloz, 2009). In our case, the restricted range of the two populations' clusters could exacerbate spatial autocorrelation issues; in such situations, the commonly used cross-validation approaches (e.g., k-fold or repeated split-sample) would lead to overoptimistic error estimates and differences in predictive performance between nearby and distant test localities (Roberts et al., 2017;Valavi et al., 2019). Thus, we opted for spatial blocking cross-validation. To define an appropriate block size we fitted a GAM and a GBM model for each Pres-PseudoAbs dataset, we computed residuals from the predictions of these "full data" HSMs on calibration data, and we subsequently drew correlograms on residuals through the "ncf" R package (Bjornstad, 2019) to visualize the variations of Moran's Index at increasing interpoint distances (10 km pace): The distance at which Moran's Index stably approaches 0 represents the range of spatial autocorrelation, and block size should be greater than this distance to adequately assess prediction errors (Roberts et al., 2017;Valavi et al., 2019). Once individuated the optimal size, blocks were built through the "blockCV" R package (Valavi et al., 2019) and assigned to the training and test folds in a checkerboard fashion: this blocking structure makes environmental heterogeneity more evenly distributed between the training and the test fold compared with the grouping of contiguous blocks into a same fold, which would favor extrapolation-related low predictive performance on test data (Roberts et al., 2017).
We assessed internal transferability of the obtained HSMs on the respective test folds (hereafter, "SpBlock CV") by means of two metrics, the area under the curve (AUC) of the receiver operating characteristic (ROC) plot (Fielding & Bell, 1997) and the Continuous Boyce Index (Hirzel et al., 2006). AUC, a discrimination metric widely used in HSM literature Mammola et al., 2018;Yates et al., 2010), is measured plotting true-positive rate (i.e., sensitivity) against false-positive rate (i.e., 1-specificity) along increasing suitability thresholds. When HSMs are built using pseudoabsences instead of true absences, AUC can only provide comparative evaluations among HSMs built for a same entity through various algorithms and/or parameterizations, or among HSMs built for entities whose occurrences and pseudoabsences are collected from a same extent (Jiménez-Valverde et al., 2013;Lobo et al., 2008). Differently, the Continuous Boyce Index considers how occurrences are distributed within the range of predicted suitability: A window of fixed width is iteratively moved along the suitability range, each time computing the predicted-to-expected ratio Fi (i.e., proportion of occurrences divided by the proportion of raster cells falling within the window) and plotting it against the mean/median suitability value of the window; the Spearman correlation coefficient between Fi and the mean/median suitability values is then calculated (Hirzel et al., 2006). The Continuous Boyce Index plot and the respective correlation coefficient (hereafter indicated as B), ranging between −1 (counter-predictions) and 1 (optimal predictions), can be used to contrast predictive performance of HSMs calibrated on Pres-PseudoAbs data from different areas, as this metric refers only to presences . Therefore, we used AUC to assess accuracy differences between HSMs fitted through GAM and GBM,

| Ensemble predictions and climate-occurrence relationships
Previous researches suggested that predictions based on ensemble of HSMs could be more informative than predictions from single models (Araújo & New, 2007;Marmion et al., 2009)

| Niche overlap
We implemented the PCA-Env approach described in Broennimann et al. (2012) using the "ecospat" package: a PCA performed on sites randomly sampled from the entire study area permits to derive a 2D space whose axes represent the components best summarizing environmental variability across the study area; within the gridded 2D environmental space, the "niche occupancy" of the target entity is estimated comparing the kernel-smoothed density of occurrence of the entity in each cell with the density of available environmental conditions within the same cell. We extracted kernel-smoothed niche occupancies for the Italian and French populations, subsequently computing Schoener's D metric (Schoener, 1970) to estimate their overlap. Then, we performed the niche equivalency and niche similarity tests (Warren et al., 2008). The equivalency test compares the observed D to a set of D values from virtual niches obtained through random reallocation of occurrences between the two entities, and niche equivalency cannot be rejected unless the observed For both tests, 1,000 virtual niches were created; for the similarity test, we considered as background area for each populations' cluster the buffer polygon drawn around the respective "Full" occurrence dataset to generate pseudoabsences.

| RE SULTS
The spatial thinning led to the retention of 51 occurrences for the French populations and 49 occurrences for the Italian ones. Thus, while the "Full" datasets comprised 1,376 presence points for the "France" calibration group, 302 for the "Italy" group, and 1,678 for the "Joint" one, the corresponding "Thin" datasets comprised 51, 49 and 100 occurrences, respectively.
The two approaches to predictors filtering produced final sets differing both in the number of predictors and in the climatic trends represented (Table 1). Out of the nineteen temperature-and precipitation-related variables (Bio1-19), six were selected for the "Ecol" set while five were retained in the "VIF" one, with the two sets sharing only Bio9 (mean temperature of driest quarter) and Bio15 (precipitation seasonality). The variables related to solar radiation  (Figure 2b,c), and these habitat-related variables were retained both in the "Ecol" and in the "VIF" sets.
Correlograms showed that Moran's Index approached 0 within the 50-60 km distance bin for all the calibration group * occurrence data size combinations except the "Joint-Full" one; for this latter, spatial auto correlation became negligible (i.e., Moran's Index = 0 ± 0.05) only at 250-300 km interpoint distance ( Figure S1); thus, we performed checkerboard blocking choosing 275 km as block size for the "Joint-Full" combination, and 60 km as block size for the others ( Figure S2).

Predictive performance in terms of AUC and Continuous Boyce
Index (B) of the 480 fitted HSMs is reported, for both "SpBlock CV" and "External validation," in a table provided as Supplementary Information.
Neither normality nor homoscedasticity was clearly fulfilled within any of the linear models relating the various considered factors to AUC and B values within "SpBlock CV" and "External validation" (see below), so we used nonparametric testing for subsequent comparisons.
No evident differences in AUC within "SpBlock CV" emerged between the HSMs fitted upon the "Ecol" set of predictors and those fitted upon the "VIF" one, while GBM apparently provided better discrimination than GAM (Figure 3a). The linear models AUC ∼ Algorithm * Set of predictors fitted for each combination of calibration group * occurrence data size confirmed this trend, as only the algorithm factor emerged to significantly affect discrimination performance for most of the combinations (Table S1). Thus, onetailed Mann-Whitney U tests were performed to evaluate interalgorithm differences: GBM was confirmed to attain significantly higher AUC than GAM for "France-Full" (n = 80, U = 1,022, p = 0.02), "Italy-Full" (n = 80, U = 1,155.5, p < 0.001), "Italy-Thin" (n = 80, U = 1,233.5, p < 0.001), and "Joint-Thin" (n = 80, U = 1,094.5, p = 0.002), though estimated differences between the two algorithms were mild (est. diff. = 0.01-0.09).
The additive linear model B ∼ Calibration group + Occurrence data size + Algorithm + Set of predictors showed that, within "SpBlock CV", B scores were not influenced by the set of predictors the HSMs were fitted upon ( Figure 3b, Table S2). The refined linear model B ∼ Calibration group * Occurrence data size + Algorithm confirmed significant effect of the other three factors on B scores; moreover, the positive effect of data thinning emerged as significantly less pronounced for the "Italy" and "Joint" groups than for the "France" one ( Figure 3c, Table S2). The Kruskal-Wallis test confirmed significant differences across the three calibration groups (χ 2 = 38.6, df = 2, p < 0.001). The subsequent pairwise Wilcoxon signed-rank tests showed that B from "Italy" group was significantly higher than B from the "France" (p < 0.001) and "Joint" (p < 0.001) ones, and that B from the "Joint" group was higher than B from the "France" one (p = 0.001).
Within "External validation," the linear models AUC ∼ Algorithm * Set of predictors showed that algorithm affected discrimination performance for all the calibration group * occurrence data size combinations (Figure 4a, Table S3), while the set of predictors emerged as influential only for "France-Full" (AUC lower for "VIF" than for "Ecol") and "Italy-Full" (AUC higher for "VIF" than for "Ecol") ( Figure 4b, Table S3). One-tailed Mann-Whitney U tests showed that GBM attained significantly higher AUC than GAM for all the combinations ("Italy-Full": n = 80, U = 1,230.5, p < 0.001;  Note: "Ecol" indicates the set of predictors resulting from a preliminary check for pairwise collinearity through Pearson's r correlation coefficient, followed by critical evaluation of which variables could be more directly linked to the known autoecology of Vipera ursinii ursinii. "VIF" indicates the set of predictors obtained through a filtering procedure totally based on variance inflation factor analysis. The additive linear model B ∼ Calibration group + Occurrence data size + Set of predictors + Algorithm fitted for "External validation" did not confirm the positive effect of spatial thinning on B scores which emerged within "SpBlock CV" (Figure 5a, Table S4).
The refined linear model B ∼ Calibration group + Algorithm * Set of predictors suggested that HSMs from the "Italy" group performed better than those from the "France" one in terms of B, while GBM appeared to perform worse than GAM; in this case, the chosen set of predictors did not emerge as influential factor, though marginally significant positive interaction resulted between GBM and "VIF" Table S4). One-tailed Mann-Whitney U tests confirmed that HSMs fitted for the "Italy" group actually attained higher B scores than those fitted using only the French occurrences (n = 156, U = 1,909.5, p = 0.001, est. diff. = 0.43), and that GBM models obtained significantly lower B scores than GAM ones (n = 156, U = 3,030, p < 0.001, est. diff. = −0.35).
Comparing the two validation approaches, the HSMs fitted on the "Italy" and "France" groups attained higher predictive performance, in terms of both AUC and B, within "SpBlock CV" than within "External validation" (Figures 4, 5). The linear models AUC ∼ Validation approach (fitted for each calibration group * occurrence data size combination) and B ∼ Validation approach confirmed significant effect of validation approach on both metrics (Table S5). One-tailed Mann-Whitney U tests showed that AUC was significantly higher in "SpBlock CV" than in "External validation" for all the combinations ("France- The percentage of test sites showing multivariate novelty was clearly higher within "External validation" (more than 99%) than within "SpBlock CV" (between 60% and 90%); contextually, variability in AUC values was higher in the former validation approach than in the latter, especially for the "France" group ( Figure S3). Within "SpBlock CV," the decrease in AUC and B scores at increasing multivariate novelty was more pronounced for GAM than for GBM ( Figures S3, S4).
While no differences emerged between "Ecol" and "VIF" considering predictive performance in terms of Continuous Boyce Index at increasing percentage of non-analog test sites ( Figure S4a), the lower F I G U R E 3 (a) Box and whiskers plots showing, for each combination of calibration group ("France," "Italy," "Joint") * occurrence data size ("Full," "Thin"), the AUC scores that the HSMs fitted through the GAM or GBM algorithm, using the "Ecol" or "VIF" set of predictors, attained when validated on spatially independent test samples drawn from the calibration region ("SpBlock CV"). (b) Box and whiskers plots showing the Continuous Boyce Index scores that the HSMs fitted upon the different calibration groups, using the "Ecol" or "VIF" set of predictors, attained in "SpBlock CV." (c) Box and whiskers plots showing the Continuous Boyce Index scores that the HSMs fitted upon the different calibration groups, using the "Full" or "Thin" occurrence dataset, attained in "SpBlock CV"

F I G U R E 4 Box and whiskers plots
showing, for each combination of calibration group ("France," "Italy") * occurrence data size ("Full," "Thin"), the AUC scores that the HSMs attained when validated on spatially independent test samples drawn from the calibration region ("SpBlock CV") or on test samples drawn from the geographically disjunct populations' cluster ("External"), according to the (a) algorithm (GAM, GBM) and (b) set of predictors ("Ecol," "VIF") used to fit the model F I G U R E 5 Box and whisker plots showing, for the "France" and "Italy" calibration groups, the Continuous Boyce Index scores that the HSMs attained when validated on spatially independent test samples drawn from the calibration region ("SpBlock CV") or on test samples drawn from the geographically disjunct populations' cluster ("External"), according to the (a) occurrence data size ("Full," "Thin") and (b) set of predictors ("Ecol," "VIF") used to fit the model multivariate novelty of the "Thin" datasets compared with the "Full" ones within "SpBlock CV" appeared to be associated with higher B scores for HSMs fitted on the thinned data ( Figure S4b).
No major differences in suitability patterns predicted within the occurrence range of the French and Italian populations emerged between ensemble projections from "Ecol" (Figure 6a1,b1,c1) and "VIF" Differently, ensemble projections from the "Italy" and "France" groups fairly modeled the current distribution of the respective populations but predicted relatively low suitability (Weighted_Avg_ HS = 0.1-0.4) within the range of the geographically disjunct cluster (Figure 6a1,a2,b1,b2; Figure S5a1,a2,b1,b2).
Considering the weighted standard deviation of suitability (Weighted_StdDev_HS), ensemble projections from the "Joint" group showed lower variability than those from the "France" and "Italy" ones in suitability patterns predicted outside the range of the two populations' clusters, particularly for the thinned datasets ( Figure 7). Nonetheless, most of areas outside the French and Italian V. u. ursinii ranges predicted as highly suitable within ensemble projections from the "Joint" group still showed moderateto-high variability among predictions of the component HSMs (i.e., Weighted_StdDev_HS > 0.5) (Figure 7c1,c2; Figure S6c1,c2).

F I G U R E 8
Analysis of niche overlap between the Italian and the French Vipera ursinii ursinii populations through the "PCA-Env" approach.
(a) contributions of the input predictors to the first two principal components resulting from the PCA; (b) position of occurrence points from the Italian (brown) and the French (blue) populations, as well as of background points used to calibrate the PCA-Env (green), within the 2D environmental space defined by the two principal components; (c) density of occurrence of the Italian and French populations in the 2D environmental space, with solid contour lines representing the full environmental background and dashed contour lines representing 50% of the background environment; (d) simulated (histograms) and observed (red line) niche overlap D values compared within the niche equivalency and niche similarity (alternative hypothesis: niche divergence) tests Ensemble projections from "Ecol" and "VIF" did not evidently differ in terms of Weighted_StdDev_HS, although those from HSMs fitted on the thinned datasets for the "France" and "Italy" groups showed somewhat higher variability for "Ecol" than for "VIF" outside the range of the two populations' clusters ( Figure 7a1,a2,b1,b2).
The PCA-Env performed on the full set of occurrences of the Italian and French populations, considering all the 31 candidate predictors, resulted in 60.1% of the total variability explained by the first two principal components (PrinComp). Occurrences of the two populations' clusters partially overlapped in the upper-right quadrant of the 2D environmental space, apparently associated to the habitatrelated variables (i.e., "Grasslands" and "Rocks_SparseVeg") and to precipitation-related ones such as Bio13 (precipitation of the wettest month) and Bio16 (precipitation of wettest quarter) (Figure 8a,b). nonetheless, the niche similarity test performed considering niche divergence as alternative hypothesis indicated that the observed overlap was not lower (n = 1,000, p = 0.93) than expected based on the background environment of the two populations' clusters.
Contrarily, niche conservatism could be accepted under a "relaxed" α = 0.1 according to a second niche similarity test with conservatism as alternative hypothesis (n = 1,000, p = 0.08).
The two habitat-related variables obtained weighted average importance scores (Weighted_Avg_VarImp) higher than 5% for most of the combinations of calibration group * occurrence data size * set of predictors, but they were not comprised among the most influential predictors for any of them (Table 2). Precipitation-related variables, particularly Bio16 and Bio15 (precipitation seasonality), were the most influential predictors for the HSMs fitted upon the French occurrences using the "Ecol" set of predictors (Table 2, Figure S7a,c).
Nonetheless, inflated response curves drawn for the "France-Ecol-Thin" combination resulted in unclear predictor-suitability relationships ( Figure S7c), probably due to the high weighted standard deviation of importance score (Weighted_StdDev_VarImp) attained by Bio16 and Bio15 (Table 2). Differently, suitability predicted from the HSMs fitted for the "France" group through the "VIF" set of predictors was positively related to annual solar radiation (Srad_ Ann_Mean) and Bio13, while negatively related to variability in wind speed during V. u. ursinii activity period (Wspeed_MayOct_Sd) (  Figure S7b,d). Considering the HSMs fitted for the "Italy" group using the "Ecol" set, variability in solar radiation during the May-October period (Srad_MayOct_Sd), precipitation of the warmest quarter (Bio18) and mean temperature of warmest quarter (Bio10) attained the highest Weighted_Avg_VarImp, with different relative rankings depending on the considered occurrence data size (Table 2). Inflated curves drawn for these variables considering HSMs fitted upon the full datasets suggested negative association with Srad_MayOct_Sd and Bio10 ( Figure S8a), while the corresponding curves from the HSMs fitted upon the thinned datasets could barely retrieve a relationship between these variables and suitability ( Figure S8c). The HSMs fitted upon the Italy group using the "VIF" set isolated, for both the "Thin" and the "Full" datasets, mean temperature of the driest quarter (Bio9), annual variability of solar radiation (Srad_Ann_Sd), and Wspeed_MayOct_Sd as the most influential variables, predicting negative association between suitability and Srad_Ann_Sd, positive association with Wspeed_MayOct_Sd and unclear patterns for Bio9 (Table 2, Figure S8b,d). Finally, considering the HSMs fitted for the "Joint" group upon both the "Full" and the "Thin" datasets, Bio10 resulted as the most influential variable within the "Ecol" set, with an optimum range for suitability between 10°C and 15°C (Table 2, Figure S9a,c), while Bio9 attained the highest Weighted_Avg_VarImp within the "VIF" set, showing decreasing suitability at increasing temperatures (Table 2, Figure S9b,d).

| D ISCUSS I ON
Transferability of HSMs represents a compelling challenge when predictions to new spatiotemporal scenarios are required (Franklin, 2013;Yates et al., 2018). Our modeling framework per- Italian range (Luiselli, 2004), while the activity period may extend until October-November in the French range (Lisse et al., 2012). The known association of V. u. ursinii with typical mountainous habitats such as grasslands and rocky areas (Luiselli, 2004;Lyet et al., 2013) was retrieved for both the French and the Italian populations in the respective niche occupancy modeled through the PCA-env approach (Figure 8a,b). On the other hand, percent cover of grasslands or rocky and sparsely vegetated areas were not selected as critical predictors for the environment-occurrence relationships estimated by the HSMs. Contrarily, different sets of climate-related predictors attained the highest weighted importance scores depending on the considered combination of calibration group * occurrence data size * set of predictors ( Table 2). The HSMs fitted upon occurrences TA B L E 2 Weighted average percent importance score (Avg.), its standard deviation (SD), and the resulting coefficient of variation (Coeff. Var.) of the variables attaining Avg. ≥ 5%, for each combination of calibration group ("France," "Italy," "Joint") * occurrence data size ("Full," "Thin") * set of predictors ("Ecol," "VIF")  (Araújo et al., 2006;Guisan & Hofer, 2003), while precipitation, solar radiation, and wind speed emerge as important predictors when modeling herptiles' suitability at more local scale due to their influence on thermoregulation and other critical factors such as prey availability (Mizsei et al., 2016;Ortega et al., 2017). Moreover, the fact that the obtained HSMs primarily correlated suitability for V. u. ursinii to climatic patterns rather than to the habitat-related variables echoes previous findings suggesting that, once the important climatic drivers are included, HSMs predictions at the meso-to macroscale may not benefit from the addition of predictors representing land cover or vegetation types (Bucklin et al., 2015;. Nonetheless, the upscaling of information about the percent cover of grasslands or rocky and sparsely vegetated areas from the original 100 m resolution to that of the climate-related predictors may have hidden part of the fine-scale association between V. u. ursinii and such habitat types found in previous modeling studies conducted upon the French populations (Lyet et al., 2013). The most proximal drivers of V. u. ursinii presence and activity patterns could be better understood gaining additional information from mechanistic models relying on experimental studies targeting its ecophysiology, but this is beyond the scope of the present work.
The realized niche of the Italian V. u. ursinii populations emerged as narrower than that of the French ones, which also showed background environmental conditions extending far beyond those characterizing the Italian range ( Figure 8c)

TA B L E 2 (Continued)
and was less sensitive to multivariate novelty than GAM when HSMs were validated on spatially independent test samples from the calibration region ("SpBlock CV"). Differently, GAM outperformed GBM when HSMs were transferred on the geographically disjunct populations' cluster ("External validation"), echoing results from Duque-Lazo et al. (2016). All in all, it should be noted that discrepant rankings of the single algorithms in transferability tasks among different studies may also derive from the way algorithms were parameterized and the resulting HSMs evaluated (e.g., spatial blocking cross-validation versus repeated split-sample), as highlighted in Yates et al. (2018).
Spatial thinning of occurrence data ameliorated internal transferability for all the three calibration groups according to Continuous Boyce Index scores in "SpBlock CV", corroborating the usefulness of rarefying geographically close occurrences when sampling biases in data collection could not be a priori excluded (Boria et al., 2014). The fact that the "France" group was the one showing the worst performance in internal transferability and at the same time the one benefitting the most from spatial thinning is worth of attention. The "France" group underwent the highest proportional reduction in the number of occurrences from the "Full" dataset to the "Thin" one; contextually, environmental heterogeneity within V. u. ursinii French range was clearly more pronounced than within the Italian one ( Figure 8c): Reducing the influence of possible sampling biases as well as the novel combinations of environmental conditions in test sites, spatial thinning eased interpolation for the HSMs fitted on the French occurrences. This "environmental noise" emerging from the French datasets probably also made HSMs from the "Joint" group performing worse than those from the "Italy" one in internal transferability, as it increased the difficulty for the HSMs to project the environment-occurrence relationships fitted on the training fold to the environmentally heterogeneous test fold derived from checkerboard blocking (Roberts et al., 2017).
On the other hand, spatial thinning did not provide beneficial effects for external transferability, and the HSMs separately fitted upon the French and the Italian occurrences showed consistently lower predictive performance in "External validation" than in "SpBlock CV". This echoes results from previous studies about HSMs' internal versus external transferability (Barbosa et al., 2009;Duque-Lazo et al., 2016;Randin et al., 2006) and suggests that transferability across wide geographical extents is more related to the environmental novelty of test data than to the number and spatial arrangement of calibration data. Indeed, validation on test localities far from the calibration area increases the probability of HSMs being projected on conditions significantly different from those upon which they fitted the environment-occurrence relationships, forcing them to extrapolate (Qiao et al., 2019). Our finding that the degree of multivariate extrapolation, which was expectedly higher in "External validation" than in "SpBlock CV," was significantly related to the decrease in predictive performance considering both AUC and Continuous Boyce Index further strengthens this hypothesis.
Weighted ensemble projections derived from the HSMs fitted on the "Joint" group, considering both the "Full" and "Thin" Pres-PseudoAbs datasets, correctly predicted as suitable the patches currently occupied by V. u. ursinii within both its French and Italian range. Contrarily, ensemble projections from HSMs fitted on partial Pres-PseudoAbs datasets did not properly predict suitable areas within the geographically disjunct range not considered for model calibration. Moreover, ensemble projections from the Joint datasets showed lower variability among the component HSMs than ensemble projections separately obtained for the two populations' clusters, especially outside the occurrence areas of these latter. These trends confirm that spatial predictions of suitability benefit from calibration data covering as much as possible the range of environmental conditions the target entity experiences (Qiao et al., 2019), particularly when intraspecific niche differences at the local scale are presumed to exist (Barbosa et al., 2009;Carretero & Sillero, 2016).
In conclusion, we could summarize our findings about spatial transferability of HSMs at intraspecific level as follows: (a) In case the aim is to get predictions within the surroundings of a populations' cluster, for instance to individuate promising sites looking for unknown populations, spatial thinning of occurrence data is recom-

ACK N OWLED G M ENTS
We thank all the organisms contributing to SILENE (Système Préalpes d'Azur. We also sincerely thank all the authors of the Vipera ursinii ursinii observations provided by SILENE, that we do not list here just for conciseness.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Since Vipera ursinii represents a highly threatened species, classified as "Vulnerable" by the IUCN, we prefer not to share exact coordinates of occurrence localities upon publicly accessible repositories (Lunghi et al., 2019). However, exact coordinates of Italian and French V. ursinii ursinii occurrences gathered from Console et al. (2020), as well as the R code written to conduct all the analyses, are freely available from authors upon motivated request. The French occurrence data obtained from SILENE cannot be stored on public repositories as well, because the agreement signed with SILENE does not allow the authors to share the data, but they are freely available at the highest resolution upon request (and freely downloadable at 5 km 2 resolution) at www.silene.eu.