National assessments of species vulnerability to climate change strongly depend on selected data sources

Correlative species distribution models (SDMs) are among the most frequently used tools for conservation planning under climate and land use changes. Conservation‐focused climate change studies are often conducted on a national or local level and can use different sources of occurrence records (e.g., local databases, national biodiversity monitoring) collated at different geographic extents. However, little is known about how these restrictions in geographic space (i.e., Wallacean shortfall) can lead to restrictions in environmental space (i.e., Hutchinsonian shortfall) and accordingly affect conclusions about a species’ vulnerability to climate change.


| INTRODUC TI ON
Over the last decade, species distribution models (SDMs; also called ecological niche, habitat suitability and (bio)climatic envelope modelling, as well as various other names, hereafter all included under the acronym "SDM"; see Franklin, 2010;Guisan et al., 2017;Guisan et al., 2013) have become an important tool to estimate the impact of climate and land use changes on species' potential distributions (Franklin, 2010;Guisan et al., 2017;Peterson et al., 2011).
The first SDM package (BIOCLIM) became available in 1984 (Booth et al., 2014). SDM statistically relate environmental variables to presence-absence data (or presence-background data) to quantify the species-environment relationship and accordingly approximate the realized environmental niche (Austin et al., 1990;Guisan et al., 2017). From early approaches based on simple envelopes and bioclimatic interpolations, more advanced ways to quantify the realized environmental niches and model species distributions have emerged (Booth et al., 2014;Elith et al., 2006). The quantified realized niche (i.e., the statistical model) can then be projected in geographic space and/or time to map the potential distribution of a species and predict change affecting it. The ability of SDMs to produce spatial and temporal models based on readily available climate/ environmental and species occurrence data makes them important tools for conservation planning (Guisan et al., 2013). Importantly, by being based on field (in situ) observations and correlative statistics, SDMs are empirical in nature and-unless fitted with more mechanistic data and approaches (Catullo et al., 2015;Fordham et al., 2018) as, for example, in ex situ conditions (Booth & McMurtrie, 1988;Vetaas, 2002)-do not allow estimating the fundamental niche (Araujo & Guisan, 2006;Austin et al., 1990). Accordingly, and whatever the scale of the observation data used, most SDMs are assumed to be projecting various estimates of the realized niche in time and space , as considered in the present study.
One crucial issue when fitting realized niches and modelling species distributions is the selection of the in situ field species occurrence data (thus not considering ex situ observations; see Booth, 2017;Booth & McMurtrie, 1988;Vetaas, 2002) used to construct the models (Anderson et al., 2020;Rondinini et al., 2006).
Preparing occurrence data for modelling can include a large number of essential processes encompassing partitioning (Fielding & Bell, 1997;Muscarella et al., 2014), thinning of occurrences (Aiello-Lammens et al., 2015;Varela et al., 2014), integrating historic records (Lima-Ribeiro et al., 2017;Nogués-Bravo et al., 2016) and selecting absences and background data (Senay et al., 2013;Wisz & Guisan, 2009). Two approaches are commonly found in the literature: (1) using databases that typically combine occurrences from disparate sources, such as natural history collections, citizen science programmes or formal monitoring programmes (Anderson et al., 2020;e.g., Global Biodiversity Information Facility, GBIF); or (2) using local datasets often collected in surveys by the study authors themselves (Gu & Swihart, 2004;Randin et al., 2010;. While self-collected data typically follow some sampling design and therefore have a lower and constant bias (e.g., misidentification, omission errors, location errors), heterogeneous biodiversity databases (e.g., GBIF) usually have substantially more occurrence records from a broader geographic area but suffer problems of unknown/ spatially varying data bias and errors (Anderson et al., 2016;Haque et al., 2017Haque et al., , 2020. The choice of a certain dataset is rarely justified or discussed in the literature, and very few studies have compared the conclusions drawn from models based on different sources of data (but see Hannemann et al., 2015). It therefore remains the question of when to use which data source and spatial extent for SDMs aimed at national conservation planning and assessments of species vulnerability to environmental change. from databases, such as GBIF, may span the species' known distribution, but biases in these data (i.e., different sources, collection methods, collection years) can add significant noise to the model (Anderson et al., 2020;Graham et al., 2004). Additionally, environmental data (e.g., climate, land use, soil) for global or continental scales are often sparse or at a lower spatial resolution than desired. Conversely, reliance on data from a localized survey may result in truncation of the species' environmental niche (Chevalier et al., 2021; or niche unfilling  as only a part of its distribution has been sampled (Wallacean shortfall ;Whittaker et al., 2005). As such, data from a localized survey may not reflect the entire set of physical (or abiotic) characteristics that a given species can tolerate (Hutchinsonian shortfall; Booth, 2017;Hortal et al., 2015). This may result in large errors when inter-and extrapolating models (i.e., projections; Booth, 2017;Booth & McMurtrie, 1988;Owens et al., 2013). Nevertheless, conservation studies are focused on a national or even local scale for a number of reasons: (a) while species do not care about political borders, funding for conservation projects and priority species often does; and (b) the desired spatial resolution and required input data to create conservation maps (e.g., land use, climate, soil) are often not available in comparable quality/ resolution across countries/regions (Cayuela et al., 2009;CONABIO, 2007).
It is therefore important to differentiate between the geographic and environmental space that a species can occupy (Colwell & Rangel, 2009;Hutchinson, 1957;MacArthur, 1972). While a local dataset might only span part of the species' geographic distribution (Wallacean shortfall), it may still cover the breadth of the species realized environmental niche (e.g., in a mountain area with wide environmental gradients). Conversely, even records from a large part of a species' geographic range might not span its full realized environmental niche (Hutchinsonian shortfall). There are indeed at least three distinct patterns when considering how sampling in geographic space captures the environmental niche: (a) the subsample in geographic space is also a subsample of the niche in environmental space; (b) the subsample in geographic space covers the total environmental space; and (c) the subsample in geographic space represents a distinct subset (e.g., ecotype or subspecies) of the environmental niche. Depending on these patterns, the outcomes of SDMs and their projections onto future scenarios might differ substantially. However, these patterns remain largely understudied, raising questions as to what portion of niches are typically sampled in species' geographic surveys. The limited availability about the geographic distribution (Wallacean shortfall) and environmental tolerances (Hutchinsonian shortfall) might be especially relevant for rare/under-sampled species, habitat specialists, or narrow-ranged and non-charismatic species (Albert et al., 2010;Guisan et al., 2006), and strongly dependent on the spatial extent and source of the species occurrence data.
Here, we illustrate how both Wallacean and Hutchinsonian shortfalls might affect national conservation planning by developing niche-based SDMs for three Mexican tree species associated with mountain cloud forests, based on calibration datasets with varying spatial extents (i.e., explicit creation of a Wallacean shortfall) and sampling designs. These datasets reflect, for each species, different geographic scales (i.e., spatial extent) based on their:

| Species data and study area
In this study, we selected three tree species, with contrasting natural distributions, that build dominant stands at different elevations in the Trans-Mexican Volcanic Belt: Alnus acuminata Kunth, Quercus xalapensis Bonpl. and Liquidambar styraciflua L. (Table S1). Alnus acuminata grows between 500 and 2,800 m a. s. l. in the mountain ranges in tropical Central and South America, from Mexico to northern Argentina (CATIE, 1995). Liquidambar styraciflua has a lower altitudinal range between 400 and 1,800 m a. s. l. and is native to warm temperate areas of eastern North America and tropical montane regions of Mexico and Central America (Rzedowski, 2006).
Quercus xalapensis is distributed between 400 and 2,700 m a.s.l., is native to Central America and Mexico and is threatened by habitat loss in parts of its native range (Jerome, 2018). These species are characteristic trees of cloud forests, growing at middle elevations in various mountainous areas where the climate is humid and temperate (Rzedowski, 1996(Rzedowski, , 2006. Cloud forests are considered the most threatened terrestrial ecosystem at a national level because of changes in land use, the effects of global climate change, and local and regional environmental changes (CONABIO, 2010(CONABIO, , 2014. These species are vulnerable to climate change, particularly to changes in precipitation, with L. styraciflua being drought-sensitive (Esperón-Rodríguez & Barradas, 2015a, 2015b. Additionally, these species show very different patterns of Hutchinsonian shortfall or niche unfilling (sensu Guisan et al., 2014;Petitpierre et al., 2012) based on occurrence datasets with different geographic extents.
For each species, we collected three datasets with different extents in geographic space: (a) the entire geographic range ("global," i.e., the Americas); (b) the national distribution ("national," i.e., Mexico); and (c) the local distribution based on sampling associated with cloud forest ecosystems of eastern Mexico ("local," i.e., cloud forests of eastern Mexico). The global dataset was created by downloading all occurrence records of the target species from GBIF (GBIF.org 14 January 2020, GBIF Occurrence Download https://doi.org/10.15468/ dl.g2yss3) and then cleaning the data by removing those with no geographic coordinates, coordinate uncertainty greater than 1,000 m, or incorrect or duplicate coordinates; or with the observation dated before 1950. We therefore only kept records with no known coordinate issues and for which the basis of observation in GBIF was reported as "human observation," "observation," "specimen," "living specimen," "literature occurrence" and "material sample." All records outside of the Americas were removed as these represent non-native locations often associated with (botanical) gardens and parks (e.g., Vetaas, 2002) or other urban areas (Booth, 2017). These highly human-modified environments (e.g., watering or shelter from extreme climate) rather represent the fundamental niche and are often not well reflected by global climate data, and would therefore introduce other niche dimensions or bias to the model.  Figure S1).
To limit the effects of spatial autocorrelation and sampling bias, we disaggregated the occurrence records in all datasets by removing occurrences closer than 5 km from each other using the R-package spThin (Aiello-Lammens et al., 2015). The final numbers of occurrences per species and dataset are given in Table 1.

| Climate data
For both baseline and future climate data, we downloaded 19 bioclimatic variables at a spatial resolution of 30 arc-second (≈1 km at the equator) from CHELSA (Climatologies at high resolution for the earth's land surface areas; Karger et al., 2017). Baseline data include monthly mean temperature and precipitation patterns for the period 1979-2013. For future climate data, we downloaded CMIP5 clima- To assess whether our results were mostly related to the effects of the estimation of the climatic niche (i.e., Hutchinsonian shortfall) rather than problems with extrapolation into non-analogue climates (Fitzpatrick & Hargrove, 2009), we also conducted a MESS analysis

| Niche comparison
The climatic niches of the three species were estimated based on the six aforementioned bioclimatic variables. To compare the climatic niches across the different datasets, we calculated Schoener's D (Barbosa, 2015;Schoener, 1968), which expresses the proportion of niche overlap based on the first two PCA axes, using the <ecospat. niche.overlap> function from the ecospat R package (Broenniman et al., 2017;Di Cola et al., 2017) in R (v3.5.2; R Core Team, 2020).
This index ranges from 0 (no overlap) to 1 (complete overlap). To test whether the estimates of the climatic niches significantly differ among datasets (i.e., niche divergence), we used the function <ecospat.niche.equivalency> (Broenniman et al., 2017). This test compares the observed niche overlap with the expectation based on null models (i.e., random permutations; Broennimann et al., 2012; see also Warren et al., 2008).

| Niche-based species distribution models
For our SDMs, we used the same six bioclimatic predictors described above. For each species and calibration dataset, we ran SDMs with the biomod2 R package (Thuiller, 2003;Thuiller et al., 2019) using

| Comparisons of predictions
We avoided the creation of binary maps by thresholding our habitat suitability predictions, as this process always leads to the loss of information and introduces a bias , especially in cases where no reliable absence data are available (Liu et al., 2013).
However, to be able to (statistically) compare the area predicted suitable among projections, we defined an area as "suitable" if the predicted suitability was ≥0.5 (which in our case was almost identical to choosing a threshold based on AUC or maximizing TSS). This very simple definition allowed a statistical comparison of predictions across time (current, 2050, 2070) by analysing the areas that remained "suitable" (stable), became unsuitable (loss) or became newly suitable for the species (gain). Additionally, to the simple threshold of 0.5, we tested two additional thresholding approaches based on the predicted suitability either including 95% of the training occurrences ("wide niche") or only including the 20% of training occurrences with the highest predicted suitability ("core niche").

| Species distributions and climatic niches
The effects of the different calibration datasets (global, national, local) on the estimation of the climatic niche differed strongly among species (Figure 1). For A. acuminata, the subsamples in geographic space (Wallacean shortfall) also reflected subsamples in environmental space (i.e., resulting in a Hutchinsonian shortfall).
Alnus acuminata is found not only in the study region ( In contrast to the other two species, the geographic distribution of L. styraciflua was split into a North American and a Central American range, resulting in a likewise split in its climatic niche ( Figure 1). Therefore, while the local and national occurrences underestimated the actual (total) niche, we found that the omitted proportion (i.e., the North American occurrences) covered conditions not found in Mexico under current and future climate (Figure 1).

| SDM predictions
Our SDMs performed well when evaluated based on crossvalidation, with most values of AUC >0.9 ( Figure S2) and maxTSS >0.7 ( Figure S3). There was no consistent difference in model performance among the three spatial datasets (global, national and local). Conversely, the relative importance of the six climatic predictors varied strongly across the three spatial datasets and species with no clear trends ( Figure S4).
The global SDMs usually predicted the largest amount of suitable climate within Mexico (Figure 2). However, the difference in suitable habitat between global, national and local models varied substantially across species (Figure 2), although all models for all species predicted a higher loss than gain in suitable habitat under the future climate scenarios (Figure 2). It is important to keep in mind that we restricted our projections to Mexico and differences in the area projected as suitable might be much more varying across models in areas outside of Mexico.
In the case of A. acuminata (Figure 3 for RCP 8.5; see Appendix S2 for specific GCM and other RCP and years), there was a significant difference in suitable habitat between the global, national and local model (suitability threshold = 0.5; p < .05, pairwise Wilcoxon;  (Figure 2; Appendix S1). Using a suitability threshold for the "wide niche" (i.e., encompassing 95% of the training occurrences) showed identical patterns across datasets (global, national and local) and future climate change but showed overall more suitable habitat probably due to the much lower suitability threshold (Appendix S1).
Choosing a suitability threshold focusing on the "core niche" (i.e.,

F I G U R E 1
Distribution of recorded occurrences of Alnus acuminata, Liquidambar styraciflua and Quercus xalapensis in geographic space (top row) and environmental space (bottom row). Green points/areas are based on the global occurrence dataset (Americas), blue on the national dataset (Mexico) and red on the local dataset (eastern Mexico). The environmental space is represented by the two main PCA axes based on six climatic variables used for SDM. The total available climate of Mexico (national) and the Americas (global) is represented by a green and pink contour plot (100th and 90th percentiles), respectively. The numbers in the top-right corner represent Schoener's overlap (D) for G-N = global and national, G-L = global and local, and N-L = national and local. The asterisk indicates significant differences (p < .05 in D). PCA axis 1 and PCA axis 2 explain 47% and 25% of the variation, respectively. BIO4 = temperature seasonality; BIO5 = max temperature of warmest month; BIO12 = annual precipitation; BIO15 = precipitation seasonality; BIO17 = precipitation of driest quarter; and BIO18 = precipitation of warmest quarter (for details, see Figure S6) threshold based on the 20% of training occurrences with the highest predicted suitability) resulted in lower differences among different models (global, national and local) compared with using the other thresholds, and showed that almost all highly suitable habitat will be lost by 2070 (Appendix S1).
In the case of L. styraciflua (Figure 4; Appendix S2), the prediction of the current suitable habitat was unaffected by the spatial extent used for model calibration (i.e., Wallacean shortfall; suitability threshold = 0.5; p > .05; Figure 2). In general, all models predicted a slight loss of suitable habitat under future conditions (significance varies among climate change scenarios; Figure 2; Appendix S1).
Similar to A. acuminata, the local model predicted a higher turnover of suitable habitat with less stable and more gain/loss (Figure 2; Appendix S1). Selecting different suitability threshold focusing either on the "core niche" or on the "wide niche" revealed very similar patterns except for the local model, which created a model artefact predicting almost all of Mexico as suitable habitat as the "wide niche" threshold was close to 0 (Appendix S1).
Similar to L. styraciflua, the estimation of the currently suitable habitat for Q. xalapensis ( Figure 5; Appendix S2) was unaffected by the spatial extent of the calibration data ( Figure 2). Quercus xalapensis showed the largest amount of stable suitable habitat under climate warming and the predicted changes were usually very consistent among the three spatial extents and independent of the suitability threshold (suitability threshold = 0.5; Figure 2; see other suitability thresholds in Appendix S1).
Model transferability among different datasets (global, national and local) was strongly linked to the Hutchinsonian shortfalls (i.e., estimation of realized climatic niche). In the case of A. acuminata, models showed a weaker performance when evaluated with a dataset from larger geographic extents (i.e., local to national/global and nation to global; Figure S5). This is directly linked to the Hutchinsonian shortfalls at the local and national scales leading to high omission errors at the global scale. The global model showed high transferability to national and local scales but tended to overpredict the occurrences.
For L. styraciflua, both the local and national models had poor predictive power at the global scale ( Figure S5) as these models were unable to predict the occurrences found in North America occupying a completely disjunct niche/climate space (Figure 1). Most importantly, for Q.
xalapensis showing no Hutchinsonian shortfall across different datasets model transferability in all directions was very high ( Figure S5).
The MESS analysis revealed minimal projections of non-analogue climates in Mexico for both the global and national dataset (Figure 6; see Appendix S3 for analysis for individual GCM and RCPs). In contrast, the local dataset showed a large amount of non-analogue climate for both the current and future conditions (Figure 6; Appendix S3).

F I G U R E 2
Current (first column) and future predicted suitable area (second column) under climate change (average of three GCMs for RCP 8.5) for the year 2070 based on a suitability threshold of 0.5 for three tree species. Columns 3 to 5 show the relative change in the suitable area divided into stable, gain and loss, respectively. Black letters above the boxplots indicate significant differences (p < .05, paired Wilcoxon rank sum test) among the three spatial extent datasets (global, national and local). Red letters indicate significant differences (p < .05) between the current and future suitable area for each dataset

| D ISCUSS I ON
Our study demonstrates that occurrence datasets from alternate sources covering different geographic extents (i.e., varying Wallacean shortfalls) can result in different estimations of species' environmental niches (i.e., Hutchinsonian shortfalls). Consequently, models calibrated with different datasets-in our case global, national and local-can lead to divergent spatial predictions under current and future climatic conditions. Inconsistencies among models calibrated on varying spatial extents, however, were species-specific and driven by a Hutchinsonian rather than Wallacean shortfall. While our three study species show similar levels of Wallacean shortfalls across the global, national and local datasets, the magnitude of the Hutchinsonian shortfall differed significantly across species. Unsurprisingly, the larger the Hutchinsonian shortfall (i.e., niche truncation: the underestimation of the species' realized environmental niche), the more divergent the SDM predictions (as these are based on the Hutchinsonian niche) and the lower the model transferability. Furthermore, models with different degrees of Wallacean shortfalls showed similar predictive performance as evaluated by the most commonly used metrics (AUC and TSS), making it challenging to determine which predictions are the most accurate and reliable for a national assessment of species' vulnerability. This means that models based on truncated niches can still have as good, or even better, fit than models based on entire niches.

| Geographic and environmental space
Our study design was centred on the nesting of datasets in geographic space (i.e., global, national and local), thus creating different levels of Wallacean shortfalls. We then demonstrated how these shortfalls can translate into very different configurations in environmental ( Figure 1) and geographic space under baseline and future climates. Alnus acuminata showed a strong nesting both in geographic and in environmental F I G U R E 3 Influence of climate change (average of three GCMs for RCP 8.5) on habitat suitability for Alnus acuminata depending on the spatial extent of the calibration datasets (global, national and local). Habitat suitability represents average values across all background datasets and modelling techniques (i.e., GLM, RF, BRT and MAXENT) space, indicating that, for this species, the Wallacean shortfall directly translated into a Hutchinsonian shortfall: the environmental niche quantified by the local dataset was almost entirely contained within that quantified by the national dataset, the latter being within that of the global dataset. As a result, predictions based on the local or national dataset underestimated the global realized climatic niche and, consequently, predicted substantially smaller areas of suitable habitat in Mexico. This is an example of niche truncation, and our observed effects are comparable to previous studies on this phenomenon in European trees (Hannemann et al., 2015;Thuiller et al., 2004). In addition, the local model for A. acuminata predicted some gains in suitable areas under future conditions, while the national and global models mostly predicted a loss of suitable areas. Hence, national conservation decisions/actions regarding the vulnerability of this species to climate change might be highly affected by the choice of initial data for model calibration. However, it is important to highlight that the gain in suitable areas under the local model only represents a subset of the suitable climate of the national and global models, which overall predicted broader suitable areas under current and future conditions. The large difference in habitat predicted as suitable by different datasets might also hint at the specialization of ecotypes across the distribution of A. acuminata. A large proportion of the area predicted as suitable by the global model may require the migration/introduction of ecotypes from North and South America, as these habitats seem currently unoccupied by the "Mexican" ecotype, which is adapted to warmer and drier conditions (CATIE, 1995). Our study indicates that a Wallacean shortfall alone might not directly affect national assessments but can potentially lead to a Hutchinsonian shortfall. Therefore, if a local or national dataset covers the species' entire realized climatic niche, it may be preferable for internal consistency to use these data (e.g., all data collected in a similar manner or by the same source, as such data may be of higher quality). Additionally, at the national or even local level, environmental data (e.g., climate, land use, soil) are often available at both finer spatial resolution and more consistent quality than at global or continental scales. However, if making predictions into different areas (here using our local model to predict to the whole of Mexico) or into different environmental conditions (here future climate change) issues of model transferability arise Randin et al., 2006;Yates et al., 2018) and models calibrated with data representing only a part of the species' range may have limitations (Barbet-Massin et al., 2010;Chevalier et al., 2021;Thuiller et al., 2004).

F I G U R E 5
Influence of climate change (average of three GCMs for RCP 8.5) on habitat suitability for Quercus xalapensis depending on the spatial extent of the calibration datasets (global, national and local). Habitat suitability represents average values across all background datasets and modelling techniques (i.e., GLM, RF, BRT and MAXENT) Different geographic regions/time periods often contain new combinations of environmental conditions not observed in the calibration data (i.e., non-analogue climate; Williams & Jackson, 2007), and predictions to these data may be unreliable due to extrapolation errors (i.e., problems with response curves; Fitzpatrick & Hargrove, 2009;Owens et al., 2013) and related transferability problems . The amount of error and uncertainty introduced by extrapolation is dependent on the amount of new conditions and the modelling algorithms used (Qiao et al., 2019). This is illustrated in our study by using the local model to predict (i.e., extrapolate) to the national level (Mexico). The MESS analysis highlights a large amount of non-analogue climate making the model inappropriate for these parts of Mexico. Issues with non-analogue climate are sometimes unavoidable at global scales (i.e., globally new climates due to climate change). However, at local or national scales, non-analogue conditions rarely reflect new conditions, as corresponding conditions could be found elsewhere on Earth. In such cases, extrapolation uncertainty could be avoided by expanding the dataset to include conditions reflecting the non-analogue climate of the extrapolation area (Broennimann & Guisan, 2008). We stress this is true even if such conditions are outside of the species' climatic niche, as these records still contain valuable information about the available climate space (i.e., background data) and potential absences. In our study, the problems with the non-analogue climate could be avoided by selecting background data across the whole territory of Mexico (i.e., the planned prediction area/environment), rather than restricting the sampling to the local area.
Given potential problems with niche truncation and nonanalogue conditions in spatially restricted datasets, then why not always use the global distribution of a species, even when modelling at national or local scales? First, there may be problems of data heterogeneity and quality, as the entries in global databases often stem from disparate sources and temporal periods (e.g., data from GBIF).
Second, when working with widespread species, the global distribution might include multiple ecotypes, provenances or subspecies not present in the target area (i.e., local or national area desired for projection; Randin et al., 2006;Wright et al., 2006). Including different ecotypes (or all populations at a larger spatial scale ;Trivedi et al., 2008) into the estimation of the climatic niche and SDMs  might lead to an overestimation of the potential distribution of the species, as ecotypes/subspecies may have diverging niches and be locally adapted (e.g., due to dispersal limitations; Hu et al., 2017;Trivedi et al., 2008). This might be the case of L. styraciflua, which has disjunct populations in geographic (North F I G U R E 6 MESS analysis for non-analogue climate (red colour) under climate change (average of three GCMs and RCP 8.5) based on the spatial extent (global, national and local) used for SDM calibration data (occurrences + background data) [Colour figure can be viewed at wileyonlinelibrary.com] America vs. Central America) and environmental space. The use of global data for L. styraciflua might therefore overestimate this species' ability to tolerate climate change by not considering dispersal restrictions and spatial segregation of the two distinct ecotypes.
Also, in the case of A. acuminata, the potential species distribution in Mexico under current and future conditions might be overestimated based on the global dataset, as these data are based on the assumption of unlimited dispersal of all ecotypes throughout the species' entire range. The limited pool of available ecotypes within a region might therefore increase local extinction risk, especially in species with very wide geographic distributions, and poses the question of whether assisted migration of certain provenances might be a solution to mitigate some climate change effects (Thiel et al., 2012).
Furthermore, biotic interactions among species might be altered under future climate change, and consequently, the estimations of the current realized environmental niche (i.e., under current biotic interactions) are no longer valid (i.e., changes in the species-environment equilibrium). In such cases, it might be useful to approximate the fundamental rather than the realized environmental niche by including occurrences from non-native ranges (i.e., different biotic interactions) or even ecophysiological data from controlled experiments. Unfortunately, such data are often hard to integrate into SDMs as ecophysiological experiments are difficult to translate into global climate data (e.g., soil matric potential vs. 30-year average of evapotranspiration at km resolution) and non-native occurrences often stem from strongly human-modified environments (e.g., watering), such as gardens and parks.
Global data and models, however, can still be used in combination with local, regional or national models in a hierarchical way to account for the complete species climatic niche . This approach may pose technical challenges for studies modelling local species distributions at very high spatial resolutions. At such resolutions, the data needed for model calibration (i.e., climate or land cover layers) may not be available and the use of (sometimes complex) hierarchical modelling approaches to account for varying spatial resolutions in the calibration data (e.g.,  can pose computation limitations (Chevalier et al., 2021).

| CON CLUS ION
Conservation planning and associated assessments of species vulnerability to climate change can be conducted at a national level and based on corresponding data sources (e.g., national biodiversity monitoring programmes, meteorological data, habitat mapping).
Depending on the Wallacean and Hutchinsonian shortfalls of these national datasets compared with the complete distribution of a target species, the resulting suitability maps and estimations of vulnerability to climate change might be misleading. Artefacts of niche truncation (as a result of Hutchinsonian shortfall) and problems of non-analogue climates might lead to an underestimation of species climatic tolerances and adaptability and consequently overestimation of spatial turnover and extinction risk.
Our study cannot give a definite answer to which dataset provides the most accurate future predictions for Mexico or any chosen geographic extent (only time will tell), partly because each country or region has its own geographic and environmental specificities; nevertheless, here we provide insights to resolve this conundrum.
Given that species and environmental data from varied sources are becoming increasingly accessible, we suggest that local or national assessments are assessed for potential problems posed by the Hutchinsonian shortfall and non-analogue climate, and how it relates to the Wallacean shortfall, via niche equivalency tests and MESS, among others. This assessment can minimize the effects of niche truncation and extrapolation uncertainties due to "locally novel" non-analogue climate conditions, thereby minimizing or at least assessing the uncertainty in estimations of a species' vulnerability to environmental/climate change. Finally, in cases where local or national datasets cover the species' entire realized environmental niche (i.e., no Hutchinsonian shortfall), it is recommended to use these datasets over a global one to increase data quality. We also suggest exploring model calibration with local or national datasets to evaluate species with broad distributions and different ecotypes, as a previous step for model calibration with a global dataset.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13275.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data on species occurrences for the global and national dataset were downloaded from GBIF (GBIF.org 14 January 2020, GBIF Occurrence Download https://doi.org/10.15468/ dl.g2yss3), and the cleaned and thinned data used for modelling can be found on Dryad together with the local dataset (https://doi.org/10.5061/dryad. qnk98 sfg5). All climate data used in the study are freely available on CHELSA (Climatologies at high resolution for the earth's land surface areas; https://chels a-clima te.org/).