Species distributions models may predict accurately future distributions but poorly how distributions change: A critical perspective on model validation

Species distribution models (SDMs) are widely used to make predictions on how species distributions may change as a response to climatic change. To assess the reliability of those predictions, they need to be critically validated with respect to what they are used for. While ecologists are typically interested in how and where distributions will change, we argue that SDMs have seldom been evaluated in terms of their capacity to predict such change. Instead, typical retrospective validation methods estimate model's ability to predict to only one static time in future. Here, we apply two validation methods, one that predicts and evaluates a static pattern, while the other measures change and compare their estimates of predictive performance.


| INTRODUC TI ON
Understanding how species will respond to climate change is one of the current key challenges in ecology and nature conservation (Malhi et al., 2020;Sofaer et al., 2018). To gain such understanding, various types of species distribution models (hereafter SDMs) are widely used to make predictions on where the species occurs now, and projections on how species distributions are likely to change as a response to predicted climatic change (Huntley et al., 2007;Leathwick et al., 1996;Massimino et al., 2017). Thus, SDMs are important tools for ecologists and can have large implications for management and conservation (Kujala et al., 2013). SDMs can for example help to identify conservation needs, to define alternative conservation actions and to evaluate the effects of such actions (Guisan et al., 2013).
The utility of predictions clearly depends on how much they can be trusted, and to estimate reliability the predictions need to be critically validated with respect to what they are used for. In this paper, we seek to simulate a situation where an SDM is intended to be used in projecting community changes to a future time period approximately 20 years from the current moment. As Vaughan and Ormerod (2005) claim, the ideal way to measure a model's generalizability is to test the model by incorporating situations that resemble its future applications as closely as possible. We take this idea as read that if a model is intended to predict, for example 20 years ahead in future, then its validation should reflect this pattern by using data from the past to make predictions into present (a 'future' time period in relation to model fitting data), which lies approximately a similar length of time ahead. This means ignoring information from the latest part of the data for model fitting, but simulates the real-life data gap that an ecologist making projections into a more distant future is always faced with.
Here, we argue that while ecologists are typically interested in how species distributions will change in a changing climate, SDMs have seldom been critically evaluated in terms of their capacity to predict such change. This has led to a situation where our ability to make reliable range change forecasts may often be overestimated.
Even though the benefits of using temporally independent data for testing SDM's predictive performance have been documented already decades ago (Araújo et al., 2005), still a typical SDM used for forecasting is not validated by means of such data (Uribe-Rivera et al., 2022). Typically, in those cases when temporally independent data exist and there has been an attempt to validate the model by means of retrospective forecasting, this has been done by splitting the dataset into two temporal blocks: the earliest records of the dataset form a training data block and the latest a test data block.
There might or might not be a temporal gap between the training and testing datasets. Depending on the length of the gap, the test data can be seen as temporally independent data with respect to the training data. The model is then calibrated with training data and predictions are made to one static point in time (either to data from 1 year only or to a mean of multiple years). Lastly, these predictions are compared with observations ('static validation method' in Figure 1), and the predictive performance of the model is considered high if the predictions match with the observations. Several authors have used temporally independent static validation, for example on butterflies (Eskildsen et al., 2013), plants (Dobrowski et al., 2011) and birds (Araújo et al., 2005;Regos et al., 2019).
However, we argue that if the aim was to evaluate how well a model can predict changes in the patterns of occupancy or abundance of the species, then using a static validation method that only considers the spatial distribution of occupancy or abundance (predicts and validates a single static point in time) is not adequate. While the static method is able to evaluate how well the model predicts a future distribution on a general level, it does not evaluate whether or not the model is able to predict if and how the species distribution changes. This is because the species ranges may remain mostly stable and thus there can be a high correlation between locations where the species has occurred in the past and where it will occur in the near future (Rapacciuolo et al., 2014).
Results: Even though static validation method evaluated predictive performance as good, change method indicated very poor performance. Predictive performance was not strongly related to any trait.
Main Conclusions: Static validation method might overestimate predictive performance by not revealing the model's inability to predict change events. If species' distributions remain mostly stable, then even an unfit model can predict the near future well due to temporal autocorrelation. We urge caution when working with forecasts of changes in spatial patterns of species occupancy or abundance, even for SDMs that are based on time series datasets unless they are critically validated for forecasting such change.

K E Y W O R D S
birds, climate change, Fennoscandia, forecasting, land use, model validation, prediction, species distribution modelling, species traits, temporal transferability So even if the model fails to predict change events, which usually happen especially at the range margins, it may still predict a major part of the unchanged distribution correctly and thus achieve an overoptimistic evaluation of predictive performance when only a static validation method is applied (Rapacciuolo et al., 2012;Sofaer et al., 2018).
As a solution on how to effectively test the performance of SDMs intended for predicting biodiversity changes in future, we propose that the model validation method should be explicitly based on measuring and validating a change over a given time period. By 'change', we mean the difference in a population measure between two points in time, that is increase, decrease or stability in occurrence probability or abundance. Ideally, the time period over which a change is calculated should be independent in relation to the model training data ('change validation method' in Figure 1).
We base our change validation method on the work of Rapacciuolo et al. (2014) where the authors developed a temporal validation plot to visualize and measure the agreement between predicted and observed changes and applied it to two bird species from the Great Britain and a virtual species. The temporal validation plots solved the issue of generally static distributions by focussing the assessment of model performance only on locations where range change events (a grid cell either gaining or losing occupancy) over time were either observed or predicted. By focussing on change events only, the authors were able to reveal aspects of the relationship between species' range change and model variables that might not have been identified through range-wide measures (including also grid cells with stable presences or absences).
While the need to validate the predicted change rather than validating the predicted future may sound self-evident, to our knowledge our work is one of only a few papers to apply such an approach in the context of climate change predictions (Johnston et al., 2013;Rapacciuolo et al., 2014). A greater proportion of studies have investigated models' ability to predict to new geographical areas by applying spatial validation methods (Bahn & McGill, 2013;Charney et al., 2021;Journé et al., 2020;Rousseau & Betts, 2022). While there is a certain analogy between spatial and temporal block crossvalidation through the space-for-time substitution in distribution modelling (Blois et al., 2013), we, however, think that spatial validation methods are not best suited for testing a model, which is not intended to predict to new areas, but to a new time, and therefore, results obtained from spatial validation studies do not directly apply F I G U R E 1 Schematic picture of the two validation methods compared in the study. The evaluated model uses data from 1975 to 1999 for model training (indicated by the grey bar). Static validation method uses the model to predict a static pattern, species occurrence or abundance in only one, independent time period t2 (mean of 2013-2016, indicated by a circle P2). Predictive performance is calculated as the correlation between the prediction and the corresponding observed value (diamond O2) over study locations. Change method predicts both time periods t1 (1996-1999, circle P1) and t2 (circle P2) and then counts the predicted change (ΔP), that is difference in species occurrence or abundance between these two separate time periods, and compares it to the corresponding observed change (ΔO). Δ Δ Δ in the context of temporal validation and making predictions to future time.
In multispecies studies, there is usually a high level of variability in predictive performance among the species (Araújo et al., 2005;Dobrowski et al., 2011;Sofaer et al., 2018;Venne & Currie, 2021), meaning that some species are easier to model than others. One reason for this variation can be ecological differences between species (Tessarolo et al., 2021). For instance, species traits such as range size, migratory behaviour, rarity and body size have been linked with model performance in birds (McPherson & Jetz, 2007). Studying the connection between species' traits and predictive performance can give us an understanding on how those traits affect modelling and how to interpret results.
In this study, we compared our 'change validation method' with the more routinely used 'static validation method' using largescale monitoring data on 120 bird species from Finland, Sweden and Norway. We studied whether predictive performance differs between these two validation methods and whether the variation in predictive performance is linked with certain species' traits.
Concerning the influence of species traits, we expected that common, large-sized, nontropical migrant species which prefer naturally patchy habitats show higher predictive performance than rare, smallsized, long-distance migratory species, which prefer broad and structurally more varied habitats such as forest, or are more influenced by factors outside their breeding range (Johnston et al., 2014;Laaksonen & Lehikoinen, 2013;McPherson & Jetz, 2007).

| Bird data
We gathered data from national bird monitoring surveys performed in North Europe: Finland, Sweden and Norway (Lehikoinen et al., 2014;Lindström et al., 2015)   Traits of each bird species (prevalence, body mass, migratory behaviour and habitat preference) are listed in Table S1.2: Appendix S1. Source data for species traits and more precise hypotheses concerning species traits are presented in Table S1.1: Appendix S1

| Environmental variables
To describe the habitat where birds were observed, we drew a   (Howard et al., 2015), which supports our goal to study especially the impacts of changing climate.

| Species distribution models
In the analysis, we used the Hierarchical Modelling of Species Communities approach (HMSC; Ovaskainen et al., 2017), which belongs to the class of hierarchical Bayesian joint species distribution models (JSDMs; Warton et al., 2015). We considered each visit to each survey site (line transect or point count survey) as one sampling unit. We included in the HMSC analyses data on species abundances (matrix Y of sampling units times species) and environmental covariates (matrix X of sampling units times the included covariates).
As the response variable, that is components of the matrix Y, we considered the abundance of species, that is the vector of species counts for each sampling unit. To account for the zero-inflated nature of the data, we applied a hurdle approach, in which we first modelled species presence/absence data with a probit regression and then modelled abundance conditional on presence with a lognormal regression. We included all species that had the prevalence (fraction of surveys in which the species was observed) of at least 0.05, which limited the data to 120 species.
As fixed effects, as components of the matrix X, we included the mean (1) winter and (2)  We examined MCMC convergence by examining the potential scale reduction factors (Gelman & Rubin, 1992) of the model parameters that measure species responses to the covariates included in the X matrix.
Since differences between modelling approaches are typically less impactful than other factors (especially modelling strategy, included covariates, and the study system) when evaluating temporal predictive performance (Rapacciuolo et al., 2012), we use a single model, instead of using consensus modelling approaches which put more emphasis on precision rather than accuracy (Dobrowski et al., 2011). We selected to use specifically HMSC because it performed generally the best in terms of predictive power in a recent comparison among single-species and JSDMs (Norberg et al., 2019).
One of the benefits of this multispecies approach is that when the model is estimating the environmental responses of each species, through its hierarchical structure it can borrow information across all species . This is likely to increase predictive performance, especially for rare species for which the data alone is not sufficiently informative for accurate parameterization. Accordingly, Norberg et al. (2019) showed that HMSC models that included such feature had better predictive performance compared with corresponding single-species models.
We report our data and modelling process according to the ODMAP protocol (Zurell et al., 2020) in Appendix S2. We emphasize that the details regarding model algorithms were secondary in this study as the focus was on the comparison of validation methods.

| Model validation methods
To test the ability of the model to make predictions for the future, we first parameterized the model using data until the year 1999 only (3486 visits to 901 unique routes). We then used the model to make To evaluate the presence-absence part of the hurdle model, we used as the measure y the probability of occurrence, that is proportion of years in which the survey site was occupied. For evaluating the abundance part of the hurdle model, we used the mean of logtransformed counts over the years. Predicted mean abundances were counted only for years corresponding to species' presence.
We chose correlation as our measure of predictive performance because we were interested in identifying locations or environmental conditions which will influence the species especially favourably or unfavourably, that is on measures of discrimination. Evaluating predictive performance in terms of accuracy, calibration and precision would clearly be relevant as well (Norberg et al., 2019), but for the above reason we considered discrimination to be of the highest relevance and focussed on that.
For occurrence data, we chose only survey sites that had been visited at least twice, either during only the latter time period (static method) or during both time periods (change method). This resulted in 1214 and 264 sites, respectively. For abundance, the data were further restricted to only those survey sites where the species was observed at least once. This resulted on average 422 sites per species (range 67-1182) for the latter time period (static method) and 83 sites (range 3-262) for both time periods (change method).
Despite the smaller sample size and geographical coverage of validation data for the change method (Figure 2), the validation data still cover both the northern range edge and the core breeding areas for over 90% of study species (Keller et al., 2020). Species with observations from <10 sites were excluded from the validation data. Thus, in the abundance data for the change method, hazel grouse (Bonasa bonasia) and rustic bunting (Emberiza rustica) were excluded.
To quantify the overall amount of change per species, we calculated species-specific mean observed and predicted changes from across all the study sites, which contributed to the change validation datasets (n = 10-264 depending on species and type of data). To ask whether the model was able to predict which species increased and which decreased, we counted a correlation coefficient between the mean predicted and observed changes over the species.

| Effect of validation method
To investigate the differences in predictive performance between the two validation methods, we fitted a linear mixed-effects model, separately for occurrence and abundance datasets. For both occurrence and abundance data, we used the species-specific Pearson's correlation coefficient, that is predictive performance, as the response variable. As a fixed effect, we entered the validation method (categorical variable: static or change). Species was included as a random effect and species' phylogeny was modelled in the variance structure to account for if closely related species showed similar predictive performance. Phylogeny tree was acquired from birdt ree.org (Jetz et al., 2012). The modelling was conducted using the R-package and function MCMCglmm (Hadfield & Nakagawa, 2010) with Gaussian error distribution using 1,003,000 iterations, where first 3000 were used for 'burning in' and thinning interval was 1000.

| Effect of species traits
To investigate how the predictive performance for each validation method was related to species prevalence (continuous variable), migration strategy (long or short migration or resident), body mass (continuous) and habitat preference (forests, wetlands, cultural environments, mountains and mires), we fitted a set of 17 competing models. Prevalence and body mass variables were log-transformed prior to analysis to reduce non-normality. Our  Table S1.6: Appendix S1). Species was included as a random effect and species' phylogeny was modelled in the variance structure.
For model selection, we used the multimodel inference approach (Burnham & Anderson, 2002) by calculating the corrected deviance information criterion (DIC) for each model and selecting the one with the lowest DIC (Spiegelhalter et al., 2002).

| Robustness test
We assessed the sensitivity of our results to the effect of using a single species instead of JSDM. To do so, we considered otherwise identical HMSC models as in the case of joint species distribution modelling but fitted the models separately for each of the 120 study species. We then calculated the same predictive performance metrics with static and change methods for each species.
To investigate the effect of modelling approach (single vs. joint SDM), we fitted a linear mixed-effects model where both validation method and modelling approach were used as fixed effects to explain predictive performance. Species was included as a random effect and species' phylogeny was modelled in the variance structure. The model was fitted with the MCMCglmm -function (Hadfield & Nakagawa, 2010) with Gaussian error distribution (1,003,000 iterations, 3000 warm-up iterations, thinning interval of 1000).

| RE SULTS
Both HMSC models, for presence/absence data and for abundance data, reached good MCMC convergence and thus their posterior distributions were adequately sampled (Table S1.3: Appendix S1).
Species occurrences and abundances were more often statistically connected with climatic variables rather than land use variables (Figures S1.1 and S1.2: Appendix S1).
The overall mean species-specific observed and predicted changes across all study sites varied almost symmetrically around zero, thus some species declining and some increasing in their overall distribution and abundance. The observed mean changes varied between −0.11 and 0.15 in the occurrence probability and between −0.74 and 0.95 in the log-transformed abundance. The predicted mean changes varied much less, between −0.05 and 0.10 in the occurrence probability, and between −0.30 and 0.25 in the logtransformed abundance ( Figures S1.3 and S1.5: Appendix S1). For most species, observed changes were more negative (less increase or greater decrease) than what was predicted (Figures S1.4 and S1.6: Appendix S1). The correlation coefficient between mean observed and mean predicted changes was 0.21 (p = .023) for occurrence data and 0.25 (p = .0056) for abundance data (Figures S1.3 and S1.5: Appendix S1), meaning that the model predicted better than by random which species declined and which increased in general.

| Predictive performances
For occurrence data, the mean correlation coefficient for the static validation method was high, 0.54 (SD 0.18), whereas for the change method it was very low, 0.024 (SD 0.13; Figure 3, species-specific correlations in Table S1.2). For abundance data, the corresponding mean correlations were 0.31 (SD 0.19) for the static method and 0.003 (SD 0.18) for the change method ( Figure 3, Table S1.2: Appendix S1). Thus, for both occurrence and abundance data, the static validation method resulted in much higher predictive performance compared with change method. When focussing on the static validation method, the mean predictive performances were higher for occurrence compared with abundance data, but for the change method no such difference was detected ( Figure 3).

F I G U R E 3
Predictive performance, measured as the correlation between observed and predicted values, for each species (n = 118-120). The two panels correspond to occurrence and abundance analyses, and for each panel, results using static and change validation methods are shown. The dot and the error bar show the overall mean for all species and its 95% confidence interval.  (Tables S1.4 and S1.5: Appendix S1).

| Species traits
Habitat preference and prevalence were the most important traits affecting predictive performance in occurrence data (Table S1.6: Appendix S1). However, these effects were strongly moderated by the validation method used, as evidenced by the significant interactions between the terms (Table S1.7: Appendix S1). Species which prefer cultural (urban and agricultural areas) and mountain and mire habitats were associated with higher predictive performance but only with the static method, while no such connection was detected with the change method ( Figure S1.7: Appendix S1). Higher prevalence was also associated with higher predictive performance with the static method, whereas with the change method the positive association was very weak ( Figure S1.8: Appendix S1).
In abundance data, habitat preference, prevalence and migration behaviour were the most important traits affecting predictive performance (Table S1.8: Appendix S1). More common, short-migrating species preferring cultural habitats were associated with higher performances when conducting validation with the static method (Table S1.9, Figures S1.9-S1.11: Appendix S1). However, again these connections were much weaker or nonexistent with the change method.
In summary, the effects of species traits were rather weak and less visible for change method, which suffers from low predictive performance in general.

| Single versus joint species distribution modelling
When comparing the results from single SDMs to joint SDMs, for static method the predictive performances were much lower (but still higher than change method), whereas for the change method the performances remained near zero (Table S1.10, Figure S1.12: Appendix S1). The model results showed that the single-species modelling approach had a negative effect on predictive performance. The posterior mean estimate for the effect of modelling approach was −0.13, 95% CI [−0.15, −0.10] (Table S1.11: Appendix S1).
In summary, our robustness test indicated that the use of joint instead of single SDM does not influence the relative order of estimates of model performances between static and change validation methods.

| DISCUSS ION
Our study aimed at developing the concept of SDM validation by comparing two different validation methods for assessing temporal predictive performance, one that measures and predicts a static distribution in a population, that is only 1 year or mean of several years, and one that measures a change, that is difference in distribution between two separate time periods.

| Static validation ignores change events
Our results showed a major plummet in the estimate of predictive performance when change validation was applied, compared with the static method. This means that the interpretation of model performance changes drastically depending on which type of validation is applied, static or change. This highlights that, even if validation is done as recommended, with temporally independent data (Harris et al., 2018), the estimates of model's predictive performance might still be overly optimistic if evaluating future static distribution instead of change in distribution.
The main reason for this difference lies behind the fact that species ranges are often rather static, expanding or retracting mainly at their edges. This stability increases the possibility that, even as time passes, a species is still present, and in somewhat same quantities, in most parts of its previous range. Spurious species-environment as-  (Eskildsen et al., 2013;Rapacciuolo et al., 2012;Sofaer et al., 2018).

| Reasons for failing to predict change events
Why we failed to predict change could be that there were only minor changes happening in either bird distributions or in land cover or climate variables in the study area within the forecasted time period (approximately 16 years). The smaller the variation in the calibration data, the harder it will be for the model to estimate the underlying species-environment associations and thus to predict the spatial pattern of change events correctly. However, our bird data and climate records from Fennoscandia show that poleward shifts in both species' abundance (1.5 km/year) and climate (temperature, 7.4 km/ year) have occurred in our study area (Lehikoinen & Virkkala, 2016).
In addition, logging volumes have for instance increased c. 20% in the 21st century within the study area (Virkkala et al., 2020).
Therefore, we cannot conclude that our SDMs would have failed to predict changes simply because nothing had changed.
A more likely explanation for failing to predict change involves missing information. Even if we used relatively long-term data, the variables which we thought would best associate with species distributions over the study area may not be the ones that are most important in driving the change. We suspect that our classification of the land cover variables (six categories) might have been too coarse, and thus, it might not have included enough information on fineresolution changes in habitat quality, such as changes in the forest structure due to increased logging intensity (Virkkala et al., 2020), changes in agricultural landscape due to increased farming intensity (Laaksonen & Lehikoinen, 2013) or degradation of peatlands due to draining (Fraixedas et al., 2017). While the level of resolution of the abiotic variables that we used in our model (temperature and land cover) probably was adequate to determine suitable sites for species, that is where the species was present or absent, the missing fine-resolution detail of the land cover variable may have hindered our predictions on species abundance and especially, where local changes have happened. Furthermore, different species may be affected by the same variables but at different spatial scales. For example, species with small territories are more likely affected by local changes in the habitat whereas species with larger territories can be more affected by landscape level changes (Table S1.1: Appendix S1).
Another source of prediction error could originate from the fact that we did not model the whole species' ranges. Undoubtedly, it would be preferable to calibrate an SDM with information from the species' full environmental space (Owens et al., 2013), albeit global calibration might not in certain cases always improve predictive performance either (El-Gabbas & Dormann, 2018;Giovanelli et al., 2010). However, we note that our study area covered a circa 1900 km latitudinal gradient, which is a large spatial coverage and relatively broad in SDM studies in general. Further, for about 10% of the species, mainly northern boreal species, our study area did cover full climatic conditions of their range, yet they (as mainly rare species) did not show particularly high predictive performance in the species trait analysis.
In addition, our test data for change validation method were geographically biased towards southern Sweden ( Figure 2). However, we assume that this geographical bias was negligible because most of our study species had a southerly distribution and because this sample still covers mostly the same temperature (cold to warm) and land cover (from agricultural to forested and mountain landscape) gradients as a sample with a more northerly coverage would. One gradient that is not covered with this sampling is the increasing human population density towards southern latitudes. This could be important because in Norway, Husby et al. (2021) found that farmland birds have a stronger decline near urban areas than in rural areas. We acknowledge the possibility that omitting this effect of urban landscape may have distorted our results, yet we assume its effect to be marginal as only a small proportion of the study species breed in farmlands.
Lastly, an ongoing debate questions the use of correlative SDMs for projections of future distribution changes altogether (Dormann et al., 2012;Ehrlén & Morris, 2015;Zurell et al., 2009). In essence, correlative SDMs have not been originally designed to generate accurate projections of change, as opposed to more mechanistic, process-based models, for example dynamic occupancy models, which directly model occupancy states in space-time (Kéry et al., 2013). Despite their limitations, correlative SDMs are still often preferred because their data requirements are more feasible compared with alternative mechanistic approaches. This may result in a larger spatial coverage of predictions, but of poorer quality.

| Influence of species traits
Regarding species traits, our results were not clearly consistent throughout the validation methods and data types, thus not showing strong support for our hypotheses. The hypothesis concerning lower prevalence leading to lower predictability gained support in both occurrence and abundance data, but significantly only when associated with prediction performances calculated with the static method. Higher predictive performance of more common species could be related to better detection probability compared with rarer species, which would decrease bias in the data (Johnston et al., 2014;McPherson & Jetz, 2007).

| Importance for conservation
Species distribution models can be used in various ways in conservation ecology (Guisan et al., 2013;Sofaer et al., 2019). However, if the models are blind to the factors that drive changes, and the spatial predictions derived from them completely miss the exact locations where changes are likely to happen, then one should not base any local conservation action, such as establishing finely targeted new protected areas for endangered species, on these projections. When data allow, an alternative might be to use, for example speciesspecific dynamic occupancy models, yet also their predictions need to be equally rigorously validated. Although our model failed to predict local change events, it still fared decently in predicting overall which species declined and which increased at the whole study area level. Therefore, these predictions can still prove to be helpful when preparing conservation actions over larger spatial scales.

| CON CLUS ION
In conclusion, as SDMs are becoming increasingly important tools to predict future distributions of species and as time series are getting longer, also enabling validation by measuring change, we should pay more attention to the reliability of the predictions.
We recommend that when the aim was to make future forecasts of changes in species distributions, the models used for the purpose are validated using temporally independent data and with methods that are able to evaluate not only the model's ability to make future predictions on a general level but also its ability to predict change events in the spatial pattern of species occupancy or abundance. We remind that the choice of modelling approach and validation method should be determined by the intended application of the model. We also conclude that the field should proceed towards more critical model validations and thus take the challenge of making time series datasets more broadly available and attempting to make better models that can be shown to be successful in predicting also fine-scaled change events in species distributions.

ACK N O WLE D G E M ENTS
We acknowledge the tremendous fieldwork effort made by the vol- ERC-synergy project LIFEPLAN). We acknowledge the E-OBS dataset from the EU-FP6 project UERRA (http://www.uerra.eu) and the Copernicus Climate Change Service, and the data providers in the ECA&D project (https://www.ecad.eu). We thank also the anonymous reviewers.

CO N FLI C T O F I NTE R E S T S TATE M E NT
All authors declare that they have no conflicts of interest.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/ddi.13687.

DATA AVA I L A B I L I T Y S TAT E M E N T
The original bird survey dataset includes 63 species whose occurrence and location information are classified as 'sensitive' according to Scandinavian laws and regulations and thus cannot be openly shared as such. Therefore, in the openly shared dataset, the coordinates for the survey sites have been made coarser. These data, as well as R-code and additional data for analysis, are available in Dryad at https://doi.org/10.5061/dryad.bzkh1 89br. The original, accurate data are available from the corresponding author upon