Effective strategies for correcting spatial sampling bias in species distribution models without independent test data

Spatial sampling bias (SSB) is a feature of opportunistically sampled species records. Species distribution models (SDMs) built using these data (i.e. presence‐background models) can produce biased predictions of suitability across geographic space, confounding species occurrence with the distribution of sampling effort. A wide range of SSB correction methods have been developed but simulations suggest effects on predictive performance are highly variable. Here, we aim to identify the SSB correction methods that have the highest likelihood of improving true predictive performance and evaluation strategies that provide a reliable indicator of model performance when independent test data are unavailable.


| INTRODUC TI ON
Species distribution models (SDMs) are used frequently to infer the environmental suitability of locations across landscapes for species based on correlations between species records and environmental conditions (Elith & Leathwick, 2009;Guillera-Arroita et al., 2015;Guisan et al., 2013).Presence-only (PO) data, collected opportunistically by recorders, are often the only data available for modelling species' distributions.This has driven development of SDM methods for exploiting PO data, most notably presencebackground (PB) modelling frameworks, where in lieu of absence data, a sample of background points is used to contrast available environmental conditions with conditions at sites of known occurrences (Tsoar et al., 2007;Valavi et al., 2022).A major challenge for modelling species' distributions using PO data is that opportunistically collected species records are usually affected by spatial sampling biases [SSB] (Hughes et al., 2021;Meyer et al., 2016).
These arise because when recorders are unguided by a sampling protocol they are more likely to visit and record species in locations with certain characteristics; for example, accessible sites that contain ecologically valuable habitats (Geldmann et al., 2016;Sousa-Baena et al., 2014).SSB hinders the performance of SDMs built using PO data because it violates the assumption that environmental space is sampled in proportion to its availability in geographic space (Phillips et al., 2009).SSB in geographic space typically results in biased sampling of environmental spaces (Ranc et al., 2017) leading to model predictions that confound the suitability of environmental space with spatial patterns of recording effort (Cosentino & Maiorano, 2021;Phillips et al., 2009), potentially affecting decisions or inferences informed by these models (Muscatello et al., 2021).SSB correction methods are frequently used in SDM workflows with the aim of reducing the effects of SSB on model predictions.
Filtering methods aim to reduce the effects of oversampling occurrence records in parts of environmental space by removing records from apparently oversampled parts of geographic and/or environmental space (Aiello-Lammens et al., 2015;Beck et al., 2014;Chauvier et al., 2021).Whereas filtering methods alter the pattern of species occurrence records, an alternative is to alter background samples from a random sample of geographic space to a distribution that reflects the suspected pattern of SSB, with the aim of factoring out bias that is common to both the presence and background data (Phillips & Dudík, 2008).Common approaches for altering background samples include aggregating data towards landscape features thought to be associated with biasing processes (e.g.roads and pathways) or clustering the background near to occurrence records (Monsarrat et al., 2019;Mutascio et al., 2018;Vollering et al., 2019).Phillips et al. (2009) introduced the idea of inferring the pattern of SSB in presence-only records for a focal species from the spatial distribution of records for a target group of species assumed to be affected by a similar SSB.This target group approach for altering background data is now widely used (Barber et al., 2022;Mair et al., 2017).Model-based approaches have also been developed to correct for SSB, including covariate-based methods that aim to reduce the effects of SSB on model performance by including covariates in the model associated with sampling effort, such as those capturing distance from roads or cities where SSB is known to be strongly associated with these factors (Chauvier et al., 2021;Inman et al., 2021;Warton et al., 2013).
While these methods can improve predictive performance of SDMs when compared to uncorrected models they are not universally effective and can even degrade model predictive performance (Barber et al., 2022;Inman et al., 2021;Kramer-Schadt et al., 2013).
Occurrence filtering is generally evaluated as a reliable and safe method of SSB correction (Gábor et al., 2020;Kramer-Schadt et al., 2013;Moua et al., 2020), but the benefits of filtering are inconsistent (Inman et al., 2021;Moua et al., 2020;Varela et al., 2014).Inman et al. (2021) found that environmental filtering improved model performance 55% of the time, but led to a decrease in predictive performance in 43% of treatments.Filtering approaches appear to reduce model predictive performance by causing a loss of data (Varela et al., 2014) and often performed poorly compared to other methods when sample sizes were low (Moua et al., 2020;Vollering et al., 2019).Altered background approaches have been shown to perform well in several simulation studies (Barber et al., 2022;Vollering et al., 2019), but overall the performance of this class of correction method is also highly variable (Fourcade et al., 2014;Ranc et al., 2017;Stolar & Nielsen, 2015;Dubos et al., 2022).Fourcade et al. (2014) found that while systematic sampling (geographic filtering) improved predictive performance in 66% of models, altering the bias in the background data via prior weights in Maxent led to improvements in only 23% of SDMs. Ranc et al. (2017) found that uncorrected SDMs performed better or no worse than those corrected using the target group approach for altering the background sample for several species groups (simulated 'real' species), and ultimately cautioned against 'uninformed and ubiquitous' use of this method.
While this literature highlights the variable effects of SSB correction on SDM predictive performance, there is no overall consensus on how to ensure that SSB correction methods improve performance in real-world modelling scenarios where, as is typically environmental niche model, observer bias, opportunistic records, presence-background, presence-only, preferential sampling TA B L E 1 Results and conclusions from simulation studies evaluating spatial sampling bias correction methods applied to presence-only species distribution models.Env = environmental; Geo = geographic (i.e.filtering in environmental or geographic space).See Table 2 for SSB correction method classifications and abbreviations.
• 25 presences: background thickening performed best, but no difference between OccFilter (Geo) and no correction   the case, the 'true' distribution or probability of occurrence is unknown and independent test data are unavailable.Here, we assess how best to implement SSB corrections when independent test data are unavailable for model selection and validation.First, we conduct a systematic review and meta-analysis of the primary research literature to evaluate changes in predictive performance between corrected and uncorrected SDMs for major classes of SSB correction methods with the aim of identifying methods with consistent positive effects on performance.For a group of commonly used SSB correction methods, we then use simulations to identify: (i) when evaluation using non-independent data (i.e.internal cross-validation using multiple withheld data blocks) provides a reliable signal of true model predictive performance, as measured using spatially independent test data, and (ii) the SSB correction method that has the highest likelihood of producing improvements in predictive performance over uncorrected models.From these results, we aim to find combinations of SSB correction method and evaluation strategy that most consistently identify models with the best predictive performance in the absence of independent test data.

| Systematic review and meta-analysis
We searched the peer-reviewed literature (Web of Science and SCOPUS databases both searched on the 13 February 2023) using the following search string: ALL=(("species distribution*" OR SDM OR "environmental niche" OR ENM OR "resource selection" OR "habitat selection" OR suitability OR occurrence) AND ("presence-only" OR "presence data" OR "presence-background" OR "pseudo absence" OR opportunistic OR "citizen science" OR preferential OR maxent OR biomod)).
After removing duplicates, the search returned 8564 unique studies.These were further filtered to remove studies that fell outside of the review subject area based on the title and abstract.
The remaining studies were filtered by content based on the criteria that they involved the building of SDMs using PO data (i.e.no absence information, including inferred absences from complete species lists) and that the study included a direct comparison between SDMs that attempted to correct models for SSB and models without this correction.To avoid ambiguity, studies were required to mention explicitly that a particular analytical approach was designed to account for SSB (i.e.not 'filtering to reduce spatial autocorrelation', which is ambiguous as to the cause of the spatial autocorrelation).This resulted in 70 studies from which information on the effect of SSB correction on model performance was extracted, along with metadata on species taxonomy, sample sizes of occurrence data, and details of the SDM methods and SSB correction approach used.
For analysis, we classified the SSB correction methods into major categories (summarised in Table 2).Where multiple SSB correction methods applied in separate models were presented in a single study, these were considered separate treatments nested within a study and were extracted separately but linked by a study ID.Maxent includes the option to provide a biasfile, which adds priors to the model that weights background data based on bias present in this file, but a similar effect can be achieved by supplying pre-altered background data.Because it was often unclear whether Maxent correction was being corrected via a biasfile or pre-altered background data, for consistency we classified all Maxent approaches that apply SSB corrections via the background data as types of altered background approach.
For studies reporting model evaluation metrics not in the mean ± one standard deviation (SD) format, where possible we converted them to this format using the following approaches.Point estimates for model performance were often provided for multiple individual species using the same SDM algorithm and SSB correction method and these were used to calculate the mean ± SD at a treatment-level (SDM and SSB correction methods).If only a single point estimate per treatment-level was given, these data are plotted in Figure S1, but not used in the meta-analysis.For studies reporting the mean ± 95% confidence intervals, the SD was calculated as (Cochrane., 2022), where u cl and l cl are the upper and lower bounds of the 95% confidence interval and n the sample size.For studies reporting the median and interquartile range per treatment and we were unable to acquire the raw data from the authors to calculate the mean ± 1SD, data were converted to this format following Wan et al. (2014).SDM ensemble mean statistics were in some cases presented without an accompanying estimate of variation; these data are plotted in Figure S1, but not used in the meta-analysis.
Meta-analysis results are presented in Figure 1 and Figure S2 provides the study ID corresponding to each effect size.
From each treatment-level mean ± SD, we calculated the Hedges' g statistic, an estimate of the standardised mean difference between two independent groups that corrects for biases introduced by small sample sizes (Borenstein et al., 2021).We present mean effect sizes ±95% CI, calculated from the pooled SD, for each study separately.We estimated the overall mean effect size by modelling Hedges' g as a function of an overall intercept with a random effect for study ID.Hedges' g was modelled using a Student's t-distribution to accommodate extreme outliers, and this model was able to capture the distribution of observed data better than assuming a Gaussian response, and the inverse of the Hedges' g standard deviation for each study was included as a model weight (i.e. the contribution of the observation to the likelihood) to account for variations in the within-study variance of effect sizes across studies (Borenstein et al., 2021).We evaluated the hypothesis that Hedges' g > 0 (i.e. a positive effect of SSB correction on predictive performance) using the evidence ratio, which tests the ratio of the posterior probability of beta > 0 versus beta < 0. Ratios exceeding one indicate increasing degrees of support for the hypothesis, which can be assessed using the posterior probability of observing this ratio.All models were fitted in a Bayesian framework using the brms package (Bürkner, 2017).

| Simulation models
Simulation models were developed to test the effects of SSB correction on model predictive performance evaluated using independent and internal testing under a range of scenarios.

| Virtual environments
Virtual environmental landscapes were created by simulating multiple surfaces as a multivariate Gaussian distribution using the rmvn function from the mgcv R package (Wood, 2020) with the spatial autocorrelation (SAC) of each simulated surface occurring over different spatial scales to represent different types of environmental feature; see also Baker et al. (2022) for similar methodology and Table S4 for a visual example of the spatial scales for SAC in each environmental variable.Four variables per virtual environment were created.Three of the variables were passed through additional functions to alter their characteristics, creating in turn V 1 ('elevation'), V 2 ('habitat'), and V 3 ('temperature') variables (see Table S4 for further details).Variables were simulated across a 50 × 50 grid that was then disaggregated using bilinear interpolation to create a 250 × 250 grid on which species distributions could be generated.Artificial roads were placed in the landscape by calculating multiple least cost pathways through the landscape, and the distance from each pixel to the nearest road was calculated.One hundred virtual environments were pre-simulated.

| Virtual species and sampling
Virtual species were simulated onto each virtual environment based on plausible responses to variables V 1 (elevation), V 2 (habitat) and V 3 (temperature) variable.The response to 'elevation' and 'habitat' were defined by a logistic function: where x i is the value of V 1 or V 2 in cell i, and A and B distribution parameters that were varied randomly (drawn from a uniform distribution over the range [−0.5, 0.5]) to create virtual species with different environmental niche breadths.K can be defined as the carrying capacity and here was set to 1 throughout.The response to 'temperature' was defined by a Gaussian function with the mean (uniform distribution over the range [0, 1]) and standard deviation (normal distribution with mean = 0.5 and standard deviation = 0.1) varied to alter niche breadth.The distributions for sampling parameter values were chosen pragmatically to generate virtual species with a broad range of prevalence.The probability of occurrence (P occ ) was calculated as the product of the responses across the three environmental variables, rescaled as P occ = P occ − min(P occ ) max (Pocc) − min(P occ ) . Species occupancy in a grid cell was simulated as a Bernoulli random trial with the probability of success equal to the cell specific P occ .For each of the 100 virtual environments we generated 100 virtual species, calculating the prevalence of the species in the environment as the proportion of grid cells occupied.We then randomly selected 50 virtual species from within each prevalence class (<5% to 50% in 5% bins) to ensure that the suite of virtual species selected for modelling represented a broad range of niche breadths, resulting in a suite of 550 virtual species to use in the analysis.SSB was generated in relation to the simulated linear feature variable, with the intensity of the bias varied using a logistic function to create spatially biased surfaces with different degrees of spatial variation in sampling intensity (see Figure S3).Occurrence records were virtually sampled across the study grid with probability weights given by a SSB surface.A 'no bias' scenario was also included to The major classes of SSB correction used to group approaches.

SSB correction class Details
OccFilter (Geo) Any approach that aims to filter (thin or rarefy) records in geographic space to reduce the influence of spatially biased sampling OccFilter (Env) Any approach that aims to filter (thin or rarefy) records in environmental space to reduce the influence of spatially biased sampling AdjBkGrd (sppTgGrp) Any approach that aims to reduce SSB by altering the spatial pattern of background points based on the spatial distribution of occurrence records for other species groups thought to be sampled with similar bias to the focal species AdjBkGrd (Occurrence) Any approach that aims to reduce SSB by altering the spatial pattern of background points based on the distribution of occurrences for the focal species (e.g.sampling from within a buffered distance from known occurrence records or weighted by distance from the occurrence records) evaluate the effects of applying SSB correction where no bias exists.
In the 'no bias' scenario, each pixel had an equal likelihood of being sampled.We evaluated the effect of the number of records sampled (n = 50, 100, 200, 400 and 800).

| Species distribution modelling
We found no statistically significant difference across SDM methods in Hedges' g scores (Table S1) and, consequently, used a single SDM approach throughout the simulation study.Point process models were used to model simulated PB data as they are a widely used and computationally efficient PB-SDM method that is easily and efficiently implemented for PB data as a downweighted Poisson regression (see Renner et al., 2015 for details).Poisson generalised linear models were fitted to the PB data with downweighting implemented by setting weights for presence locations equal to a very small value (10 −6 ), while weights for background (aka quadrature) points were set equal to the area of the study divided by the number of background points (Renner et al., 2015).The number of background points was set equal to 10,000 throughout.The three variables used to create the virtual species were included as predictors in the SDMs.Because the aim was to compare models with and without SSB correction and not to identify the overall best model, we kept the number of terms low (i.e. did not include quadratics and interactions).We acknowledge that a more flexible model may improve the overall model performance but is unlikely to affect relative performance of SSB correction method.

| Spatial sampling bias correction methods
Informed by the systematic review, four approaches to SSB correction were implemented, with multiple settings for each approach to allow optimisation of the SSB correction.Two frequently employed altered background approaches were implemented: AdjBkGrd (Feature) and AdjBkGrd (Occurrence) .These approaches generate sampling weights for background data with the aim of producing a background sample with similar SSB to the occurrence data.
AdjBkGrd (Feature) was implemented as a distance decay from the road features, with the distance decay defined using a logistic function to give varying degrees of decay.These values were rescaled to between 0 and 1 to give the sampling weights for the background points.The AdjBkGrd (Occurrence) was implemented by placing a buffer around each occurrence point with the radius varying across three different distances.Pixels within the buffer were given a weight of 1 and those outside a weight of 0. Background points were chosen using a random sampling approach weighted by the sampling weights.OccFilter (geo) was implemented by defining a coarse resolution grid across the study area and then using the reciprocal of the number of observations per pixel as the probability of success in a Bernoulli random trial applied to each record (i.e. if two records occurred in a pixel then each record has a 0.5 chance of being selected).Grids of multiple pixel sizes were created to allow optimisation of the filtering distance.For the Covariate (CondPred) method, models were fitted with the distance to road as an additional covariate, specified as either a first, second or third order polynomial, which together accommodate several different functional forms of spatial sampling bias distance decay relationships that might be hypothesised.Predictions were then made after setting this value to the mean distance to roads (i.e.conditioning on a common value, following Warton et al. ( 2013)).

| Model evaluation and analysis
Ten-fold cross-validation was used in model building and evaluation, with models in turn fit to nine blocks and tested on the remaining block.Blocks for internal (i.e.withheld portions of the same data used to fit the model) evaluation were created either by randomly assigning pixels to blocks, or using spatial systematic or environmental blocking, as implemented in the BlockCV package (Valavi et al., 2019).For the systematic blocking, the spatial range of spatial autocorrelation in the environmental datasets was explored and a grid size was selected that reflected this scale.Environmental blocks were created using a k-means clustering algorithm to identify similar environmental conditions based on environmental covariates and species occurrences were then assigned to one of these environmental blocks (Valavi et al., 2019).At each iteration, spatially independent test data were also generated by taking a random unbiased sample from the virtual species distribution equal in size to the test block.These data represent the type that might be collected with independent, spatially unbiased field work (i.e.presence-absence validation data) and provide a measure of how model performance would be judged with spatially independent test data.The area under the curve (AUC), continuous Boyce index, and Spearman's rank correlation (r s ) and Pearson's correlation (r p ) coefficients were calculated on each test block.Analysis was based on the mean of the 10-fold cross-validation for each test statistic calculated for each virtual species and simulation scenario.
In evaluating model performance, we aimed to identify the SSB correction methods with the greatest likelihood of producing an improvement in model predictive performance and the SSB correction method and internal evaluation approach that provides the most accurate assessment of the true model predictive performance.To do this, we calculated the difference in test metric (Δ [metric] ) between F I G U R E 1 Effect of spatial sampling bias correction on model predictive performance quantified for SDMs built using presence-only data with and without attempts to correct for spatial sampling bias and evaluated using (a) independent or (b) internal (i.e.withheld) test data.Results were extracted from published studies, with each point estimate representing the mean within-study effect size (Hedges' g ± 95 confidence intervals) for a particular method of SSB correction (shown on the y-axis).See Figure S2 for links to studies citations.SDMs.We also ask whether tuning the implementation of the SSB using internal cross-validation using consistently identifies the true best model and consistently avoids the true worst model.
The overall effects of sample size and species prevalence on SSB correction were evaluated using linear models, with models fitted separately for each metric and blocking approach.The response variable Δ [metric] was modelled as a function of the SSB strength (none, weak, moderate and strong), SSB correction method, number of presences using to build SDMs, and the species prevalence, and each two-way interactions between the correction method, prevalence and number of presences.Species ID was included as a random intercept.Models were fit using robust linear mixed-effects models using the robustlmm package (Koller, 2016) to account for the presence of extreme values in estimates of the mean effects.Marginal effects were calculated using ggeffects package (Lüdecke, 2018).
All analysis and simulation modelling were conducted in R 4.3.1 (R Core Team, 2023) and code for the analysis is available at https:// github.com/ david jbake r79/ spati alSam pling BiasP erfor mance .

| Meta-analysis of effects of SSB on SDM performance
Only 10 studies were identified that tested SDMs built using PO data on independent data and also reported sufficient information to calculate standardised effect sizes.These studies produced 13 effect size estimates across five SSB correction method types and in all cases the 95% CI for the mean standardised effect size (Hedges' g) overlapped zero (Figure 1a).However, the mean effect size was 0.35 (−0.66, 1.57) across all methods and the posterior probability of Hedges' g > 0 (i.e. a positive effect of SSB correction) was 0.77 (evidence ratio = 3.27), which suggests weak support for a positive effect of SSB correction on model performance (Table S2).However, this positive change in model performance reflects a large change from a single study, while the remaining studies showed a mean difference in AUC of −0.028, which indicates on average a small decline in performance with SSB correction.
A further 31 studies provided comparisons between corrected and uncorrected models based on internal cross-validation approaches and these studies produced 55 effect size estimates over nine classes of SSB correction method.There was much greater variation in standardised effects between studies and between SSB correction methods for models evaluated using internal cross-validation (Figure 1b).Of the 14 studies evaluating the species target group approach for adjusting the background data, six showed support for a decline in the mean effect size when SSB correction was implemented in the SDM (i.e.95% CI did not include zero), while one showed an increase.Similar patterns of variation in mean effect sizes were seen across the other methods.
However, no significant difference in mean effect sizes was found for AdjBkGrd (Effort) and Covariate (CondPred) .The estimated mean Hedges' g across all studies was −0.57(−1.29, 0.12) and the posterior probability that Hedges' g > 0 (i.e. a positive effect of SSB correction) was 0.05 (evidence ratio = 0.05), suggesting little support for a positive effect of SSB correction on model performance when evaluated on internal test data (Table S3).For models evaluated using AUC, the mean difference between models with and without SSB correction was −0.016, suggesting a small decrease in model performance on average.

| Performance of SSB correction methods and evaluation strategies (simulation)
Simulations support observations from the meta-analysis showing that the effects of SSB correction on model true predictive performance were on average small (absolute value

ΔAUC:
Occfilter (Geo) = 0.003; AdjBkGrd (Occurrence) = 0.003; AdjBkGrd (Feature) = 0.017; Covariate (CondPred) = 0.015).Generally, across all model scenarios, SSB correction was equally likely to decrease as increase true model performance (Table 3), with the exception of AdjBkGrd (Feature) method, where performance improved with strong or no SSB but worsened with weak or moderate SSB, and Covariate (CondPred) , where ΔAUC increased in up to 84% of the models with strong SSB, giving a mean change of 0.048 to 0.06, depending on blocking strategy (Table 3).These broad patterns observed for AUC and random blocking were similar across evaluation metrics and blocking strategies (Tables S5-S7; Figures S4-S10).
SSB correction measured using AUC generally became increasingly positive with higher species prevalence, with the effect strongest for AdjBkGrd (Feature) (Figures S11-S22).The effect of the number of presence records used to model species distribution varied across methods and with species prevalence but was generally small.The effects of prevalence and number of presences different slightly from the AUC results for Boyce and r p , whereas effects on r s were similar to AUC (Figures S11-S22).
Overall, agreement between internal and independent crossvalidation in the direction of the effect of model performance with SSB correction was c. 50% for Occfilter (Geo) (49.5%),AdjBkGrd (Occurrence) (49.9%) and Covariate (CondPred) (49.4%).While agreement was slightly higher for AdjBkGrd (Feature) (62.5%), this reflected wide variation between different SSB strengths, showing good agreement when SSB was strong (>84%), but agreement was consistently lower with weaker SSB strengths and for the other SSB correction methods (Figure 2; Table S8).For this method, agreement was also more variable between evaluation metrics and blocking methods, including being notably lower for AdjBkGrd (Feature) method when measured using continuous Boyce index with random or environmental blocking (Figures S4-S10; Tables S8-S11).Thus, for most methods and scenarios internal cross-validation provides little information on whether SSB correction has improved true model performance.
Tuning of the SSB implementation using internal cross-validation identified the best model (n = 3 parameter settings), as evaluated using independent test data using AUC, in 15%-32% of models, depending on the strength of SSB and blocking strategy, and was on average highest with spatial systematic (26%) and lowest with random (20%) blocking (Tables S12-S15).Tuning using internal crossvalidation avoided the worst model in c. 50% of model runs, with little difference on average across blocking scenarios and was similarly effective using continuous Boyce index and r s, but less so using r p .These statistics together suggest that tuning using internal crossvalidation identifies the best model and avoids the worst model no better, and often worse, than at random.

| DISCUSS ION
Here, we provide guidance on best approaches for implementing SSB correction in PO-SDMs when, as is most often the case, independent test data are unavailable to evaluate model predictive performance.The meta-analysis showed some support for a positive effect of SSB correction on model performance between models with and without correction when evaluated using independent test data suggesting that SSB correction does improve model performance on average, although there was no single method that consistently increased model performance.Predictive performance measured using internal test data was much more variable and suggested low correspondence with evaluations on independent test data.Our simulation results indicate that internal and spatially independent evaluation statistics generally have low concordance, with the exception of the AdjBkGrd (Feature) when SSB was strong.The risk of causing decreases in predictive performance was c. 50% overall, but notably smaller for AdjBkGrd (Feature) and Covariate (CondPred) methods in some cases.Importantly, tuning of SSB implementation using internal cross-validation was not helpful in identifying the best models and often led to a higher likelihood of finding the worst model than a random implementation.Together, these results highlight the difficulty of applying SSB corrections in the absence of independent test data, but also   emphasise the importance of understanding the processes generating bias in order to inform the implementation.
Adjusting background data relative to a known driver of SSB proved most reliable in our simulations when SSB was strong, but where the drivers of SSB are unknown these types of approaches will likely be difficult to implement, as illustrated by the poorer performance with weaker SSB.While there have been several studies that have investigated the drivers of SSB in opportunistic datasets (Meyer et al., 2016;Sousa-Baena et al., 2014) method might achieve a better characterisation of the SSB structure in data under these conditions, assuming covariate proxies for the drivers can be identified.When identifying these proxies it is important to consider correlations between SSB drivers and a species' niche (Baker et al., 2022;Warton et al., 2013) because, as is common to regression adjustments, collinearity between SSB and occurrence can make the occurrence function unidentifiable (Fithian & Hastie 2013).
Occfilter (Geo) and AdjBkGrd (Occurrence) are frequently used methods and are likely employed where the drivers of SSB are unknown.
These methods make no particular assumptions about the structure of the SSB and are quite similar in performance (Vollering et al., 2019).Occurrence filtering has generally been shown to improve model performance over uncorrected models (Fourcade et al., 2014;Kramer-Schadt et al., 2013), although effect sizes for this method are often small regardless of the direction of the effect (Inman et al., 2021;Varela et al., 2014) and this aligns with results reported here.Larger improvements in model performance have been documented but these studies also tend to highlight substantial variation in differences between filtered and unfiltered data depending on the implementation of occurrence filtering (Kramer-Schadt et al., 2013;Varela et al., 2014).Clearly, the exact implementation of filters can be important and considerable care should be taken to test and validate filtering decisions.
Another frequently used alternative for altering the background sample when the drivers of SSB are unknown is to use the species target-group method (AdjBkGrd (sppTgGrp) ).The meta-analysis showed inconsistent effects of this approach on model performance, but large improvements are clearly possible based on independent evaluation (Figure 1).Warton et al. (2013) comment that the targetgroup approach to altered background correction replaces the observer effort bias with a species richness bias and this is likely to explain poor performance of the target-group SSB correction approach in some studies.We did not include this approach in our simulation study because of the complexity of exploring the consequences of choosing different target groups on this richness bias within a relatively simple simulation.This approach does however warrant further detailed study to determine when it is reliable for correcting for SSB and when greater caution is warranted.
Several studies have raised concerns over the use of internal cross-validation to evaluate models when independent test data are not available because these data inevitably contain the same biases as found in the training dataset, which can make it difficult to detect improvements in model predictive performance (Matutini et al., 2021;Roberts et al., 2017).Our results strongly support these concerns, but also show the problem is not equal across SSB correction methods.
Discrimination metrics (i.e. the ability to correctly classify presences and absences) dominated the SDM studies reviewed here, but several studies have cautioned against using such metrics because of a potential lack of sensitivity to changes in model calibration caused by the SSB correction (Fourcade et al., 2014;Stolar & Nielsen, 2015;Dubos et al., 2022).These metrics have been evaluated as inappropriate for presence-only data (Leroy et al., 2018) and have proven misleading when calculated using internal cross validation (Dubos et al., 2022).
Improved calibration is likely to be more useful than increased discrimination when the purpose of modelling is to generate spatially contiguous predictions of habitat suitability (Warren et al., 2020), which is most frequently the intended use for the model outputs.Our simulation results show moderate agreement between discrimination and correlation metrics, suggesting that the same models are likely to be identified as the best performing regardless of the test statistic used in most cases.However, it is good practice to use multiple test statistics to assess model performance as divergences between statistics may reveal important characteristics of model performance that might be overlooked by a single metric (Elith & Graham, 2009;Dubos et al., 2022).The development of specific metrics for measuring the effects of SSB correction might provide considerable advantages over existing metrics.A recent example is the Relative Overlap Index that measures the effect of correction relative to model stochasticity (Dubos et al., 2022).Furthermore, while small differences were observed with different blocking methods, particularly for environmental blocking, the broad conclusions about model performance were not altered by these differences.Nevertheless, there are likely to be circumstances where the blocking specification becomes more important, such as where spatial or temporal autocorrelation in the data are strong, and therefore, we would recommend checking for consistency across different blocking approaches (Valavi et al., 2019).
In conclusion, while corrections to account for SSB are widely used when building SDMs using presence-only data, there is still considerable uncertainty as to whether and when these efforts lead to improved model performance in real-world modelling scenarios (Dubos et al., 2022).Model predictive performance evaluated on independent test data suggest that effects on model performance are typically small, but improvements are clearly possible.Simulations show that decreases in model performance are common, which suggests that great caution is required when applying these methods where independent test data are not available.SSB is important and therefore despite these concerns we do recommend trying to understand the drivers of SSB in a study system before building SDMs corrected and uncorrected SDMs from spatially independent test data after identifying the model with the highest tuning based on internal cross-validation scores (e.g.optimising buffer distance or the shape of the distance decay for weighing background points).This represents the real-world usage scenario where SSB might be tuned using internal cross-validation to identify the parameterisation with the strongest positive effect on predictive performance.We then plot the mean (±SD) Δ [metric] against the percentage agreement (0%-100%) in the direction of Δ [metric] measured using internal versus independent cross-validation.The former being the measure of true change in performance with SSB correction and the latter indicating the degree to which internal evaluation provides information on the true change in model performance.The mean difference in model performance for a given SSB correction method was calculated as the grand mean and pooled SD of the 10-fold cross-validation metrics across virtual species, accounting for variation within and between virtual species

TA B L E 3
The percentage of SDMs[simulations]  showing an increase or decrease in AUC with SSB correction and the mean change in AUC (ΔAUC) measured across all models for four different SSB correction methods, four different strengths of SSB, and three blocking methods (random, spatial systematic and environmental).

F
I G U R E 2The mean difference (μ ± SD) in the true change (Δ) in AUC between presence-background SDMs with and without SSB correction (y-axis), identified via cross-validation with independent test data, and the agreement in the direction of change in model predictive performance (ΔAUC) measured using internal versus independent cross-validation between SDMs with and without SSB correction evaluated using internal versus independent cross-validation (x-axis).The panels show results for models where data used in the SDMs was sampled with (a) moderate and (b) strong SSB.Blocking was either random, spatially systematic or environmental.N indicates the number of occurrence records in the virtual sample available for modelling and Prev.indicates the prevalence of the virtual species in the landscape.On the y-axis positive values indicate improvement in model performance with SSB correction compared to uncorrected model.

•
OccFilter (Geo) improved model performance in 82% of cases.•OccFilter (Env) improved model performance in 55% of cases Inman et al. (2021)Maxent with SSB correction via OccFilter (Geo or Env) and AdjBkGrd (occurrence) [latter implemented using FactorBiasOut (biased prior) in Maxent based on kernel density of observations]• AdjBkGrd (occurrence) improved model performance in 96% of cases.
Feature) As above, but based on a buffer or distance decay from landscape features (e.g.roads and cities) AdjBkGrd (Effort) As above, but based on the known distribution of effort, such as a buffer or distance decay from shipping routes Covariate (Observed) A covariate included in the model solely to address variations in sampling (e.g.distance from road) but where predictions are made assuming the observed value of the variable across space Covariate (CondPred) As above, but where predictions are made assuming a constant value of the variable across space (e.g.setting distance from road to zero across all areas)