Geographic selection bias of occurrence data influences transferability of invasive Hydrilla verticillata distribution models



Due to socioeconomic differences, the accuracy and extent of reporting on the occurrence of native species differs among countries, which can impact the performance of species distribution models. We assessed the importance of geographical biases in occurrence data on model performance using Hydrilla verticillata as a case study. We used Maxent to predict potential North American distribution of the aquatic invasive macrophyte based upon training data from its native range. We produced a model using all available native range occurrence data, then explored the change in model performance produced by omitting subsets of training data based on political boundaries. We also compared those results with models trained on data from which a random sample of occurrence data was omitted from across the native range. Although most models accurately predicted the occurrence of H. verticillata in North America (AUC > 0.7600), data omissions influenced model predictions. Omitting data based on political boundaries resulted in larger shifts in model accuracy than omitting randomly selected occurrence data. For well-documented species like H. verticillata, missing records from single countries or ecoregions may minimally influence model predictions, but for species with fewer documented occurrences or poorly understood ranges, geographic biases could misguide predictions. Regardless of focal species, we recommend that future species distribution modeling efforts begin with a reflection on potential spatial biases of available occurrence data. Improved biodiversity surveillance and reporting will provide benefit not only in invaded ranges but also within under-reported and unexplored native ranges.