Get access

The uncertain nature of absences and their importance in species distribution modelling


  • Jorge M. Lobo,

  • Alberto Jiménez-Valverde,

  • Joaquín Hortal

J. M. Lobo (, Dept Biodiversidad y Biología Evolutiva, Museo Nacional de Ciencias Naturales, c/ José Gutiérrez Abascal 2, ES-28006, Madrid, Spain. – A. Jiménez-Valverde, Natural History Museum and Biodiversity Research Center, The Univ. of Kansas. Lawrence, KS 66045, USA. – J. Hortal, NERC Centre for Population Biology, Divi. of Biology, Imperial College London, Silwood Park Campus, Ascot, Berkshire SL5 7PY, UK.


Species distribution models (SDM) are commonly used to obtain hypotheses on either the realized or the potential distribution of species. The reliability and meaning of these hypotheses depends on the kind of absences included in the training data, the variables used as predictors and the methods employed to parameterize the models. Information about the absence of species from certain localities is usually lacking, so pseudo-absences are often incorporated to the training data. We explore the effect of using different kinds of pseudo-absences on SDM results. To do this, we use presence information on Aphodius bonvouloiri, a dung beetle species of well-known distribution. We incorporate different types of pseudo-absences to create different sets of training data that account for absences of methodological (i.e. false absences), contingent and environmental origin. We used these datasets to calibrate SDMs with GAMs as modelling technique and climatic variables as predictors, and compare these results with geographical representations of the potential and realized distribution of the species created independently. Our results confirm the importance of the kind of absences in determining the aspect of species distribution identified through SDM. Estimations of the potential distribution require absences located farther apart in the geographic and/or environmental space than estimations of the realized distribution. Methodological absences produce overall bad models, and absences that are too far from the presence points in either the environmental or the geographic space may not be informative, yielding important overestimations. GLMs and Artificial Neural Networks yielded similar results. Synthetic discrimination measures such as the Area Under the Receiver Characteristic Curve (AUC) must be interpreted with caution, as they can produce misleading comparative results. Instead, the joint examination of ommission and comission errors provides a better understanding of the reliability of SDM results.