Selecting thresholds for the prediction of species occurrence with presence-only data
Article first published online: 7 JAN 2013
© 2013 Blackwell Publishing Ltd
Journal of Biogeography
Volume 40, Issue 4, pages 778–789, April 2013
How to Cite
Liu, C., White, M., Newell, G. (2013), Selecting thresholds for the prediction of species occurrence with presence-only data. Journal of Biogeography, 40: 778–789. doi: 10.1111/jbi.12058
- Issue published online: 16 MAR 2013
- Article first published online: 7 JAN 2013
- lift curve;
- ROC curve;
- species distribution model;
Species distribution models have been widely used to tackle ecological, evolutionary and conservation problems. Most species distribution modelling techniques produce continuous suitability predictions, but many real applications (e.g. reserve design, species invasion and climate change impact assessment) and model evaluations require binary outputs, and thresholds are needed for these transformations. Although there are many threshold selection methods for presence/absence data, it is unclear whether these are suitable for presence-only data. In this paper, we investigate mathematically and empirically which of the existing threshold selection methods can be used confidently with presence-only data.
We used real spatially explicit environmental data derived from the western part of the state of Victoria, south-eastern Australia, and simulated species distributions within this area.
Thirteen existing threshold selection methods were investigated mathematically to see whether the same threshold can be produced using either presence/absence data or presence-only data. We further adopted a simulation approach, created many virtual species with differing prevalences in a real landscape in south-eastern Australia, generated data sets with different proportions of pseudo-absences, built eight types of models with four modelling techniques, and investigated the behaviours of four threshold selection methods in these situations.
Three threshold selection methods were not affected by pseudo-absences, including max SSS (which is based on maximizing the sum of sensitivity and specificity), the prevalence of model training data and the mean predicted value of a set of random points. Max SSS produced higher sensitivity in most cases and higher true skill statistic and kappa in many cases than the other methods. The other methods produced different thresholds from presence-only data to those determined from presence/absence data.
Max SSS is a promising method for threshold selection when only presence data are available.