How to assess the prediction accuracy of species presence–absence models without absence data?
Article first published online: 17 APR 2013
© 2013 The Authors
Volume 36, Issue 7, pages 788–799, July 2013
How to Cite
Li, W. and Guo, Q. (2013), How to assess the prediction accuracy of species presence–absence models without absence data?. Ecography, 36: 788–799. doi: 10.1111/j.1600-0587.2013.07585.x
- Issue published online: 17 JUN 2013
- Article first published online: 17 APR 2013
- Paper manuscript accepted 12 February 2013
It is very common that only presence data are available in ecological niche modeling. However, most existing methods for evaluating the accuracy of presence–absence (binary) predictions of species require presence–absence data. The aim of this study is to present a new method for accuracy assessment that does not rely on absence data.
Two new statistics Fpb and Fcpb were derived based on presence–background data. With generated six virtual species, we used DOMAIN, generalized linear modeling (GLM), and maximum entropy (MAXENT) to produce different species presence–absence predictions. To investigate the effectiveness of the new statistics in accuracy assessment, we used Fpb, Fcpb, the traditional F-measure (F), kappa coefficient, true skill statistic (TSS), area under the receiver operating characteristic curve (AUC), and the contrast validation index (CVI) to evaluate the accuracy of predictions, and the behaviors of these accuracy measures were compared. The effectiveness of Fpb for threshold selection and estimation of species prevalence was also investigated.
Experimental results show that Fcpb is an estimate of F. The Pearson's correlation coefficient (COR) between Fcpb and F is 0.9882, with a root-mean-square error (RMSE) of 0.0171. In general, Fpb, Fcpb, F, kappa coefficient, TSS, and CVI can sort models by the accuracy of binary prediction, but AUC is not appropriate to evaluate the accuracy of binary prediction. For DOMAIN, GLM, and MAXENT, finding the threshold by maximizing Fpb and by maximizing F result in similar accuracies. In addition, the estimation of species prevalence based on binary output with maximizing Fpb as the thresholding method is significantly more accurate than simply averaging the original continuous output. The best estimate of prevalence is provided by the binary output of MAXENT, with an RMSE of 0.0116.
Finally, we conclude that the new method is promising in accuracy assessment, threshold selection, and estimation of species prevalence, all of which are important but challenging problems with presence-only data. Because it does not require absence data, the new method will have important applications in ecological niche modeling.