Biodiversity Viewpoint
On evaluating species distribution models with random background sites in place of absences when test presences disproportionately sample suitable habitat
Article first published online: 19 JAN 2013
DOI: 10.1111/ddi.12031
© 2013 Blackwell Publishing Ltd
Issue

Diversity and Distributions
Early View (Online Version of Record published before inclusion in an issue)
Additional Information
How to Cite
Smith, A. B. (2013), On evaluating species distribution models with random background sites in place of absences when test presences disproportionately sample suitable habitat. Diversity and Distributions. doi: 10.1111/ddi.12031
Publication History
- Article first published online: 19 JAN 2013
Funded by
- US Institute of Museum and Library Services
- Abstract
- Article
- References
- Cited By
Keywords:
- AUC ;
- background sites;
- biased data;
- model evaluation;
- species distribution models
Abstract
Modelling the distribution of rare and invasive species often occurs in situations where reliable absences for evaluating model performance are unavailable. However, predictions at randomly located sites, or ‘background’ sites, can stand in for true absences. The maximum value of the area under the receiver operator characteristic curve, AUC, calculated with background sites is believed to be 1 − a/2, where a is the typically unknown prevalence of the species on the landscape. Using a simple example of a species' range, I show how AUC can achieve values > 1 − a/2 when test presences do not represent each inhabited region of a species__ range in proportion to its area. Values of AUC that surpass 1 − a/2 are associated with higher model predictions in areas overrepresented in the test data set, even if they are less environmentally suitable than other regions the species occupies. Pursuit of high AUC values can encourage inclusion of spurious predictors in the final model if they help to differentiate areas with disproportionate representation in the test data. Choices made during modelling to increase AUC calculated with background sites on the assumption that higher scores connote more accurate models can decrease actual accuracy when test presences disproportionately represent inhabited areas.

1472-4642/asset/ddi_left.gif?v=1&s=b5c0734a11efd255b2215c418a42d0a633a003aa)