Modelling distribution and abundance with presence-only data
Article first published online: 20 DEC 2005
Journal of Applied Ecology
Volume 43, Issue 3, pages 405–412, June 2006
How to Cite
PEARCE, J. L. and BOYCE, M. S. (2006), Modelling distribution and abundance with presence-only data. Journal of Applied Ecology, 43: 405–412. doi: 10.1111/j.1365-2664.2005.01112.x
- Issue published online: 20 DEC 2005
- Article first published online: 20 DEC 2005
- Received 12 May 2005; final copy received 1 June 2005 Editor: Rob Freckleton
- logistic discrimination;
- logistic regression;
- presence-only studies;
- resource selection functions;
- 1Presence-only data, for which there is no information on locations where the species is absent, are common in both animal and plant studies. In many situations, these may be the only data available on a species. We need effective ways to use these data to explore species distribution or species use of habitat.
- 2Many analytical approaches have been used to model presence-only data, some inappropriately. We provide a synthesis and critique of statistical methods currently in use to both estimate and evaluate these models, and discuss the critical importance of study design in models where only presence can be identified
- 3Profile or envelope methods exist to characterize environmental covariates that describe the locations where organisms are found. Predictions from profile approaches are generally coarse, but may be useful when species records, environmental predictors and biological understanding are scarce.
- 4Alternatively, one can build models to contrast environmental attributes associated with known locations with a sample of random landscape locations, termed either ‘pseudo-absences’ or ‘available’. Great care needs to be taken when selecting random landscape locations, because the way in which they are selected determines the modelling techniques that can be applied.
- 5Regression-based models can provide predictions of the relative likelihood of occurrence, and in some situations predictions of the probability of occurrence. The logistic model is frequently applied, but can rarely be used directly to estimate these models; instead, case–control or logistic discrimination should be used depending on the sample design.
- 6Cross-validation can be used to evaluate model performance and to assess how effectively the model reflects a quantity proportional to the probability of occurrence. However, more research is needed to develop a single measure or statistic that summarizes model performance for presence-only data.
- 7Synthesis and applications. A number of statistical procedures are available to explore patterns in presence-only data; the choice among them depends on the quality of the presence-only data. Presence-only records can provide insight into the vulnerability, historical distribution and conservation status of species. Models developed using these data can inform management. Our caveat is that researchers must be mindful of study design and the biases inherent in presence data, and be cautious in the interpretation of model predictions.