Can we model the probability of presence of species without absence data?


  • Wenkai Li,

  • Qinghua Guo,

  • Charles Elkan

W. Li ( and Q. Guo, Sierra Nevada Research Inst., School of Engineering, Univ. of California at Merced, CA 95343, USA. – C. Elkan, Dept of Computer Science and Engineering, Univ. of California at San Diego, CA 92093, USA.


In ecological studies, it is useful to estimate the probability that a species occurs at given locations. The probability of presence can be modeled by traditional statistical methods, if both presence and absence data are available. However, the challenge is that most species records contain only presence data, without reliable absence data. Previous presence-only methods can estimate a relative index of habitat suitability, but cannot estimate the actual probability of presence.

In this study, we develop a presence and background learning algorithm (PBL) that is successful in modeling the conditional probability of presence of a simulated species. The model is trained by two completely separate sets: observed presence and background data. Assuming that the probability of presence is one for ‘prototypical presence’ locations where the habitats are maximally suitable for a species, we can estimate a constant that can calibrate the trained model into the actual probability of presence. Experimental results show that the PBL method performs similarly to a presence-absence method, and significantly better than the widely used maximum entropy method. The new algorithm enables us to model the probability that a species occurs conditional on environmental covariates without absence data. Hence, it has potential to improve modeling of the geographical distributions of species.