Presence-Only Data and the EM Algorithm
Article first published online: 28 AUG 2008
© 2008, The International Biometric Society
Volume 65, Issue 2, pages 554–563, June 2009
How to Cite
Ward, G., Hastie, T., Barry, S., Elith, J. and Leathwick, J. R. (2009), Presence-Only Data and the EM Algorithm. Biometrics, 65: 554–563. doi: 10.1111/j.1541-0420.2008.01116.x
- Issue published online: 28 MAY 2009
- Article first published online: 28 AUG 2008
- Received September 2006. Revised April 2008. Accepted April 2008.
- Boosted trees;
- EM algorithm;
- Logistic model;
- Presence-only data;
- Use-availability data
Summary In ecological modeling of the habitat of a species, it can be prohibitively expensive to determine species absence. Presence-only data consist of a sample of locations with observed presences and a separate group of locations sampled from the full landscape, with unknown presences. We propose an expectation–maximization algorithm to estimate the underlying presence–absence logistic model for presence-only data. This algorithm can be used with any off-the-shelf logistic model. For models with stepwise fitting procedures, such as boosted trees, the fitting process can be accelerated by interleaving expectation steps within the procedure. Preliminary analyses based on sampling from presence–absence records of fish in New Zealand rivers illustrate that this new procedure can reduce both deviance and the shrinkage of marginal effect estimates that occur in the naive model often used in practice. Finally, it is shown that the population prevalence of a species is only identifiable when there is some unrealistic constraint on the structure of the logistic model. In practice, it is strongly recommended that an estimate of population prevalence be provided.