Location-Only and Use-Availability Data
The point process use-availability or presence-only likelihood and comments on analysis
- Use-availability and presence-only analyses are synonyms. Both require two samples (one containing known locations, one containing potential locations), both estimate the same parameters, and both use the same fundamental likelihood.
- Use-availability and presence-only designs compare characteristics of points where an organism was located to those where the organism could have been located. These designs can be generalized to estimate the relative probability that any event occurred at a set of locations.
- This article generalizes the use-availability likelihood given in Johnson et al. (Resource selection functions based on use-availability data: theoretical motivation and evaluation methods, Journal of Wildlife Management, 2006) to point locations. This derivation arrives at the same likelihood as Fithian & Hastie (Statistical Models for Presence-Only Data: Finite-Sample Equivalence and Addressing Observer Bias, 2012) but uses a different technique and allows a more general link function. Fithian & Hastie (2012) use a case–control argument and Bayes theorem to derive the likelihood. This article uses Lagrangian multipliers to maximize the two-sample likelihood.
- Resource selection functions (RSF) defined here are ratios of density functions. RSFs must be positive and unbounded. Proper link functions must provide proportionality over their entire range. Given these conditions, the exponential link is the most logical and appropriate link function for RSFs. These conditions exclude the logistic link.
- This article affirms that estimation of a RSF does not involve ‘running logistic regression’. By assigning 0 and 1 (pseudo-)responses to vectors of covariates associated with locations in the used and available sample, it is possible to ‘trick’ logistic regression software into maximizing the use-availability likelihood. Representing the analysis as ‘logistic regression’ is misleading because that implies use of the logistic link, which is inappropriate for RSF's. It is more accurate to state that the ‘use-availability likelihood was maximized’.
- RSFs are more general, intuitive and useful than resource selection probability functions (RSPF). RSPFs depend heavily on sampling mechanisms and the number of used and available locations selected. Consequently, the objective of estimation in use-availability studies should be the RSF, not the RSPF.
- Two simple examples and R code in the Supporting Information illustrate computations. These examples maximize the general log likelihood without the aid of logistic regression software.