## Introduction

Species distribution is naturally characterized by the probability of occurrence of a species, say *ψ*(*x*) = Pr(*y*(*x*) = 1) where *y*(*x*) is the true occurrence state of a species at some location (pixel) *x* (Kéry 2011). Inference about *ψ*(*x*) can be achieved directly from presence–absence data using logistic regression and related models (MacKenzie *et al*. 2002). However, ecologists are not always fortunate enough to have presence–absence data, and many data sets exist which only contain locations of species presence – so-called presence-only data.

maxent (e.g. Phillips *et al*. 2006) is a popular software package for producing ‘species distribution’ maps from presence-only data. Interestingly, maxent does not produce estimates of occurrence probability but, instead, produces estimates of an ill-defined ‘suitability index’ (Elith *et al*. 2011). Because maxent does not correspond to an explicit model of species occurrence, it is not suitable for making explicit predictions of an actual state variable or testing hypotheses about factors that influence occurrence probability. Support for producing indices of species distribution from presence-only data, as opposed to estimates of occurrence probability, has been justified in the literature based on the *incorrect* assertion that occurrence probability *ψ* (sometimes referred to as ‘prevalence’ or occupancy) cannot be estimated from presence-only data.

The principle aim of our paper is to show that occurrence probability *can* be estimated from presence-only data. We consider a formal model-based approach to analysis of presence-only data. We emphasize the critical assumption required for statistical inference about species occurrence probability from presence-only data, which is random sampling of space as a basis for accumulating presence-only observations. In addition, the estimator we devise here is most relevant only when species detection probability is constant. We conclude that, under these assumptions, inference about occurrence probability can be achieved directly from presence-only data using conventional likelihood methods (e.g. Lancaster and Imbens 1996). We suspect that this is surprising to many users of maxent and related species distribution modelling tools in the light of repeated statements to the contrary in the literature (e.g. Phillips and Dudik 2008; Elith *et al*. 2011; Kéry 2011), asserting that probability of occurrence is not identifiable. For example, Elith *et al*. (2010) state that

Formally, we say that prevalence is not identifiable from presence-only data (Ward

et al. 2009). This means that it cannot be exactly determined, regardless of the sample size; this is a fundamental limitation of presence-only data.

In fact, Ward *et al*. (2009) do not make such a definitive claim. Their precise claim is

[...occurrence probability...] is identifiable only if we make unrealistic assumptions about the structure of [...the relationship between occurrence probability and covariates....] such as in logistic regression....

In that context, it seems that subsequent references to Ward *et al.* (2009) misconstrue their result. In our view, logistic regression (or other binary regression models) is hardly unrealistic. Indeed, such models are the most common approach to modelling binary variables in ecology (and probably all of statistics), especially in the context of modelling species occurrence (MacKenzie *et al.* 2002; Tyre *et al*. 2003; Kéry *et al.* 2010). Even more generally, the logistic function is the canonical link of the binomial GLM (McCullagh and Nelder 1989, p. 38) and, as such, it is customarily adopted and widely used, and even books have been written about it (Hosmer and Lemeshow 2000).

We demonstrate the application of the formal model-based framework for estimating occurrence probability from presence-only data using a data set derived from the North American Breeding Bird Survey, and we provide an r package for producing estimates of species distribution model parameters from presence-only data.

Before proceeding, we note that the statistical principle of maximum entropy (Jaynes 1957, 1963; Jaynes and Bretthorst 2003) is widely applied to problems in statistics and other disciplines, and our development here is not critical of these ideas. Rather, we are critical of the routine application of the software package maxent as applied to species distribution modelling. We specifically object to the pervasive views in the maxent user community that one should avoid characterizing species distribution by occurrence probability, that occurrence probability is not identifiable and that one should instead obtain indices of species occurrence probability by using maxent.