In wildlife and ecology literature, an equivalence exists between the terms presence-only and use-availability. Papers that analyse previously collected or historical locations of an organism (e.g. museum samples, historical reports) tend to appear in the ecological literature and generally call their data presence-only (Pearce & Boyce 2006; Warton & Shepherd 2010; Fithian & Hastie 2012). Papers that analyse an organism's locations collected during field studies (e.g. pellet surveys, telemetry) tend to appear in the wildlife literature and generally call their data use-availability (Hobbs & Hanley 1990; Aebischer, Robertson & Kenward 1993; Cherry 1996; Johnson et al. 2006; Beyer et al. 2010). Close inspection of these papers reveals that the basic assumptions and estimated parameters under both methods are identical, a fact noted by Aarts, Fieberg & Matthiopoulos (2012) and Fithian & Hastie (2012). The analyses implied by both terms ultimately involve two independent samples: one containing locations where an organism has been and the other containing locations where an organism could have been. Furthermore, both terms imply analyses that relate characteristics of the environment to the relative probability that an organism is located in a particular habitat. Consequently, the terms ‘presence-only data’ and ‘use-availability data’ should be treated as synonyms. This article will stick with the use-availability term because it clearly implies two samples and thereby seems more descriptive.
Presence-only data, or the used sample from use-availability data, consist of locations (in n-dimensional space, but commonly 2-D geographic space) where organisms have been located and observed in the past (Manly, McDonald & Thomas 1993; Johnson et al. 2006; Pearce & Boyce 2006; Warton & Shepherd 2010; Fithian & Hastie 2012). Broader definitions of ‘use’ that take into account the time spent or activity at a location are possible (Hebblewhite, Merrill & McDonald 2005; Buskirk & Millspaugh 2006). Indeed, this type of analysis can be used to analyse the locational characteristics of any event (e.g. locations of terrorist activity, earthquakes, human infrastructure development, etc.). As the names imply, presence-only or use-availability data do not contain the converse information on where organisms have not been located. Instead, these studies rely on data collected from locations where the organism could have been. The objective of the analysis is to identify characteristics of the environment, which influence where organisms were located (Johnson et al. 2006; Fithian & Hastie 2012). Mathematically, this objective amounts to estimating the relative probability of an organism using a particular habitat. In the wildlife literature, achieving this objective is generally called estimating resource selection (Manly et al. 2002) or habitat use (Johnson et al. 2006).
The analysis of presence-only and use-availability data has become common (Elith et al. 2006; Fithian & Hastie 2012). In the early 1990s, Manly, McDonald & Thomas (1993) devoted a chapter to analysis of use-availability data and made the analysis accessible to the average researcher. Since then, modelling of use-availability and presence-only data has seen increased application. Warton & Shepherd (2010) reported that 343 publications between 2005 and 2008 in the ISI Web of Science contained the term ‘presence-only data’. A search of Google Scholar in April 2013 for the terms ‘presence-only’ and ‘use-availability’ revealed an approximate fivefold increase in the number of papers with the terms presence-only and use-availability since 1991 and a twofold increase between 2001–2005 and 2006–2010.
Estimating the relative probability that an organism used a habitat fundamentally involves estimating two densities (Fithian & Hastie 2012). One density is for the characteristics of used or presence points, the other is for the characteristics of available or background points. The fact that two densities are involved is easy to overlook because the densities themselves are estimated simultaneously and ‘behind the scenes’ as part of the likelihood. Furthermore, these two densities are not typically interesting; rather, interest lies in the relative heights of these densities for a fixed set of characteristics. That is, researchers are typically interested in the ways that these two distributions differ, not in the densities themselves. A useful way to express the differences between these distributions is to take their ratio over all unique sets of characteristics. Johnson et al. (2006) defined the set of all such ratios to be the resource selection function, or RSF, although they were not the first to use that term. Manly, McDonald & Thomas (1993) was among the first to use the term RSF, although they did not explicitly write the RSF as a ratio of densities.
Several analyses estimate a RSF (Neu, Byers & Peek 1974; Aebischer, Robertson & Kenward 1993; MacKenzie1 et al. 2002). Manly et al. (2002) and Johnson et al. (2006) approached the RSF estimation problem from a finite sampling point of view. A finite sampling point of view, in general, implies that a finite population of sampled entities exists, and that it is theoretically possible to observe every single one. Manly et al. (2002) and Johnson et al. (2006) defined habitat units to be discrete geographic regions with positive area (e.g. pixels or quadrats), and they proceeded to estimate either the resource selection probability function (RSPF) or the RSF. In most cases, the RSPF is not estimable because it requires knowledge of the exact sampling mechanisms. Estimates of the RSF can always be produced. The RSF estimates the relative probability of selecting one habitat unit with a particular set of characteristics relative to another unit with different characteristics. Keating, Cherry & Lubow (2004) pointed out that the RSPF defined by Manly et al. (2002) was not constrained to the interval [0,1] and questioned the validity of the entire method despite the fact that the RSPF is less than 1 in the vast majority of cases. Johnson et al. (2006) extended the finite population method by directly estimating the RSF (rather than the RSPF) and empirically demonstrated the utility of the method. Since 2004, the finite population method and data type have been the focus of several papers (Johnson et al. 2006; Lele & Keim 2006; Lele 2009; Baddeley et al. 2010; Warton & Shepherd 2010; Aarts, Fieberg & Matthiopoulos 2012).
Warton & Shepherd (2010) approached the problem of RSF estimation from an infinite population point of view. In this view, an organism's locations are true mathematical points that have no area. Warton & Shepherd (2010) proposed modelling such point locations as an inhomogeneous poisson process (IPP), and in so doing was able to apply methods from other bodies of statistical theory (specifically, spatial statistics). At first, the IPP view seems to require only the sample of used locations (hence the name presence-only). However, the IPP likelihood involves the integral of the intensity surface over the entire study area (Cressie 1993; Warton & Shepherd 2010; Fithian & Hastie 2012). To estimate this integral, a second sample is required that disregards the used locations. So, in practical terms, both the finite population approach and the infinite population approach require two samples.
An additional comment about one aspect of the theoretical justification for both the finite and infinite population approaches is appropriate. Animals in general select some type, size or shape of discrete habitat unit in the wild. Animals cannot actually select a single point, they must select some region surrounding a point. A theoretical problem arises under both the finite and infinite population approaches because the true size and shape of the regions being selected (i.e. the habitat units) cannot be known unless the animal's cognitive processes can be measured. Uncertainty surrounding the true habitat units is a theoretical problem for the finite population view because the true size and shape of habitat units is needed to simply define the population. If the true size and shape of habitat units is unknown, the hypothesized population of habitat units is an approximation of the true population of habitat units and estimated relative probabilities of selection are approximations to true relative probabilities. Likewise, the infinite population view is an approximation to reality simply because animals cannot actually be located at a single massless point. Because both approaches are approximations to reality, one approach cannot always be favoured. Both approaches have positive and negative characteristics. One positive characteristic of the finite population approach is that different sizes and shapes of habitat units can be posited and studied. A positive characteristic of the infinite population approach is its connection to other bodies of statistical theory and the additional flexibility this affords. From a practical point of view, the approximations made by both approaches do not impede implementation of the method.
The purpose of this article is twofold. First, the use-availability likelihood derivation of Johnson et al. (2006) is generalized to the infinite population formulation. Fithian & Hastie (2012) derive the same result for fixed samples from a case–control perspective after employing Bayes' rule. The derivation presented here is different because it defines the two-sample likelihood and maximizes it using a Lagrangian multiplier method. Both derivations make the connection between analysis of use-availability data and presence-only data, and the different techniques provide additional perspective for both. The second purpose of this article is to present simple examples of the analysis in hopes that some of the controversy surrounding this analysis (Keating, Cherry & Lubow 2004; Lele & Keim 2006) can be put to rest. Specifically, it is hoped that readers will realize that RSPFs are not generally useful and that the exponential function is an appropriate form for RSFs. The derivation also confirms that standard logistic regression software maximizes the use-availability likelihood if the link function is exponential, a fact noted by Manly et al. (2002), Johnson et al. (2006) and Fithian & Hastie (2012). Furthermore, the logistic link function proposed by Lele & Keim (2006) is inappropriate, even for estimating the RSPF, because it cannot produce a function that is everywhere proportional to the RSF.
The use-availability likelihood
This derivation of the use-availability likelihood follows Johnson et al. (2006), but generalizes it to the IPP and expands a number of steps. The derivation in Johnson et al. (2006), in turn, is closely related to Seber's 1984, p. 308–315) derivation of logistic discriminate functions for the case of separate sampling. A key part of the Lagrangian multiplier technique is attributed to Anderson & Blair (1982).
Following Warton & Shepherd (2010) and Fithian & Hastie (2012), the sampling process for the set of used locations is formulated as an IPP as follows. Assume that an organism selects and is observed in a small region associated with a set of points contained in some domain . The ‘small region associated with’ a location in is the (unknown size and shape) habitat unit being utilized by the organism. As a side note, these small regions or habitat units are allowed to overlap. Assume further that is a realization of an IPP with intensity function . The IPP assumption implies that the total number of points in follows a Poisson distribution with mean
and that the locations of points in are independent and identically distributed with density
(Fithian & Hastie 2012).
Associated with every point is a vector of attributes . Attribute vector x may contain both discrete and continuous variables, polynomial terms, interactions, spline bases, etc. Furthermore, the elements of x can be computed on small regions surrounding location z (e.g. edge or patch density in circles or buffers around z). The important characteristic of x is that it represent as accurately as possible salient characteristics of the (unknown size) habitat unit being utilized by the organism.
The (multivariate) density of x associated with is
(Fithian & Hastie 2012). Fithian & Hastie (2012) call the set of distinct x feature space, and fu(x) the density of features in feature space. Here, this nomenclature is adopted, but values in x are called characteristics and the set of distinct x is occasionally called the support set for fu(x). Let the set of vectors associated with locations in be denoted by (i.e. ). The fact that is an independent and identically distributed sample from implies is an independent and identically distributed sample from fu(x). That is, the function transforms the , which is a sample from in geographic space, into a random sample from fu(x) in feature space.
Separately, through a combination of field work and other methods (notably, programming in geographic information systems), assume that a second set of random points are obtained from . Assume further that arise from a (homogenious) Poisson process with intensity function . The constant intensity function associated with implies that the locations themselves are independent and identically distributed with density
That is, is a random sample from independent of the locations in . Practically, the important feature of is that it represent all areas of with equal probability. Simple random samples from , grid samples from and other equiprobable samples from [such as BAS (Robertson et al. in press) or GRTS (Stevens & Olsen 2004)] all satisfy this condition.
Similar to the used points, the (multivariate) density of characteristics x associated with locations in is
Let the vectors associated with the locations in be denoted by . Assuming is an independent and identically distributed sample from , is an independent and identically distributed sample from .
The object of estimation, the resource selection function, denoted by w(x), is defined to be the function that multiplicatively transforms into , that is,
(Johnson et al. 2006). In reality, the support sets for and are identical. In practice, portions of the support set may not be observed. Later, this is dealt with by conditioning on the composite sample . The function w(x) is unique because it must be scaled correctly to produce a valid probability density function (i.e. ) that integrates to 1. Rearranging,
It is clear from this expression that w(x) is the proportion of used locations with characteristics x divided by the proportion of available locations with characteristics x. As a ratio of proportions, w(x) maps feature space into [0,∞).
A note on the interpretation of w(x). When no selection takes place, the organism's path through is a pure random walk, and it encounters x vectors at random. A random walk causes which in turn causes In this case, w(x) = 1 and characteristics x are said to be used in proportion to availability. In contrast, when w(x) > 1, the proportion of x among the used locations is higher than the proportion of x among the available locations. In this case, the characteristics x are said to be selected more often than random chance, or that they are preferred. Characteristics x are selected less often than random chance when w(x) < 1 and are said to be avoided in this case.
The goal of analysis is to estimate both w(x) and how particular characteristics in x influence changes in w(x). To accomplish the latter part of this goal, w(x) is linked to a linear function of x via an arbitrarily chosen non-linear function. In other words, w(x) is reparameterized as,
where is an unknown scalar, β is a vector of unknown coefficients, and η() is an unbounded monotonically increasing positive function. η() is called the link function because it links (transforms) the estimated coefficients in β to real parameters w(x). is essentially a scaling constant that causes [and ] to integrate to one. accounts for varying scales of measurements in x and differences in the sizes of and . Because is present, the covariate vector x does not contain a constant.
It remains to compute the likelihood for β and to maximize it appropriately. To write the likelihood for coefficients in β, we use the fact that is a random sample from and that is a random sample from . We also assume that and are mutually independent. In particular, the latter assumption means that the set of x in cannot depend on the set of x in , and vice versa. In practical terms, this assumption is satisfied if the set of locations in do not depend on the set of locations in , and vice versa. If the set of locations in do depend on the set of locations in , like when locations in are placed in proximity to locations in , the assumptions that and are random samples from their respective distributions, and independent, must be carefully justified. In particular, spatial or temporal dependencies in or should be investigated and eradicated (if possible) because they have the potential to produce biased samples of feature space [i.e. the x's do not well represent or ]. Spatial or temporal correlation is primarily a concern for the used sample because in some situations the sampling mechanism is out of the control of researchers (e.g. reported events, museum samples, etc.). If and are dependent on one another, it may be acceptable to modify the study's inference scope by redefining the study area to be the union of all areas sampled by . Another acceptable action in this case may be to consider application of a discrete choice model (McCracken, Manly & Vander Heyden 1998a; McDonald et al. 2006), which relaxes this particular dependency assumption.
Assuming and are independent and identically distributed samples from their respective distributions, the likelihood for coefficients in β is,
The corresponding log likelihood is,
where n(x) is the number of covariate vectors in the composite sample with values equal to x, , and is the observed support for both densities, or the set of unique x vectors in the composite sample .
There are four sets of unknowns in log (L(β)). One set consists of the values , another consists of the values , another is ψ, and the last unknown is β. In this setting, the and ψ are thought of as nuisance parameters. Real interest lies in β.
If x contains discrete covariates only, implying and are (multivariate) probability mass functions, and if all points in the support of both functions are observed (or known), log (L(β)) can be maximized directly (see section ‘Example 1: Discrete Convariates’). Otherwise, the likelihood must be constrained to ensure and are proper density functions. This is carried out by conditioning on the composite sample and constraining the likelihood so that and are proper densities over the observed support . Conditioning is necessary because the full support for and may not have been observed and the functional form of the densities has not been specified. If the functional form of or was specified (e.g. multivariate normal or multivariate gamma), the true support for both densities would be known and conditioning on would not be necessary. It would still be necessary in this case to somehow ensure that and were properly scaled densities during estimation. Once conditioned on , and become discrete probability mass functions, and the only values needed from these functions to complete estimation are the values corresponding to the distinct x.
Mathematically, conditioning on and constraining the likelihood amounts to restricting the universe of unknown density heights in log (L(β)) to those that satisfy the following constraints,
Once constrained log (L(β)) can be maximized to obtain estimates of β that satisfy the constraints.
To constrain and maximize the likelihood, the method of Lagrangian multipliers is employed. The method involves subtracting a multiple of the constraint equations from log (L(β)) and solving for values of the multipliers that make the gradient of the expression in the direction of the nuisance parameters zero. A zero gradient of the expression in the direction of parameters involved in the constraints assures that the gradient of the log likelihood and both constraint functions in those directions are parallel and thus that any value of the free parameters will satisfy the constraints.
The Lagrangian expression is,
Following Anderson & Blair (1982), differentiating the above with respect to and multiplying through by yield d gradient equations of the form,
Summing these d gradient equations implies
by constraints (2) and (3). An additional equation associated with the final nuisance parameter is necessary to solve for and . Differentiating (4) with respect to ψ yields
which implies by constraint (3), which in turn implies by (5). Finally, substituting the values of and into each of the d gradient equations, the values of that satisfy the constraints are
Substituting these into (1) yields the final constrained log likelihood,
which drops terms of the form n(x) log (n(x)) because they do not involve parameters. Numerical maximization of (6) with respect to ψ and β can be carried out by conventional methods such as the Simplex method (Nelder & Mead 1965), and others (e.g. Nocedal & Wright 1999) (see section ‘Example 2: Maximization of the Constrained Likehood’). The likelihood in (6) is not proportional to a logistic regression likelihood unless conditions of the next section are satisfied.
The exponential link
If the link function is exponential, the constrained likelihood simplifies considerably and becomes proportional to a logistic regression likelihood. It is therefore possible to obtain estimates of β using logistic regression software and thereby avoid programming the Simplex or other maximization algorithms. This is ‘the trick’ mentioned earlier. Assuming , the values of that satisfy the constraints in (2) and (3) are,
Substituting this form of into eqn (6), the constrained log likelihood reduces to,
which is proportional to a logistic regression likelihood in which the are associated with a (pseudo-)response of 1 and the are associated with a (pseudo-)response of 0. Proof of the preceding statement is contained in the Supporting Information. A by-product of this ‘trick’ is that the intercept reported by the logistic regression software must be discarded because it estimates , but this does not discard any information about the relative probabilities in the RSF. This result agrees with eqn (40) of Fithian & Hastie (2012).
When ‘the trick’ is employed, the predicted values of w(x) should be computed as,
where are the estimated coefficients reported by the logistic regression software. Note that the values of do not include the intercept, do not utilize the inverse of the logistic link function and are therefore not equivalent to the predicted values typically produced by standard logistic regression software. The predicted values from standard logistic regression software are and are not proportional to . Values of must usually be computed outside logistic regression software. However, if interest lies in the relative rank of RSF values, it is acceptable to rank the because they rank identically to the .
The following examples will demonstrate maximization of the general use-availability likelihood. The first example assumes all covariates are discrete and uses both an exponential and linear link function. The second example illustrates maximization of the more general constrained log likelihood function.
Example 1: Discrete covariates
In this example, fake use-availability data is generated and analysed by maximizing the log likelihood in eqn (1). When all covariates are discrete, constraints (2) and (3) can be satisfied by explicitly scaling estimates of and to sum to 1. Lagrangian multipliers are not needed in this case.
Assume that locations in a study area possess a single binary characteristic, such as (false, true), (forest, meadow), etc., and assume they are coded as 0 for false and 1 for true. Assume the probability that a location with a characteristic of 1 is used is 0·35 (i.e. for 1, 0·65 for 0). Assume the proportion of locations in the study area with characteristic 1 is 0·15 (i.e. for 1, 0·85 for 0). The R code in Table S1 of the Supporting Information draws a random sample of pixels from the used population, and a random sample of pixels from the available population. Output from the R code in Table S1 is contained in Table S2.
The realization of data generated by code in Table S1 yields 866 values of 0 in the available sample, 134 values of 1 in the available sample, 72 values of 0 in the used sample and 28 values of 1 in the used sample. The observed selection ratios are (28/100)/(134/1000) = 2·0895 for characteristic 1 and (72/100)/(866/1000) = 0·8314 for characteristic 0. The observed relative probability of selecting a location with a characteristic of 1 over a pixel with a characteristic of 0 is 2·0895/0·8314 = 2·5133.
Direct maximization of eqn (1) using function optim()in R and assuming an exponential link function yields an estimate of . The estimated standard error of , computed as the root of the inverse of the Hessian of maximization, is . Because the likelihood was maximized directly, an intercept is not needed. If present, the intercept cancels from the relative ratios inherent in the computations. The predicted RSF value, or the relative probability of selecting a location with characteristic 1 over a pixel with characteristic 0, is , which agrees with the empirical ratio of ratios in the previous paragraph.
To illustrate use of a different link, consider the linear link function . This link function is not strictly admissible as a link because it is positive only for values of x > −3; however, the range over which is positive is wide enough in this example to yield valid estimates of selection. If extreme avoidance was observed in a real problem and a linear link was desired, it might be possible to either recode the covariates (e.g. recode 0's to 1's and 1's to 0's) or redefine the link to be a function that is positive over a wider domain (e.g. ). Assuming and imposing a lower bound of −2·9999 in the call to optim(), the maximum likelihood estimate of is 4·5398 (). The predicted RSF value for a location with characteristic 1 is 4(4·5398)+12 = 30·1592, while the predicted RSF value for a location with characteristic 0 is 4(0)+12 = 12. The relative probability of selecting a location with characteristic 1 over a location with characteristic 0 is 30·1592/12, which again equals 2·5133.
To check that the constrained likelihood in eqn (6) yields the same result as the previous two techniques, the link function was switched back to exponential and the data were passed to a logistic regression routine. Doing so yields estimates of () and (). Using eqn (8), the predicted RSF value for locations with characteristic 0 is exp (0) = 1·0, while the predicted RSF value for locations with characteristic 1 is exp (0·9216) = 2·5133, which agrees with the previous two methods.
It is interesting to note that the estimated standard error of output by logistic regression software is slightly higher than that reported by direct maximization (c.f. 0.2227 vs. 0.2413). The difference in standard error estimates results because the logistic regression routine assumes is an unknown parameter. Under the use-availability design, does not enter the likelihood when covariates are discrete and the entire support set is observed. As a result, the variance reported by logistic regression software for both and any derived RSF values will be artificially high.
Finally, note that when the logistic link function is assumed [i.e. ], direct maximization of the likelihood converges to a different solution. The value of that maximizes the use-availability likelihood under a logistic link is 8·3005 (), which yields a relative RSF value for values of 1 equal to (cf. 2·5133). The reason a logistic link yields an inappropriate solution is that the logistic link function is bounded at 1·0. The asymptotes at 0·0 and 1·0 inherent in the logistic link function cause non-proportionalities that are inappropriate for RSF's. In simple situations, such as this example, it may be possible to recode the covariate values (e.g. recode 0 to −1) or modify the logistic link [e.g. ] to achieve near proportionality of RSF values away from the asymptotes; but, true proportionality can never be achieved under a logistic link, and such modifications are difficult to implement in more complex real-world problems.
Both Warton & Shepherd (2010) and Fithian & Hastie (2012) make connections between the IPP approach, Poisson regression and logistic regression. They note that if the number of used points is fixed and the size of the background sample increases, estimates from true logistic regression converge to those from the IPP method. This fact is caused by the near linearity (and thus near proportionality) of the logistic link function near zero. As the size of the background sample increases, the average prediction from logistic regression converges to zero (i.e. the estimated intercept approaches −∞) and the estimates themselves become closer to proportional and thus closer to the IPP approach. In an attempt to make logistic regression agree with the poison regression approach, Fithian & Hastie (2012) propose ‘infinitely weighted’ logistic regression wherein background points are upweighted by the inverse of the area they represent. McCracken, Manly & Heyden (1998b) employed a similar upweighting technique. Given the results and examples in this article and indeed the results in other parts of Fithian & Hastie (2012), the ‘infinitely weighted’ logistic regression approach is not necessary (because the IPP approach is available) nor warranted (because it uses a bounded link and weights need to be calculated). This article shows that results from the IPP approach can be obtained exactly via unweighted logistic regression software with the realization that the actual link function is not logistic but exponential.
Manly et al. (2002) used a finite population sampling argument to derive the use-availability likelihood, and in that derivation w(x) was both exponential and defined to be a probability (i.e. the RSPF). Keating, Cherry & Lubow (2004) correctly pointed out that w(x) was not constrained to be ≤1 in the derivation of Manly et al. (2002) and cases when w(x) > 1 exist. The derivation of Manly et al. (2002) is valid if the RSPF induced by the sampling design does not approach 1·0 over the full support of x, which happens when the available sample is either much larger than the used sample or if rare habitats are not excessively used. These conditions cover many, but not all, cases. The derivation of this article, which is essentially that of Johnson et al. (2006), directly estimates the RSF and assumes w(x) is simply a ratio, not a probability. The results of this article are more general than those of Manly et al. (2002) and Keating, Cherry & Lubow (2004), yet in most situations gives the same result as the method of Manly et al. (2002).
The main difference between the current derivation and that of Johnson et al. (2006) is that the likelihood in this article formulates the sampling process as an IPP rather than selection of pixels from a finite population. A general link function is incorporated into both derivations to illuminate the form assumption of w(x). The derivation and likelihood bear repeating here because it has proven easy to mistake use-availability modelling as ‘running logistic regression’. This mistake is easy to make because the logistic link function is nearly linear when its range is near zero, and it approximates an appropriate link function as size of the available sample grows to infinity (Warton & Shepherd 2010). The use-availability likelihood (eqn 6) and the logistic regression likelihood (eqn 7) are equal only when w(x) is exponential, and even in this case, the analyst has not ‘run logistic regression’ because doing so implies a logistic (not exponential) link.
Valid link functions
While the use-availability likelihood was derived using a general link function, it is difficult to actually postulate and use an appropriate link function other than the exponential. To be a general purpose link, a function must have a domain of (−∞,∞), a range of (0,∞) and be monotonically increasing. Few common functions, other than the exponential, satisfy these requirements. A linear link was successfully used in Example 1 only because was one-dimensional and relatively simple. If two or more covariates are present in the model, it will be difficult to define an appropriate linear link because the function cannot be positive over the entire real line. The difficulty defining alternative links, and the fact that the exponential link reduces the likelihood to be proportional to a logistic regression likelihood, make the exponential link the most obvious and legitimate choice. This derivation also makes it clear that link functions which are appropriate for logistic regression, such as the logistic (Lele & Keim 2006), log–log, or Probit, are inappropriate for use-availability designs. Such functions are inappropriate because their ranges are not (0,∞) and their values are not proportional to one another.
Resource selection probability functions
Some of the past controversy surrounding the use-availability method stems from efforts to estimate RSPFs or otherwise make values of the RSF into probabilities. This effort and focus on RSPFs is misguided and doomed to failure if one seeks a general method. Estimation of RSPFs is difficult and not useful because the RSPF depends heavily on the ways in which and were sampled. For example, sampling with replacement will yield a different RSPF than sampling without replacement. Sampling using independent Bernoulli trials for each unit will result in a different RSPF than sampling using a fixed size sampling scheme. Fithian & Hastie (2012) made the same remark, noting that even when researchers know the sampling mechanisms involved, absolute probabilities are generally not useful because they depend so heavily on sampling methods. It is more general (fewer assumptions), intuitive and useful to estimate an RSF directly than it is to estimate a RSPF and rescale. Consequently, the objective of estimation in use-availability studies should generally be the RSF, not the RSPF. For selection and habitat use, there is no information contained in an RSPF that is not also contained in the RSF.
Finally, in future, it may be possible to assume a functional form for (e.g. normal, gamma, exponential, etc.) and thereby alleviate the necessity for Lagrangian constraints. By making such a parametric assumption, it may be possible to improve precision of RSF estimates when the assumption is true. Making a parametric assumption may also facilitate Bayesian estimation of the parameters in β. If so, it may be possible to extend the use-availability design to cases with latent variables or unobserved states. For example, Bayesian analysis may offer a method other than that of Nielson et al. (2009) to estimate selection when observations are missing or when the organism is in different unobserved behavioural states.