The point process use-availability or presence-only likelihood and comments on analysis

Authors


Summary

  1. Use-availability and presence-only analyses are synonyms. Both require two samples (one containing known locations, one containing potential locations), both estimate the same parameters, and both use the same fundamental likelihood.
  2. Use-availability and presence-only designs compare characteristics of points where an organism was located to those where the organism could have been located. These designs can be generalized to estimate the relative probability that any event occurred at a set of locations.
  3. This article generalizes the use-availability likelihood given in Johnson et al. (Resource selection functions based on use-availability data: theoretical motivation and evaluation methods, Journal of Wildlife Management, 2006) to point locations. This derivation arrives at the same likelihood as Fithian & Hastie (Statistical Models for Presence-Only Data: Finite-Sample Equivalence and Addressing Observer Bias, 2012) but uses a different technique and allows a more general link function. Fithian & Hastie (2012) use a case–control argument and Bayes theorem to derive the likelihood. This article uses Lagrangian multipliers to maximize the two-sample likelihood.
  4. Resource selection functions (RSF) defined here are ratios of density functions. RSFs must be positive and unbounded. Proper link functions must provide proportionality over their entire range. Given these conditions, the exponential link is the most logical and appropriate link function for RSFs. These conditions exclude the logistic link.
  5. This article affirms that estimation of a RSF does not involve ‘running logistic regression’. By assigning 0 and 1 (pseudo-)responses to vectors of covariates associated with locations in the used and available sample, it is possible to ‘trick’ logistic regression software into maximizing the use-availability likelihood. Representing the analysis as ‘logistic regression’ is misleading because that implies use of the logistic link, which is inappropriate for RSF's. It is more accurate to state that the ‘use-availability likelihood was maximized’.
  6. RSFs are more general, intuitive and useful than resource selection probability functions (RSPF). RSPFs depend heavily on sampling mechanisms and the number of used and available locations selected. Consequently, the objective of estimation in use-availability studies should be the RSF, not the RSPF.
  7. Two simple examples and R code in the Supporting Information illustrate computations. These examples maximize the general log likelihood without the aid of logistic regression software.

Introduction

In wildlife and ecology literature, an equivalence exists between the terms presence-only and use-availability. Papers that analyse previously collected or historical locations of an organism (e.g. museum samples, historical reports) tend to appear in the ecological literature and generally call their data presence-only (Pearce & Boyce 2006; Warton & Shepherd 2010; Fithian & Hastie 2012). Papers that analyse an organism's locations collected during field studies (e.g. pellet surveys, telemetry) tend to appear in the wildlife literature and generally call their data use-availability (Hobbs & Hanley 1990; Aebischer, Robertson & Kenward 1993; Cherry 1996; Johnson et al. 2006; Beyer et al. 2010). Close inspection of these papers reveals that the basic assumptions and estimated parameters under both methods are identical, a fact noted by Aarts, Fieberg & Matthiopoulos (2012) and Fithian & Hastie (2012). The analyses implied by both terms ultimately involve two independent samples: one containing locations where an organism has been and the other containing locations where an organism could have been. Furthermore, both terms imply analyses that relate characteristics of the environment to the relative probability that an organism is located in a particular habitat. Consequently, the terms ‘presence-only data’ and ‘use-availability data’ should be treated as synonyms. This article will stick with the use-availability term because it clearly implies two samples and thereby seems more descriptive.

Presence-only data, or the used sample from use-availability data, consist of locations (in n-dimensional space, but commonly 2-D geographic space) where organisms have been located and observed in the past (Manly, McDonald & Thomas 1993; Johnson et al. 2006; Pearce & Boyce 2006; Warton & Shepherd 2010; Fithian & Hastie 2012). Broader definitions of ‘use’ that take into account the time spent or activity at a location are possible (Hebblewhite, Merrill & McDonald 2005; Buskirk & Millspaugh 2006). Indeed, this type of analysis can be used to analyse the locational characteristics of any event (e.g. locations of terrorist activity, earthquakes, human infrastructure development, etc.). As the names imply, presence-only or use-availability data do not contain the converse information on where organisms have not been located. Instead, these studies rely on data collected from locations where the organism could have been. The objective of the analysis is to identify characteristics of the environment, which influence where organisms were located (Johnson et al. 2006; Fithian & Hastie 2012). Mathematically, this objective amounts to estimating the relative probability of an organism using a particular habitat. In the wildlife literature, achieving this objective is generally called estimating resource selection (Manly et al. 2002) or habitat use (Johnson et al. 2006).

The analysis of presence-only and use-availability data has become common (Elith et al. 2006; Fithian & Hastie 2012). In the early 1990s, Manly, McDonald & Thomas (1993) devoted a chapter to analysis of use-availability data and made the analysis accessible to the average researcher. Since then, modelling of use-availability and presence-only data has seen increased application. Warton & Shepherd (2010) reported that 343 publications between 2005 and 2008 in the ISI Web of Science contained the term ‘presence-only data’. A search of Google Scholar in April 2013 for the terms ‘presence-only’ and ‘use-availability’ revealed an approximate fivefold increase in the number of papers with the terms presence-only and use-availability since 1991 and a twofold increase between 2001–2005 and 2006–2010.

Estimating the relative probability that an organism used a habitat fundamentally involves estimating two densities (Fithian & Hastie 2012). One density is for the characteristics of used or presence points, the other is for the characteristics of available or background points. The fact that two densities are involved is easy to overlook because the densities themselves are estimated simultaneously and ‘behind the scenes’ as part of the likelihood. Furthermore, these two densities are not typically interesting; rather, interest lies in the relative heights of these densities for a fixed set of characteristics. That is, researchers are typically interested in the ways that these two distributions differ, not in the densities themselves. A useful way to express the differences between these distributions is to take their ratio over all unique sets of characteristics. Johnson et al. (2006) defined the set of all such ratios to be the resource selection function, or RSF, although they were not the first to use that term. Manly, McDonald & Thomas (1993) was among the first to use the term RSF, although they did not explicitly write the RSF as a ratio of densities.

Several analyses estimate a RSF (Neu, Byers & Peek 1974; Aebischer, Robertson & Kenward 1993; MacKenzie1 et al. 2002). Manly et al. (2002) and Johnson et al. (2006) approached the RSF estimation problem from a finite sampling point of view. A finite sampling point of view, in general, implies that a finite population of sampled entities exists, and that it is theoretically possible to observe every single one. Manly et al. (2002) and Johnson et al. (2006) defined habitat units to be discrete geographic regions with positive area (e.g. pixels or quadrats), and they proceeded to estimate either the resource selection probability function (RSPF) or the RSF. In most cases, the RSPF is not estimable because it requires knowledge of the exact sampling mechanisms. Estimates of the RSF can always be produced. The RSF estimates the relative probability of selecting one habitat unit with a particular set of characteristics relative to another unit with different characteristics. Keating, Cherry & Lubow (2004) pointed out that the RSPF defined by Manly et al. (2002) was not constrained to the interval [0,1] and questioned the validity of the entire method despite the fact that the RSPF is less than 1 in the vast majority of cases. Johnson et al. (2006) extended the finite population method by directly estimating the RSF (rather than the RSPF) and empirically demonstrated the utility of the method. Since 2004, the finite population method and data type have been the focus of several papers (Johnson et al. 2006; Lele & Keim 2006; Lele 2009; Baddeley et al. 2010; Warton & Shepherd 2010; Aarts, Fieberg & Matthiopoulos 2012).

Warton & Shepherd (2010) approached the problem of RSF estimation from an infinite population point of view. In this view, an organism's locations are true mathematical points that have no area. Warton & Shepherd (2010) proposed modelling such point locations as an inhomogeneous poisson process (IPP), and in so doing was able to apply methods from other bodies of statistical theory (specifically, spatial statistics). At first, the IPP view seems to require only the sample of used locations (hence the name presence-only). However, the IPP likelihood involves the integral of the intensity surface over the entire study area (Cressie 1993; Warton & Shepherd 2010; Fithian & Hastie 2012). To estimate this integral, a second sample is required that disregards the used locations. So, in practical terms, both the finite population approach and the infinite population approach require two samples.

An additional comment about one aspect of the theoretical justification for both the finite and infinite population approaches is appropriate. Animals in general select some type, size or shape of discrete habitat unit in the wild. Animals cannot actually select a single point, they must select some region surrounding a point. A theoretical problem arises under both the finite and infinite population approaches because the true size and shape of the regions being selected (i.e. the habitat units) cannot be known unless the animal's cognitive processes can be measured. Uncertainty surrounding the true habitat units is a theoretical problem for the finite population view because the true size and shape of habitat units is needed to simply define the population. If the true size and shape of habitat units is unknown, the hypothesized population of habitat units is an approximation of the true population of habitat units and estimated relative probabilities of selection are approximations to true relative probabilities. Likewise, the infinite population view is an approximation to reality simply because animals cannot actually be located at a single massless point. Because both approaches are approximations to reality, one approach cannot always be favoured. Both approaches have positive and negative characteristics. One positive characteristic of the finite population approach is that different sizes and shapes of habitat units can be posited and studied. A positive characteristic of the infinite population approach is its connection to other bodies of statistical theory and the additional flexibility this affords. From a practical point of view, the approximations made by both approaches do not impede implementation of the method.

The purpose of this article is twofold. First, the use-availability likelihood derivation of Johnson et al. (2006) is generalized to the infinite population formulation. Fithian & Hastie (2012) derive the same result for fixed samples from a case–control perspective after employing Bayes' rule. The derivation presented here is different because it defines the two-sample likelihood and maximizes it using a Lagrangian multiplier method. Both derivations make the connection between analysis of use-availability data and presence-only data, and the different techniques provide additional perspective for both. The second purpose of this article is to present simple examples of the analysis in hopes that some of the controversy surrounding this analysis (Keating, Cherry & Lubow 2004; Lele & Keim 2006) can be put to rest. Specifically, it is hoped that readers will realize that RSPFs are not generally useful and that the exponential function is an appropriate form for RSFs. The derivation also confirms that standard logistic regression software maximizes the use-availability likelihood if the link function is exponential, a fact noted by Manly et al. (2002), Johnson et al. (2006) and Fithian & Hastie (2012). Furthermore, the logistic link function proposed by Lele & Keim (2006) is inappropriate, even for estimating the RSPF, because it cannot produce a function that is everywhere proportional to the RSF.

The use-availability likelihood

This derivation of the use-availability likelihood follows Johnson et al. (2006), but generalizes it to the IPP and expands a number of steps. The derivation in Johnson et al. (2006), in turn, is closely related to Seber's 1984, p. 308–315) derivation of logistic discriminate functions for the case of separate sampling. A key part of the Lagrangian multiplier technique is attributed to Anderson & Blair (1982).

Following Warton & Shepherd (2010) and Fithian & Hastie (2012), the sampling process for the set of used locations is formulated as an IPP as follows. Assume that an organism selects and is observed in a small region associated with a set of points inline image contained in some domain inline image. The ‘small region associated with’ a location in inline image is the (unknown size and shape) habitat unit being utilized by the organism. As a side note, these small regions or habitat units are allowed to overlap. Assume further that inline image is a realization of an IPP with intensity function inline image. The IPP assumption implies that the total number of points in inline image follows a Poisson distribution with mean

display math

and that the locations of points in inline image are independent and identically distributed with density

display math

(Fithian & Hastie 2012).

Associated with every point inline image is a vector of attributes inline image. Attribute vector x may contain both discrete and continuous variables, polynomial terms, interactions, spline bases, etc. Furthermore, the elements of x can be computed on small regions surrounding location z (e.g. edge or patch density in circles or buffers around z). The important characteristic of x is that it represent as accurately as possible salient characteristics of the (unknown size) habitat unit being utilized by the organism.

The (multivariate) density of x associated with inline image is

display math

where

display math

(Fithian & Hastie 2012). Fithian & Hastie (2012) call the set of distinct x feature space, and fu(x) the density of features in feature space. Here, this nomenclature is adopted, but values in x are called characteristics and the set of distinct x is occasionally called the support set for fu(x). Let the set of vectors associated with locations in inline image be denoted by inline image (i.e. inline image). The fact that inline image is an independent and identically distributed sample from inline image implies inline image is an independent and identically distributed sample from fu(x). That is, the function inline image transforms the inline image, which is a sample from inline image in geographic space, into a random sample from fu(x) in feature space.

Separately, through a combination of field work and other methods (notably, programming in geographic information systems), assume that a second set of random points inline image are obtained from inline image. Assume further that inline image arise from a (homogenious) Poisson process with intensity function inline image. The constant intensity function associated with inline image implies that the locations themselves are independent and identically distributed with density

display math

That is, inline image is a random sample from inline image independent of the locations in inline image. Practically, the important feature of inline image is that it represent all areas of inline image with equal probability. Simple random samples from inline image, grid samples from inline image and other equiprobable samples from inline image [such as BAS (Robertson et al. in press) or GRTS (Stevens & Olsen 2004)] all satisfy this condition.

Similar to the used points, the (multivariate) density of characteristics x associated with locations in inline image is

display math

where

display math

Let the vectors associated with the locations in inline image be denoted by inline image. Assuming inline image is an independent and identically distributed sample from inline image, inline image is an independent and identically distributed sample from inline image.

The object of estimation, the resource selection function, denoted by w(x), is defined to be the function that multiplicatively transforms inline image into inline image, that is,

display math

(Johnson et al. 2006). In reality, the support sets for inline image and inline image are identical. In practice, portions of the support set may not be observed. Later, this is dealt with by conditioning on the composite sample inline image. The function w(x) is unique because it must be scaled correctly to produce a valid probability density function (i.e. inline image) that integrates to 1. Rearranging,

display math

It is clear from this expression that w(x) is the proportion of used locations with characteristics x divided by the proportion of available locations with characteristics x. As a ratio of proportions, w(x) maps feature space into [0,∞).

A note on the interpretation of w(x). When no selection takes place, the organism's path through inline image is a pure random walk, and it encounters x vectors at random. A random walk causes inline image which in turn causes inline image In this case, w(x) = 1 and characteristics x are said to be used in proportion to availability. In contrast, when w(x) > 1, the proportion of x among the used locations is higher than the proportion of x among the available locations. In this case, the characteristics x are said to be selected more often than random chance, or that they are preferred. Characteristics x are selected less often than random chance when w(x) < 1 and are said to be avoided in this case.

The goal of analysis is to estimate both w(x) and how particular characteristics in x influence changes in w(x). To accomplish the latter part of this goal, w(x) is linked to a linear function of x via an arbitrarily chosen non-linear function. In other words, w(x) is reparameterized as,

display math

where inline image is an unknown scalar, β is a vector of unknown coefficients, and η() is an unbounded monotonically increasing positive function. η() is called the link function because it links (transforms) the estimated coefficients in β to real parameters w(x). inline image is essentially a scaling constant that causes inline image [and inline image] to integrate to one. inline image accounts for varying scales of measurements in x and differences in the sizes of inline image and inline image. Because inline image is present, the covariate vector x does not contain a constant.

It remains to compute the likelihood for β and to maximize it appropriately. To write the likelihood for coefficients in β, we use the fact that inline image is a random sample from inline image and that inline image is a random sample from inline image. We also assume that inline image and inline image are mutually independent. In particular, the latter assumption means that the set of x in inline image cannot depend on the set of x in inline image, and vice versa. In practical terms, this assumption is satisfied if the set of locations in inline image do not depend on the set of locations in inline image, and vice versa. If the set of locations in inline image do depend on the set of locations in inline image, like when locations in inline image are placed in proximity to locations in inline image, the assumptions that inline image and inline image are random samples from their respective distributions, and independent, must be carefully justified. In particular, spatial or temporal dependencies in inline image or inline image should be investigated and eradicated (if possible) because they have the potential to produce biased samples of feature space [i.e. the x's do not well represent inline image or inline image]. Spatial or temporal correlation is primarily a concern for the used sample because in some situations the sampling mechanism is out of the control of researchers (e.g. reported events, museum samples, etc.). If inline image and inline image are dependent on one another, it may be acceptable to modify the study's inference scope by redefining the study area to be the union of all areas sampled by inline image. Another acceptable action in this case may be to consider application of a discrete choice model (McCracken, Manly & Vander Heyden 1998a; McDonald et al. 2006), which relaxes this particular dependency assumption.

Assuming inline image and inline image are independent and identically distributed samples from their respective distributions, the likelihood for coefficients in β is,

display math

The corresponding log likelihood is,

display math(eqn 1)

where n(x) is the number of covariate vectors in the composite sample inline image with values equal to x, inline image, and inline image is the observed support for both densities, or the set of unique x vectors in the composite sample inline image.

There are four sets of unknowns in  log (L(β)). One set consists of the values inline image, another consists of the values inline image, another is ψ, and the last unknown is β. In this setting, the inline image and ψ are thought of as nuisance parameters. Real interest lies in β.

If x contains discrete covariates only, implying inline image and inline image are (multivariate) probability mass functions, and if all points in the support of both functions are observed (or known),  log (L(β)) can be maximized directly (see section ‘Example 1: Discrete Convariates’). Otherwise, the likelihood must be constrained to ensure inline image and inline image are proper density functions. This is carried out by conditioning on the composite sample inline image and constraining the likelihood so that inline image and inline image are proper densities over the observed support inline image. Conditioning is necessary because the full support for inline image and inline image may not have been observed and the functional form of the densities has not been specified. If the functional form of inline image or inline image was specified (e.g. multivariate normal or multivariate gamma), the true support for both densities would be known and conditioning on inline image would not be necessary. It would still be necessary in this case to somehow ensure that inline image and inline image were properly scaled densities during estimation. Once conditioned on inline image, inline image and inline image become discrete probability mass functions, and the only values needed from these functions to complete estimation are the inline image values corresponding to the distinct x.

Mathematically, conditioning on inline image and constraining the likelihood amounts to restricting the universe of unknown density heights in  log (L(β)) to those that satisfy the following constraints,

display math(eqn 2)

and

display math(eqn 3)

Once constrained log (L(β)) can be maximized to obtain estimates of β that satisfy the constraints.

To constrain and maximize the likelihood, the method of Lagrangian multipliers is employed. The method involves subtracting a multiple of the constraint equations from  log (L(β)) and solving for values of the multipliers that make the gradient of the expression in the direction of the nuisance parameters zero. A zero gradient of the expression in the direction of parameters involved in the constraints assures that the gradient of the log likelihood and both constraint functions in those directions are parallel and thus that any value of the free parameters will satisfy the constraints.

The Lagrangian expression is,

display math(eqn 4)

Following Anderson & Blair (1982), differentiating the above with respect to inline image and multiplying through by inline image yield d gradient equations of the form,

display math

Summing these d gradient equations implies

display math(eqn 5)

by constraints (2) and (3). An additional equation associated with the final nuisance parameter is necessary to solve for inline image and inline image. Differentiating (4) with respect to ψ yields

display math

which implies inline image by constraint (3), which in turn implies inline image by (5). Finally, substituting the values of inline image and inline image into each of the d gradient equations, the values of inline image that satisfy the constraints are

display math

Substituting these into (1) yields the final constrained log likelihood,

display math(eqn 6)

which drops terms of the form n(x) log (n(x)) because they do not involve parameters. Numerical maximization of (6) with respect to ψ and β can be carried out by conventional methods such as the Simplex method (Nelder & Mead 1965), and others (e.g. Nocedal & Wright 1999) (see section ‘Example 2: Maximization of the Constrained Likehood’). The likelihood in (6) is not proportional to a logistic regression likelihood unless conditions of the next section are satisfied.

The exponential link

If the link function inline image is exponential, the constrained likelihood simplifies considerably and becomes proportional to a logistic regression likelihood. It is therefore possible to obtain estimates of β using logistic regression software and thereby avoid programming the Simplex or other maximization algorithms. This is ‘the trick’ mentioned earlier. Assuming inline image, the values of inline image that satisfy the constraints in (2) and (3) are,

display math

Substituting this form of inline image into eqn (6), the constrained log likelihood reduces to,

display math(eqn 7)

which is proportional to a logistic regression likelihood in which the inline image are associated with a (pseudo-)response of 1 and the inline image are associated with a (pseudo-)response of 0. Proof of the preceding statement is contained in the Supporting Information. A by-product of this ‘trick’ is that the intercept reported by the logistic regression software must be discarded because it estimates inline image, but this does not discard any information about the relative probabilities in the RSF. This result agrees with eqn (40) of Fithian & Hastie (2012).

When ‘the trick’ is employed, the predicted values of w(x) should be computed as,

display math(eqn 8)

where inline image are the estimated coefficients reported by the logistic regression software. Note that the values of inline image do not include the intercept, do not utilize the inverse of the logistic link function and are therefore not equivalent to the predicted values typically produced by standard logistic regression software. The predicted values from standard logistic regression software are inline image and are not proportional to inline image. Values of inline image must usually be computed outside logistic regression software. However, if interest lies in the relative rank of RSF values, it is acceptable to rank the inline image because they rank identically to the inline image.

Examples

The following examples will demonstrate maximization of the general use-availability likelihood. The first example assumes all covariates are discrete and uses both an exponential and linear link function. The second example illustrates maximization of the more general constrained log likelihood function.

Example 1: Discrete covariates

In this example, fake use-availability data is generated and analysed by maximizing the log likelihood in eqn (1). When all covariates are discrete, constraints (2) and (3) can be satisfied by explicitly scaling estimates of inline image and inline image to sum to 1. Lagrangian multipliers are not needed in this case.

Assume that locations in a study area possess a single binary characteristic, such as (false, true), (forest, meadow), etc., and assume they are coded as 0 for false and 1 for true. Assume the probability that a location with a characteristic of 1 is used is 0·35 (i.e. inline image for 1, 0·65 for 0). Assume the proportion of locations in the study area with characteristic 1 is 0·15 (i.e. inline image for 1, 0·85 for 0). The R code in Table S1 of the Supporting Information draws a random sample of inline image pixels from the used population, and a random sample of inline image pixels from the available population. Output from the R code in Table S1 is contained in Table S2.

The realization of data generated by code in Table S1 yields 866 values of 0 in the available sample, 134 values of 1 in the available sample, 72 values of 0 in the used sample and 28 values of 1 in the used sample. The observed selection ratios are (28/100)/(134/1000) = 2·0895 for characteristic 1 and (72/100)/(866/1000) = 0·8314 for characteristic 0. The observed relative probability of selecting a location with a characteristic of 1 over a pixel with a characteristic of 0 is 2·0895/0·8314 = 2·5133.

Direct maximization of eqn (1) using function optim()in R and assuming an exponential link function yields an estimate of inline image. The estimated standard error of inline image, computed as the root of the inverse of the Hessian of maximization, is inline image. Because the likelihood was maximized directly, an intercept is not needed. If present, the intercept cancels from the relative ratios inherent in the computations. The predicted RSF value, or the relative probability of selecting a location with characteristic 1 over a pixel with characteristic 0, is inline image, which agrees with the empirical ratio of ratios in the previous paragraph.

To illustrate use of a different link, consider the linear link function inline image. This link function is not strictly admissible as a link because it is positive only for values of x > −3; however, the range over which inline image is positive is wide enough in this example to yield valid estimates of selection. If extreme avoidance was observed in a real problem and a linear link was desired, it might be possible to either recode the covariates (e.g. recode 0's to 1's and 1's to 0's) or redefine the link to be a function that is positive over a wider domain (e.g. inline image). Assuming inline image and imposing a lower bound of −2·9999 in the call to optim(), the maximum likelihood estimate of inline image is 4·5398 (inline image). The predicted RSF value for a location with characteristic 1 is 4(4·5398)+12 = 30·1592, while the predicted RSF value for a location with characteristic 0 is 4(0)+12 = 12. The relative probability of selecting a location with characteristic 1 over a location with characteristic 0 is 30·1592/12, which again equals 2·5133.

To check that the constrained likelihood in eqn (6) yields the same result as the previous two techniques, the link function was switched back to exponential and the data were passed to a logistic regression routine. Doing so yields estimates of inline image (inline image) and inline image (inline image). Using eqn (8), the predicted RSF value for locations with characteristic 0 is  exp (0) = 1·0, while the predicted RSF value for locations with characteristic 1 is  exp (0·9216) = 2·5133, which agrees with the previous two methods.

It is interesting to note that the estimated standard error of inline image output by logistic regression software is slightly higher than that reported by direct maximization (c.f. 0.2227 vs. 0.2413). The difference in standard error estimates results because the logistic regression routine assumes inline image is an unknown parameter. Under the use-availability design, inline image does not enter the likelihood when covariates are discrete and the entire support set is observed. As a result, the variance reported by logistic regression software for both inline image and any derived RSF values will be artificially high.

Finally, note that when the logistic link function is assumed [i.e. inline image], direct maximization of the likelihood converges to a different solution. The value of inline image that maximizes the use-availability likelihood under a logistic link is 8·3005 (inline image), which yields a relative RSF value for values of 1 equal to inline image inline image (cf. 2·5133). The reason a logistic link yields an inappropriate solution is that the logistic link function is bounded at 1·0. The asymptotes at 0·0 and 1·0 inherent in the logistic link function cause non-proportionalities that are inappropriate for RSF's. In simple situations, such as this example, it may be possible to recode the covariate values (e.g. recode 0 to −1) or modify the logistic link [e.g. inline image] to achieve near proportionality of RSF values away from the asymptotes; but, true proportionality can never be achieved under a logistic link, and such modifications are difficult to implement in more complex real-world problems.

Example 2: Maximization of the constrained likelihood

When one or more covariates come from a continuous distribution and the functional form of inline image is not specified, it is necessary to introduce the scaling constant ψ (or inline image) and utilize the Lagrangian multiplier technique to write a constrained likelihood, which can then be maximized. To illustrate that the Lagrangian multiplier technique successfully constrains the likelihood, this example generates data containing both a discrete and continuous covariate and maximizes the constrained likelihood in eqn (6).

The R code in Table S3 contains an objective function, which computes the general constrained likelihood of eqn (6). Output from the R code in Table S3 is contained in Table S4. Maximization of the constrained likelihood using function optim() via objective function objFun.continuous, and assuming an exponential link, yields estimates of inline image, inline image and inline image. Estimated standard errors were inline image, inline image and inline image. The estimates and standard errors for inline image and inline image (Table S4) agree with those reported by logistic regression software, further confirming the derivations in section ‘'The exponential link'’.

Discussion

Both Warton & Shepherd (2010) and Fithian & Hastie (2012) make connections between the IPP approach, Poisson regression and logistic regression. They note that if the number of used points is fixed and the size of the background sample increases, estimates from true logistic regression converge to those from the IPP method. This fact is caused by the near linearity (and thus near proportionality) of the logistic link function near zero. As the size of the background sample increases, the average prediction from logistic regression converges to zero (i.e. the estimated intercept approaches −∞) and the estimates themselves become closer to proportional and thus closer to the IPP approach. In an attempt to make logistic regression agree with the poison regression approach, Fithian & Hastie (2012) propose ‘infinitely weighted’ logistic regression wherein background points are upweighted by the inverse of the area they represent. McCracken, Manly & Heyden (1998b) employed a similar upweighting technique. Given the results and examples in this article and indeed the results in other parts of Fithian & Hastie (2012), the ‘infinitely weighted’ logistic regression approach is not necessary (because the IPP approach is available) nor warranted (because it uses a bounded link and weights need to be calculated). This article shows that results from the IPP approach can be obtained exactly via unweighted logistic regression software with the realization that the actual link function is not logistic but exponential.

Manly et al. (2002) used a finite population sampling argument to derive the use-availability likelihood, and in that derivation w(x) was both exponential and defined to be a probability (i.e. the RSPF). Keating, Cherry & Lubow (2004) correctly pointed out that w(x) was not constrained to be ≤1 in the derivation of Manly et al. (2002) and cases when w(x) > 1 exist. The derivation of Manly et al. (2002) is valid if the RSPF induced by the sampling design does not approach 1·0 over the full support of x, which happens when the available sample is either much larger than the used sample or if rare habitats are not excessively used. These conditions cover many, but not all, cases. The derivation of this article, which is essentially that of Johnson et al. (2006), directly estimates the RSF and assumes w(x) is simply a ratio, not a probability. The results of this article are more general than those of Manly et al. (2002) and Keating, Cherry & Lubow (2004), yet in most situations gives the same result as the method of Manly et al. (2002).

The main difference between the current derivation and that of Johnson et al. (2006) is that the likelihood in this article formulates the sampling process as an IPP rather than selection of pixels from a finite population. A general link function is incorporated into both derivations to illuminate the form assumption of w(x). The derivation and likelihood bear repeating here because it has proven easy to mistake use-availability modelling as ‘running logistic regression’. This mistake is easy to make because the logistic link function is nearly linear when its range is near zero, and it approximates an appropriate link function as size of the available sample grows to infinity (Warton & Shepherd 2010). The use-availability likelihood (eqn 6) and the logistic regression likelihood (eqn 7) are equal only when w(x) is exponential, and even in this case, the analyst has not ‘run logistic regression’ because doing so implies a logistic (not exponential) link.

Valid link functions

While the use-availability likelihood was derived using a general link function, it is difficult to actually postulate and use an appropriate link function other than the exponential. To be a general purpose link, a function must have a domain of (−∞,∞), a range of (0,∞) and be monotonically increasing. Few common functions, other than the exponential, satisfy these requirements. A linear link was successfully used in Example 1 only because inline image was one-dimensional and relatively simple. If two or more covariates are present in the model, it will be difficult to define an appropriate linear link because the function cannot be positive over the entire real line. The difficulty defining alternative links, and the fact that the exponential link reduces the likelihood to be proportional to a logistic regression likelihood, make the exponential link the most obvious and legitimate choice. This derivation also makes it clear that link functions which are appropriate for logistic regression, such as the logistic (Lele & Keim 2006), log–log, or Probit, are inappropriate for use-availability designs. Such functions are inappropriate because their ranges are not (0,∞) and their values are not proportional to one another.

Resource selection probability functions

Some of the past controversy surrounding the use-availability method stems from efforts to estimate RSPFs or otherwise make values of the RSF into probabilities. This effort and focus on RSPFs is misguided and doomed to failure if one seeks a general method. Estimation of RSPFs is difficult and not useful because the RSPF depends heavily on the ways in which inline image and inline image were sampled. For example, sampling inline image with replacement will yield a different RSPF than sampling without replacement. Sampling inline image using independent Bernoulli trials for each unit will result in a different RSPF than sampling inline image using a fixed size sampling scheme. Fithian & Hastie (2012) made the same remark, noting that even when researchers know the sampling mechanisms involved, absolute probabilities are generally not useful because they depend so heavily on sampling methods. It is more general (fewer assumptions), intuitive and useful to estimate an RSF directly than it is to estimate a RSPF and rescale. Consequently, the objective of estimation in use-availability studies should generally be the RSF, not the RSPF. For selection and habitat use, there is no information contained in an RSPF that is not also contained in the RSF.

Finally, in future, it may be possible to assume a functional form for inline image (e.g. normal, gamma, exponential, etc.) and thereby alleviate the necessity for Lagrangian constraints. By making such a parametric assumption, it may be possible to improve precision of RSF estimates when the assumption is true. Making a parametric assumption may also facilitate Bayesian estimation of the parameters in β. If so, it may be possible to extend the use-availability design to cases with latent variables or unobserved states. For example, Bayesian analysis may offer a method other than that of Nielson et al. (2009) to estimate selection when observations are missing or when the organism is in different unobserved behavioural states.

Acknowledgements

The author would like to thank an anonymous reviewer for pointing out that the IPP approach was applicable, Dr Bryan Manly and his father, Dr Lyman McDonald, for their many helpful discussions, and Aidan McDonald, a high-school calculus student, who helped with the Lagrangian multiplier technique.

Ancillary

Advertisement