## Introduction

Management and conservation of animal populations require a clear understanding of how environmental variables influence the spatial distribution of species. This need has led to increased interest in the analysis of existing (historical) data on species distributions, and has fuelled the collection of new animal location data, e.g. using telemetry. The high diversity of such data (e.g. counts, location-only, and presence-absence) has led to a parallel development of a variety of statistical models that quantify the link between the distribution of a species, and environmental conditions (Buckland & Elston 1993; Manly, McDonald & Thomas 1993; Boyce & McDonald 1999; MacKenzie *et al*. 2005; Johnson *et al*. 2006; Phillips, Anderson & Schapire 2006; Chakraborty *et al*. 2011). Although these approaches may look dissimilar superficially, most are, in fact, exactly or approximately equivalent to an *inhomogeneous Poisson point process* (IPP – Baddeley *et al*. 2010; Warton & Shepherd 2010; Aarts, Fieberg & Matthiopoulos 2012; Renner & Warton 2013).

IPP models treat animal observations as points and quantify variations in their spatial density as a function of geographic or environmental covariates. This setting encompasses well-known use-availability designs traditionally used to estimate habitat or resource selection (Manly, McDonald & Thomas 1993). The logic behind the use-availability design is that if animals moved randomly in space, use of habitats would be proportional to habitat availability. Hence, modelling the deviations from proportionality is thought to quantify preferential selection.

Although this concept is widely accepted by ecologists, it assumes that changes in availability lead to proportional changes in usage. However, this assumption may not be valid. Mysterud & Ims (1998) illustrated how under a dichotomous habitat classification, relative use changes with relative availability measured at the home-range level. Also, Beyer *et al*. (2010) showed that when organisms use two habitat types for a fixed amount of time, changes in (apparent) availability can lead to changes in the sign of the coefficients that define the direction and strength of a particular environmental variable. This phenomenon, where use-availability ratios change non-linearly with availability, is known as a functional response in habitat selection (Arthur *et al*. 1996; Mysterud & Ims 1998; Mauritzen *et al*. 2003; Matthiopoulos *et al*. 2011; Moreau *et al*. 2012), and causes traditional resource selection functions to fail when predicting space use in novel environments. Thus, alternative methods may be needed to elucidate which environmental conditions are critical to the organism's existence.

The objective of this study is first, to illustrate *how* habitat selection functions vary with availability under a variety of habitat use strategies (Tilman 1986), and second, to explore various ways of incorporating availability into habitat selection models. For the selection of discrete habitats, we propose a model framework similar to discrete choice models (McCracken, Manly & Heyden 1998; Manly *et al*. 2002), but with log availability included as a covariate (i.e. we estimate a regression parameter that quantifies the influence of availability on use). When habitats are defined using continuous covariates (or a mix of continuous and discrete covariates), the influence of availability on habitat use can be modelled flexibly, by specifying the parameters of the resource selection function itself as functions of availability (Matthiopoulos *et al*. 2011). An appealing aspect of all such approaches is that the influence of habitat availability on the output of the models is not subjective, but data-driven.

Throughout, we will use simulations to illustrate key concepts, starting with a simple simulation scenario involving discrete habitats and then building up to a more realistic scenario in which habitat is defined using continuous covariates.

### Definitions

Resources are defined as substances or objects required by an organism (e.g. water, food items), the quantity of which can be reduced by the activity of the organism (Begon, Harper & Townsend 1996). A habitat can be defined as a collection of resources and environmental conditions (abiotic and biotic) that determine the presence, survival and reproduction of a population (Sinclair, Fryxell & Caughley 2006; Gaillard *et al*. 2010). Under the finest habitat classification scheme, every point in geographical space can be treated as a unique habitat.

Attempts to quantify selection as a function of environmental variables have led to the development of *resource selection functions* (RSFs reviewed in Boyce & McDonald 1999; Manly *et al*. 2002). Since the environmental variables included in such models do not always relate to resources, we prefer to use the name *habitat selection functions* (HSFs). HSFs are used to model the disproportionality between habitat usage and availability. The ratio of the use of a habitat over its availability, conditional on the availability of all habitats to the study animals, is often also termed habitat preference (Krausman 1999). Often, the response variable in HSFs is said to be proportional to the probability of use. However, probabilities of use can only be non-zero for areas, hence, it seems necessary to define a discrete spatial unit (e.g. a grid cell), which is not necessarily defined *a priori* in most use-availability designs. Further, the probabilities of use will change non-linearly with the size of the spatial unit (Baddeley *et al*. 2010). Thus, there is appeal to fitting and interpreting models under a spatial Poisson point process framework (i.e. IPP), with use modelled via a spatially-varying intensity function (Warton & Shepherd 2010; Chakraborty *et al*. 2011). Since a weighted distribution likelihood (eqn 1), often used to fit HSFs, is identical to the conditional IPP likelihood, it implies that the HSF actually models an intensity (or density of observations) rather than a probability (Aarts, Fieberg & Matthiopoulos 2012).

### Specification of the habitat selection function and inhomogeneous Poisson process likelihood

The probability density function, *f*^{u}(*X*), describing relative use of a habitat with environmental covariates, *X*, is typically defined using:

where *f*^{a}(*X*) gives the relative availability of a habitat composed of environmental covariates *X*, in the study area and the integral applies to all possible combinations of environmental conditions in environmental space *E* (i.e. the multi-dimensional space represented by environmental variables). This specification (eqn 1) can be seen as a weighted distribution (Patil & Rao 1978; Patil 2002; Lele & Keim 2006), where the organism samples the available distribution *f*^{a}(*X*) with a probability proportional to *w*(*X*) in order to obtain the used distribution *f*^{u}(*X*). Hence, in environmental space, *w*(*X*) is proportional to the *ratio* between habitat use and availability. *w*(*X*) is known as the habitat or resource selection function and is most often modelled as an exponential function of covariates, , where *X*^{T} is the transpose of *X* (McDonald, Manly & Raley 1990; Lele & Keim 2006; Johnson *et al*. 2008). Although rarely explicitly stated, eqn 1 assumes that, conditional on *w*(*X*), changes in *f*^{a}(*X*) lead to proportional changes in *f*^{u}(*X*). Assuming the parameters in the habitat selection function are conditional on *f*^{a}(*X*), and *f*^{a}(*X*) is uniform in geographical space (*G*), the likelihood function for the entire dataset can be simplified to:

where the integral of eqn 1, originally defined in environmental space, is replaced by the integral evaluated over all of geographical space; here *A* is the entire study area. This integral can be approximated by evaluation and averaging of *w*(*X*) at random or regular points in geographical space. This alternative specification of the likelihood function illustrates that the habitat selection model fitted in environmental space (see eqn 1), is equivalent to a model which quantifies use in geographical space as a function of the underlying environmental conditions (i.e. eqn 2 – Aarts, Fieberg & Matthiopoulos 2012). An advantage of specifying the likelihood according to eqn 1, however, is that it becomes readily apparent that the estimated habitat selection function *w*(*X*) is *conditional* on what is considered to be available to the organism [i.e. *f*^{a}(*X*)]. As will be illustrated later (see e.g. Figs 1-4), the estimated *w*(*X*) may vary drastically as a function of absolute habitat availability, even though the animal uses the same movement rules to explore and exploit space.

The likelihood function in eqn 2 is equivalent to that of a *conditional inhomogeneous Poisson process* (CIPP – Cressie 1993) and identical to the likelihood used by MaxEnt when an exponential function is used to model *w*(*X*) (Phillips, Anderson & Schapire 2006; Aarts, Fieberg & Matthiopoulos 2012; Renner & Warton 2013). Alternatively, one can fit an Unconditional Inhomogeneous Poisson Process (UIPP) model [Cressie (1993), eqn. 8.5.16, page 655, and Aarts, Fieberg & Matthiopoulos (2012), eqn. 4]. In contrast to the CIPP, the UIPP estimates an intercept, which relates to the mean intensity. In the UIPP likelihood, the integral in the denominator of eqn 2 is exponentiated, and must still be evaluated numerically. In both the (CIPP) and UIPP, the (relative) intensity of the point process is given by *w*(*X*). The two approaches will result in equivalent slope parameters associated with environmental covariates [see Aarts, Fieberg & Matthiopoulos (2012), Appendix A], but the conditional approach cannot estimate an intercept parameter since it cancels from both the numerator and denominator in eqn 2 (Lele & Keim 2006). We fitted models using the unconditional likelihood; such models can easily be fit as a Poisson log-linear model in most statistical software packages using the following numerical trick (Baddeley & Turner 2000; Warton & Shepherd 2010): First, a regular grid with a spatial resolution similar to the environmental data is constructed, an availability point is located at the centre of each grid cell, and for each grid cell (*i*) the total number of used and available locations (*n*_{i}) is calculated. Next, a Poisson log-linear model is fitted to the data, where the value of the response variable is set to 0 for each availability point, but to *n*_{i} for each animal location, and 1/*n*_{i} were specified as ‘prior weights’ for all points in the GLM. For more details see Baddeley & Turner (2000).

### Modelling the influence of availability on habitat use in discrete environmental space

Deciding *a priori* how habitat use depends on availability can be difficult, since it largely depends on the unknown mechanism of habitat selection. Consider *k* discrete habitats from which organisms may choose. Assuming that changes in availability have a proportional effect on usage (i.e. there is preferential selection), then the log of availability (i.e. amount of area *A*_{i}) of the *i*th habitat (*i = *1, 2, …, *k*) can be treated as an offset and added to the likelihood (Heisey 1985; Kneib, Knauer & Küchenhoff 2011),

where *P*(*u = i*) is the probability that the animal will be found in habitat type *i*, and *β*_{i} defines the preference for habitat *i*. Under the unit-sum constraint, one of the *β*'s is non-identifiable, so we set *β*_{1} = 0 (Kneib, Knauer & Küchenhoff 2011). However, using availability as an offset may not appropriately capture the mechanism with which animals use certain habitat types. We can relax this assumption by including parameters (*ω*_{i}; *i = *2, 3, …, *k*) that quantify a non-linear effect of availability on the use of the (*k*) different habitat types:

Again, under the unit-sum constraint, similar to *β*_{1}, we set *ω*_{1} = 0. If availability of a particular habitat has no impact on observed habitat use, the estimates of *ω*_{i} should be close to zero, and if changes in availability have a proportional effect on usage, estimates of *ω*_{i} should be close to 1. Information criteria, such as AIC's (Burnham & Anderson 2002) can then be used to evaluate whether a more complex model (i.e. eqn 4) outweighs the traditional preferential selection model (eqn 3).

### Generalized functional response (GFR) in continuous environmental space

In recent years, many species distribution models fitted to count or use-availability data, quantify the distribution of a species as a function of continuous covariates. Similar to the discrete case, it may be necessary to allow for a more flexible influence of availability on use.

Traditionally, the HSF parameters were assumed to be fixed across different study sites. Matthiopoulos *et al*. (2011) showed how the coefficient, *β*_{i}*,* associated with the *i*th environmental covariate can be explicitly modelled as a function of the availability of all covariates. In the simplest case, where the influence of availability is allowed to vary as a linear function of the covariates, the coefficient associated with the *k*th disjoint region (or sampling instance), *β*_{i,k,} can be quantified using region-specific expectations:

where is the average value of the *j*th environmental variable calculated for the conditions prevailing in the *k*th region, *δ*_{i,j} is the corresponding fixed effect slope coefficient and *γ*_{i,0} is the intercept coefficient for the mixed-effect coefficient *β*_{i,k}. For more details on such GFR models, see Matthiopoulos *et al*. (2011). Incorporating eqn 5 into the linear predictor of the HSF results in a random intercept (for each region), fixed and random effect terms for all environmental covariates, and fixed effects terms for all pairwise interactions between these covariates and their expectations in each region. Flexibility can be further increased by allowing the influence of availability to vary as a polynomial function of the covariates, which leads to the use of higher order expectations [eqn 6 in Matthiopoulos *et al*. (2011)]. In Appendix S1, we show how the GFR approach can be extended to the case where the log-intensity function is modelled as a non-linear function of environmental covariates using a set of b-spline basis functions. Allowing the selection coefficients (*β*) to vary as a function of the interactions between these basis functions and their expectations provides increased flexibility to capture non-linear effects of habitat availability on selection.