## Introduction

Resource utilization function (RUF) analyses (Marzluff *et al*. 2004) are widely employed in the study of animal space use and enjoy the advantages of being relatively intuitive and comparatively easy to implement. Based on the estimation of an individual-based ‘utilization distribution' (UD; e.g. Millspaugh *et al*. 2006), RUF analyses are commonly intended to obtain inference about the relationship between an animal or population's use of space and the underlying environmental niche. This desired inference is a critical component in the field of ecology (Krebs 1978). In linking the UD to the underlying environment, RUF analyses go beyond that of home range and core area (e.g. Wilson *et al*. 2010) estimation and also relate to resource selection analyses, where desired inference pertains to whether the use of resources is disproportionate to those available (Manly *et al*. 2002). One potential advantage of the RUF approaches is that they may improve selection inference when the telemetery data are subject to measurement error (Millspaugh *et al*. 2006).

Resource utilization function analyses hold an appeal because of their simplicity, but the specific connection in how they relate to resource selection functions (RSFs) has not been described. In this paper, we attempt to reconcile RUF analysis with RSF analysis and examine several sources of potential bias in doing so. We begin by describing the RUF and RSF analyses as they are traditionally employed. We then explore several potential sources of bias that could affect RUF analyses and use simulation to demonstrate our findings. Finally, we suggest a few simple diagnostics that could be employed in selection analyses and illustrate them using a real data set pertaining to the spatial ecology of mountain lions (*Puma concolor*) in Colorado, USA.

### Resource utilization functions

The conventional perspective in animal space use studies is that the UD is a spatial probability distribution that gives rise to a spatial point process (i.e. the observed telemetry locations). That is, one assumes there is a surface over a spatial domain () of interest that specifies the likelihood (*f*) an animal will occur at any given location (**s**) in the domain. Thus, for a finite set of times at which an animal's location is observed, say *t* = 1,…,*T*, we have a statistical model for location where .

The RUF procedure outlined by Marzluff *et al*. (2004) assumes that the probability distribution *f* (i.e. the UD) then depends on the underlying environment **X** (i.e. *f*(**s**)≡*f*(**s**|**X**,**β**)) and adopts a two-stage estimation approach for the coefficients **β**. The first step in the analysis is concerned with estimating the UD (with say, ), while the second stage links the UD to a set of underlying covariates **X**.

To estimate the UD, a wide variety of density estimation techniques can be employed to find based on the telemetry data (); however, we will focus on kernel density estimation (KDE), because (i) this is a commonly applied technique familiar to many animal ecologists and (ii) Marzluff *et al*. (2004) employed this approach in their seminal paper on the topic. It should be noted, however, that many of the following results would apply to RUFs based on any form of UD estimation technique.

In KDE, one takes a nonparametric approach to estimating *f* whereby for any location of interest in the spatial domain , the estimate of the UD is as follows:

where , *k* represents the kernel (which we assume to be Gaussian) and the parameters and are bandwidth parameters that control the diffuseness of the kernel (Venables & Ripley 2002, Chapter 5). There are various ways to choose the bandwidth parameters, and these are well described in the literature (e.g. Silverman 1986). In practice, the UD, , is estimated for a large but finite set of points (or grid cells, *i* = 1,…,*m*) in the spatial domain for the purposes of graphical display or further use in a RUF model.

Consider, as an illustration, the situation where there is a single covariate of interest *x* and telemetry locations are simulated from (Fig. 1). In this case, the coefficients were chosen to provide a positive relationship between the covariate and the UD (i.e. , where only has an effect on the total number of observed telemetry locations *T*). Figure 1 depicts a large-scale spatial pattern in the covariate where the telemetry data are constrained to the unit square region shown; this constraint serves as the ‘home range’ and could take any shape, but the rectangular shape is used here for display purposes only. We will show that the spatial pattern in the covariate, which is only a function of the spatial arrangement of the landscape, will prove to be an important factor in the spatially explicit models that follow.

A conventional RUF analysis typically proceeds by fitting a linear model with as the response variable and , a *p* × 1 vector, representing the covariates (i.e. environmental resources) at location . That is, the second stage of the RUF analysis for an individual involves fitting the regression model:

for *i*=1,…,*m* and , where the regression coefficients **β** control the linear relationship between the environmental covariates and the UD, and corresponds to an intercept parameter that is not typically interpreted.

At the individual level, RUF analysis provides inference about the regression coefficients **β** in terms of significance and possibly subset selection, thereby illuminating the potential environmental influences on space use. In a population-level analysis, where telemetry data exist for multiple individuals (say, for *j* = 1,…,*J* individuals) one would index the regression coefficients such that they are labelled for each individual. Then, the focus shifts towards the expectation or variance in coefficient estimates among individuals; for example, we may be interested in learning about for all *j* = 1,…,*J* animals. In this latter case, the individual becomes the sample unit and the sample size *J* most heavily influences the uncertainty concerning .

In implementing the RUF approach described previously, Marzluff *et al*. (2004) wisely noticed that there may be lurking forms of dependence in the regression errors . They posited that such forms of dependence might arise from the smoothing induced by the KDE approach for estimating the UD (eqn (eqn 1)) [in addition to other possible sources of latent autocorrelation such as missing covariates in eqn (eqn 2)]. Marzluff *et al*. (2004) propose a geostatistical approach (Cressie 1993) that involves modelling the covariance structure between the errors in a spatially explicit manner. A simple geostatistical model for the RUF analysis is the exponential spatial model given by:

where the numerator in the exponential refers to the Euclidean distance between cell *i* and cell *l*, and the denominator ϕ is a range parameter that controls the decay in the spatial structure of ɛ with distance. The two variance components (nugget) and (sill) account for the variance associated with a non-spatially structured and spatially structured source of error, respectively. In matrix notation, the model for the errors is then often expressed as **ɛ**∼*N*(**0**,**Σ**), where and the element of the covariance matrix **Σ** is equal to (eqn 3). Often, the covariance matrix is written as .

The conventional procedure used to fit geostatistical models to continuous spatial data involves a multi-step process of first (i) fitting the linear regression model assuming independent errors, then (ii) characterizing the spatial structure in the residuals using variogram estimation (Cressie 1993), and finally (iii) using generalized (or weighted) least squares (GLS) to estimate the regression coefficients (**β**) while taking into account the correlated errors. Other approaches such as maximum likelihood can also be used, but for simplicity, we retain the GLS method in our simulations.

### Resource selection functions

Resource selection is the differential use of resources given those resources available. In describing the conventional approach for estimating RSFs (e.g. Manly *et al*. 2002; Johnson *et al*. 2006), we note that most recent applications of RSFs take a weighed distribution approach where the probability distribution of use can be expressed as an updated distribution of availability given the RSF *g*(**x**,**β**) which is usually expressed in an exponential form as *g*(**x**,**β**)= exp (**x**′**β**) (although other functional forms are possible, e.g., Lele & Keim 2006). This equivalence between use and the updated version of availability can be written as:

because the distribution of use is not observed directly, a maximum likelihood approach can be taken to maximize a product over the right-hand-side of eqn (eqn 4) with respect to **β**:

Various tricks can be employed to maximize (eqn (eqn 5)) without having to analytically solve the integral in the denominator (e.g. Johnson *et al*. 2006; Lele 2009). The most common approach involves taking a ‘background’ sample (sometimes referred to as an availability sample) of locations from and labelling those as zeros in a binary response vector with the ones corresponding to the observed telemetry locations. A logistic regression is then fit to the binary data using the covariates at all of the used and available locations. Under certain conditions, the parameter estimates have been shown to be equivalent to those obtained by maximizing (eqn (eqn 5)). Incidentally, Warton and Shepherd (2010) and Aarts *et al*. (2012) have recently shown that maximizing (eqn (eqn 5)) is equivalent to maximizing the likelihood of an inhomogeneous spatial point process for the purpose of estimating **β**. Furthermore, Aarts *et al*. (2012) show that the required maximization can be achieved using a Poisson generalized linear model (GLM), with an offset term corresponding to availability.

To fit the Poisson GLM, one bins the telemetry locations into a large set of grid cells spanning the spatial domain , and the resulting response variable (for *i* = 1,…,*m* grid cells) consists of cell counts where the model is expressed as , and a log link is used to model the intensities :

where if the availability weights are all equal (i.e. even availability within the region ), then this procedure becomes a regular Poisson log-linear regression of the cell counts on the covariates without weights. In what follows, we set all ; however, if are set to be the area of the grid cells, then can be interpreted as the average number points per unit area.