## Introduction

Accurate modelling of habitat selection by animals is critical to developing effective management plans. Resource selection functions (RSFs) are used to compare used with available habitat (Manly *et al*. 2002). Recent progress in GPS technology development has resulted in enormous amounts of data being made available. However, sequentially surveyed locations may be correlated at intervals as long as 1 month apart (Cushman, Chase & Griffin 2005), and are obviously correlated at intervals measured in minutes or hours (e.g. Fortin *et al*. 2005). Such data violate assumptions of independence of observations, which may increase frequency of type I errors (Clifford, Richardson & Hémon 1989).

One approach to dealing with this autocorrelation has been to adopt an analysis that assumes absence of correlation, then manipulating data to meet this assumption. For example, telemetry locations may be recorded every few hours or days (e.g. Johnson, Seip & Boyce 2004) on the assumption that this time-lag results in independence. However, the increased time-lag may not be sufficient to produce independent observations, and the reduced amount of data may increase bias and reduce accuracy (Gustine *et al*. 2006). Destructive sampling, accomplished by dropping data until independence is reached (Way, Ortega & Strauss 2004), is similarly problematic, and may require dropping as many as 95% of data collected (e.g. Saher 2005).

Some approaches that have been proposed for controlling for temporal autocorrelation are problematic. For example, information-theoretic approaches do not sufficiently correct for autocorrelation (cf. Boyce 2006; Aarts *et al*. 2008) because calculation of standard errors (sensitive to independence) is a critical component of the model selection paradigm, and because the likelihood used to calculate information criteria assumes independence (Burnham & Anderson 1998). Conditional logistic regression (e.g. Johnson & Gillingham 2005) assumes independence among groups of points (observed paired with random), which is not met when telemetry points are recorded frequently. To address this problem, Fortin *et al*. (2005) incorporated robust standard errors and destructive sampling to obtain long time-lags between clusters of points. Although useful at the ‘step-scale’, their approach does not allow for evaluation of habitat selection at the home-range scale.

Gillies *et al*. (2006) recommended models that include fixed and random (clustering) effects, such as generalized linear mixed-effects models (GLMMs) to control for the correlation that arises from recording multiple locations from each animal. Mixed models have been applied to correlated ecological data (e.g. Bolker *et al*. 2009), but Gillies *et al*. (2006) are among the first to apply it to RSFs (see also Aarts *et al*. 2008). However, there are at least two potential problems with applying GLMM to RSFs. First, the models are analytically complex (Fitzmaurice, Laird & Ware 2004: 326), which may inhibit convergence, and secondly, hypothesis tests in GLMMs are highly sensitive to model and correlation structure misspecification (Overall & Tonidandel 2004) when model-based standard errors are used. Because telemetry locations have been sampled sequentially, they are autocorrelated. However, random points selected from an animal's home range (e.g. Gillies *et al*. 2006) do not show autocorrelation, as they are not sampled sequentially over time. Because the correlation structure among telemetry and random points differ, it is impossible to correctly specify the within-cluster correlation structure. The data Gillies *et al.* provide suggest that grizzly bear *Ursus arctos* L. locations were determined approximately every 4 h. At this sampling frequency, it is unlikely that these locations are independent. Gillies *et al*. (2006: 890) misspecified the correlation structure, as they assumed that all data within a cluster (animal) were equally correlated. Therefore, their approach does not meet the assumptions of GLMM. Nonetheless, we believe their approach is promising, and can be developed further.

One possible modification is to use empirical (Huber–White sandwich) variance estimates within the GLMM to make the analysis robust to misspecification of the correlation structure (SAS Institute Inc. 2006), as Nielsen *et al*. (2002) did with a logistic regression model. Gillies *et al*. (2006) found that GLMMs were more effective for the development of RSFs than were logistic regression models with empirical standard errors, but did not evaluate GLMM combined with empirical standard errors. We suggest that GLMM with empirical standard errors may be robust to both among- and within-animal correlations, in contrast to GLMM without empirical standard errors.

Generalized linear models (GLMs) with generalized estimating equations (GEEs) may provide a useful alternative. GEEs include an additional variance component to accommodate correlated data, and to allow for differences among clusters. GEEs have several favourable properties for ecological analyses; for example, parameter estimates and empirical standard errors are robust to misspecification of the correlation structure (Overall & Tonidandel 2004), and they are usually less analytically complex than GLMMs (Agresti 2002: 365), hence, model convergence is more likely. GEEs have been used extensively in a variety of disciplines, such as epidemiology (Wu *et al*. 1999) and political science (Zorn 2001). In ecology, they have been used to control for lack of independence among nests clustered within sites (Driscoll *et al*. 2005) and among related species (Duncan 2004). Generalized estimating equations have been used only occasionally in habitat-selection studies. Storch (2002) and Dorman *et al.* (2007) demonstrate its use for controlling for spatial autocorrelation. In a conditional logistic regression context, Fortin *et al*. (2005) developed RSFs using estimating equations with an independence-working correlation structure and robust standard errors, which they implemented using Cox proportional hazards regression. Although GEEs with other correlation structures have not previously been used for building RSFs, robust standard errors have been applied to control for correlation among telemetry locations (Nielsen *et al*. 2002). However, pooling data across animals (e.g. Nielsen *et al*. 2002) biases results towards data-rich individuals (Gillies *et al*. 2006; Aarts *et al*. 2008), if data are not missing at random. Applying robust standard errors while using a working correlation structure other than ‘independence’ in the estimation procedure should help overcome this problem.

Nonetheless, like GLMMs, there are tradeoffs to the benefits of GEEs. Whereas GLMMs are sensitive to the choice of correlation structure, GEEs are sensitive to the link function (Pendergast *et al*. 1996: 101), which can affect model fit (Lele & Keim 2006). It is, therefore, important to compare these approaches according to both their performance and analytical paradigm, to evaluate the appropriateness of their tradeoffs under different management scenarios.

Another fundamental issue is the interpretation of parameter estimates. Conditional (subject-specific) coefficient interpretation means that coefficients model how individual responses change with respect to independent variables. Marginal (population) parameter estimates describe the effects of independent variables on a population. This has a strong effect on parameter estimates, standard error estimates, and significance testing (Fitzmaurice *et al*. 2004: 365). Whereas GLMMs generate conditional parameter estimates, from which marginal estimates can be derived (Agresti 2002: 499), GEEs only produce marginal ones. However, marginal parameter estimates derived from GLMMs are biased, in that their absolute value is too small, and this bias increases as the variance of the random effect increases (Agresti 2002: 499). Although RSFs do not produce estimates of actual probabilities of use, they produce estimates that are proportional to probability of use (Manly *et al*. 2002), and thus, this bias could be problematic. Further, the relationships among covariates, and the parameter estimates themselves, are not easily interpreted for marginal estimates derived from conditional models, and models are more likely to be misspecified (Agresti 2002: 499; Fitzmaurice *et al*. 2004: 364). It is therefore preferable to use a marginal model, such as GEE, when marginal population estimates are of interest (Agresti 2002: 501).

Accurate resource selection functions make an important contribution to the conservation of rare or threatened species (Johnson, Seip & Boyce 2004). The boreal population of woodland caribou *Rangifer tarandus caribou* L. is threatened in Canada (COSEWIC 2002). It is sensitive to habitat composition and anthropogenic activities (Brown *et al*. 2007), and therefore, accidental misspecification of RSFs would have important conservation consequences. We compared RSFs developed using GLMMs and GEEs, at two spatial scales, using data on woodland caribou. We compared effects of empirical and model-based standard errors on statistical significance. Finally, we compared our results with an analysis done on a destructively sampled subset of the data. Because GEEs have rarely been applied to RSFs, we provide an overview of this approach (see also Dorman *et al*. 2007).