### Introduction

- Top of page
- Summary
- Introduction
- Materials and methods
- Results
- Discussion
- Acknowledgements
- References

Accurate modelling of habitat selection by animals is critical to developing effective management plans. Resource selection functions (RSFs) are used to compare used with available habitat (Manly *et al*. 2002). Recent progress in GPS technology development has resulted in enormous amounts of data being made available. However, sequentially surveyed locations may be correlated at intervals as long as 1 month apart (Cushman, Chase & Griffin 2005), and are obviously correlated at intervals measured in minutes or hours (e.g. Fortin *et al*. 2005). Such data violate assumptions of independence of observations, which may increase frequency of type I errors (Clifford, Richardson & Hémon 1989).

One approach to dealing with this autocorrelation has been to adopt an analysis that assumes absence of correlation, then manipulating data to meet this assumption. For example, telemetry locations may be recorded every few hours or days (e.g. Johnson, Seip & Boyce 2004) on the assumption that this time-lag results in independence. However, the increased time-lag may not be sufficient to produce independent observations, and the reduced amount of data may increase bias and reduce accuracy (Gustine *et al*. 2006). Destructive sampling, accomplished by dropping data until independence is reached (Way, Ortega & Strauss 2004), is similarly problematic, and may require dropping as many as 95% of data collected (e.g. Saher 2005).

Some approaches that have been proposed for controlling for temporal autocorrelation are problematic. For example, information-theoretic approaches do not sufficiently correct for autocorrelation (cf. Boyce 2006; Aarts *et al*. 2008) because calculation of standard errors (sensitive to independence) is a critical component of the model selection paradigm, and because the likelihood used to calculate information criteria assumes independence (Burnham & Anderson 1998). Conditional logistic regression (e.g. Johnson & Gillingham 2005) assumes independence among groups of points (observed paired with random), which is not met when telemetry points are recorded frequently. To address this problem, Fortin *et al*. (2005) incorporated robust standard errors and destructive sampling to obtain long time-lags between clusters of points. Although useful at the ‘step-scale’, their approach does not allow for evaluation of habitat selection at the home-range scale.

Gillies *et al*. (2006) recommended models that include fixed and random (clustering) effects, such as generalized linear mixed-effects models (GLMMs) to control for the correlation that arises from recording multiple locations from each animal. Mixed models have been applied to correlated ecological data (e.g. Bolker *et al*. 2009), but Gillies *et al*. (2006) are among the first to apply it to RSFs (see also Aarts *et al*. 2008). However, there are at least two potential problems with applying GLMM to RSFs. First, the models are analytically complex (Fitzmaurice, Laird & Ware 2004: 326), which may inhibit convergence, and secondly, hypothesis tests in GLMMs are highly sensitive to model and correlation structure misspecification (Overall & Tonidandel 2004) when model-based standard errors are used. Because telemetry locations have been sampled sequentially, they are autocorrelated. However, random points selected from an animal's home range (e.g. Gillies *et al*. 2006) do not show autocorrelation, as they are not sampled sequentially over time. Because the correlation structure among telemetry and random points differ, it is impossible to correctly specify the within-cluster correlation structure. The data Gillies *et al.* provide suggest that grizzly bear *Ursus arctos* L. locations were determined approximately every 4 h. At this sampling frequency, it is unlikely that these locations are independent. Gillies *et al*. (2006: 890) misspecified the correlation structure, as they assumed that all data within a cluster (animal) were equally correlated. Therefore, their approach does not meet the assumptions of GLMM. Nonetheless, we believe their approach is promising, and can be developed further.

One possible modification is to use empirical (Huber–White sandwich) variance estimates within the GLMM to make the analysis robust to misspecification of the correlation structure (SAS Institute Inc. 2006), as Nielsen *et al*. (2002) did with a logistic regression model. Gillies *et al*. (2006) found that GLMMs were more effective for the development of RSFs than were logistic regression models with empirical standard errors, but did not evaluate GLMM combined with empirical standard errors. We suggest that GLMM with empirical standard errors may be robust to both among- and within-animal correlations, in contrast to GLMM without empirical standard errors.

Generalized linear models (GLMs) with generalized estimating equations (GEEs) may provide a useful alternative. GEEs include an additional variance component to accommodate correlated data, and to allow for differences among clusters. GEEs have several favourable properties for ecological analyses; for example, parameter estimates and empirical standard errors are robust to misspecification of the correlation structure (Overall & Tonidandel 2004), and they are usually less analytically complex than GLMMs (Agresti 2002: 365), hence, model convergence is more likely. GEEs have been used extensively in a variety of disciplines, such as epidemiology (Wu *et al*. 1999) and political science (Zorn 2001). In ecology, they have been used to control for lack of independence among nests clustered within sites (Driscoll *et al*. 2005) and among related species (Duncan 2004). Generalized estimating equations have been used only occasionally in habitat-selection studies. Storch (2002) and Dorman *et al.* (2007) demonstrate its use for controlling for spatial autocorrelation. In a conditional logistic regression context, Fortin *et al*. (2005) developed RSFs using estimating equations with an independence-working correlation structure and robust standard errors, which they implemented using Cox proportional hazards regression. Although GEEs with other correlation structures have not previously been used for building RSFs, robust standard errors have been applied to control for correlation among telemetry locations (Nielsen *et al*. 2002). However, pooling data across animals (e.g. Nielsen *et al*. 2002) biases results towards data-rich individuals (Gillies *et al*. 2006; Aarts *et al*. 2008), if data are not missing at random. Applying robust standard errors while using a working correlation structure other than ‘independence’ in the estimation procedure should help overcome this problem.

Nonetheless, like GLMMs, there are tradeoffs to the benefits of GEEs. Whereas GLMMs are sensitive to the choice of correlation structure, GEEs are sensitive to the link function (Pendergast *et al*. 1996: 101), which can affect model fit (Lele & Keim 2006). It is, therefore, important to compare these approaches according to both their performance and analytical paradigm, to evaluate the appropriateness of their tradeoffs under different management scenarios.

Another fundamental issue is the interpretation of parameter estimates. Conditional (subject-specific) coefficient interpretation means that coefficients model how individual responses change with respect to independent variables. Marginal (population) parameter estimates describe the effects of independent variables on a population. This has a strong effect on parameter estimates, standard error estimates, and significance testing (Fitzmaurice *et al*. 2004: 365). Whereas GLMMs generate conditional parameter estimates, from which marginal estimates can be derived (Agresti 2002: 499), GEEs only produce marginal ones. However, marginal parameter estimates derived from GLMMs are biased, in that their absolute value is too small, and this bias increases as the variance of the random effect increases (Agresti 2002: 499). Although RSFs do not produce estimates of actual probabilities of use, they produce estimates that are proportional to probability of use (Manly *et al*. 2002), and thus, this bias could be problematic. Further, the relationships among covariates, and the parameter estimates themselves, are not easily interpreted for marginal estimates derived from conditional models, and models are more likely to be misspecified (Agresti 2002: 499; Fitzmaurice *et al*. 2004: 364). It is therefore preferable to use a marginal model, such as GEE, when marginal population estimates are of interest (Agresti 2002: 501).

Accurate resource selection functions make an important contribution to the conservation of rare or threatened species (Johnson, Seip & Boyce 2004). The boreal population of woodland caribou *Rangifer tarandus caribou* L. is threatened in Canada (COSEWIC 2002). It is sensitive to habitat composition and anthropogenic activities (Brown *et al*. 2007), and therefore, accidental misspecification of RSFs would have important conservation consequences. We compared RSFs developed using GLMMs and GEEs, at two spatial scales, using data on woodland caribou. We compared effects of empirical and model-based standard errors on statistical significance. Finally, we compared our results with an analysis done on a destructively sampled subset of the data. Because GEEs have rarely been applied to RSFs, we provide an overview of this approach (see also Dorman *et al*. 2007).

### Discussion

- Top of page
- Summary
- Introduction
- Materials and methods
- Results
- Discussion
- Acknowledgements
- References

Destructive sampling reduced power and could lead to higher probability of type II errors (see also Gustine *et al*. 2006) when analysed using GEE. In our study, we would have underestimated selection for spruce and jack pine (using the independent structure) with destructive sampling. Degree of habitat selection by woodland caribou may, therefore, be greater than previously recorded in studies that used destructive sampling or long intervals between relocations (e.g. Johnson, Seip & Boyce 2004). Alternatively, destructive sampling may not be sufficient for ensuring independence among sequential points (Cushman, Chase & Griffin 2005). Not accounting for such correlations (e.g. Johnson, Seip & Boyce 2004; Gillies *et al*. 2006) may overestimate the sample size of independent data, and may overestimate habitat selection (Clifford, Richardson & Hémon 1989). However, it may still be important to use destructive sampling to minimize the effects of measurement error (Jerde & Visscher 2005). Analytical methods that can use all the data are a better choice. Both GEEs and GLMMs hold promise, but both methods must be applied appropriately. Neither GLMMs nor GEEs with model-based variance estimators meet the assumption of correctly selecting the correlation structure. Evaluation of parameter significance was strongly influenced by type of standard error. Empirical standard errors must be used to determine significance if correlation structures are misspecified (Hardin & Hilbe 2003). This will almost invariably be the case with RSFs developed using GPS telemetry data. If GLMMs are used to develop RSFs, the methods described by Gillies *et al*. (2006) should be modified to ensure that empirical standard errors are used. The methods proposed by Fortin *et al*. (2005) avoid some of the differences in the correlation structure between used and available points, and thus are suitable if habitat selection is assessed at a local, rather than home-range scale.

Statistical results, and therefore conservation implications, sometimes differed between GEE and GLMM. One reason is that we used only conditional parameter estimates from GLMMs, while coefficient interpretation for GEEs was marginal; therefore, each model addressed a different question. Because the interpretation of parameter estimates is different for a marginal vs. conditional design, their selection is of critical importance and must be based on appropriate biological or management rationale. The population-specific response will be of interest if management actions are intended to influence whole populations. This would be typical of many scenarios in applied ecology, such as landscape-level conservation, in which a subset of the population is monitored to understand effects of management on the whole population or other populations. Marginal models such as GEEs are preferred for generating marginal population estimates (Agresti 2002: 501). Conditional estimates are most appropriate when management focuses on individuals (Fitzmaurice *et al*. 2004: 369); for example, when conservation of specific individuals is the management goal, such as with endangered species management (Gillies *et al*. 2006), and when individuals are monitored to understand how future management plans will affect them. If management goals are conditional, GLMMs can be used for their analysis.

We monitored a subset of individuals to develop management recommendations for the population; therefore, marginal estimates were of interest. We would have underestimated the degree of habitat selection for jack pine and spruce if we incorrectly applied a conditional approach to the analysis at the IHR scale. This would have serious conservation implications, as jack pine and spruce are economically valuable and therefore at significant risk of harvest.

Road avoidance was only detected at the HHR scale (see also Apps & McLellan 2006). Many previous studies have recognized differences in habitat selection among spatial scales (e.g. Gustine *et al*. 2006; Mayor* et al*. 2007), and thus, the analytical approach must be selected based on the appropriate spatial scale for answering the ecological or management question. The herd home range is the spatial scale at which many management plans are developed (e.g. Crichton & Duncan 2005), and therefore, this spatial scale is of particular importance in applied ecology.

Evaluating the predictive capacity of models is important for determining their usefulness for conservation and management (Boyce *et al*. 2002). Appropriate evaluation of model prediction using *k*-fold cross-validation (Boyce *et al*. 2002) is sensitive to whether a marginal vs. conditional approach is taken. Although we are not aware of any precedent for using *k*-fold cross-validation for evaluating fit of GEE or GLMM, it should be effective. However, the approach must be applied differently for a marginal vs. conditional design. For discussion purposes, we use the example of withholding 20% for evaluating performance of an RSF (*sensu *Boyce *et al*. 2002). With a marginal design, we hope to predict habitat selection of all animals in a population, from a subset of monitored animals. In that case, we should withhold data from 20% of the *individuals* and evaluate model performance with those. However, with a conditional design, the model describes habitat selection of specific animals. In that case, we should withhold data from 20% of the *points from each animal*, and evaluate model performance with those. Obviously, the results of the *k*-fold cross-validation will differ under each scenario, and address different questions about how well the RSF model predicts. It is unsurprising that we found RSFs were more likely to correctly predict habitat selection of those animals used to develop the models (conditional approach), than to predict habitat selection of other animals in the population (marginal approach); there is greater variation among than within individuals. Although threshold guidelines are not yet available (Pearce & Boyce 2006), guidelines for *k*-fold cross- validation thresholds would clearly differ for a marginal vs. conditional approach. This represents an important avenue for future research.

Suitability of GEEs and GLMMs for developing resource selection functions differ in several more ways. GLMMs may be less likely to converge than GEEs due to their added complexity (Fitzmaurice *et al*. 2004: 326). While we circumvented this problem using alternative optimization procedures, this is not always effective. Gillies *et al*. (2006) argue that point estimates obtained under a working independence assumption weights results towards animals with more samples. However, GEEs may be used in combination with empirical variance estimators to produce an estimator robust to moderate deviations in sample sizes (Fitzmaurice *et al*. 2004: 320). Parameter estimates from GEEs may be sensitive to degree of correlation (Pepe & Anderson 1994), which may vary among seasons. This suggests that, as is usually the case with RSFs, GEEs should be used to analyse habitat selection within reasonably discrete, biologically appropriate seasons. In our study, the correlation structure had few effects on GEE results. If greater differences in habitat selection among animals are expected in other studies, researchers should consider both independent and compound-symmetric correlation structures.

Both GEEs and GLMMs hold promise for the development of RSFs when used with empirical variance estimates. The optimal approach will depend on study design and management goals (see also Bolker *et al*. 2009). Selection of a marginal or conditional approach is a key step in the study design process, and should be based on ecological or management goals. GEEs may be more likely to converge for some data sets as they are simpler analytically, and are preferred when marginal population estimates are needed (Agresti 2002: 501). GLMMs are required for generating conditional population estimates, and may be preferred if the link function is likely to be misspecified (see also Lele & Keim 2006). GLMMs may be preferred if variances differ widely among groups within categorical explanatory variables (Agresti 2002: 501). Further research is required to adapt either approach for the development of resource selection probability functions (Lele & Keim 2006) to account for contamination of random points with used points (Keating & Cherry 2004). Contamination is likely to be minimal in our system, as the species density is low (Johnson *et al*. 2006). Nonetheless, this problem is beyond the scope of our study, and therefore, we restrict our discussion to RSFs. Caution must also be taken if relative selection is very small, as in this case, parameter estimates for RSFs may be inaccurate (Lele & Keim 2006). We also recognize that the correlation itself may be of interest, and recommend use of the many analytical procedures that are available, such as Mantel correlograms, in this case. However, other biological questions may need to be addressed within an RSF that does not focus on the autocorrelation, for which GEEs and GLMMs are useful.