If animals moved randomly in space, the use of different habitats would be proportional to their availability. Hence, deviations from proportionality between use and availability are considered the tell-tale sign of preference. This principle forms the basis for most habitat selection and species distribution models fitted to use-availability or count data (e.g. MaxEnt and Resource Selection Functions).
Yet, once an essential habitat type is sufficiently abundant to meet an individual's needs, increased availability of this habitat type may lead to a decrease in the use/availability ratio. Accordingly, habitat selection functions may estimate negative coefficients when habitats are superabundant, incorrectly suggesting an apparent avoidance. Furthermore, not accounting for the effects of availability on habitat use may lead to poor predictions, particularly when applied to habitats that differ considerably from those for which data have been collected.
Using simulations, we show that habitat use varies non-linearly with habitat availability, even when individuals follow simple movement rules to acquire food and avoid risk. The results show that the impact of availability strongly depends on the type of habitat (e.g. whether it is essential or substitutable) and how it interacts with the distribution and availability of other habitats.
We demonstrate the utility of a variety of existing and new methods that enable the influence of habitat availability to be explicitly estimated. Models that allow for non-linear effects (using b-spline smoothers) and interactions between environmental covariates defining habitats and measures of their availability were best able to capture simulated patterns of habitat use across a range of environments.
An appealing aspect of some of the methods we discuss is that the relative influence of availability is not defined a priori, but directly estimated by the model. This feature is likely to improve model prediction, hint at the mechanism of habitat selection, and may signpost habitats that are critical for the organism's fitness.
Management and conservation of animal populations require a clear understanding of how environmental variables influence the spatial distribution of species. This need has led to increased interest in the analysis of existing (historical) data on species distributions, and has fuelled the collection of new animal location data, e.g. using telemetry. The high diversity of such data (e.g. counts, location-only, and presence-absence) has led to a parallel development of a variety of statistical models that quantify the link between the distribution of a species, and environmental conditions (Buckland & Elston 1993; Manly, McDonald & Thomas 1993; Boyce & McDonald 1999; MacKenzie et al. 2005; Johnson et al. 2006; Phillips, Anderson & Schapire 2006; Chakraborty et al. 2011). Although these approaches may look dissimilar superficially, most are, in fact, exactly or approximately equivalent to an inhomogeneous Poisson point process (IPP – Baddeley et al. 2010; Warton & Shepherd 2010; Aarts, Fieberg & Matthiopoulos 2012; Renner & Warton 2013).
IPP models treat animal observations as points and quantify variations in their spatial density as a function of geographic or environmental covariates. This setting encompasses well-known use-availability designs traditionally used to estimate habitat or resource selection (Manly, McDonald & Thomas 1993). The logic behind the use-availability design is that if animals moved randomly in space, use of habitats would be proportional to habitat availability. Hence, modelling the deviations from proportionality is thought to quantify preferential selection.
Although this concept is widely accepted by ecologists, it assumes that changes in availability lead to proportional changes in usage. However, this assumption may not be valid. Mysterud & Ims (1998) illustrated how under a dichotomous habitat classification, relative use changes with relative availability measured at the home-range level. Also, Beyer et al. (2010) showed that when organisms use two habitat types for a fixed amount of time, changes in (apparent) availability can lead to changes in the sign of the coefficients that define the direction and strength of a particular environmental variable. This phenomenon, where use-availability ratios change non-linearly with availability, is known as a functional response in habitat selection (Arthur et al. 1996; Mysterud & Ims 1998; Mauritzen et al. 2003; Matthiopoulos et al. 2011; Moreau et al. 2012), and causes traditional resource selection functions to fail when predicting space use in novel environments. Thus, alternative methods may be needed to elucidate which environmental conditions are critical to the organism's existence.
The objective of this study is first, to illustrate how habitat selection functions vary with availability under a variety of habitat use strategies (Tilman 1986), and second, to explore various ways of incorporating availability into habitat selection models. For the selection of discrete habitats, we propose a model framework similar to discrete choice models (McCracken, Manly & Heyden 1998; Manly et al. 2002), but with log availability included as a covariate (i.e. we estimate a regression parameter that quantifies the influence of availability on use). When habitats are defined using continuous covariates (or a mix of continuous and discrete covariates), the influence of availability on habitat use can be modelled flexibly, by specifying the parameters of the resource selection function itself as functions of availability (Matthiopoulos et al. 2011). An appealing aspect of all such approaches is that the influence of habitat availability on the output of the models is not subjective, but data-driven.
Throughout, we will use simulations to illustrate key concepts, starting with a simple simulation scenario involving discrete habitats and then building up to a more realistic scenario in which habitat is defined using continuous covariates.
Resources are defined as substances or objects required by an organism (e.g. water, food items), the quantity of which can be reduced by the activity of the organism (Begon, Harper & Townsend 1996). A habitat can be defined as a collection of resources and environmental conditions (abiotic and biotic) that determine the presence, survival and reproduction of a population (Sinclair, Fryxell & Caughley 2006; Gaillard et al. 2010). Under the finest habitat classification scheme, every point in geographical space can be treated as a unique habitat.
Attempts to quantify selection as a function of environmental variables have led to the development of resource selection functions (RSFs reviewed in Boyce & McDonald 1999; Manly et al. 2002). Since the environmental variables included in such models do not always relate to resources, we prefer to use the name habitat selection functions (HSFs). HSFs are used to model the disproportionality between habitat usage and availability. The ratio of the use of a habitat over its availability, conditional on the availability of all habitats to the study animals, is often also termed habitat preference (Krausman 1999). Often, the response variable in HSFs is said to be proportional to the probability of use. However, probabilities of use can only be non-zero for areas, hence, it seems necessary to define a discrete spatial unit (e.g. a grid cell), which is not necessarily defined a priori in most use-availability designs. Further, the probabilities of use will change non-linearly with the size of the spatial unit (Baddeley et al. 2010). Thus, there is appeal to fitting and interpreting models under a spatial Poisson point process framework (i.e. IPP), with use modelled via a spatially-varying intensity function (Warton & Shepherd 2010; Chakraborty et al. 2011). Since a weighted distribution likelihood (eqn 1), often used to fit HSFs, is identical to the conditional IPP likelihood, it implies that the HSF actually models an intensity (or density of observations) rather than a probability (Aarts, Fieberg & Matthiopoulos 2012).
Specification of the habitat selection function and inhomogeneous Poisson process likelihood
The probability density function, fu(X), describing relative use of a habitat with environmental covariates, X, is typically defined using:
where fa(X) gives the relative availability of a habitat composed of environmental covariates X, in the study area and the integral applies to all possible combinations of environmental conditions in environmental space E (i.e. the multi-dimensional space represented by environmental variables). This specification (eqn 1) can be seen as a weighted distribution (Patil & Rao 1978; Patil 2002; Lele & Keim 2006), where the organism samples the available distribution fa(X) with a probability proportional to w(X) in order to obtain the used distribution fu(X). Hence, in environmental space, w(X) is proportional to the ratio between habitat use and availability. w(X) is known as the habitat or resource selection function and is most often modelled as an exponential function of covariates, , where XT is the transpose of X (McDonald, Manly & Raley 1990; Lele & Keim 2006; Johnson et al. 2008). Although rarely explicitly stated, eqn 1 assumes that, conditional on w(X), changes in fa(X) lead to proportional changes in fu(X). Assuming the parameters in the habitat selection function are conditional on fa(X), and fa(X) is uniform in geographical space (G), the likelihood function for the entire dataset can be simplified to:
where the integral of eqn 1, originally defined in environmental space, is replaced by the integral evaluated over all of geographical space; here A is the entire study area. This integral can be approximated by evaluation and averaging of w(X) at random or regular points in geographical space. This alternative specification of the likelihood function illustrates that the habitat selection model fitted in environmental space (see eqn 1), is equivalent to a model which quantifies use in geographical space as a function of the underlying environmental conditions (i.e. eqn 2 – Aarts, Fieberg & Matthiopoulos 2012). An advantage of specifying the likelihood according to eqn 1, however, is that it becomes readily apparent that the estimated habitat selection function w(X) is conditional on what is considered to be available to the organism [i.e. fa(X)]. As will be illustrated later (see e.g. Figs 1-4), the estimated w(X) may vary drastically as a function of absolute habitat availability, even though the animal uses the same movement rules to explore and exploit space.
The likelihood function in eqn 2 is equivalent to that of a conditional inhomogeneous Poisson process (CIPP – Cressie 1993) and identical to the likelihood used by MaxEnt when an exponential function is used to model w(X) (Phillips, Anderson & Schapire 2006; Aarts, Fieberg & Matthiopoulos 2012; Renner & Warton 2013). Alternatively, one can fit an Unconditional Inhomogeneous Poisson Process (UIPP) model [Cressie (1993), eqn. 8.5.16, page 655, and Aarts, Fieberg & Matthiopoulos (2012), eqn. 4]. In contrast to the CIPP, the UIPP estimates an intercept, which relates to the mean intensity. In the UIPP likelihood, the integral in the denominator of eqn 2 is exponentiated, and must still be evaluated numerically. In both the (CIPP) and UIPP, the (relative) intensity of the point process is given by w(X). The two approaches will result in equivalent slope parameters associated with environmental covariates [see Aarts, Fieberg & Matthiopoulos (2012), Appendix A], but the conditional approach cannot estimate an intercept parameter since it cancels from both the numerator and denominator in eqn 2 (Lele & Keim 2006). We fitted models using the unconditional likelihood; such models can easily be fit as a Poisson log-linear model in most statistical software packages using the following numerical trick (Baddeley & Turner 2000; Warton & Shepherd 2010): First, a regular grid with a spatial resolution similar to the environmental data is constructed, an availability point is located at the centre of each grid cell, and for each grid cell (i) the total number of used and available locations (ni) is calculated. Next, a Poisson log-linear model is fitted to the data, where the value of the response variable is set to 0 for each availability point, but to ni for each animal location, and 1/ni were specified as ‘prior weights’ for all points in the GLM. For more details see Baddeley & Turner (2000).
Modelling the influence of availability on habitat use in discrete environmental space
Deciding a priori how habitat use depends on availability can be difficult, since it largely depends on the unknown mechanism of habitat selection. Consider k discrete habitats from which organisms may choose. Assuming that changes in availability have a proportional effect on usage (i.e. there is preferential selection), then the log of availability (i.e. amount of area Ai) of the ith habitat (i = 1, 2, …, k) can be treated as an offset and added to the likelihood (Heisey 1985; Kneib, Knauer & Küchenhoff 2011),
where P(u = i) is the probability that the animal will be found in habitat type i, and βi defines the preference for habitat i. Under the unit-sum constraint, one of the β's is non-identifiable, so we set β1 = 0 (Kneib, Knauer & Küchenhoff 2011). However, using availability as an offset may not appropriately capture the mechanism with which animals use certain habitat types. We can relax this assumption by including parameters (ωi; i = 2, 3, …, k) that quantify a non-linear effect of availability on the use of the (k) different habitat types:
Again, under the unit-sum constraint, similar to β1, we set ω1 = 0. If availability of a particular habitat has no impact on observed habitat use, the estimates of ωi should be close to zero, and if changes in availability have a proportional effect on usage, estimates of ωi should be close to 1. Information criteria, such as AIC's (Burnham & Anderson 2002) can then be used to evaluate whether a more complex model (i.e. eqn 4) outweighs the traditional preferential selection model (eqn 3).
Generalized functional response (GFR) in continuous environmental space
In recent years, many species distribution models fitted to count or use-availability data, quantify the distribution of a species as a function of continuous covariates. Similar to the discrete case, it may be necessary to allow for a more flexible influence of availability on use.
Traditionally, the HSF parameters were assumed to be fixed across different study sites. Matthiopoulos et al. (2011) showed how the coefficient, βi, associated with the ith environmental covariate can be explicitly modelled as a function of the availability of all covariates. In the simplest case, where the influence of availability is allowed to vary as a linear function of the covariates, the coefficient associated with the kth disjoint region (or sampling instance), βi,k, can be quantified using region-specific expectations:
where is the average value of the jth environmental variable calculated for the conditions prevailing in the kth region, δi,j is the corresponding fixed effect slope coefficient and γi,0 is the intercept coefficient for the mixed-effect coefficient βi,k. For more details on such GFR models, see Matthiopoulos et al. (2011). Incorporating eqn 5 into the linear predictor of the HSF results in a random intercept (for each region), fixed and random effect terms for all environmental covariates, and fixed effects terms for all pairwise interactions between these covariates and their expectations in each region. Flexibility can be further increased by allowing the influence of availability to vary as a polynomial function of the covariates, which leads to the use of higher order expectations [eqn 6 in Matthiopoulos et al. (2011)]. In Appendix S1, we show how the GFR approach can be extended to the case where the log-intensity function is modelled as a non-linear function of environmental covariates using a set of b-spline basis functions. Allowing the selection coefficients (β) to vary as a function of the interactions between these basis functions and their expectations provides increased flexibility to capture non-linear effects of habitat availability on selection.
Materials and methods
The objectives of the simulation study were to: (i) illustrate how habitat use varies as a function of habitat availability under different selection strategies; (ii) illustrate the relationship between estimated regression parameters and habitat availability in traditional habitat selection models for each of these different selection strategies; and (iii) show how the influence of habitat availability on habitat use can be explicitly estimated by extending the traditional HSFs, using the models defined in eqns 4 and 5.
The strength and functional form of the HSF will depend on the type of habitats used and also the availability and distribution of these habitats. We considered three simulation scenarios: In the first, the organism required two essential habitat types for resting and foraging. The time spent in each habitat was fixed and the duration of travel between habitats was assumed negligible. In the second, the organism chose between two substitutable habitats. Two habitat types are perfectly substitutable, when either can be wholly replaced by the other, but without implying that they are of equal quality (Begon, Harper & Townsend 1996). The preference for each habitat type was expressed by habitat-specific movement rules: animals moved to a new and random location and stayed there with probability = 0.25 (when in habitat type A) or 0.75 (when in habitat type B). Under this second scenario the total number of individuals in each habitat not only depended on how much it was preferred by the organisms, but also on availability, e.g. more individuals might be observed in habitat A if it is more available. The simulated behaviour is in line with most species distribution modelling techniques (i.e. the IPP type approaches). To investigate how habitat use was influenced by availability under these two scenarios, the availability of each habitat was allowed to vary between 90% and 10%, at increments of 10% (Fig. 1).
In the third and most complex simulation, individuals required two essential habitat types, food and cover, both of which were defined using continuous variables. Cover should be interpreted flexibly, and could represent low levels of predation pressure, disturbance or anthropogenic noise, or it could represent the amount of shelter, e.g. against extreme weather conditions. Organisms sought out the food-rich habitat when hungry, and retreated to habitat with good cover once satiated (see Matthiopoulos et al. 2011). Spatial variability in the amount of food was simulated using spatially correlated Gaussian random fields. Cover was assumed to be negatively correlated (correlation coefficient = −1) with food. The organism perceived resource values within its current cell and the four nearest neighbouring cells, but perception was made imperfect by adding noise (). The animal accumulated energy (E) through food (u) at a rate εH(u) until reaching a satiation threshold (E1 = 100). The constant ε was a food-to-energy conversion efficiency and H(u) was a Holling type II model of food consumption. Upon satiation, the animal stopped feeding and climbed up the gradient of cover. Accumulated energy decayed at a rate G(E) = kE irrespective of whether the animal was foraging or not. When E fell below a threshold (E < 50) the animal climbed up the gradient of food, and when E < 0, the animal died. Parameter values were selected to ensure several switches between foraging and hiding (Fig. 3 and see Matthiopoulos et al. 2011). Each movement step was stored, but since our objective was to generate species distribution data similar to those obtained by intensive line transect or block surveys, a subsample of 1000 locations sampled at regular intervals from each scenario was used for model fitting. This also ensured a reduction in serial correlation and computation time.
Model fitting and validation
First, models were fitted to the data generated in simulations 1 and 2 (see Fig. 1). The model defined in eqn 3 was fitted to each individual sampling region (represented by the different panels in Fig. 1). The model defined in eqn 4 was fitted to the pooled data and provided a means to quantify the influence of availability on use by estimating ω.
Second, to illustrate how estimates of habitat selection parameters can vary as a function of availability in simulation 3 (i.e. the simulation in continuous environmental space, see Fig. 3), separate HSF models were fitted to each region. Models were fitted using the unconditional IPP likelihood, where the integral was approximated using quadrature points located at the centre of each grid cell (Baddeley & Turner 2000). The density of species locations was assumed to vary as a smooth function of food abundance, here defined by 5 b-spline basis functions (Ramsay & Silverman 2005), using the function ‘bs’ [package ‘splines’ and see Hastie & Tibshirani (1990)] of the r software (R Development Core Team 2011).
Third, we fitted several models to the pooled data from simulation 3. We began by illustrating how a use-only model can be fitted in continuous environmental space [i.e. M1 defined below, where habitat availability is assumed to have no influence on use, see also Fig. 3d in Aarts, Fieberg & Matthiopoulos (2012)]. This model was fitted using the unconditional form of the likelihood defined in eqn 1, but with . Consequently, it was necessary to approximate the integral using quadrature points distributed uniformly in environmental space.
where, η is the log intensity function in environmental space (i.e., ), βm are fixed effect parameters, and gm( · ) are a set of b-spline basis functions.
Finally, we fitted three models (M2–M4 below), to illustrate how a traditional habitat selection model (M2) can be extended, by allowing its coefficients to vary as a function of habitat availability
is the expectation of gm(x) in the kth sampling instance.
These models were fitted using the Unconditional IPP likelihood function, with quadrature points distributed uniformly in geographical space. M2 corresponds to a traditional HSF model, where the log-intensity is modelled as a smooth function of food density. M3 and M4 correspond to GFR models [eqn 6 – see Matthiopoulos et al. (2011)]. Specifically, the parameters, βm, associated with each basis function, gm(·), were modelled as a function of the availabilities (of one or more) of the other basis functions. M3 includes pairwise interactions between each basis function and its region-specific expectation, and (M4) also includes pairwise interactions between each basis function and region-specific expectations for all of the other basis functions. Justification for the latter model is given in Appendix S1.
Models in discrete environmental space were fitted in R (R Development Core Team 2011) using the Nelder–Mead optimizer ‘optim’ (Nelder & Mead 1965). Models fitted in continuous environmental space were fitted as Bayesian fixed-effects GLMs (M1 and M2) or generalized linear mixed-effects models (GLMMs) (M3 and M4) by means of Integrated Nested Laplace approximation, using the package ‘INLA’ (Rue, Martino & Chopin 2009). Although most standard statistical software (e.g. the ‘glm’ function in R) could be used to fit models (M1 and M2), a major advantage of INLA is that it can fit GLMMs (e.g. M3, M4) to large data sets very efficiently. Prior distributions of the fixed effects and the precision parameters of the random effects were set as Log-Gamma(1,0.001). To assess the predictive performance of the models fitted to the pooled data in simulation 3, we used 9-fold likelihood cross-validation (Geisser 1993; Boyce et al. 2002), using the summed log-likelihood values for the holdout data (with parameters set to values estimated from the training data) as a goodness-of-fit measure (Matthiopoulos 2003; Horne & Garton 2006). See Table 1 for more details.
Table 1. Goodness of fit values for models fitted to the simulated data from organisms moving in continuous environmental space (simulation 3). “(CV) LogL” are the (cross validation) marginal log-likelihoods. The best fit model is shown in bold
When individuals used two essential habitats, each for a fixed amount of time (simulation 1), estimates of habitat selection parameters (from eqn 3) varied greatly depending on the absolute availability of the habitat in the study area (Fig. 2a). When habitat was highly abundant, the values for the estimated coefficients were negative, suggesting an apparent avoidance (similar to Beyer et al. 2010). When the model defined in eqn 4 was fitted to pooled data, ω was estimated to be 0, correctly suggesting that habitat use was not impacted by the relative availability of habitats. Furthermore, when we extended the standard habitat selection function (eqn 3) by including the estimation of ω (eqn 4), the log-likelihood increased from −2533 to −2024; hence, the latter more flexible model was preferable (likelihood ratio test, Deviance = 1016.5, P-value <0.001).
In contrast, when organisms selected from two substitutable, but unequally preferred habitats (i.e. simulation 2), parameter estimates of models that included availability as an offset (eqn 3) were consistent across the different availability scenarios. When fitting eqn 4 to the pooled data, , indicating that habitat use was proportional to availability across the range of scenarios (Fig. 2b).
Several models were fitted to the data from the third simulation, where individuals followed food gradients until saturation was achieved, after which they reverted to climbing the cover gradient (Fig. 3). The shape of the habitat selection function estimated for each sampling instance (Fig. 4a) varied greatly depending on the absolute availability of food. When food was highly abundant, the estimated preference function (w) was a relatively flat dome-shaped curve, with apparent avoidance at high food densities; this result occurs because the organisms were able to meet their food requirements in regions close to areas with good cover but, moderate food levels. Only when absolute food availability was low, did preference increase sharply with increasing food availability. The low food availability scenario also led to 49% mortality, hence the lower number of points in the bottom-right panel of Fig. 3.
Table 1 shows the (cross-validation) goodness-of-fit measures of the different models fitted to the pooled data from simulation 3. When the contribution of availability was set to 0 (M1, Table 1), the log-likelihood was −62474 (df = 6). Modelling habitat selection using the standard IPP approach in which use is assumed to be proportional to availability (and the integral in the likelihood function is approximated in geographical space), led to an increase in the log-likelihood (i.e. −61101, df = 6, M2, Table 1). Yet, the best models were those where the influence of availability was allowed to interact with the habitat selection parameters, using the GFR approach (i.e. −60988 (df = 11) and −60814 (df = 31) for M3 and M4, respectively; Table 1).
Figure 4b shows the estimated linear predictor of each sampling instance, based on the best model (M4, Table 1) fitted to all data. Although the results (in Fig. 4a,b) look similar, a major advantage of the single functional response model is that it can predict habitat selection for scenarios with all combinations of food and cover availabilities that lie within the range of the observed data (Matthiopoulos et al. 2011).
The influence of habitat availability in shaping species distributions
The objective of most species distribution models is to quantify the link between species and the habitats they rely on, and to use the resulting function to predict in space or time (Buckland & Elston 1993; Boyce & McDonald 1999; Guisan & Zimmermann 2000; Phillips, Anderson & Schapire 2006). Despite the fact that individuals may follow simple rules to acquire resources and seek cover, this study shows that the functional form of habitat selection functions can vary non-linearly with the absolute availability of resources (see Fig. 4a). Furthermore, the impact of availability on the estimated habitat selection function strongly depends on the utility of a particular habitat, and how it interacts with the distribution and abundance of other habitats (e.g. Figs 2 and 4). For example, when food was superabundant, organisms in simulation 3 did not increase their fitness by spending more time in areas of high food density. In such cases, increasing the availability of the resource will result in a diminishing use-availability ratio, which eventually may lead to an apparent avoidance (Mysterud & Ims 1998; Beyer et al. 2010). Also, when other factors play a role, such as the necessity to forage during daylight only, increased availability of foraging habitat may not necessarily lead to increased use of such habitats (see e.g. Fig. 2b).
In natural settings, the distribution of species is driven by multiple environmental conditions and resources. For the study species, the resources can be essential, substitutable, antagonistic, complementary or inhibiting (Tilman 1986). In addition, several environmental variables will define the constraints and costs (e.g. heat loss, handling time, searching time and inflicted mortality) of resource acquisition. Our simple simulation in which an organism responds to two resources, already leads to complex space use patterns. Real-life systems probably require more complex functional response models.
Quantifying the effect of habitat availability on species distributions
Although rarely stated explicitly, IPP models (Cressie 1993) in which the intensity of space use is assumed to be a log-linear function of covariates, assume that changes in availability lead to proportional changes in habitat use. The equivalence between these IPP models and other popular species distribution and habitat selection models [e.g. MAXENT (Phillips, Anderson & Schapire 2006), weighted distribution models (Lele & Keim 2006), HSF's fitted using logistic regression (Boyce & McDonald 1999; Johnson et al. 2006)], implies that similar assumptions underlie these models as well. For a particular sampling instance, IPP models are capable of quantifying how the distribution of a species is shaped by environmental covariates (conditional on availability being defined correctly). Nonetheless, IPP models may not adequately account for non-linear changes in space use that can follow from changes in availability, and as a consequence, these models may poorly predict space-use patterns in habitats that differ considerably from those for which data have been collected. These problems are particularly pertinent when organisms display a strong functional response in habitat selection.
To measure the influence of availability on use, a first step is to fit separate models to different sampling instances, and to inspect how absolute availability impacts the estimated habitat selection functions (Mysterud & Ims 1998; and see e.g. Fig. 4a). Although easy to implement, the variability in the estimated habitat selection functions can also be the result of sampling error, which can be particularly apparent when sample size is small. Alternatively, 2-step approaches have been used. First, a mixed effect model is fitted to all data, treating the parameters as random variables which are allowed to vary by region (or sampling instance). In the second step, the random coefficients are modelled or plotted as a function of availability (Moreau et al. 2012). GFRs, by contrast, perform this task in a single step under the assumption that HSF parameters vary as a function of availability.
The strategy of this paper was to start from first principles; inspecting how availability is incorporated into traditional habitat selection functions, and illustrating the consequences with regard to the interpretation of fitted habitat selection models. Next, we illustrate different ways to extend the HSF, allowing for a more flexible role of availability. For selection of discrete habitats, discrete choice models (McCracken, Manly & Heyden 1998) are commonly used with log-availability of each habitat included as an offset (Kneib, Knauer & Küchenhoff 2011). Here, we show how the approach of Mysterud & Ims (1998) which is based on a dichotomous classification, can be extended by estimating multiple parameters (i.e. ω's) that control the influence of availability of all habitats (eqn 3). When simulated animals used two essential habitats for a fixed amount of time, independent of availability, the ω parameter was estimated to be close to 0 (see also Fig. 2a). In contrast, when simulated organisms selected among two substitutable habitats, with increases in availability resulting in proportional increases in use, , as in traditional discrete choice models.
When the number of habitats increases, the number of parameters needed to define the role of habitat availability would increase proportionally under the approach described above. In the extreme case, when species distributions are modelled using continuous covariates, an infinite number of additional parameters are required. An alternative approach is to model the habitat selection coefficients as a (non-) linear function of availability. To allow for even more flexibility, an interaction between the covariate of interest and a measure of its availability, or that of other covariates, may be included (a key feature in GFR models, see Matthiopoulos et al. 2011). Here we extended the GFR approach by modelling the log-intensity function using a set of b-spline basis functions. In addition, the corresponding coefficients (β) are allowed to vary as a function of the interactions between these basis functions and their expectations. This increased flexibility was able to capture non-linear effects of habitat availability on selection.
The results of this study illustrate that GFR models tend to fit data better than traditional IPP models and offer better predictions, particularly in novel environments (see Table 1). GFR approaches that allow the habitat selection parameters to vary as a non-linear function of availability (e.g. eqn 5), seem particularly appealing.
Correctly defining availability
The main objective of this study was to illustrate the influence of availability on habitat use and to explore various modelling approaches that can quantify relationships between availability and use. However, we did not address the problem of how best to quantify availability in the first place. Whether a particular point in space (and its underlying environmental conditions and resources) is available, is highly time-dependent. In the short-term (i.e. seconds), only points in the direct vicinity of the organism are available. However, on an evolutionary time scale (i.e. millions of years), the entire globe may potentially be available for a species to colonize. Although miss specification can heavily influence the results for single sampling instances, we suspect it may not be as problematic for prediction if the model is flexible in the way it incorporates the effect of availability on habitat use (e.g. a GFR type of approach that utilizes spline-based smoothers), and if the data are collected from multiple environments [so that the effect of (potentially miss specified) availability on use is adequately captured].
A further improvement to the correct definition of what is available, could be achieved by estimating the parameters of the GFR using geographically weighted regression (GWR – Fotheringham, Brunsdon & Charlton 2002; Páez, Uchida & Miyamoto 2002). The GFR method presented here, models the parameters of the HSF as a (non-linear) function of the mean or higher order moments of the covariate values present within each sampling instance. These sampling instances are assumed to be discrete, spatially disjoint regions. In practice however, the perceived availability of covariate values may differ even within such sampling instances. Therefore, an alternative solution is to estimate availability at each point in space as a kernel-weighted average of its surrounding environmental conditions, and allow the GFR coefficients to vary as a function of these locally defined availabilities. An additional appealing aspect of this approach is that it may be possible to fit GFR models for single sampling instances, as long as the environmental conditions within that sampling instance are sufficiently diverse.
Species distributions are often shaped by a large variety of environmental variables, such as food density, predation pressure, human disturbance, temperature exposure, pollution, etc. To capture the multitude of non-linear effects of these covariates on species distribution probably requires complex models. Increasing model complexity by capturing the non-linear influence of availability on habitat use, indeed improves both model fit and prediction (this study and see also Matthiopoulos et al. 2011).
However, this study also illustrates that simple movement rules can result in complex space-use patterns and habitat selection functions. Also, varying foraging strategies (i.e. varying patch-departure and interpatch-movement rules) may affect the inference from habitat selection functions (Bastille-Rousseau, Fortin & Dussault 2010). These results suggest that mechanistic movement models may be more parsimonious (i.e. require fewer parameters and yield better predictions under novel environmental conditions) than species distribution models. One challenge to building a realistic movement model, however, is where to start (e.g. what covariates to include)? The most influential covariates may be unknown, and in addition, they may strongly depend on the absolute availability of all resource types present in the study area. This is where GFR models may play an important role: GFRs model the disproportionality between use and availability and thus, may extract those covariates likely to be most influential, and in addition, determine how the influence of these covariates varies as a function of the availability of that and other resources. GFR models may also hint at whether a habitat is essential or substitutable. For example, the slope of the preference function increased with decreasing availability for essential covariates in both the discrete and continuous simulations. After careful interpretation, these insights could then be used to construct a simple mechanistic movement model focused on the most influential covariates. Next, instead of modelling the disproportionality between use and availability (where availability is essentially a reflection of a null-model of uniform movement), one could model the disproportionality between use and the movement model (see e.g. Aarts et al. 2008 for an example), thereby identifying any missing components of this (movement) model. This process could be repeated iteratively, or ideally incorporated into a single model framework [akin to Moorcroft & Barnett (2006); Forester, Kyung & Rathouz (2009)]. In summary, empirical distribution models can extract the most influential habitat types, while mechanistic movement models are most proficient in quantifying movement within and between these habitat types. Perhaps the complementary use of these methods is where the best predictions of natural or human-induced environmental change can be attained.
Implications for species conservation
For species management, we need to know where animals are, why they are there, and where else they could be. This information can be used to define special areas of conservation (Embling et al. 2010), restore or preserve habitats, and facilitate immigration or introduce new specimens into those regions with high predicted density (Olsson & Rogers 2009). HSFs can play an important role in these questions. Yet, given the findings of this and other studies, one may conclude that traditional HSFs will not always adequately capture the effect of all essential covariates. In an extreme case, when an organism lives under optimal conditions, it may be able to acquire all necessary resources by simple moving randomly in space. In this scenario, habitat use would be approximately equal to habitat availability, and the habitat selection parameters would be estimated to be close to zero.
We suspect the utility of HSFs will largely depend on the specific underlying modelling objectives. Since HSFs capture the disproportionality between use and availability, they will estimate significant positive coefficients for those environmental conditions or resource densities where use far exceeds availability. From a conservation perspective, these findings may be sufficient for managing habitat where threatened populations are present; those environmental conditions with low availability and high usage are probably most limiting. However, if we wish to generalize to other members of the population or predict how species will respond to new environments (e.g. following climate change or reintroduction efforts), the effect of habitat availability needs to be accounted for. One of the primary objectives of this study was to present techniques for doing so. However, before these methods can be implemented, data from multiple sampling instances (i.e. different regions in space and periods in time) are needed on the distribution of a species (Albert et al. 2010). This ensures that model predictions are based on interpolation, rather than extrapolation, and hence, improves model transferability between regions (Randin et al. 2006). Currently, there is a tendency to increase sample size (e.g. more tagged individuals or higher spatial and temporal resolution) in a single spatiotemporal frame. If we wish to use GFR models to understand why individuals select certain habitats and to predict consequences of climate change on the distribution and abundance of a species, we may first need to reconsider how we collect our data.
We thank Lyman McDonald, Wayne Thogmartin and Bryan Manly for inviting GA to the 2011 Wildlife Society meeting and organising the special theme session. We are grateful to the associate editor, Devin Johnson, and two other anonymous reviewers for their constructive comments. The Wildlife Society biometrics working group is acknowledged for contributing towards travel costs. GA is funded by the NWO-ZKO grant ‘Effects of underwater noise of fish and marine mammals in the North Sea'.