SEARCH

SEARCH BY CITATION

Keywords:

  • habitat suitability;
  • occupancy;
  • presence-only data;
  • step selection functions;
  • telemetry;
  • weighted distributions

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Concepts revisited
  5. Statistical inference and interpretation
  6. Discussion
  7. Acknowledgements
  8. References
  1. During the last decade, there has been a proliferation of statistical methods for studying resource selection by animals. While statistical techniques are advancing at a fast pace, there is confusion in the conceptual understanding of the meaning of various quantities that these statistical techniques provide.

  2. Terms such as selection, choice, use, occupancy and preference often are employed as if they are synonymous. Many practitioners are unclear about the distinctions between different concepts such as ‘probability of selection,’ ‘probability of use,’ ‘choice probabilities’ and ‘probability of occupancy’.

  3. Similarly, practitioners are not always clear about the differences between and relevance of ‘relative probability of selection’ vs. ‘probability of selection’ to effective management.

  4. Practitioners also are unaware that they are using only a single statistical model for modelling resource selection, namely the exponential probability of selection, when other models might be more appropriate. Currently, such multimodel inference is lacking in the resource selection literature.

  5. In this paper, we attempt to clarify the concepts and terminology used in animal resource studies by illustrating the relationships among these various concepts and providing their statistical underpinnings.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Concepts revisited
  5. Statistical inference and interpretation
  6. Discussion
  7. Acknowledgements
  8. References

Habitat and food-selection studies have played a significant role in our attempts to understand the distribution of animals in the environment (Lack 1933; Svärdson 1949; Hildén 1965). Quantification of animal selection patterns in early studies was limited to a handful of approaches. During the last decade, however, there has been a proliferation of statistical techniques for analysing data from field investigations of resource selection by animals. This has allowed biologists to conduct their studies more efficiently and to apply their results to important conservation problems. Accompanying the presentation of these refinements has been discussion of the appropriateness of the study design and model alternatives for the data (Aldredge & Ratti 1998; Keating & Cherry 2004; Johnson et al. 2006; Lele & Keim 2006; Thomas & Taylor 2006). While this discourse has heightened the practitioner's appreciation for the issues involved, considerable confusion has emerged both in the understanding of the meaning of these concepts and in the interpretation of the outcomes of these various approaches that hinders their correct application to real-world situations. In this paper, we revisit some of the basic concepts of resource selection, use, choice and occupancy and clarify in precise terms the statistical underpinnings of these concepts. We illustrate how the probability of selection is distinct from these related concepts. Our goal is to bridge what we see as an emerging gap between the development of new statistical techniques and the practitioners' understanding of these approaches in resource selection and habitat studies.

Concepts revisited

  1. Top of page
  2. Summary
  3. Introduction
  4. Concepts revisited
  5. Statistical inference and interpretation
  6. Discussion
  7. Acknowledgements
  8. References

In the following, we review and define concepts used in resource selection studies and contrast resource selection with resource use, choice and occupancy.

Resource unit and resource type

It is critical that we clearly understand the distinction between a ‘resource unit’ and a ‘resource type’. Resource units are tangible items that are distributed over space and time naturally as discrete units, such as eggs that are available for consumption by a fox, or are devised by a researcher to be discrete units, such as pixels imposed on a landscape. These resource units are assumed to be equivalent to each other in some sense such as area or ‘an egg’ or ‘a nesting site’. Resource units may have one or more attributes that can be described categorically, such as egg colour, or continuously, such as forage biomass (g/m2) on a pixel of land. Attributes of a unit determine its type. If two distinct resource units have identical attributes, those units are of the same ‘resource type’ although they are different ‘resource units’. In cases where multiple attributes of a resource unit are of interest, all units with the same set of attributes are considered to be of the same resource type. For example, two locations with exactly the same elevation and soil type are considered to have the same resource type, although they are clearly different resource units. In mathematical notation, inline image denotes the collection of attributes of a resource unit, henceforth termed covariates, which characterize the resource type of the i-th resource unit.

Available resource units and their distribution

Available units are those units that could potentially be encountered by the animal. The available distribution is commonly denoted by fA (x), and it is a distribution on the environmental covariate space. It tells us in what proportion the different environmental covariates occur in the set of all available resource units. In the simple case of categorical resource types, the available distribution reflects the proportion of various resource types in the collection of available resource units. For example, consider an environment where a fox depredates eggs. Suppose there are 1313 available eggs, 851 of them are blue and 462 of them are brown (Table 1). The proportion of blue eggs in the total available units is 851/1313 = 0·6481, and proportion of brown eggs is 462/1313 = 0·3519. These proportions add up to 1 and correspond to the available distribution, fA(x), of egg types. Thus, available distribution answers the question, if a resource unit is an available unit, what is the probability that its resource type is x. (Note: In strict statistical sense, the numbers 0·6481 and 0·3519 here are estimates of the true available distribution. For pedagogical simplicity, we will use these numbers as if they were true probabilities and not distinguish between estimates and the true values. Furthermore, these and other related values are referred to and used throughout the rest of the paper.)

Table 1. Data used as a hypothetical example of a fox selecting from two types of eggs, blue and brown eggs, that it encounters (available) moving through the environment
Resource typeEncountered eggsUsed eggsUnused eggs
Blue851709142
Brown462154308
Total1313863450

Used resource units and their distribution

Used units are those resource units that are encountered and selected and are part of a set of resource units that have received some investment by an animal (i.e. used set) during a sampling period. The investment can be an event such as a visit to a nest site or the removal of a twig from a shrub, or, an amount of time spent at a roost site (Buskirk & Millspaugh 2006). When resource selection function (RSF) is used with telemetry data, ‘quantity’ of use at any specific instance is typically not taken into account. For example, suppose we collect telemetry data every 30 min. In that case, a location that is visited for 1 min and a location that is visited for 20 min are not distinguished. They are counted as one instance of use. If a location is visited on 3 separate instances, that specific resource unit gets counted three times and hence gets weight proportional to the number of instances it was used when computing the use distribution. The use distribution, denoted by fU(x), is a distribution on the environmental covariate space. It tells us in what proportion different covariates occur if we consider only the set of used resource units. In the simple case of categorical resource types, the use distribution reflects the proportion of various resource types in the collection of used resource units. In the fox example, suppose the total number of used units is 863 of which 709 are of the blue egg type (Table 1). Hence, proportion of blue eggs in the used units is 709 of 863 = 0·8215 (Table 2). Similarly, the proportion of brown eggs in the used units is 154/863 = 0·1785. Note that these again sum to 1. Thus, use distribution answers the question, if a resource unit is a used unit, what is the probability that its resource type is x. It, thus, computes P (Unit is of type x|Unit is a used type).

Table 2. Glossary of terms
TermMathematical definition or notationFox example calculation
Number of available (and encountered) units N 1313
Number of used units n 863
Number of available (and encountered) units of type x a(x) a(Blue) = 851, a(Brown) = 462
Number of used units of type x u(x)u(Blue) = 709, u(Brown) = 154
Probability of selection (RSPF) inline image s(Blue) = 709/851 = 0·8331, s(Brown) = 154/462 = 0·3333
Available distribution inline image inline image,inline image
Use distribution inline image inline image, inline image
Resource selection function (RSF)w(x) = cs(x) where c > 0 is any constant

If we take c = 1·9,

w(Blue) = 1·9*0·8331 = 1·5829,

w(Brown) = 1·9*0·3333 = 0·6333

*Different constants lead to different RSF values

Relative probability of selection (RS) inline image inline image
Odds of selection for resource type x inline image

Odds for a blue egg being selected = 0·8331/(1–0·8331) = 4·9916.

Odds for a brown egg being selected = 0. 3333/(1–0·3333) = 0·4999

Odds ratio of selection for resource type x vs. resource type xR inline image Odds ratio for blue vs. brown eggs is 4·9916/0·4999 = 9·9852
Inter-relationships and related concepts
Use distribution, Available distribution and RSPF inline image inline image
Use distribution, Available distribution and RSFinline image Recall that w(x) = cs(x) for some c > 0. This constant does not affect the used distributionSame as above.
Probability of choice for a resource unit of type x inline image where ac(x) denotes the number of available units of type x in the choice set.

If the choice set is {3 blue eggs, 7 brown eggs},

inline image

Probability of choice for a specific resource unit of type x inline image P(Unit #15 of type blue is chosen)inline image
Probability of use for a specific resource unit that is of type x inline image In the fox example, for n = 5 and and N = 100, u(Blue,i) = inline image
Probability of occupancy for resource type x inline image Assuming there are 65 blue eggs in 100 (N) available eggs, it follows that probability of obtaining at least 1 blue egg in the used sample is given by inline image

We want to emphasize that this probability is quite different than the commonly and loosely employed term ‘probability of use’. The term ‘probability of use’ by itself does not have a proper meaning. To give it a proper meaning, we need a restrictive modifier and specify the unit and the resource type of the unit. We can answer the question ‘probability that a specific resource unit with resource type x will be used’. This answer, as will be described later, depends on two quantities: probability of selection for the resource type x and the probability that the resource unit will be encountered. Use distribution does not answer this question.

Selection, probability of selection and resource selection probability function

We denote selection of a resource unit by an animal as the act of using a resource unit if it is encountered. The probability of selection of a resource unit depends on its resource type. The resource selection probability function, RSPF, models this and is defined as the probability that a resource unit of type x is selected (or, becomes part of the use set) when encountered. We denote the selection probability as s(x), s standing for selection and x = (x1x2, …, xp) is the collection of attributes, that characterize the resource type at that resource unit.

Selection is defined to be strictly a binary decision with outcomes of use or non-use of a resource unit when encountered. This makes the probability of selection a fundamentally different metric than probabilities of use, choice and occupancy as we discuss later. In the fox example, a fox encounters 462 brown eggs and consumes 154 of them (Table 1), then the probability of selection of a brown egg is s(brown egg) = 154/462 = 0·3333 (Table 2). If the fox consumed 709 of 851 blue eggs that he encountered, then s(blue egg) = 0·8331. We again remind the reader that although, strictly speaking, these are estimated probabilities of selection, we will ignore the distinction between estimates and true probabilities in the following discussion. We will behave as if these are true probabilities of selection.

Resource selection function and selection ratio

Resource selection function is defined as any function, w, such that w(x) = ks(x) for all values of x and for a fixed number k > 0. The selection ratio, also referred to as forage ratio or preference index (Manly et al. 2002: Table 1·1), is the origin of the concept of the resource selection function. Although originally defined only for categorical resource types, it can be extended to the general concept of resource type. Selection ratio is denoted by w(x) and is defined as inline image. Following this definition, the selection ratio for blue eggs is inline image . In fact, selection ratio is identical to the RSF and hence is proportional to the probability of selection (RSPF). Thus, the RSF or selection ratio is proportional to the probability of selection, not to the use distribution.

Choice probability

Discrete choice models also have been interpreted as resource selection functions (RSF, Cooper & Millspaugh 1999; McDonald et al. 2006), but the choice probability predicted by the discrete choice model is not the same as the probability of selection as defined above. A choice set is a collection of resource units from which the animal may select a unit to use. This is identical to the set of available resource units as defined earlier. Choice is defined as the event of choosing one and only one discrete unit from the given choice set where all members of this set are assumed to be equally accessible and known to the individual at the time of choice. The choice probability, which we denote by c(x), c standing for choice, is the probability that a chosen resource unit will be of resource type x. This probability depends on the probability of selection of a resource, s(x), and also on the proportion of each resource type in the choice set (Table 2). Thus, it is not the probability of selection as sometimes considered. An implicit assumption in the following calculations is that all resource units in the choice set are equally accessible. Suppose the fox encounters 10 eggs at a nest site (the choice set) that consists of 3 blue and 7 brown eggs. With this constrained availability, the available distribution can be computed as fA(blue) = 0·30 and fA(brown) = 0·70. Then, given the fox chooses and consumes one and only one egg from this choice set, the probability that the consumed egg is of the colour blue is given by inline image . The choice probability for a brown egg from this choice set is inline image . Suppose we change the choice set to one consisting of 1313 units (eggs) of which 851 are blue and 462 are brown (Table 2). Now, the available distribution for this choice set is fA(blue) = 0·6481 and fA(brown) = 0·3519. In this case, the probability that a chosen egg is blue is given by inline image . Similarly, the choice probability for a brown egg from this choice set is inline image . Notice that, for a fixed choice set, the choice probabilities always sum to 1 because the fox must choose one of the eggs from the choice set and that egg will be either blue or brown. In fact, choice probability, c(x), reflects the probability that a used unit is of a specific resource type and is nearly equivalent to the use distribution f U(x) defined earlier. They are, of course, identical if the choice set is the same as the set of available units.

We also can compute the probability that the chosen resource unit is a specific unit of type x. For example, suppose we want to compute probability that a specific egg (denoted egg #15), which is blue, is chosen among the three blue eggs and seven brown eggs in a choice set. This probability is denoted by c(blue,15) and can be computed using elementary probability rules as inline image. Now that we are specifying that a particular blue egg be chosen and not just any blue egg, this probability is 1/3rd of the probability of choosing any of the three blue eggs.

Probability of occupancy of a specific resource unit

One can consider a specific resource unit as ‘occupied unit’ if that specific resource unit is used at least once during a specified time frame. The probability that a specific resource unit will be used at least once is identical to the inclusion probability of specific population unit when we use sampling from a finite population with replacement and unequal probability sampling design (Thompson 2002:57-58). This probability is a function of the total number of available resource units (N), the number of resource units in the used set and probability of selection for the various resource types, s(x). For example, suppose there are 100 available pixels in the home range of a GPS-collared fox that are classified as having either high or low egg density, and the probability of a fox selecting a pixel with high egg density is 0·65. At first, we observe the fox for a short time period, say for five relocations. We can compute the probability that a specific pixel (i.e. #15) known to have a high egg density is used during this period. This probability, ψ(x,15), is given by: inline image (Table 2). Now imagine that the fox is observed for 200 relocations. Then the probability that at least one of the relocations of the fox is in pixel #15 increases to inline image. Similarly, suppose the fox is observed for only five relocations but the resolution of the pixels is changed and the number of available pixels is now 200 even though the home range remains the same. Now, the probability that a pixel labelled #15, ψ(x; 15), will be included in the used set decreases to inline image . If the fox is observed for a very long time, eventually all available pixels are likely to be visited and this probability increases to 1. Of course, if a pixel in the home range is not accessible to the animal, occupancy over time is not possible. Thus, probability of occupancy depends on the probability of selection, total number of available units and the total number of encounters. Although occupancy is related to the probability of selection, it is not the same as the probability of selection.

Finally, we briefly mention the term preference. Johnson (1980) defined preference as selection when the available distribution is uniform over the different resource types. There may be a couple of reasons behind equalizing availabilities. First, the intent might be to remove any influence of the order, amount or perceived ‘lost future opportunity’ that might influence an organism's decision and thus to focus the decision on the perceived value of a resource only. Secondly, if availabilities are equal, then the relative probability of use of one resource type to a second type is equal to the relative probability of selection of these types. To us, it is unclear whether equal availability will remove the former influences and the latter is unnecessary if the concepts of use distribution, probability of selection and their relationships are clearly understood. More importantly, while an experimental design can provide equal availabilities, most commonly employed designs in field studies do not support estimation of preference as defined by Johnson (1980), and hence, we do not discuss this concept any further.

In the literature, the terms ‘occupancy’ and ‘occurrence’ are often conflated. The common linguistic usage of occurrence connotes frequency of an event, whereas the term occupancy connotes at least once. Similarly, it appears that researchers are using the concept of probability of occupancy (sensu, MacKenzie et al. 2006) and the probability of selection interchangeably. We believe that this confusion is caused because of the lack of proper restrictive modifiers leading to incomplete and imprecise articulation of the question. One cannot compute the probability of an event until the event is fully defined. Take for example, the event of occupancy. Is this occupancy of a specific resource unit at a specific time point or is it occupancy of the resource unit at least once during a season? These two events are distinct, and hence, the computation of probability of occupancy without specifying the proper restrictive modifiers is impossible and if not stated explicitly can lead to confusion. Similar confusion occurs in defining probability of use: Is it use of a specific resource unit once? Or, is it use of that specific resource unit at least once during the study period? It is imperative that researchers specify the events completely and then compute the probability for that event. Furthermore, distinct events need distinct names. In the resource selection literature, lack of complete specification of the events and the lack of distinct names for different events has led to substantial confusion. To reduce the confusion and to summarize our discussion, we propose the following definitions of the key concepts of selection, use, occupancy and occurrence specifying appropriate restrictive modifiers.

  • (a)
    Probability of selection refers strictly to a binary decision an animal makes. Given that a resource unit of resource typexis encountered, what is the probability that an individual will select it? This decision depends solely on the resource type and nothing else; neither on what other resources are potentially available nor on the encounter probability.
  • (b)
    Probability of occupancy always refers to the occupancy of a specific resource unit: What is the probability that a specific resource unit will be used at least once during a specified period? The restrictive modifiers, ‘specific unit’, ‘specific time period’, are critical to define this term properly. For a unit to be occupied first, it needs to be encountered by an individual and then depending on its resource type, the animal may or may not select it. Thus, probability of occupancy is a function of its resource type (which affects the probability of selection) along with its location in the study area and the total number of animals in the study area (which affect the probability of encounter and frequency of use). When we refer to probability of occupancy, we should always specify which unit and not simply its resource type. Recall that, as illustrated earlier, if the number of individuals in the study area is large, the probability of occupancy for a specific unit can be close to 1 although probability of selection for its resource type is small. Probability of occupancy and probability of selection are not the same.
  • (c)
    Probability of occurrence refers to the frequency of occupancy of a specific resource unit: What is the probability that a specific research unit will be used ‘k’ number of times during a specified time period? This, thus, depends not only on the probability of selection but also on the frequency with which that specific resource unit will be encountered. Thus, occupancy of a unit is the censored version of the occurrence; one or more instances of use are recorded as 1 in occupancy.
  • (d)
    Probability of use refers to a single instance of use of a specific resource unit and hence, similar to the probability of occupancy (that refers to at least one instance of use), needs the restrictive modifier of which resource unit. In order for a specific resource unit to be used, it first needs to be encountered and then selected for consumption. Probability of selection and probability of use are different. A resource unit with a highly desirable resource type, and hence high probability of selection, may have small probability of use if it is not easily encountered (Keim, DeWitt & Lele 2011).
  • (e)
    Probability of choice refers to a single instance of use of a specific resource unit (or a specific resource type) out of a well-defined choice set of resource units. Hence, this requires a restrictive modifier of which resource unit (or, resource type) and the choice set. Without this information, probability of choice is ill defined.
  • (f)
    Manly et al. (2002) discuss RSF in the context of use and available study design. Johnson et al. (2006, appendix) provide a mathematically precise description of the definition of RSF in terms of weighted distributions. Our definition (see also Lele & Keim 2006) generalizes this definition to extend it to the probability of selection rather than limiting it to the relative probability of selection. In this situation, implicitly corresponding to each RSF, there is an underlying RSPF and vice versa. Manly et al. (2002) also discuss RSPF in the context of used and unused study designs when a finite number of resource units are studied with each unit classified as either used or unused. Then, a binary regression is used to estimate the probability of use. This corresponds to our definition of probability of occupancy. This is not the same as probability of resource selection as has been shown before. The RSF in use and available design and the RSPF in the used and unused design are not proportional to each other. They are entirely different concepts although unfortunately similar terminology was employed.

Statistical inference and interpretation

  1. Top of page
  2. Summary
  3. Introduction
  4. Concepts revisited
  5. Statistical inference and interpretation
  6. Discussion
  7. Acknowledgements
  8. References

In resource selection studies, the only information we have is the list of the resource units in the study area that were used during the investigation. We also have information on the resource units that could have been encountered during the study duration. Notice that a particular resource unit may be used repeatedly. This information is sufficient to obtain an estimate of f U(x), the use distribution. Given this limited information, the goal is to infer about the probability of selection (RSPF). Towards this goal, the key relationship is: inline image. In the fox example, the use distribution answers the question: what is the probability that an egg picked randomly from those found in the stomach of a fox (use distribution of 863 eggs that were consumed) being blue? This probability is 0·8215 (Table 2). This probability is derived by examining only those units that were used and assumes that the set of observed used units is a random sample from all used units. We also can calculate this probability using the RSPF or the RSF and the available distribution, fA(x). If the probabilities of selection (RSPF) are known, fU(blue) egg is given by inline image . Because the selection ratio is proportional to the probability of selection, in the above formula, one can replace the probabilities of selection by the RSF or selection ratios to obtain inline image . Thus, knowledge of probabilities of selection RSPF, RSF or even the selection ratios and the available distribution is sufficient to obtain the use distribution, fU(x). In contrast, generally, it is not possible to estimate the probability of selection given knowledge only of fU(x) or presence-only data. For example, probability of selection could not be estimated if knowledge of the 863 eggs used by the fox were collected from a museum (Pearce & Boyce 2006), without the accompanying data on what the fox encountered.

In practice, the complete set of encountered units is seldom known. Instead researcher postulates about the resource units that may be encountered and use that assumption to estimate the available distribution. For example, one may consider all the study area to be equally accessible and take a random sample from it to estimate the available distribution. Or, one may consider only a small buffer around the current location as accessible and use that area to estimate the available distribution. Unfortunately, there is no practical way to check whether the assumed available distribution is appropriate or not. Moreover, the choice of the available distribution strongly affects the inference about the probability of selection and other quantities. Researchers should be aware that the major assumptions behind various RSF-related methods are (a) the available distribution is correctly specified, (b) selection depends only on the characteristics of the encountered resource unit and is independent of knowledge of the resource types of other resource units, and (c) the probabilities of selection remain unchanged during the period of investigation.

At the heart of resource selection involves active behavioural decisions by an organism. When designing controlled feeding studies, for example, researchers must make decisions as to whether to offer a food item to a consumer sequentially (Stage 1 experiments) or simultaneously with other food items (Stage 2 experiments) (Underwood & Clarke 2005; Taplin 2007; Manly 2006). When observing free-ranging animals, the way animals move and encounter resources has implications for the interpretation of outcomes (Matthiopoulos 2003; Martin et al. 2008). If resources are encountered sequentially, prey selection can be influenced by the order of prey presentation due to different cumulative handling times or gut saturation even if the prey's ‘tastiness’ remains constant. In habitat studies, travel cost to a habitat and recent memory of resources can alter expectations and future selection of habitats. As a result, the time-scale over which selection studies are conducted can be important. Although researchers are aware of these complications, existing methods are unable to account for these complications.

One of the most commonly used RSPF models is the exponential RSPF. It is known that the intercept parameter (β0) of the exponential RSPF is non-estimable (Lele & Keim 2006). A computationally simple way to estimate the non-intercept parameters is to use any standard Logistic regression package (Johnson et al. 2006; McDonald et al. 2006). Use of a Logistic regression package to estimate the non-intercept parameters of the exponential RSPF has led to confusion about its interpretation. For example, a common misconception is that the parameters can be interpreted as log-odds ratios (Hebblewhite, Merrill & McDonald 2005). However, the interpretation of the parameters is based on the model that is being fitted, not by the computational procedure that is used to fit it. The coefficients in the exponential RSF model give relative risk (Ramsay & Schafer 2002) and not the log-odds ratio. Sometimes, researchers standardize the estimated exponential RSF model by dividing the RSF values by their sum over the entire study area. Another standardization that is used is to divide the estimated exponential RSF model by the maximum of the RSF value over the study area. Such standardized values do not correspond to the probability of selection. This is because the standardized values depend on the number of resource units in the study area and the types of units available in the study area, whereas by definition, probability of selection depends only on the characteristics of the unit and not on the characteristics of the other units in the study area nor on how many resource units are in the study area. RSF values can be interpreted only in relation to each other. For example, one can use RSF values to answer the question: given two resource units which one is more likely to be selected. Thus, the plot of standardized RSF values gives the correct visual impression of which areas are more likely to be selected and which ones are less likely to be selected, assuming encounter probability is the same for all resource units.

The exponential model is the unique model when both the used and the available distributions follow the Normal distribution. In practice, the case of both used and available distributions being Normal is rare because at least some of the covariates are categorical or are strictly positive and may have skewed distribution. Nevertheless, the exponential model may be appropriate when the distributions are not Normal. This needs to be checked using model comparisons between exponential and other models such as Logistic or Probit. We also note that, similar to the above result, if both used and available distributions are multinomial, the exponential model is the only model that is permissible. This result was implicit in Lele & Keim (2006) and led to the conclusion that one cannot estimate probability of selection when all covariates are categorical.

The exponential model is somewhat limited because it allows estimation of only a relative probability of selection (RSF). The major deterrent for fitting models other than the exponential form was the availability of software, but this capability is now evolving. Parametric functions such as the Logistic, Probit or Complementary log-log models can be fitted to telemetry data, both with common or changing availability. In contrast to the exponential RSPF, the Logistic, Probit or Complementary log-log models allow estimation of the actual probability of selection and not simply the relative probability of selection (Lele 2009; R package ‘ResourceSelection’), making them more useful than the exponential model. The weighted distribution approach to estimate RSPF is applicable provided the covariate space contains at least one continuous covariate. It also needs a substantial number of used points, generally 300 to 500, to obtain stable estimators of the probability of selection. The new capability for estimating an RSPF makes it pertinent to ask the question of when relative measures of selection (i.e. RSF) are adequate for the research or conservation question at hand. As argued by Lele (2009), although the relative change in probability of selection from 0·05 to 0·01 and from 0·9 to 0·18 is the same (one probability is 5 times the other), the management interpretations are likely to be quite different. In the first case, one might perceive that an already bad environment is made a bit worse; in the second case, a really good environment is worsened substantially. We believe RSPF provides more information about the value of the resource type than RSF. Of course, if one really wants to use the relative change in probability of selection, it can always be computed when the s(x) of two resource types is known.

We note that when a Logistic model is fitted using the weighted distribution approach (Lele & Keim 2006), the parameters should be interpreted as log-odds ratios for selection. Logistic regression is also used to estimate probability of occupancy (Manly et al. 2002), but the odds ratios in selection and occupancy have different interpretations. In the former, it is the odds of selecting vs. not selecting a resource unit when it is encountered, whereas in the latter, it is the odds that a resource unit is used at least once or never.

The weighted distribution approach to estimate RSPF is applicable for data arising from telemetry studies where repeat visits to the same resource unit are counted as multiple data points. On the other hand, this method is inappropriate for presence-only data arising from occupancy surveys because in these data occupancy usually means ‘used at least once’. The weighted distribution based analysis of occupancy surveys (e.g. Royle et al. 2012) does not estimate probability of selection or habitat suitability.

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Concepts revisited
  5. Statistical inference and interpretation
  6. Discussion
  7. Acknowledgements
  8. References

In this paper, we presented precise definitions of common statistical quantities used in resource selection analyses. The lack of distinction between various terms we discuss above has led to considerable confusion in the literature on appropriate modelling approaches. Probability of selection is inherent in use distribution, choice, occupancy and occurrence, but they all use additional components such as the available distribution, choice set or number of individuals in the population etc. One cannot compute the probability that a used unit is of type x solely from information on the characteristics of a random sample of resource units with known use (i.e. presence-only data) without knowing the selection probabilities explicitly. As a result, use distribution, based only on the relative proportion of a resource type within the use set, is often employed to ascribe the value of the resource type. However, this metric does not explicitly consider constraints imposed by what is available. This critical point seems to have been ignored by many researchers to conclude selection and use are synonymous to each other.

There are two, not necessarily mutually exclusive, philosophies of statistical modelling (Breiman 2001). One approach emphasizes the importance of models for better understanding of the mechanisms underlying the phenomenon presuming that better understanding can lead to better prediction and forecasting. Another approach emphasizes prediction and hence leads to models that do not pretend to necessarily lead to better understanding of nature but that are good at prediction. The machine-learning-based maximum entropy approach (e.g. Elith & Leathwick 2009) is commonly used in predicting species distributions using presence-only data. Mathematically, MaxEnt approach is identical to the exponential RSF model described earlier. Most of the applications of exponential RSF tend to use linear or polynomial parametric models, whereas the models in MaxEnt are more flexible. They both provide only the relative probability of selection and are highly dependent on the specification of the availability distribution. There is one important but subtle difference between the situations where RSF models are used and where MaxEnt usually has been used. The MaxEnt model is commonly used when occupancy data, where the location is occupied at least once during the study period, are available, whereas RSF models are used for telemetry data where the same location may be used multiple times. Thus, MaxEnt seems to answer the question of occupancy when the non-occupancy data are unavailable. Mathematically, however, there is no reason why MaxEnt cannot be used for telemetry data as well.

Aside from the issue of choice of available distribution, the RSF, MaxEnt or RSPF models are, inherently, not process-based models as pointed out by Austin (2002). They do not explicitly relate the inferences to survival or the process of growth of a population. Instead, these are descriptive models that are useful for generating plausible hypotheses of what animals might select. However, any such inferences need to be corroborated by other pieces of evidence that relate directly to the behaviour and survival of the species (e.g. McLoughlin et al. 2010, Wasser et al. 2011).

Statistical models provide structure for characterizing selection, occupancy, use and choice, and all these concepts have useful applications. Selection, use and occupancy are all related concepts with different uses in management. Precise definitions and understanding of these concepts are important for applied ecological research. Our paper has attempted to clarify the assumptions and relationships behind different concepts used in selection studies. The main culprit for the confusion seems to be the lack of precise statements about various quantities that are being studied. Often, the same nomenclature is used to describe mathematically distinct events. We can avoid such confusion in future by describing the quantities and the associated events precisely and completely.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Concepts revisited
  5. Statistical inference and interpretation
  6. Discussion
  7. Acknowledgements
  8. References

We would like to thank the referees and the associate editor for their careful reading and insightful comments that have improved our paper substantially.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Concepts revisited
  5. Statistical inference and interpretation
  6. Discussion
  7. Acknowledgements
  8. References