practical sampling situation
We envisage a practical situation where N locations are monitored for the presence or absence of target species. The monitoring locations may represent user-specified quadrats or sites within an area of interest, or discrete habitats such as ponds, islands or patches of vegetation. Each location is surveyed for the species on multiple (not necessarily an equal number of) occasions, and species are either detected or not detected during each survey. For the duration of the surveying the locations are closed to changes in the occupancy state with respect to each species, i.e. a species is either always present, or always absent from the location over the surveying period (this requirement may be relaxed in some situations, see the discussion).
The sequence of detections and non-detections at a location for each species may be recorded as a ‘detection history’: a vector of 1s (detection) and 0s (non-detection). For example, the detection history = 101 represents that location i was surveyed on three occasions, with species A being detected only in the first and third surveys. Similarly, the detection history = 000 would represent that species B was never detected at location i.
We define the model here for situations involving only two species, but the approach can easily be extended to a greater number of species. However, the number of parameters in the model increases exponentially with the number of species; hence this technique could become very ‘data hungry’ and not all of the parameters may be estimable for a given data set. In addition it could be difficult to interpret meaningfully the interactions among a large number of species; therefore, we recommend that users focus their research questions on a small (< 4) number of target species.
A monitoring location may be considered to be in one of four mutually exclusive states of occupancy for two species (more generally there are 2k possible states for k species); (1) occupied by both species A and B; (2) occupied by species A only; (3) occupied by species B only; or (4) occupied by neither species. Using the notation introduced in Table 1, we define a row vector for the probability of location i being in each of the four respective states as,
Table 1. Notation for the parameters used in the model
|Probability of both species being present at location i|
|Probability of species A being present at location i, regardless of occupancy status of species B|
|Probability of species B being present at location i, regardless of occupancy status of species A|
|Probability of detecting species A during the jth survey of location i, given only species A is present|
|Probability of detecting species B during the jth survey of location i, given only species B is present|
|Probability of detecting both species during the jth survey of location i, given both species are present|
|Probability of detecting species A, but not B, during the jth survey of location i, given both species are present|
|Probability of detecting species B, but not A, during the jth survey of location i, given both species are present|
|Probability of detecting neither species during the jth survey of location i, given both species are present; = |
- (eqn 1)
Note that the elements of φi sum to 1.
Conditional upon the occupancy state of the location, the probability of observing the detection histories for the two species can be stated in terms of the detection probabilities defined in Table 1. For example, the probability of observing the detection histories given in the previous section, conditional upon the location being occupied by both species, is:
Another possibility for this example would be that the location is occupied by species A only, in which case the probability of not observing species B is 1·0. The conditional probability of observing the two detection histories in this situation would be:
The probability of observing this combination of histories for all other occupancy states (occupied by species B only and occupied by neither species) is 0, as both states prohibit species A from being at the location, yet species A was actually observed there. Therefore, we define a column vector representing the probability of observing the detection histories conditional upon each state. For instance, using the above example:
- (eqn 2)
The unconditional probability for observing the two detection histories could then be calculated as:
- (eqn 3)
By using the probability vectors we account for potential uncertainties in the occupancy state of a location due to not detecting one or both of the target species during the surveys. Note that our use of different detection probability parameters for the cases of single-species and two-species occupancy is very general and permits the possibility that detection probability of one species depends on whether the site is occupied by the other species (e.g. the detection probability for a prey species may depend upon whether a predator species is also present). Some examples of detection histories and the unconditional probabilities of observing them are given in Table 2.
Table 2. Example detection histories (XA, XB) and the probabilities of observing them (Pr(XA, XB))
Assuming that the detection histories collected at the N locations are independent, we can define the model likelihood as:
- ( eqn 4)
This can then be maximized numerically to obtain the maximum likelihood estimates (MLEs) of the parameters.
For generality, we have presented the above model using location-specific parameters (as denoted by the subscript i); however, there is never sufficient information in the type of data considered here to estimate a different parameter for each location. Constraints are required in order to obtain MLEs (e.g. require all or groups of locations to have a common parameter). Another approach is to let the location-specific parameters be defined by some function of the features that characterize a location, i.e. habitat type, patch size, etc. We consider how these covariates could be accommodated by the model in a later section.
testing and quantifying interactions between species
The general likelihood-based framework presented above provides the opportunity to both test for, and quantify, the level of interaction between two species. There are two mechanisms through which we can investigate species interactions that may reflect different questions of biological interest; species occupancy probabilities or in terms of detection probabilities given the species are present. In both cases we can determine whether the occupancy (or detection events) for one species appear to be occurring independently of the presence (or detection) of the other species, i.e. do the species both occur at a site (or similarly, do we detect both species in a survey) more/less often than expected under an assumption of independence. In addition, we can determine whether there is evidence that the probability of detecting one species changes in the presence of the other species, i.e. if species B is also present at a site, we are more/less likely to detect species A (regardless of detecting species B).
To investigate potential interactions between species, one has the choice of using hypothesis testing or a model selection approach, depending upon the goals of the research. Standard likelihood ratio tests (LRT) could be used to test for independence of the species with respect to either occupancy or detection. For example, if species occupy sites independently then, based upon the statistical definition of independence, it would be expected that ψΑΒ = ψΑ × ψΒ. A LRT could be constructed by comparing the likelihood values from two models; a full model where ψΑΒ, ψΑ and ψΒ, and are each estimated; and a reduced model where only ψA and ψB are estimated, with ψΑΒ being calculated as the product of ψΑ and ψΒ (note that the structure for all other parameters is unchanged between the full and reduced models). By conducting the test it is possible to determine whether there is sufficient evidence to reject the null hypothesis of independence. Examples of the constraints that could be imposed are given in Table 3. Alternatively, it may be appropriate to explore the data using information-theoretic model selection approaches (e.g. Akaike's information criterion, AIC), where the intent is to find a set of parsimonious models upon which inferences about the species biology could be made (e.g. Burnham & Anderson 2002).
Table 3. Examples of the constraints that should be imposed for testing the independence of occupancy and detection probabilities, where and are the marginal detection probabilities for the respective species in survey j, given both species are present
The magnitude of the interaction between species could be estimated from the parameter estimates of the full model (e.g. as γ̂ = ), which we term a species interaction factor (SIF). Values of γ̂ < 1 would suggest species avoidance (i.e. the species co-occur less frequently than if they were distributed independently), while values > 1 would suggest contagion, or a tendency to co-occur more frequently than expected under independence. Note that γ̂ = 1 would suggest the species occur independently. However, often it may be advantageous to reparameterize the model so that the SIF is estimated directly, i.e. ψΑΒ = ψA × ψB× γ. Similarly, we can redefine the detection probabilities rAB as rAB = rA ×rB × δ where rA and rB are the overall probabilities of detecting species A and B during a survey, given both species are present, and δ is the SIF for the detection probabilities.
To consider whether the probability of detecting species A during a survey is different when species B is also present, we could compare models where the constraint is used (and similarly for species B when species A is also present). Note that this issue is distinct from the question of whether detections of the two species occur independently given that both species are present (i.e. does δ= 1?).
incorporating covariate information
Potentially, the probability that a species occupies a location may be affected by characteristics of the location. For example, some species may prefer particular habitat types over other available habitats; have a higher occupancy rate at locations near permanent water sources; require a minimum patch size for a sustainable population; or show reduced probability of occurrence in isolated patches (e.g. Verner, Morrison & Ralph 1986; Scott et al. 2002). Similarly, the probability of detecting species at the location may also be affected by location-specific covariates (e.g. old growth forest vs. rejuvenating forest). Detection probabilities may also be affected by conditions at the time of the survey, such as air temperature, cloud cover, or time since a rain event.
One method for incorporating such covariates is to use the multinomial logistic model (eqn 5).
- (eqn 5)
- (eqn 7)
For example, if and , then the two species must both occur at a minimum of 20% of the locations, while if they exactly co-occur then it can only be at 60% of sites at most. This restriction must be enforced when using SIFs (which may cause numerical problems), but when using the first approach the restriction is automatically imposed because of the different parameterization of the covariate relationship. Similar reasoning applies when using the SIF parameterization with respect to the r parameters.
A probable feature of many wildlife studies is that occasionally not all locations will be surveyed for the target species. This may be due either to logistical constraints (it is simply not possible to survey all locations virtually simultaneously); study design; or unforeseen circumstances such as a vehicle breakdown en route. We define such occasions as a missing observation. The flexible modelling framework presented above can be modified easily to accommodate missing observations. As in MacKenzie et al. (2002), for occasions when the location was not surveyed, the respective detection probability (or probabilities) is set to zero, effectively removing it from the probabilistic statement about the observed detection history for that location.
An important point is that by being able to accommodate missing observations, the model does not require equal sampling effort across all locations. This provides a great deal of flexibility for study design. For example, under certain conditions it may be appropriate to survey a subsample of locations more frequently to gain adequate information about the detection probabilities, and elsewhere survey only once or twice.