Investigating species co-occurrence patterns when species are detected imperfectly


Proteus Research and Consulting Ltd, PO Box 5193, Dunedin, New Zealand. E-mail:


  • 1Over the last 30 years there has been a great deal of interest in investigating patterns of species co-occurrence across a number of locations, which has led to the development of numerous methods to determine whether there is evidence that a particular pattern may not have occurred by random chance.
  • 2A key aspect that seems to have been largely overlooked is the possibility that species may not always be detected at a location when present, which leads to ‘false absences’ in a species presence/absence matrix that may cause incorrect inferences to be made about co-occurrence patterns. Furthermore, many of the published methods for investigating patterns of species co-occurrence do not account for potential differences in the site characteristics that may partially (at least) explain non-random patterns (e.g. due to species having similar/different habitat preferences).
  • 3Here we present a statistical method for modelling co-occurrence patterns between species while accounting for imperfect detection and site characteristics. This method requires that multiple presence/absence surveys for the species be conducted over a reasonably short period of time at most sites. The method yields unbiased estimates of probabilities of occurrence, and is practical when the number of species is small (< 4).
  • 4To illustrate the method we consider data collected on two terrestrial salamander species, Plethodon jordani and members of the Plethodon glutinosus complex, collected in the Great Smoky Mountains National Park, USA. We find no evidence that the species do not occur independently at sites once site elevation has been allowed for, although we find some evidence of a statistical interaction between species in terms of detectability that we suggest may be due to changes in relative abundances.


One approach to ecological science seeks to draw inferences about community dynamics and function based on observed patterns (e.g. Brown 1995; Rosenzweig 1995; Marquet 2000; Hubbell 2001). One type of pattern that has attracted much attention from ecologists is the spatial occurrence of species. Indeed, a simple presence–absence matrix of species occurrence in spatial units has been termed ‘the fundamental unit of analysis in community ecology and biogeography’ (Gotelli 2000; also see McCoy & Heck 1987). Investigations of such matrices have led to the development of interesting ecological hypotheses (e.g. the community assembly rules of Diamond 1975) and to the identification of interesting empirical patterns (e.g. the nested subset structure of Patterson & Atmar 1986; Patterson 1987).

A key issue in the investigation of presence–absence matrices involves how to draw appropriate inferences about whether an observed matrix is unusual with respect to either random processes or processes that are neutral with respect to some purported ecological mechanism (Harvey et al. 1983; Gotelli & Graves 1996). For example, could a particular matrix have been generated by random species colonizations or is it more likely to have arisen as a result of interspecific competition? This issue of appropriate inferential methods has led to heated debate (Connor & Simberloff 1979, 1983, 1984; Diamond & Gilpin 1982; Gilpin & Diamond 1982, 1984) and continued methodological development (Kelt, Taper & Mesevre 1995; Manly 1995; Gotelli 2000; Gotelli & McCabe 2002).

In this paper we address a problem that has not received adequate attention in previous work, the assumption that all species present at a location are detected with certainty. In many, if not most, practical situations it is not realistic to obtain a census of all species. Few species are so conspicuous that they will always be detected when present at a location and in many cases, even after exhaustive searches, some species may still go undetected when present. This feature of the data collection will lead to ‘false absences’ in the presence–absence matrix, which may lead in turn to incorrect inferences about the patterns of species co-occurrence. Cam et al. (2000) presented methods that can be used to deal with species non-detection when testing hypotheses about nested subset community patterns (Patterson & Atmar 1986; Patterson 1987). The methods of Cam et al. (2000) are based on estimates of the fraction of species present at one location that are also present at another (Nichols et al. 1998). However, these estimation methods are based on groups of species and cannot be used to draw inferences about specific patterns of co-occurrence of a small number of species.

Another potential problem with attempts to draw inferences about interspecific interactions from presence–absence matrices involves other factors (e.g. habitat preferences and physiological tolerances) which are likely to result in non-random patterns of species co-occurrence, yet have nothing to do with interspecific interactions. This class of problem is inherent in all attempts to draw inferences about process based on pattern and has been recognized in previous efforts to analyse presence–absence matrices (e.g. Connor & Simberloff 1984; Gilpin & Diamond 1984; Peres-Neto, Olden & Jackson 2001). One approach to dealing with such factors is to identify them a priori and incorporate them into analyses. For example, one approach is to develop a regression model to predict detections of one species as a function of both habitat variables and detections of other species (Schoener 1974; Crowell & Pimm 1976).

Here, we present a method that deals with both problems by incorporating both non-detection and possible habitat preferences directly into the model set. This method is based on the approach of MacKenzie et al. (2002), who developed a single-species model for estimating the fraction of locations occupied by the species, allowing for the possible non-detection of the species when present. They considered the realistic situation where multiple surveys are conducted at the locations, over a relatively short time period. Straightforward probabilistic arguments are used to model the sequence of species detections and non-detections from the repeated surveys, enabling the probability of an observed sequence to be calculated. By combining the information from all locations, the model parameters (probability of occupancy and probability of detection given occupancy) can be estimated using maximum likelihood techniques. Importantly, the model of MacKenzie et al. (2002) does not require equal sampling effort across all locations, and the parameters can be functions of covariates such as habitat type.

Here we extend the work of MacKenzie et al. (2002) to estimate and model co-occurrence patterns between two or more species across a landscape, when species are not detected with certainty when present at a location. The likelihood-based framework detailed below enables the magnitude of interspecific interactions in probabilities of occurrence to be estimated directly, while accounting explicitly for imperfect detectability. The flexibility of our approach also enables the level of co-occurrence to be estimated, above and beyond any habitat preferences exhibited by the species. We envisage that this model could be most useful to address questions about the importance of interspecific interactions such as competition and predator–prey relationships as potential determinants of community structure.

In this paper we begin by discussing the practical sampling framework required for the method; detail the straightforward probabilistic arguments used to construct the model likelihood; show, via simulation, that the estimators have reasonable properties in terms of bias and precision; and apply the method to a field study of terrestrial salamanders in Great Smoky Mountains National Park (Bailey, Simons & Pollock 2004). Throughout this paper we refer to interactions between species. When we do so, we use the term ‘interaction’ in the statistical sense to mean that the species are not occurring independently at sites. Our use of ‘interaction’ does not imply any particular biological mechanism (e.g. predation, resource competition, behavioural dominance) that could produce a lack of independence in pattern of co-occurrence.


practical sampling situation

We envisage a practical situation where N locations are monitored for the presence or absence of target species. The monitoring locations may represent user-specified quadrats or sites within an area of interest, or discrete habitats such as ponds, islands or patches of vegetation. Each location is surveyed for the species on multiple (not necessarily an equal number of) occasions, and species are either detected or not detected during each survey. For the duration of the surveying the locations are closed to changes in the occupancy state with respect to each species, i.e. a species is either always present, or always absent from the location over the surveying period (this requirement may be relaxed in some situations, see the discussion).

The sequence of detections and non-detections at a location for each species may be recorded as a ‘detection history’: a vector of 1s (detection) and 0s (non-detection). For example, the detection history inline image = 101 represents that location i was surveyed on three occasions, with species A being detected only in the first and third surveys. Similarly, the detection history inline image = 000 would represent that species B was never detected at location i.

statistical model

We define the model here for situations involving only two species, but the approach can easily be extended to a greater number of species. However, the number of parameters in the model increases exponentially with the number of species; hence this technique could become very ‘data hungry’ and not all of the parameters may be estimable for a given data set. In addition it could be difficult to interpret meaningfully the interactions among a large number of species; therefore, we recommend that users focus their research questions on a small (< 4) number of target species.

A monitoring location may be considered to be in one of four mutually exclusive states of occupancy for two species (more generally there are 2k possible states for k species); (1) occupied by both species A and B; (2) occupied by species A only; (3) occupied by species B only; or (4) occupied by neither species. Using the notation introduced in Table 1, we define a row vector for the probability of location i being in each of the four respective states as,

Table 1.  Notation for the parameters used in the model
inline imageProbability of both species being present at location i
inline imageProbability of species A being present at location i, regardless of occupancy status of species B
inline imageProbability of species B being present at location i, regardless of occupancy status of species A
inline imageProbability of detecting species A during the jth survey of location i, given only species A is present
inline imageProbability of detecting species B during the jth survey of location i, given only species B is present
inline imageProbability of detecting both species during the jth survey of location i, given both species are present
inline imageProbability of detecting species A, but not B, during the jth survey of location i, given both species are present
inline imageProbability of detecting species B, but not A, during the jth survey of location i, given both species are present
inline imageProbability of detecting neither species during the jth survey of location i, given both species are present; = inline image
image(eqn 1)

Note that the elements of φi sum to 1.

Conditional upon the occupancy state of the location, the probability of observing the detection histories for the two species can be stated in terms of the detection probabilities defined in Table 1. For example, the probability of observing the detection histories given in the previous section, conditional upon the location being occupied by both species, is:


Another possibility for this example would be that the location is occupied by species A only, in which case the probability of not observing species B is 1·0. The conditional probability of observing the two detection histories in this situation would be:


The probability of observing this combination of histories for all other occupancy states (occupied by species B only and occupied by neither species) is 0, as both states prohibit species A from being at the location, yet species A was actually observed there. Therefore, we define a column vector inline image representing the probability of observing the detection histories conditional upon each state. For instance, using the above example:

image(eqn 2)

The unconditional probability for observing the two detection histories could then be calculated as:

image(eqn 3)

By using the probability vectors we account for potential uncertainties in the occupancy state of a location due to not detecting one or both of the target species during the surveys. Note that our use of different detection probability parameters for the cases of single-species and two-species occupancy is very general and permits the possibility that detection probability of one species depends on whether the site is occupied by the other species (e.g. the detection probability for a prey species may depend upon whether a predator species is also present). Some examples of detection histories and the unconditional probabilities of observing them are given in Table 2.

Table 2.  Example detection histories (XA, XB) and the probabilities of observing them (Pr(XA, XB))
101001inline image
000110inline image
000000inline image

Assuming that the detection histories collected at the N locations are independent, we can define the model likelihood as:

image( eqn 4)

This can then be maximized numerically to obtain the maximum likelihood estimates (MLEs) of the parameters.

For generality, we have presented the above model using location-specific parameters (as denoted by the subscript i); however, there is never sufficient information in the type of data considered here to estimate a different parameter for each location. Constraints are required in order to obtain MLEs (e.g. require all or groups of locations to have a common parameter). Another approach is to let the location-specific parameters be defined by some function of the features that characterize a location, i.e. habitat type, patch size, etc. We consider how these covariates could be accommodated by the model in a later section.

testing and quantifying interactions between species

The general likelihood-based framework presented above provides the opportunity to both test for, and quantify, the level of interaction between two species. There are two mechanisms through which we can investigate species interactions that may reflect different questions of biological interest; species occupancy probabilities or in terms of detection probabilities given the species are present. In both cases we can determine whether the occupancy (or detection events) for one species appear to be occurring independently of the presence (or detection) of the other species, i.e. do the species both occur at a site (or similarly, do we detect both species in a survey) more/less often than expected under an assumption of independence. In addition, we can determine whether there is evidence that the probability of detecting one species changes in the presence of the other species, i.e. if species B is also present at a site, we are more/less likely to detect species A (regardless of detecting species B).

To investigate potential interactions between species, one has the choice of using hypothesis testing or a model selection approach, depending upon the goals of the research. Standard likelihood ratio tests (LRT) could be used to test for independence of the species with respect to either occupancy or detection. For example, if species occupy sites independently then, based upon the statistical definition of independence, it would be expected that ψΑΒ = ψΑ × ψΒ. A LRT could be constructed by comparing the likelihood values from two models; a full model where ψΑΒ, ψΑ and ψΒ, and are each estimated; and a reduced model where only ψA and ψB are estimated, with ψΑΒ being calculated as the product of ψΑ and ψΒ (note that the structure for all other parameters is unchanged between the full and reduced models). By conducting the test it is possible to determine whether there is sufficient evidence to reject the null hypothesis of independence. Examples of the constraints that could be imposed are given in Table 3. Alternatively, it may be appropriate to explore the data using information-theoretic model selection approaches (e.g. Akaike's information criterion, AIC), where the intent is to find a set of parsimonious models upon which inferences about the species biology could be made (e.g. Burnham & Anderson 2002).

Table 3.  Examples of the constraints that should be imposed for testing the independence of occupancy and detection probabilities, where inline image and inline image are the marginal detection probabilities for the respective species in survey j, given both species are present
Occupancyinline image
Detectioninline image
inline image
inline image
inline image

The magnitude of the interaction between species could be estimated from the parameter estimates of the full model (e.g. as γ̂ = inline image), which we term a species interaction factor (SIF). Values of γ̂ < 1 would suggest species avoidance (i.e. the species co-occur less frequently than if they were distributed independently), while values > 1 would suggest contagion, or a tendency to co-occur more frequently than expected under independence. Note that γ̂ = 1 would suggest the species occur independently. However, often it may be advantageous to reparameterize the model so that the SIF is estimated directly, i.e. ψΑΒ = ψA × ψB× γ. Similarly, we can redefine the detection probabilities rAB as rAB = rA ×rB × δ where rA and rB are the overall probabilities of detecting species A and B during a survey, given both species are present, and δ is the SIF for the detection probabilities.

To consider whether the probability of detecting species A during a survey is different when species B is also present, we could compare models where the constraint inline image is used (and similarly for species B when species A is also present). Note that this issue is distinct from the question of whether detections of the two species occur independently given that both species are present (i.e. does δ= 1?).

incorporating covariate information

Potentially, the probability that a species occupies a location may be affected by characteristics of the location. For example, some species may prefer particular habitat types over other available habitats; have a higher occupancy rate at locations near permanent water sources; require a minimum patch size for a sustainable population; or show reduced probability of occurrence in isolated patches (e.g. Verner, Morrison & Ralph 1986; Scott et al. 2002). Similarly, the probability of detecting species at the location may also be affected by location-specific covariates (e.g. old growth forest vs. rejuvenating forest). Detection probabilities may also be affected by conditions at the time of the survey, such as air temperature, cloud cover, or time since a rain event.

One method for incorporating such covariates is to use the multinomial logistic model (eqn 5).

image(eqn 5)

where θi is the probability of interest, Yi is a row vector of the covariate values for the ith location, βi is the column vector of coefficients to be estimated and m is the number of discrete outcomes. For example, when modelling the probabilities for detecting/not-detecting both species at a survey occasion, e.g. inline image there are four discrete outcomes. Three of these probabilities could be modelled using eqn 5, with the final probability being obtained by subtraction. Note that when m = 2 (i.e. only two discrete outcomes), eqn 5 reduces to the more familiar binomial logistic model that could be used, for instance, for modelling the inline images or inline images where the individual species may be either detected or not detected.

For modelling the occupancy probabilities, one could use the multinomial logistic model on the elements of φi, although the results may not be biologically meaningful, i.e. interpreting the effect of a covariate on inline image. Another approach would be to use the SIFs, so that modelling of inline image and inline image is achieved using separate binomial logistic models, while ψi could be modelled as:

γi = exp(Yiβγ)(.eqn 6)

However, when using such an approach, users must be mindful of the natural relationship among inline image, inline image  and inline image, which restricts the values that inline image, hence, γi can possibly take, reflecting limits to the degree of overlap that is possible between the two species, i.e.:

image(eqn 7)

For example, if inline image and inline image, then the two species must both occur at a minimum of 20% of the locations, while if they exactly co-occur then it can only be at 60% of sites at most. This restriction must be enforced when using SIFs (which may cause numerical problems), but when using the first approach the restriction is automatically imposed because of the different parameterization of the covariate relationship. Similar reasoning applies when using the SIF parameterization with respect to the r parameters.

missing observations

A probable feature of many wildlife studies is that occasionally not all locations will be surveyed for the target species. This may be due either to logistical constraints (it is simply not possible to survey all locations virtually simultaneously); study design; or unforeseen circumstances such as a vehicle breakdown en route. We define such occasions as a missing observation. The flexible modelling framework presented above can be modified easily to accommodate missing observations. As in MacKenzie et al. (2002), for occasions when the location was not surveyed, the respective detection probability (or probabilities) is set to zero, effectively removing it from the probabilistic statement about the observed detection history for that location.

An important point is that by being able to accommodate missing observations, the model does not require equal sampling effort across all locations. This provides a great deal of flexibility for study design. For example, under certain conditions it may be appropriate to survey a subsample of locations more frequently to gain adequate information about the detection probabilities, and elsewhere survey only once or twice.

Simulation study

To assess the performance of the above modelling a simulation study was conducted, with four basic patterns in species occupancy being investigated. Two species were given equal probabilities for occupying sites at a moderate and a high level. The species were then assumed to either exhibit a strong association or disassociation. The four combinations of {ψA, ψB, ψAB} used in the simulations were; (i) {0·4, 0·4, 0·08}; (ii) {0·4, 0·4, 0·24}; (iii) {0·7, 0·7, 0·4}; and (iv) {0·7, 0·7, 0·6125}. In addition, the effects of three other factors were varied to assess their influence on the estimation of the model parameters; (1) total number of locations surveyed (N) = 50, 100 or 200; (2) number of repeat surveys (T) = 3 or 5; and (3) probability of detecting each species during a survey, given presence (p) = 0·214 or 0·5. For simplicity, the detection of each species was assumed to be independent of detection of the other (δ = 1), detection probabilities were made constant across time, equal for both species (pA = pB), and equal regardless of whether one or both species were present (r = p). The values of p used were chosen such that the probability of never detecting the species given it was actually there, i.e. (1 − p)T, was approximately 0·5 and 0·3 when p = 0·214 (for T = 3 and 5, respectively); and 0·125 and 0·03 when p = 0·5.

For each scenario, 1000 sets of simulated data were generated and a model with the following parameters was fitted to the data, ψΑ, ψΒ, ψΑΒ, pA, pB, rAB, rAb and raB. This represents a model where neither the occupancy nor the detection probabilities are assumed to be independent between the two species, and detection probabilities are constant across time (and locations). From each set of data, parameter estimates were obtained and their standard errors approximated by inverting the matrix of second partial derivatives (a standard numerical technique). The average of the 1000 parameter estimates was used to assess unbiasedness, while the standard deviation of the 1000 parameter estimates was compared to the average of the 1000 standard errors to ensure that the approximated standard errors fairly reflected the true level of uncertainty in the parameter estimates.

In approximately 8·5% of the simulations (on average) the matrix of second partial derivatives could not be inverted. This was not unexpected and is a common feature of likelihood-based methods when parameters are estimated very close to the bounds of allowable values (e.g. 0 or 1). These simulation results were discarded, which may introduce a small bias, but our results and further investigations suggest any such bias is negligible.

The results of the simulations suggest the parameter estimates are virtually unbiased for most scenarios considered, and have a reasonable level of precision. The standard errors are generally in good agreement with the true level of uncertainty. Figure 1 presents the percentage bias for the estimated joint probability of occupancy (ψΑΒ) and its standard error. In this instance, the bias is minimal except for when N = 50, T = 3, p = 0·5 and occupancy for both species was moderate, with a strong disassociation (ψA, ψB, ψΑΒ) = {0·4, 0·4, 0·008}, in which case ψΑΒ tended to be overestimated and its standard error underestimated. Full results for the simulation study can be obtained by contacting the corresponding author.

Figure 1.

Approximate percentage bias of estimated joint probability of occupancy (inline image) and its associated standard error (inline image), obtained from a simulation study, plotted against the factors; number of sites, number of surveys, detection probability per survey (p); and true value of ψΑΒ.

Example: terrestrial salamanders in Great Smoky Mountains National Park

We illustrate the utility of this approach using monitoring data collected on terrestrial salamanders at 88 sites within the Roaring Fork Watershed, Great Smoky Mountains National Park (GSMNP, Mt LeConte USGS Quadrangle). Sites were located adjacent to trails and spaced approximately 250 m apart (see Hyde & Simons 2001 for sampling details). Two parallel transects were sampled at each site: a natural cover transect (50 m long × 3 m wide) and coverboard transect consisting of five stations placed 10 m apart (see Hyde & Simons (2001) for details). Sites were sampled five times between 4 April 1999 and 27 June 1999, with approximately 2 weeks between successive sampling occasions. Relative abundance information was collected for each species but here we consider only detection/non-detection data (pooled for both transects) for two species: the red cheek variation of Jordan's salamander (Plethodon jordani Blatchley; PJ) and members of the Plethodon glutinosus complex including Plethodon glutinosus (Green) and Plethodon oconaluftee (Hairston; PG). We stress that the following analyses are presented only as an example of the above method, and they should not be used to draw definitive conclusions about co-occurrence patterns between these two species.

Several previous studies have sought to document the spatial distributions of these two species and explain geographical variation in their altitudinal overlap. Hairston (1980) found that competitive interactions were stronger in areas of little altitudinal overlap (GSMNP and Black Mountains, NC) than in areas of broad altitudinal overlap (Balsalm Mountains, NC). Other studies have found no evidence of competitive exclusion, suggesting the species’ distributions are either independent (Rissler, Barber & Wilbur 2000) or determined by habitat or environmental factors (Dakin 1978).

Here we are interested in determining whether there is any evidence that the two species exhibit strong co-occurrence patterns after allowing for any elevational gradient in occupancy probabilities. Throughout the following analysis we use the SIF parameterization of the model and assume that δ = 1, i.e. the species are detected independently when both are present. We feel this is a reasonable assumption to make given the field design and known biology of these species.

Table 4 shows the model fit and selection statistics for models that do not acknowledge a potential elevational gradient in occupancy and detection probabilities. Based upon AIC, the most parsimonious model among those considered for the data is y(S)y(·)p(S)r(S), which suggests that the detection probability for each species is different if the other species is also present (for PG:  = 0·54 and  = 0·48; for PJ:  = 0·91 and  = 0·55), and that there is very strong evidence that the two species avoid each other γ̂() = 0·67 (0·11).

Table 4.  Summary of model fit and selection statistics for models without elevation as a covariate, where K is the number of estimated parameters in the model and ΔAIC is the absolute difference in AIC values relative to the model with the smallest AIC. The terms in parentheses represent the factors in the model for the respective parameter; with ‘S’ denoting that species has been used as a factor and ‘‘·” indicating that the parameter is constant. For example, ψ(S) indicates that the occupancy probability has been estimated separately for both species, whereas γ(·) indicates that this parameter has a constant value to be estimated. Absence of the parameter in the model notation implies γ(·) and absence of r(S) implies r(S) = p(S)
ψ(S)γ(·)p(S)r(S)736·67 0·0
ψ(S)p(S)r(S)747·06 8·3

However, once we allow these parameters to vary with elevation, we obtain models that provide much better descriptions of the data (Table 5). Unfortunately we were not able to obtain models that included a γ term to converge satisfactorily, because once the probability of occupancy for PJ was modelled as a function of elevation, the predicted occupancy probability was < 0·15 for elevations below 750 m and > 0·80 for elevations above 902 m (Fig. 2). At lower elevations, this means there are very few data on which to judge whether the species were acting independently, while at the higher elevations there is a very small range of allowable values for γ, implying that there is little scope to evaluate nonindependent behaviour in terms of occupancy for these species (i.e. the lower and upper bounds on allowable values for γ, from eqn 7, both tend to 1·0 as elevation increases). The most parsimonious model we were able to fit to the data, ψ(S × E)p(S × E)r(S × E) was indicated clearly as the ‘best’ model in terms of AIC. Figures 2 and 3 illustrate how the various factors are affected by elevation. While we have no evidence of an interaction between the species in terms of occupancy probabilities, there is strong evidence that the detection functions are different if both species are present at a site. For instance, the detection probability for PG increases with elevation when PJ is not present but decreases when PJ is present, whereas for PJ the effect of elevation is much larger when both species are present than when PJ is present alone. From an observational study such as this it is difficult to suggest exactly what may be the cause for this phenomenon, but a plausible explanation involves effects of relative abundance, a potentially important determinant of species detection probability. For example, it may be that PG becomes more abundant as elevation increases until PJ is also reasonably abundant. At that point, the abundance of PG starts to decrease while PJ continues to become even more abundant (perhaps through competition for resources). This reasoning is consistent with other field studies, which conclude that while PG and PJ have shown no tendency to be mutually exclusive, PG is more tolerant of dry locations found usually at lower elevations (Grover 2000; Rissler et al. 2000) and PJ seems to have a numerical advantage in moist microhabitats common at higher elevations (Hairston 1951; Dakin 1978). While this reasoning is supported by published studies it is speculative, and we caution against inferring ecological process from spatial patterns without the support of experimental studies. In addition, in this example analysis we have not considered the potential effects of other habitat variables through more complicated models.

Table 5.  Summary of model fit and selection statistics for models with elevation as a covariate, where K is the number of estimated parameters in the model and ΔAIC is the absolute difference in AIC values relative to the model with the smallest AIC. The terms in parentheses represent the factors in the model for the respective parameter; with ‘S’ denoting that species has been used as a factor, ‘E’ indicating use of elevation as a factor, and ‘.’ indicating a parameter set equal across species and elevation. The best model from Table 4, ψ(S)γ(·)p(S)r(S) has been included to show how including elevation as a covariate substantially improves the fit of the models. Absence of the γ parameter in the model notation implies γ(·) and absence of r(S) implies r(S) = p(S)
ψ(S × E)p(S × E)r(S × E)617·312  0·0
ψ(S × E)p(S)r(S × E)623·610  2·3
ψ(S × E)p(S × E)r(S)660·110 38·8
ψ(S × E)p(S × E)675·6 8 50·2
ψ(S × E)p(S)r(S)676·1 8 50·8
ψ(S)p(S × E)r(S × E)673·210 51·8
ψ(S)γ (·)p(S × E)r(S × E)671·811 52·5
ψ(S)γ (·)p(S)r(S)736·6 7109·3
Figure 2.

Estimated probability of occupying a site for Plethodon jordani (inline image) and members of the Plethodon glutinosus complex (inline image) as a function of elevation according to the model.

Figure 3.

Estimated probability of detecting the species Plethodon jordani (PJ) and Plethodon glutinosus (PG) in a survey, as a function of elevation according to the model ψ(S × E)p(S ×E)r(S × E).


A number of previous authors have suggested various methods to test the null hypothesis of independence of species occurrence and to provide related interaction metrics both for two-species systems (Forbes 1907; Dice 1945; Cole 1949; Pielou 1977; Hayek 1994) and for more complex multispecies systems (Connor & Simberloff 1979, 1984; Gilpin & Diamond 1982, 1984; Kelt et al. 1995; Manly 1995; Gotelli 2000; Gotelli & McCabe 2002). However, with the exception of the work of Cam et al. (2000) directed at specific questions about nested subset structures (Patterson & Atmar 1986), we believe that the approach presented here is the first attempt to account explicitly for the imperfect detectability of species while modelling species co-occurrence data. Failure to allow for the fact that a species may have been present, but not detected, can result in misleading conclusions about species associations and interactions, as some species may have been classified falsely as absent. The flexible likelihood-based modelling framework we present is based on simple probabilistic arguments that are used commonly in other areas of statistical ecology such as mark–recapture (Lebreton et al. 1992), and are used widely in many statistical disciplines. Hence there is already a vast body of literature supporting the general approach. The modelling of the different occupancy states involves the same kind of thinking that has been used to develop previous approaches to testing for independence in the case of perfect detection (Forbes 1907; Dice 1945; Cole 1949; Pielou 1977; Hayek 1994). Thus, our approach to modelling and inference unites two approaches that are themselves very familiar to ecologists.

Initial investigations into the different possible methods for incorporating covariates into the occupancy probabilities suggest that using the multinomial logistic model on the elements of is the most numerically robust approach. However, as suggested earlier, this may give results that are difficult to interpret biologically. Our preference is for the use of the species interaction factors (SIFs), as they provide a meaningful interpretation for the strength of a covariate relationship on the nonindependence of two species.

While in the terrestrial salamander example we were unable to get convergence for models that involved both γ and occupancy as a function of elevation (hence we were unable to investigate possible species interactions in this respect after allowing for the effects of elevation), the example does highlight the importance of considering factors that may affect the marginal probabilities of species occurring at study sites when exploring patterns of species co-occurrences. When we did not use elevation as a covariate in our models, there was very strong evidence that the species were less likely to both occupy a site than they would have been if they were acting independently (Table 4). However, once we began to consider models that included elevation as a covariate, this strong evidence of an interaction disappeared. For example, consider the models ψ(S)p(S ×E)r(S × E) and ψ(S)γ(·)p(S × E)r(S × E) in Table 5. Here we have only allowed the detection probabilities to be functions of elevation, yet already there is little indication that by including γ we have a better model, given that both models have similar AIC values. By ignoring potential factors that may affect a researcher's ability to detect target species or factors that may affect whether a species occupies a particular location (such as habitat variables), erroneous conclusions may be reached concerning patterns of co-occurrence.

Above we have presented the estimation of model parameters in terms of maximizing the likelihood. However, another approach would be to assign appropriate prior distributions on the model parameters, representing current knowledge (or ignorance), and use the likelihood within a Markov chain Monte Carlo framework to obtain posterior distributions for the parameters. Such an approach may provide some benefits, enabling models to be explored that are intractable using standard maximum likelihood theory.

In some circumstances it may be appropriate to relax the assumption that all locations are closed to any changes with respect to occupancy for the duration of the surveying. If the species move in and out of the study locations in a completely random manner, such as for a highly mobile species, then based upon the results of Kendall (1999) in a closely related mark–recapture context we believe that parameter estimates will still be valid, although their interpretation should change. What we have referred to as ‘occupied locations’ above should be interpreted as ‘used locations’, and ‘probability of detection’ is now ‘probability species is present and detected’. However, parameter estimates are no longer valid if the changes in occupancy are non-random, i.e. if animals move to a location during the middle of the seasonal survey period or vacate the location before the sampling has been completed.

Finally, although we believe that the methods proposed here can yield strong inferences about species co-occurrence using presence–absence data from multiple locations at a single point (e.g. season) in time, we warn that this does not imply strong inference about the processes that generated any observed patterns of co-occurrence. Despite the popularity of inferring process from pattern in ecology, strong inference about process requires typically some sort of manipulative experimentation. Although not generally as powerful as experimentation, it is often useful to observe system dynamics over time. MacKenzie et al. (2003) presented a model structure for estimating the vital rates associated with occupancy dynamics (local probabilities of extinction and colonization) based on multiple seasons or years of detection/nondetection data. It might be useful to extend this dynamic modelling approach to the multispecies case in order to estimate effects of one species on the vital rates of another. Thus, we believe that the methods presented in this paper will be very useful in drawing inferences about species co-occurrence, and we believe that such inferences can be combined with other kinds of studies and analyses in order to investigate mechanisms underlying community dynamics.

This approach to modelling detection/nondetection data for two species has been implemented in program presence, which may be downloaded freely from


We would like to thank Evan Cooch and an anonymous referee for their helpful comments on an earlier draft of this paper.