Correspondence site: http://www.respond2articles.com/MEE/

# Models for species-detection data collected along transects in the presence of abundance-induced heterogeneity and clustering in the detection process

Article first published online: 10 OCT 2011

DOI: 10.1111/j.2041-210X.2011.00159.x

© 2011 The Authors. Methods in Ecology and Evolution © 2011 British Ecological Society

Additional Information

#### How to Cite

Guillera-Arroita, G., Ridout, M. S., Morgan, B. J. T. and Linkie, M. (2012), Models for species-detection data collected along transects in the presence of abundance-induced heterogeneity and clustering in the detection process. Methods in Ecology and Evolution, 3: 358–367. doi: 10.1111/j.2041-210X.2011.00159.x

#### Publication History

- Issue published online: 4 APR 2012
- Article first published online: 10 OCT 2011
- Received 14 June 2011; accepted 20 August 2011 Handling Editor: Robert Freckleton

### Keywords:

- clustered data;
- Markov-modulated Poisson process;
- Poisson mixture;
- replicated counts;
- species occupancy;
- Sumatran tiger;
- superposition of point processes

### Summary

- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

**1.** Models have been devised previously that allow the estimation of abundance from detection data of unmarked individuals while accounting for imperfect detection, but these are restricted to models for discrete sampling protocols, i.e. replicated detection/non-detection or count data. Furthermore, these models assume that the detections from each individual are independent; however, there are cases in which this assumption is likely to be violated. For example, in surveys along transects, clustering in the signs left by each individual could be expected.

**2.** Here, we propose models to estimate abundance from species-detection data collected continuously along transects considering two cases: (i) independent detections and (ii) clustering within the detections of each individual. We account for clustering by describing the detection process as a Markov-modulated Poisson process. We study the properties of the estimators via simulation, assessing the impact of unmodelled detection clustering.

**3.** We show that bias may be induced in the estimator of abundance if clustering in individual detections is not accounted for and how an estimator with better coverage properties is obtained if clustering is modelled. We demonstrate that both abundance and the clustering pattern can be well estimated simultaneously, given enough data.

**4.** To illustrate our approach, we fit the models to tiger pugmark detection data from transect surveys in Kerinci Seblat National Park in Sumatra. The analysis suggested strong abundance-induced heterogeneity in detections when clustering was disregarded, but the evidence reduced drastically when clustering was accounted for. This example illustrates how unmodelled clustering can affect the estimation of abundance.

**5.** Estimates of abundance need to be reliable to ensure that conservation and management interventions are not misguided. Provided certain model assumptions are met, abundance can be estimated from detection data of unmarked individuals. This requires an adequate description of the detection process, or otherwise, bias may be induced in the abundance estimator. The models and discussion provided here deal with the issue of clustering within the detections of individuals and are of relevance for ecologists interested in methodological developments for the estimation of animal abundance.

### Introduction

- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

Drawing inferences about the state of wildlife populations is of central interest for ecology, wildlife management and conservation, as it allows the evaluation of scientific hypotheses concerning the behaviour of the system, the assessment of whether management objectives are met and state-dependent decisions to be made (Yoccoz, Nichols, & Boulinier 2001). The modelling framework proposed independently by MacKenzie *et al.* (2002) and Tyre *et al.* (2003) for the analysis of species occupancy data has attracted wide attention from ecologists and conservationists over the past decade and has been applied in many studies as can be seen from recent published literature (490 and 142 citations, respectively, in *Web of Science* by 12th August 2011). This approach provides an important step forward, as it involves the joint modelling of species occupancy and imperfect detection, an issue traditionally disregarded in the context of species distribution modelling (Kéry 2011) and which, if not accounted for, can introduce bias in the occupancy estimator. The framework is based on a discrete sampling protocol in which replicate detection/non-detection samples (also called ‘presence/absence’ data) are obtained at a number of sampling sites. The sampling is at the species level, without the identification of individuals. The replication provides information to model the detection process and is often achieved by visiting each sampling site at different times within the sampling season (e.g. MacKenzie *et al.* 2002). Other methods used to obtain the required replication include the simultaneous use of multiple independent observers or detection methods, or selecting a number of spatial subunits within each sampling site and treating these as separate replicates (MacKenzie *et al.* 2006, 161–162).

Since the publication of the basic occupancy model, several developments have been proposed, two of which are of particular relevance for the work presented in this paper. First, Royle & Nichols (2003) propose an approach for estimating species occupancy from detection/non-detection data when there is heterogeneity in detection probability owing to differences in site abundance. By linking heterogeneity in abundance to heterogeneity in detection probability, the proposed model also allows the estimation of the underlying abundance distribution. Second, Hines *et al.* (2010) propose an extension to relax the standard assumption of independence among replicates and allow first-order Markovian dependence between the detection/non-detection samples from consecutive replicates. Dependence can be induced by the way in which replicates are chosen, as replicates that are close in time or space will tend to be correlated. Both heterogeneity in detection probability and the lack of independence among replicates can induce bias in the occupancy estimator, and these two model extensions are useful tools for dealing with each of these issues separately. However, to date, there has been no model available to model species-detection/non-detection data accounting simultaneously for both dependence and abundance-induced heterogeneity in the detection process.

In addition, in some cases, rather than using a discrete replicate sampling protocol, detection data are collected continuously along a transect or during an interval of time (e.g. camera-trap surveys). Traditionally, continuous data have been analysed by discretizing the transect (or time interval) into shorter segments, assigning a ‘1’ to each segment when there was at least one detection in the segment and a ‘0’ otherwise, and then using an appropriate model from among those developed for discrete sampling protocols. For instance, Hines *et al.* (2010) model tiger pugmark (footprint) detections collected along transects in India by discretizing the transects into 1-km segments. As an alternative for the analysis of such data, Guillera-Arroita *et al.* (2011) propose an occupancy-modelling framework that describes detection as a continuous process. They present a model that assumes independence among detections as well as a model generalization that accounts for one-dimensional clustering in detections. Various mechanisms can give clustering in species detections. For instance, in sign surveys along transects, clustering may be due to individuals intermittently following the paths used as transects or due to patches of different substrate conditions. In camera-trap surveys, clustering can be expected if the movement patterns of individuals are such that they remain for a while in the area around the camera before moving to other parts of their territory.

To deal with the problem of accounting for both spatial clustering and abundance-induced heterogeneity in detection data collected along transects (i.e. continuous protocols), Hines *et al.* (2010) suggest a 2-step *ad hoc* approach: to use their discrete clustering model to explore how the dependence among adjacent replicates decreases as data are collapsed using larger segments and then, using the data from the chosen segment length, to carry out the actual analysis with the standard Royle–Nichols model. Here, we propose an alternative solution to this problem based on a description of the detection process that allows accounting for both aspects simultaneously. To do this, we extend the models proposed by Guillera-Arroita *et al.* (2011) to incorporate abundance-induced heterogeneity in the detection process. We first present a model that assumes independence among detections from each individual; this can be seen as the continuous counterpart of the model proposed by Royle & Nichols (2003), although in practice it is closer to the *N*-mixture model for repeated counts of Royle (2004), as discussed later. We then present a model generalization that relaxes the assumption of independence within the detections of each individual, which we describe using so-called Markov-modulated Poisson processes (MMPPs). This model is, as far as we are aware, the first that allows species-detection data from continuous sampling protocols to be analysed by explicitly accounting for both clustering and abundance-induced heterogeneity in the detection process. By providing a description of the detection process that explicitly incorporates both aspects, the model allows not only the estimation of abundance but also the parameters associated with the clustering pattern. Finally, we explore the properties of the models proposed via simulations and assess the impact of unmodelled clustering in the estimation of abundance. We also provide an illustration of application with the analysis of a data set of Sumatran tigers *Panthera tigris sumatrae* from Kerinci Seblat National Park in Sumatra, Indonesia.

### Methods

- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

#### Models: formulation and assumptions

Consider a study in which species-detection surveys are carried out along one or more transects at a number of sampling sites (e.g. forest patches or quadrats), recording the location of each detection. The detection data do not allow individuals to be uniquely identified. The models proposed here assume that the species-detection process at each site results from the superposition of *n*_{i} identical independent point processes, where *n*_{i} is the number of individuals present at site *i*. Each point process describes the detections corresponding to one individual, and the assumption is that all individuals present at the site are detectable from the transect. As *n*_{i} is unknown, it is modelled as a random variable with some probability distribution and inference is made on the marginal likelihood, which is a discrete mixture over all the possible values of *n*_{i}.

##### Basic model: independent detections

Let us first consider the case where the successive detections of each individual along the transect can be considered independent of one another and so can be modelled as a homogenous Poisson process with rate γ, where γ is the average number of detections per unit length. The detection process for the species at site *i*, modelled as the superposition of *n*_{i} identical Poisson processes, results in a Poisson process with rate γ*n*_{i}. Under this model, the detection data can be summarized by the number of detections at each site, *d*_{i} at site *i*, and the likelihood is

- (eqn 1)

where *L*_{i} is the total length surveyed in site *i*, *S* is the total number of sampling sites and Pr(*n*_{i}) denotes the probability distribution, with parameter vector **θ**, describing site abundance. Species occupancy ψ is therefore 1 − Pr(0). Note that the likelihood in eqn 1 lacks the factors that would be associated with the Poisson probability function. These factors do not involve the parameters of interest and therefore do not affect the maximum-likelihood estimates. Equation 1 arises by looking at the detection data in each site as a series of exponentially distributed inter-detection distances rather than as a Poisson count. This is necessary to make sure that the likelihood function is comparable to that corresponding to the more general model based on MMPP described later and so to be able to compare the fit of the models.

Site abundance can be described by parametric or nonparametric distributions. In the nonparametric approach, the number of parameters required increases with the number of support points in the abundance distribution, and so this method may become impractical when working with certain species; however, this approach provides flexibility and can be useful for instance when working with low-abundance species. Under the parametric approach, the natural first candidate is the Poisson distribution, which provides an appropriate description when the spatial distribution of individuals is completely random. In this case, the likelihood for the model is

- (eqn 2)

where δ is the average site abundance for the species. Here, the resulting distribution describing *d*_{i} is a Poisson mixture of Poisson distributions, which gives rise to a Neyman type A distribution with parameters (γ*L*_{i}, δ) (Johnson, Kemp, & Kotz 2005, 403–410). The likelihood function in eqn 2 involves an infinite summation. In practice, in the numerical evaluation of the likelihood, the infinite summation would be truncated to the support points with non-negligible probability. An alternative form for the likelihood function in eqn 2, based on one of the two standard expressions for the probability mass function of the Neyman type A distribution, is

where *S* (*d*_{i}, *k*) are Stirling numbers of the second kind (Graham, Knuth, & Patashnik 1988, 243). As this expression involves only finite summations, it is better computationally than eqn 2 for low values of *d*_{i} and high δ. General design recommendations can be derived for this model looking at the asymptotic properties of its estimators. In Appendix S1, we present such recommendations and explore the performance of the estimators under small sample size. Apart from the Poisson distribution, other models for species site abundance are possible. For instance, to allow for overdispersion, the negative binomial (e.g. Royle 2004) or the Poisson log-normal (e.g. Kéry *et al.* 2009) could be used. Underdispersion could be dealt with using a weighted Poisson distribution (see e.g. Ridout & Besbeas 2004).

The basic model proposed here can be interpreted as the continuous counterpart of the Royle–Nichols model (Royle & Nichols 2003), which relies on a discrete sampling protocol that records the detection/non-detection of the species in a series of replicates at each site (records consist of 0s and 1s) and describes them as independent Bernoulli trials. However, our model is closer to the *N*-mixture model for repeated counts (Royle 2004) with the difference that the repeated counts model, based on discrete replicates, describes the number of detections in each replicate (the counts) as a binomial distribution. This implies that an already-detected individual cannot be detected again in the same replicate. Our model arises if the counts are instead described using a Poisson distribution, where one can detect in the same replicate an individual that has already been detected.

Note also that if the mixing distribution in eqn 1 is a nonparametric distribution with mass only at *n*_{i} = 0 and *n*_{i} = *m*, the model is equivalent to the occupancy model with Poisson detection process presented by Guillera-Arroita *et al.* (2011), with likelihood

where γ*m* = λ and the abundance distribution is Pr(0) = 1 − ψ and Pr(m) = ψ.

##### Model generalization: clustered detections

Suppose now that the detections from each individual exhibit some degree of clustering and therefore cannot be considered independent (e.g. individuals tend at times to use the route followed by the transect). One possible way to account for this is to describe the detection process for each individual as a two-state Markov-modulated Poisson process (2-MMPP), a type of Cox process (Cox & Isham 1980, 70) in which the intensity is governed by an unobserved two-state Markov process. See the study of Fischer & Meier-Hellstern (1992) for an accessible summary of useful results on MMPPs. The 2-MMPP provides a model for one-dimensional clustering (along a line or in time) and has been proposed previously to model species detections along transects (e.g. Skaug 2006; Guillera-Arroita *et al.* 2011). According to the 2-MMPP, detections take place at two different rates, γ_{1} and γ_{2}, and the interval spent in each state is stochastic and controlled by parameters μ_{12} and μ_{21}, the switching intensities between states. Detection clusters would therefore correspond to the detections in the high-detection-rate state.

Let us assume that the detection process for individuals of the species of interest is a 2-MMPP with parameters μ = [μ_{12}μ_{21}] and γ = [γ_{1}γ_{2}]. Assuming that detections among individuals are independent and the detection process has the same characteristics for all individuals, the resulting detection process for the species at site *i* can be modelled as the superposition of *n*_{i} independent identical 2-MMPPs. For this model, the detection data can no longer be summarized by the number of detections at each site, and the distances between consecutive detections are required. The likelihood is given by

- (eqn 3)

where *R*_{i} is the number of independent transects surveyed in site *i* and is the likelihood contribution of transect *j* in site *i*, given that detections along it are described by the superposition of *n*_{i} independent realizations of a 2-MMPP. To construct , we make use of a key result on MMPPs: the superposition of MMPPs is an MMPP and, in particular, the superposition of *n*_{i} independent identical 2-MMPPs is an (*n*_{i} + 1)-MMPP. Details on the construction of can be found in Appendix S2.

Above, we have assumed independence in the clustering patterns between individuals (Fig. 1a). However, it is also interesting to consider the case when the 2-MMPPs describing individuals’ detections are ‘synchronized’, such that at a given point, all the individuals are in the same detection-rate state (high or low) but, within this structure, detections are still independent (Fig. 1b). This can be a useful model for scenarios in which the clustering in individuals’ detections arises because of the difference in substrate conditions (e.g. some patches better than others for capturing pugmarks) or if the individuals in the site move closely together as a group. The likelihood for this model is the one given in eqn 3, now with the superposition of *n*_{i} aligned identical 2-MMPPs with parameters μ and γ corresponding to a 2-MMPP with parameters μ and *n*_{i}γ, and constructed accordingly.

Setting the mixing distribution in eqn 3 to a nonparametric distribution with mass at *n* = 0 and *n* = 1, the model is equivalent to the occupancy model with 2-MMPP detection process proposed by Guillera-Arroita *et al.* (2011), with likelihood

where Pr(0) = 1 − ψ, Pr(1) = ψ and *M*_{ij} = *M*_{ij|1}. Note also that a discrete counterpart of the model could be formulated by describing the detection process as a superposition of Markov-modulated Bernoulli processes (2-MMBPs).

##### Incorporating covariates

The models can be expanded to incorporate covariates following a generalized linear model approach. Under the parametric approach, the abundance distribution can be readily allowed to depend upon site characteristics, for instance via a log link function on the density parameter when a Poisson distribution is used. Covariates can be incorporated in the same manner to allow the detection rate to vary with respect to site characteristics, while within-site variation in the detection rates can also be accommodated (see Guillera-Arroita *et al.* 2011).

#### Simulation study

We explored model performance via simulations. For this, we assumed that true site abundance was distributed so that the probabilities of having 0–3 individuals were 0·05, 0·5, 0·3 and 0·15, respectively, a distribution that is plausible ecologically for our tiger example. Individual detection data were generated according to independent 2-MMPPs. We explored four detection scenarios with equal average individual detection rate and increasing levels of clustering: (nc) γ = 0·25 (no clustering), (c1) γ = [0·5, 0·125] ρ = [6, 12], (c2) γ = [1, 0·0625] ρ = [3, 12] and (c3) γ = [5, 0·0125] ρ = [0·6, 12], where represents the average lengths spent in each state before switching to the other state. In our study, we programmed the models in Matlab and obtained maximum-likelihood estimates by numerical maximization using the function *fminsearch*, which implements a simplex search algorithm. As this is an unconstrained optimization function, parameters were log- or logit-transformed as appropriate. To ensure that the estimated probabilities of the nonparametric abundance distribution summed to unity, we used a multinomial logit transformation

where is the number of support points in the nonparametric distribution, θ are the corresponding probabilities and φ are the *R* unconstrained parameters used in the optimization process. As a measure of estimator quality, we computed the mean and root mean square error (RMSE) of the maximum-likelihood estimates obtained over all the simulated data sets. We also assessed estimator coverage by counting the number of simulations in which the estimated confidence intervals included the true parameter value.

First, we explored the impact that clustering in the detections of individuals has on the estimators of abundance in the basic model that assumes independent detections. We simulated a study design in which 100 sites were surveyed for 30 length units and ran 100 simulations per scenario. We fitted the data assuming first a nonparametric abundance distribution with mass in categories 0–3, as used for data generation, and then assuming a distribution with mass in 0–5 to explore the impact of allowing for this extra flexibility. We used the Akaike Information Criterion (AIC) to compare the fit of the models based on each of these two abundance distributions.

We then explored the performance of the model generalization for independent clustering in the detection of individuals. Data were fitted assuming a nonparametric abundance distribution with mass in categories 0–3. We considered first a design with *S* = 100, *L* = 30 and then increased the sampling effort by either adding more sites, increasing survey length or both. As fitting this model is more demanding in terms of computational time, we only run 20 simulations for each scenario. In our implementation, the duration of each simulation varied from <1 h to several hours. Data were also analysed using the basic model for comparison.

#### Application example: tiger data

As an illustration, we fitted the proposed models to Sumatran tiger data from Kerinci Seblat National Park in Sumatra (Indonesia). The data were collected by Fauna and Flora International/Durrell Institute of Conservation and Ecology in a survey in which a total of 89 17 × 17 km sites were surveyed, recording the location of tiger pugmark detections along transects. Typically, 15–45 km were surveyed per site with transect lengths varying from 0·5 to 40 and 10 km on average. These data were analysed in the study of Guillera-Arroita *et al.* (2011), assuming no abundance-induced heterogeneity in the detection process. Here, we reconsidered the analysis, relaxing this assumption. We analysed the data using the new model structures proposed and used AIC to assess relative model fit. First, we used the model structure that assumes independence between detections, i.e. the detection process of individuals is described as a Poisson process. Second, we analysed the data with the model structure that incorporates independent clustering in the detection process of individuals. Finally, we fit the data to the model that allows for clustering in individuals’ detections but assumes that this clustering is ‘synchronized’. For the abundance process, we considered nonparametric distributions with support points from 0 to a maximum of 5 individuals. As tigers are territorial, one might expect underdispersion in the abundance distribution with respect to a Poisson distribution. Using a nonparametric distribution provides the flexibility to account for this. The scale of the survey (i.e. sampling site size) was chosen based on the size of a male tiger territory, so it is reasonable to expect a low number of individuals per site and therefore a density of more than five individuals per site to be highly unlikely.

### Results

- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

#### Model performance

##### Impact of unmodelled detection clustering in the abundance estimators

The simulation results showed that the abundance distribution can be estimated relatively well with moderate sampling effort when, as the model assumes, there is no clustering in the detections (Table 1). However, as clustering increases, the estimators became biased. For instance, the means for the abundance distribution estimators were [0·10, 0·51, 0·14, 0·25] in scenario (c3). The number of empty sites was overestimated, influenced by the long stretches without detections. The highest abundance category was also overestimated, which suggests that detection clusters may be interpreted by the model as a result of many individuals being present at the site. In fact, a model based on an abundance distribution with mass in the range 0–5 tended to provide a better fit for the clustered data. For instance, in all 100 simulations for scenario (c3), this model was selected a better explanation for the data based on its AIC. However, the model did not provide a satisfactory estimation of abundance. The distribution probabilities were biased, and the overall abundance was overestimated. The probability mass for the highest abundance category increased with the level of clustering.

nAIC | ||||||||
---|---|---|---|---|---|---|---|---|

^{}RMSE, root mean square error. ^{}Mean and RMSE (in brackets) of the estimators are shown. Results were obtained from 100 simulations of a design with 100 sites surveyed for 30 units of length, assuming a true site abundance distribution with probabilities θ = [0·05, 0·5, 0·3, 0·15] for 0–3 individuals (average abundance *N*= 1·55). Four detection scenarios with equal average individual detection rate and increasing levels of clustering were tested: (nc) γ = 0·25 (no clustering), (c1) γ = [0·5, 0·125] ρ = [6, 12], (c2) γ = [1, 0·0625] ρ = [3, 12] and (c3) γ = [5, 0·0125] ρ = [0·6, 12]. The upper part of the table shows the results of fitting a model based on a nonparametric abundance distribution with four abundance categories. nAIC indicates the number of simulations in which this model produced an AIC smaller than that from a model with six abundance categories. The lower part of the table shows the results of the six-category model for the simulations in which it was selected as best model.
| ||||||||

(nc) | 0·05 (0·021) | 0·50 (0·082) | 0·29 (0·078) | 0·16 (0·084) | – | – | 1·56 (0·154) | 93 |

(c1) | 0·05 (0·022) | 0·48 (0·082) | 0·25 (0·091) | 0·22 (0·092) | – | – | 1·63 (0·148) | 25 |

(c2) | 0·06 (0·026) | 0·51 (0·069) | 0·18 (0·135) | 0·24 (0·113) | – | – | 1·60 (0·134) | 1 |

(c3) | 0·10 (0·059) | 0·51 (0·059) | 0·14 (0·172) | 0·25 (0·120) | – | – | 1·54 (0·106) | 0 |

(nc) | 0·06 (0·023) | 0·31 (0·269) | 0·40 (0·150) | 0·12 (0·111) | 0·08 (0·128) | 0·03 (0·059) | 1·95 (0·546) | |

(c1) | 0·05 (0·022) | 0·28 (0·238) | 0·29 (0·104) | 0·22 (0·142) | 0·05 (0·099) | 0·11 (0·130) | 2·26 (0·774) | |

(c2) | 0·06 (0·026) | 0·36 (0·161) | 0·21 (0·133) | 0·22 (0·117) | 0·02 (0·047) | 0·13 (0·143) | 2·17 (0·669) | |

(c3) | 0·10 (0·058) | 0·38 (0·139) | 0·15 (0·173) | 0·20 (0·101) | 0·00 (0·028) | 0·16 (0·168) | 2·11 (0·598) |

##### Performance of the model generalization for clustering

The results of fitting the model generalization for clustering (Table 2) showed that the model can estimate all the parameters and that the estimators are unbiased and precise given large, yet realistic, data sets. The precision of the abundance estimators was poor for the initial survey design considered. As expected, increasing the amount of sampling effort improved the quality of the estimators. In the scenario with strong clustering (c3), it was more effective to increase the length of the survey rather than the number of sampling sites: doubling the survey length gave better results than a fivefold increase in sampling sites. The initial survey length used was relatively short considering the average interval spent in the low-detection state, in which the rate of detections was very low (0·0125). Individuals were therefore likely to remain undetected at surveyed sites. Increasing the survey length provided data that reflected better the number of individuals present at each surveyed site, and this was more critical than increasing the number of surveyed sites to obtain a better estimation of the abundance distribution.

S = 100 L = 30 | S = 100 L = 60 | S = 200 L = 30 | S = 500 L = 30 | S = 500 L = 60 | |
---|---|---|---|---|---|

^{}RMSE, root mean square error.
| |||||

(c1) | |||||

0·05 (0·017) | 0·05 (0·021) | 0·05 (0·013) | 0·05 (0·010) | 0·05 (0·009) | |

0·44 (0·169) | 0·51 (0·063) | 0·54 (0·089) | 0·52 (0·045) | 0·50 (0·038) | |

0·30 (0·155) | 0·29 (0·097) | 0·25 (0·081) | 0·30 (0·051) | 0·30 (0·037) | |

0·21 (0·162) | 0·15 (0·092) | 0·16 (0·080) | 0·13 (0·061) | 0·15 (0·031) | |

0·56 (0·152) | 0·51 (0·077) | 0·52 (0·127) | 0·52 (0·049) | 0·50 (0·023) | |

0·11 (0·052) | 0·12 (0·027) | 0·13 (0·033) | 0·13 (0·015) | 0·12 (0·010) | |

5·50 (2·5) | 6·75 (2·1) | 6·03 (2·0) | 6·88 (1·5) | 6·28 (1·0) | |

15·3 (11·1) | 14·6 (5·9) | 12·5 (6·0) | 14·5 (3·7) | 12·2 (1·7) | |

1·67 (0·290) | 1·54 (0·152) | 1·52 (0·152) | 1·51 (0·095) | 1·55 (0·064) | |

(c2) | |||||

0·05 (0·027) | 0·05 (0·024) | 0·06 (0·019) | 0·05 (0·011) | 0·05 (0·011) | |

0·50 (0·201) | 0·53 (0·091) | 0·51 (0·104) | 0·46 (0·083) | 0·50 (0·042) | |

0·29 (0·147) | 0·26 (0·103) | 0·26 (0·078) | 0·27 (0·069) | 0·30 (0·038) | |

0·15 (0·162) | 0·15 (0·087) | 0·16 (0·132) | 0·21 (0·102) | 0·15 (0·032) | |

1·00 (0·081) | 1·00 (0·066) | 0·99 (0·043) | 0·97 (0·040) | 1·00 (0·024) | |

0·06 (0·018) | 0·06 (0·010) | 0·06 (0·009) | 0·06 (0·009) | 0·06 (0·005) | |

3·15 (0·648) | 2·97 (0·280) | 3·08 (0·343) | 2·94 (0·178) | 3·00 (0·130) | |

12·2 (3·24) | 11·5 (1·51) | 12·2 (2·46) | 12·2 (1·47) | 12·0 (0·76) | |

1·55 (0·329) | 1·50 (0·141) | 1·54 (0·220) | 1·65 (0·179) | 1·54 (0·066) | |

(c3) | |||||

0·05 (0·034) | 0·05 (0·024) | 0·05 (0·036) | 0·05 (0·021) | 0·05 (0·009) | |

0·55 (0·215) | 0·54 (0·118) | 0·53 (0·167) | 0·52 (0·122) | 0·52 (0·047) | |

0·17 (0·220) | 0·25 (0·117) | 0·23 (0·174) | 0·22 (0·128) | 0·27 (0·055) | |

0·23 (0·186) | 0·16 (0·114) | 0·19 (0·137) | 0·21 (0·136) | 0·16 (0·051) | |

4·98 (0·277) | 5·02 (0·146) | 4·97 (0·172) | 5·02 (0·114) | 5·00 (0·073) | |

0·013 (0·005) | 0·013 (0·003) | 0·012 (0·003) | 0·012 (0·002) | 0·013 (0·001) | |

0·59 (0·046) | 0·61 (0·043) | 0·61 (0·034) | 0·60 (0·022) | 0·61 (0·014) | |

12·2 (2·97) | 11·8 (1·28) | 11·9 (1·67) | 12·6 (2·04) | 12·1 (0·60) | |

1·58 (0·354) | 1·53 (0·215) | 1·56 (0·274) | 1·60 (0·251) | 1·55 (0·081) |

Analysing the data assuming no clustering in the detections provided a much poorer fit, with large differences among the models in terms of AIC for all simulations. Although the abundance estimators were more precise, their coverage properties were poorer, driven by the bias (Table 3). In fact, as shown before, an analysis based on this assumption would actually favour models based on abundance distributions with more support points, which provide estimators that are both biased and imprecise (Table 1).

S, L | No clustering model | Clustering model | ||||||||
---|---|---|---|---|---|---|---|---|---|---|

100, 30 | 0·80 | 0·85 | 0·65 | 0·65 | 0·75 | 0·85 | 0·80 | 0·95 | 0·70 | 0·80 |

100, 60 | 0·90 | 0·90 | 0·80 | 0·90 | 0·90 | 0·90 | 0·95 | 1·00 | 0·90 | 0·95 |

200, 30 | 0·90 | 1·00 | 0·40 | 0·55 | 0·95 | 0·95 | 0·95 | 1·00 | 0·90 | 0·95 |

500, 30 | 0·85 | 1·00 | 0·05 | 0·00 | 0·65 | 0·95 | 0·90 | 1·00 | 1·00 | 0·95 |

500, 60 | 0·95 | 0·55 | 0·90 | 0·20 | 0·50 | 0·95 | 0·95 | 0·95 | 0·95 | 0·95 |

#### Tiger data analysis

Looking in isolation at the results from models that assume no clustering in detections (Table 4a) would suggest that abundance-induced heterogeneity in the detection process is very relevant for this data set. The model that assumes no abundance-induced heterogeneity (NP1) was over 30 AIC units worse than the best one in this subset. However, models that allow for clustering in individuals’ detections (Table 4b,c) had substantially higher support than those that assume independent detections. Modelling clustering provided an improvement of about 30 AIC units in model fit, which indicates that clustering is a relevant feature in this data set. Estimates imply that tiger detections were about ten times more frequent in some areas compared to others. As observed in Guillera-Arroita *et al.* (2011), the switching rate estimates were very small and imprecise, suggesting that transect lengths in this survey were short compared to the clustering pattern in tiger detections. The data were informative about the probability of being in each of the detection states but could not capture well the actual rate at which transitions between states occur.

Mod | ΔAIC | Abundance | Detection | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

(a) | |||||||||||

PP | NP1 | 61·9 | 0·82 | 0·18 | 0·82 | – | – | – | – | 0·11 | |

PPab | NP2 | 40·2 | 1·13 | 0·16 | 0·55 | 0·29 | – | – | – | 0·08 | |

NP3 | 31·0 | 1·43 | 0·13 | 0·59 | 0·00 | 0·28 | – | – | 0·06 | ||

NP4 | 31·2 | 1·79 | 0·10 | 0·51 | 0·12 | 0·00 | 0·26 | – | 0·05 | ||

NP5 | 32·7 | 1·98 | 0·08 | 0·43 | 0·08 | 0·08 | 0·00 | 0·23 | 0·04 |

^{}MMPP, Markov-modulated Poisson process.
| |||||||||||

(b) | |||||||||||

2-MMPP | NP1 | 2·3 | 0·96 | 0·04 | 0·96 | – | – | – | – | 0·33 | 0·23, 0·03 |

MMPPab | NP2 | 2·8 | 1·00 | 0·07 | 0·86 | 0·07 | – | – | – | 0·32 | 0·22, 0·03 |

NP3 | 3·2 | 1·15 | 0·07 | 0·80 | 0·07 | 0·07 | – | – | 0·31 | 0·19, 0·03 | |

NP4 | 5·2 | 1·24 | 0·05 | 0·79 | 0·05 | 0·05 | 0·05 | – | 0·29 | 0·19, 0·02 | |

NP5 | 4·1 | 1·74 | 0·00 | 0·63 | 0·03 | 0·35 | 0·00 | 0·00 | 0·24 | 0·17, 0·02 | |

(c) | |||||||||||

2-MMPP | NP1 | 2·3 | 0·96 | 0·04 | 0·96 | – | – | – | – | 0·33 | 0·23, 0·03 |

MMPPab | NP2 | 0·0 | 1·31 | 0·02 | 0·65 | 0·33 | – | – | – | 0·39 | 0·15, 0·02 |

NP3 | 1·5 | 1·70 | 0·00 | 0·52 | 0·26 | 0·22 | – | – | 0·44 | 0·11, 0·01 | |

NP4 | 3·1 | 2·28 | 0·00 | 0·26 | 0·47 | 0·00 | 0·27 | – | 0·45 | 0·08, 0·01 | |

NP5 | 5·1 | 2·28 | 0·00 | 0·26 | 0·47 | 0·00 | 0·27 | 0·00 | 0·45 | 0·08, 0·01 |

Once clustering was accounted for, the evidence for abundance-induced heterogeneity in the detection process giving rise to the tiger data reduced dramatically. Allowing for abundance-induced heterogeneity provided no improvement in model fit if the clustering pattern among individuals’ detections was assumed independent. The best model in this subset (Table 4b) was the one with two support points in the abundance distribution. The models that allowed for three and four points, which had similar support as explanations for the observed data, estimated low probabilities for the additional abundance categories. This suggests that the structure for modelling abundance-induced heterogeneity in the models that assumed independent detections was, at least partially, capturing the clustering in the detection process rather than actual variations in site abundance. Under the assumption of independence, the best-fitting models (NP3 and NP4) produced relatively large estimates for the highest support point in the abundance distribution. This effect was also observed in our simulation study, when exploring the effect of unmodelled clustering in the estimation of abundance. Analysing the data under the assumption that clustering in detections was ‘synchronized’ among individuals provided some improvement in model fit (Table 4c). The best model in this set was 2·3 AIC units better than the model that assumes no abundance-induced heterogeneity. The abundance estimates suggested that tigers are absent from very few sites, that in about two-thirds of the sites, there is one individual and in the remaining third, there are two. The estimates obtained were however fairly imprecise [SE() = 0·06, SE() = 0·21, SE() = 0·20], which is perhaps not surprising considering that the sample size in this data set was not particularly large.

### Discussion

- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

In this paper, we have extended the models for species-detection data collected continuously along transects presented by Guillera-Arroita *et al.* (2011) to account for abundance-induced heterogeneity in the detection process. The models in the study of Guillera-Arroita *et al.* (2011) assume no differences in site abundance or that those differences do not affect the detection process. When this assumption does not hold, bias may be induced in the occupancy estimator. By linking abundance and detection rate, the models presented here allow the estimation of site abundance, following the spirit of the models proposed by Royle & Nichols (2003) and Royle (2004) for detection/non-detection and count data collected within discrete sampling protocols. Accounting for differences in site abundance also allows for a better estimation of occupancy probability (Dorazio 2007). The proposed models, although developed for data collected along transects, could also be applicable to surveys in which detections are collected over a continuous interval of time, such as camera-trap data.

We started by presenting a basic model that assumes independent detections. Apart from its utility as a model for species-detection data collected along transects, this model can also be useful as an alternative to the *N*-mixture binomial model of Royle (2004) for the analysis of replicated count data, allowing for the possibility of detecting each individual more than once per replicate. This may be important for some surveys based on direct observations (e.g. camera-trap surveys without individual identifications) and is often crucial when modelling indirect observation data (e.g. pugmarks), as each individual can leave more than one sign. In particular, the Poisson–Poisson mixture model had been suggested previously as a useful model for encounter-rate data (Stanley & Royle 2005; in discussion; Royle & Dorazio 2008, 413).

We have shown that modelling clustering in the detection process can be relevant when estimating site abundance. Our results demonstrate that disregarding the lack of independence within the detections of individuals can induce bias in the abundance estimator. We have proposed a model to deal with this problem, considering two model variants that account for different sources of clustering giving rise to independent or ‘synchronized’ clustering patterns. Independent clustering patterns can be expected when the species movement patterns are such that individuals independently follow the transect or remain around a camera trap for a while. ‘Synchronized’ clustering patterns can be expected if individuals move instead as a group or if, for instance, there are differences in substrate conditions. Our model is based on 2-MMPPs, which provide a simple description for clustered detections. Although other models are possible for describing varying detection rates (e.g. having more states in the underlying hidden process or even allowing detection rates to vary continuously), we believe 2-MMPPs will often provide a sufficiently flexible description for this kind of data.

Our approach represents an attractive alternative to the previously proposed two-step *ad hoc* solution (Hines *et al.* 2010), as it provides a description of the detection process that explicitly incorporates both aspects, and therefore allows the estimation of not only abundance but also parameters associated with the clustering pattern. This is, as far as we know, the first model that accounts for both clustering within the detections of individuals *and* abundance-induced heterogeneity in the species-detection process. Martin *et al.* (in press) studied the effect of correlated behaviour in the *N*-mixture binomial model used to estimate abundance from replicated counts. Although tackling an essentially different problem (i.e. dependence among individuals instead of dependence within detections from each individual), they also found that lack of independence in the detections can lead to a poor estimation of abundance. Our results also illustrate how choosing an appropriate allocation of the survey effort into number of sites and length surveyed per site leads to better estimators.

One limitation of the model proposed here to account for clustering is that it is relatively demanding in terms of computing time, owing to the matrix operations involved in the likelihood. For example, fitting the independent clustering model with five abundance support points to the tiger data set took in our implementation around 10 min compared to the few seconds required by the basic model. As computing time increases with the number of support points in the abundance distribution, this becomes more relevant when dealing with abundant species. In the presence of clustering, sample size requirements increase as the detection process becomes more complicated to describe, but our simulations indicated that precise estimates can be obtained with sample sizes that, although large, are still achievable in ecological studies. Unmodelled clustering induces bias in the abundance estimators; however, the estimators are more precise than those from the clustering model. Depending on the sample size, disregarding clustering may result in better estimators in terms of RMSE, but with poorer coverage properties. A similar phenomenon was observed by Morgan & Ridout (2008) on closed population capture–recapture models for estimating population size while accounting for heterogeneity in capture probability. Their simulations showed that, when the true model was a beta-binomial, fitting a binomial model produced a very precise but biased estimator of abundance. The beta-binomial model performed well in terms of bias but provided much poorer precision.

When applying the models proposed in this paper, it is important to remember that the appropriateness of interpreting the estimates obtained as abundance estimates is contingent on how well the model assumptions are met. First of all, the models assume that differences in site abundance are reflected as heterogeneity in the detection process and that this is the only source of such heterogeneity. Unmodelled heterogeneity coming from other sources would be interpreted by the model as abundance-induced, which can cause bias in the estimators. For instance, our simulations and analysis demonstrated how unmodelled clustering in the detection process can induce bias in the abundance estimators. An appropriate description of the detection process is critical for a reliable estimation of abundance. If some measurable factors are thought to affect detection rates appreciably, these can and should be incorporated into the models as covariates. Second, the models assume that all the individuals present at the site are detectable over the whole site (i.e. at any point in the site, the detection process is the superposition of *n*_{i} individual detection processes). Whether this assumption is satisfied depends on the choice of site size and on the characteristics of the species. Given the territorial nature of tigers, with females exhibiting little overlap in their smaller home ranges, this assumption is likely to be the one least adequately met in our application example. In this connection, we would like to emphasize that the goal of this analysis was to illustrate the impact that detection clustering has on the estimation of abundance rather than to obtain a figure for tiger numbers. Third, all individuals are assumed to exhibit similar movement patterns, so that their detections are well described by identical detection processes. While for most species, this may be a reasonable approximation, for some, there may be marked differences among groups, such as males vs. females. Such heterogeneity could potentially be addressed by modelling the detection process as a mixture of non-identical point processes, although we have not investigated this extension.

Finally, it is worth noting that other models might be devised to account for heterogeneity in detection rate. For instance, a negative-binomial model could be used, which would arise if the rate in the Poisson detection process is allowed to vary among sites according to a gamma distribution. Finite mixtures could be used to characterize a system in which sites can belong to a finite number of classes with distinct species-detection rates, as done in capture–recapture to model heterogeneous recapture probabilities (Pledger 2000). Verifying whether abundance is the source of heterogeneity in the detection process may be difficult. In occupancy models for discrete sampling protocols, different descriptions for heterogeneity in detection probability can sometimes fit the detection/non-detection data equally well and yet produce different estimates of occupancy (Royle 2006). This kind of identifiability problem can also be expected when modelling data collected along transects. This adds to our discussion above on the need to address other sources of heterogeneity in detection rate to obtain reliable estimates of abundance. This should not only be relegated to the development of advanced models with sophisticated descriptions of the detection process but should also be dealt with at the early stages of the study by carefully addressing sampling design to minimize unwanted sources of heterogeneity.

### Acknowledgements

- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

The work of G.G.-A. has been supported by an EPSRC/NCSE grant. We thank the US Fish and Wildlife Service, 21st Century Tiger, Rufford Small Grants, and the Peoples Trust for Endangered Species for funding the tiger surveys; the Indonesian Department of Forestry and Nature Protection for their assistance in the tiger survey work; Yoan Dinata, Agung Nugroho, Iding Achmad Haidir and Maryati for their help with the data collection and entry; and José Lahoz-Monfort, Marc Kéry, Matthew Spencer and an anonymous reviewer for useful comments.

### References

- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

- 1980) Point Processes. Chapman and Hall, London. & (
- 2007) On the choice of statistical models for estimating occurrence and extinction from animal surveys. Ecology, 88, 2773–2782. (
- 1992) The Markov-modulated Poisson process (MMPP) cookbook. Performance Evaluation, 18, 149–171. & (
- 1988) Concrete Mathematics. Addison-Wesley, Reading, MA, USA. , & (
- 2011) Species occupancy modeling for detection data collected along a transect. Journal of Agricultural, Biological, and Environmental Statistics, 16, 301–317. , , & (
- 2010) Tigers on trails: occupancy modeling for cluster sampling. Ecological Applications, 20, 1456–1466. , , , , , & (
- 2005) Univariate Discrete Distributions, 3rd edn. John Wiley & Sons, Hoboken, NJ, USA. , & (
- 2011) Towards the modelling of true species distributions. Journal of Biogeography, 38, 617–618. (
- 2009) Trend estimation in populations with imperfect detection. Journal of Applied Ecology, 46, 1163–1172. , , , , & (
- 2002) Estimating site occupancy rates when detection probabilities are less than one. Ecology, 83, 2248–2255. , , , , & (
- 2006) Occupancy Estimation and Modeling: Inferring Patterns and Dynamics of Species Occurrence. Academic Press, New York, USA. , , , , & (
- Accounting for non-independent detection when estimating abundance of organisms with a Bayesian approach. Methods in Ecology and Evolution, doi: 10.1111/j.2041-1210X.2011.00113.x. , , , , & (in press)
- 2008) A new mixture model for capture heterogeneity. Applied Statistics, 57, 433–446. & (
- 2000) Unified maximum likelihood estimates for closed capture–recapture models using mixtures. Biometrics, 56, 434–442. (
- 2004) An empirical model for underdispersed count data. Statistical Modelling, 4, 77–89. & (
- 2004) N-mixture models for estimating population size from spatially replicated counts. Biometrics, 60, 108–115. (
- 2006) Site occupancy models with heterogeneous detection probabilities. Biometrics, 62, 97–102. (
- 2008) Hierarchical Modeling and Inference in Ecology. Academic Press, Amsterdam. & (
- 2003) Estimating abundance from repeated presence–absence data or point counts. Ecology, 84, 777–790. & (
- 2006) Markov modulated Poisson processes for clustered line transect data. Environmental and ecological statistics, 13, 199–211. (
- 2005) Estimating site occupancy and abundance using indirect detection indices. The Journal of Wildlife Management, 69, 874–883. & (
- 2003) Improving precision and reducing bias in biological surveys: estimating false-negative error rates. Ecological Applications, 13, 1790–1801. , , , , & (
- 2001) Monitoring of biological diversity in space and time. Trends in Ecology & Evolution, 16, 446–453. , & (

### Supporting Information

- Top of page
- Summary
- Introduction
- Methods
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

**Appendix S1.** Design recommendations and estimator performance for the Poisson mixture model with independent detections.

**Appendix S2.** Details on the likelihood construction for the model with clustering in individual detections.

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Filename | Format | Size | Description |
---|---|---|---|

MEE3_159_sm_AppendixS1.pdf | 163K | Supporting info item | |

MEE3_159_sm_AppendixS2.pdf | 116K | Supporting info item |

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.