Correspondence site: http://www.respond2articles.com/MEE/

# Spatial capture-recapture models for search-encounter data

Article first published online: 18 MAY 2011

DOI: 10.1111/j.2041-210X.2011.00116.x

© 2011 The Authors. Methods in Ecology and Evolution © 2011 British Ecological Society

Additional Information

#### How to Cite

Royle, J. A., Kéry, M. and Guélat, J. (2011), Spatial capture-recapture models for search-encounter data. Methods in Ecology and Evolution, 2: 602–611. doi: 10.1111/j.2041-210X.2011.00116.x

#### Publication History

- Issue published online: 5 DEC 2011
- Article first published online: 18 MAY 2011
- Received 5 July 2010; accepted 3 April 2011 Handling Editor: Nigel Yoccoz

### Keywords:

- Bayesian analysis;
- data augmentation;
- density estimation;
- distance sampling;
- hierarchical models;
- population size;
- search-encounter data;
- spatial capture–recapture;
- spatially explicit capture–recapture

### Summary

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

**1.** Spatial capture–recapture models make use of auxiliary data on capture location to provide density estimates for animal populations. Previously, models have been developed primarily for fixed trap arrays which define the observable locations of individuals by a set of discrete points.

**2.** Here, we develop a class of models for ‘search-encounter’ data, i.e. for detections of recognizable individuals in continuous space, not restricted to trap locations. In our hierarchical model, detection probability is related to the average distance between individual location and the survey path. The locations are allowed to change over time owing to movements of individuals, and individual locations are related formally by a model describing individual activity or home range centre which is itself regarded as a latent variable in the model. We provide a Bayesian analysis of the model in WinBUGS, and develop a custom MCMC algorithm in the R language.

**3.** The model is applied to simulated data and to territory mapping data for the Willow Tit from the Swiss Breeding Bird Survey MHB. While the observed density was 15 territories per nominal 1 km^{2} plot of unknown effective sample area, the model produced a density estimate of 21·12 territories per square km (95% posterior interval: 17–26).

**4.** Spatial capture–recapture models are relevant to virtually all animal population studies that seek to estimate population size or density, yet existing models have been proposed mainly for conventional sampling using arrays of traps. Our model for search-encounter data, where the spatial pattern of searching can be arbitrary and may change over occasions, greatly expands the scope and utility of spatial capture–recapture models.

### Introduction

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

Capture–recapture models that accommodate auxiliary spatial information in the form of locations of capture are a relatively new development in ecological statistics (Efford 2004; Borchers & Efford 2008; Royle & Young 2008; Efford, Dawson & Borchers 2009; Royle *et al.* 2009; Gardner *et al.* 2010a,b; Borchers 2011; Kéry *et al.* 2011). Despite their recent development, such models show great promise in addressing a large number of inference problems from spatial arrays of capture devices including camera traps, mist nets, DNA data and other methods of obtaining spatially indexed individual encounter data. Most of the current methodology deals explicitly with the situation where encounter locations represent fixed points in space (i.e. ‘traps’), which is typical of most designed studies aimed at employing capture–recapture methods.

Nevertheless, in many studies, encounter information is obtained by what can best be described as opportunistic encounter methods, where encounters do not arise from fixed arrays of detector devices or traps but rather from surveyors searching space by vehicle or on foot along some sort of transect. Thus, detections are made in continuous space and not restricted to discrete locations determined by locations of traps or other devices. One example was considered by Royle & Young (2008). In their study, crews searched a well-defined plot and encountered individuals, noting the location where each individual was captured. Royle & Young developed a spatial capture–recapture model under the assumption of a uniform search intensity. That is, while the model did not require that detection was perfect, it was assumed that detection probability was constant within the prescribed survey region. In this study, we develop a model for a situation in which the search intensity within a study area is not uniform in space. A prototypical kind of method has a surveyor walking an arbitrary path through a study plot and locating individuals, noting their coordinates, uniquely marking them, and then releasing them. Subsequent samples, perhaps along the same path, or perhaps not, yield recaptures of individuals and new captures of individuals that are marked also. We assume that the path walked by the surveyor can be characterized by line segments, such as produced by a GPS device. We refer to these methods as ‘search-encounter’ methods, and the objective of this study is to describe a modeling and inference framework for data obtained by such methods.

Our strategy for constructing the model is to develop a hierarchical model for the observations conditional on the outcome of an underlying stochastic movement process (i.e. individual locations), which itself depends on the outcome of another stochastic process describing the distribution of individual activity (home range) centres. This concept of parameterizing the model in terms of a home range centre is the cornerstone of all existing spatial capture–recapture models (Efford 2004; Borchers & Efford 2008; Gardner *et al.* 2010a,b; Royle & Young 2008), whereas conditioning on the outcome of a movement process has previously only been considered by Royle & Young (2008). Encounter probability is parameterized to depend on an individual's location during the survey, and the configuration of the surveyed path. In particular, if an individual's location at the time of the surveys is relatively closer to the surveyed line then the individual has a higher probability of being encountered. The model we develop extends that of Royle & Young (2008) and also existing distance sampling models (e.g. Borchers, Zucchini & Fewster 1998) to accommodate novel spatial sampling designs.

One type of survey that motivated the development of the model described in this study is the use of dogs to locate scat of carnivores (C. Thompson, unpublished data). In such surveys, dog teams search an area and collect scat, from which individual identity is determined in the laboratory from DNA. Similarly, in a survey for a large forest grouse in Switzerland (P. Mollet *et al.*, unpublished data), forest fragments were surveyed for scat by crews on foot after snowfall. Search-encounter surveys are also commonly used to sample herptile (Hall, Henry & Bunck 1999; Royle & Young 2008) and bird (Schmid, Zbinden & Keller 2004; Kéry & Royle 2010) populations.

In the next section, we provide a more formal description of the sampling situation along with the approach to formulating the model. A formal development of our spatial capture–recapture model for search-encounter data is given in “Hierarchical model”. In “Bayesian analysis by MCMC”, we describe a Bayesian analysis of this model using MCMC methods. Using data augmentation (Royle, Dorazio & Link 2007), the model is easily implemented in WinBUGS (Appendix S1). In addition, we have written custom R code for carrying out the analysis (Appendix S2). In “ Analysis of the MHB data”, the model is applied to data from the Swiss Breeding Bird Survey MHB.

### Sampling situation

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

We assume detections of uniquely recognizable individuals in continuous space, i.e. suppose that an area is searched (‘sampled’), and any individual encountered is captured and marked. The locations of each individual are recorded upon capture. We suppose that the area is sampled *T* > 1 times and subsequent recapture locations are also identified.

We denote by the ‘sample unit’ within which some amount of sampling activity is directed. In Royle & Young (2008), was a polygon having well-defined boundaries (e.g. a plot) within which individuals could be encountered. In their model, was subjected to a uniform search intensity. Often will be precisely defined but not uniformly searched, and we are concerned with this situation here. For example, sample units (e.g. plots) will be identified and selected, but sampled inconsistently and unsystematically using an existing road or trail network, or a collection of transects. We consider the existence of a formal sample unit for consistency with Royle & Young (2008). However, in some sampling situations, formal sample units might not exist or be poorly defined such as when individual lines or sample routes are selected, as opposed to boundaries of a spatial unit. In either case, we suppose that the sample path taken by observers is well defined by a collection of segments such as from a GPS track file or similar set of points, say **X** = (**x**_{1},…,**x**_{J}) where X ∈ . To be clear, the matrix **X** represents the set of points (i.e. their coordinates) defining sample line segments and represents a polygon in some 2-D coordinate system that is delineated by the surveyors and within which the line segments **X** are situated.

One example of such a situation is that in which we draw a random sample of 1-km^{2} quadrats and then sample a set of survey lines or a route within each quadrat. In this case, represents the quadrat and **X** are the coordinates of the surveyed lines. In some instances, we might obtain **X** without explicitly defining but that does not affect the development of the model which follows.

The data resulting from a search-encounter survey will be encounter data *y*_{it}, the binary encounter events for individual *i* = 1, 2,…, *n* and samples *t* = 1, 2,…,*T*. Further, we also obtain auxiliary encounter locations **u**_{it} which are the spatial coordinates of each individual during survey *t*. A key feature of the data structure (addressed subsequently) is that **u**_{it} is missing whenever *y*_{it} = 0. An idealized example of this situation is shown in Fig. 1. The bold red line represents a sequence of transects surveyed *T* = 6 times. In this case, they are adjoining transects but this would not have to be the case. The black dots represent encounter locations for simulated individuals (see next section), and multiple encounters of the same individual are connected by lines. While the survey lines are linear segments in this example, in many practical situations, regular linear sample lines are unlikely to occur. And importantly, in some situations, different sample paths will be chosen during all occasions – our model can cope with that.

The basic objective considered in the following section is to develop a model for search-encounter data that enables estimation of the density of individuals. The need for such a framework is motivated by a number of practical issues: (i) The number of individuals not captured is unknown – this is the classical imperfect detection problem that motivates most classical animal sampling methods (e.g. Borchers, Buckland & Zucchini 2002; Seber 1982; Williams, Nichols & Conroy 2002); (ii) In spite of prescribing the sample unit within which a search effort is applied, at the end of some arbitrary expenditure of effort, we typically cannot know precisely the effectively sampled area. In particular, animals may move onto and off of within and among samples; (iii) As a result, individuals will also exhibit variable exposure to encounter; Finally, (iv) is not uniformly searched (spatial coverage bias of the sampling) and this exacerbates (ii) and (iii).

To solve the problem of density estimation in this context, we imagine that the likelihood of ‘capture’ of individuals is related to their accumulated exposure to encounter along the surveyed lines (in a manner described elsewhere). Further, we imagine that repeated encounter locations of the same individual should be spatially correlated, and so we parameterize them as being generated by movements of the individual about the centre point of its territory or home range. This leads to a hierarchical model that provides an explicit, sequential linkage between the spatial location of individuals, their movements across sample periods, and whether or not they are encountered by the observer.

### Hierarchical model

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

Here, we develop a hierarchical model for encounter history data in continuous space obtained from search-encounter studies. In this context, hierarchical models take the form of a sequence of component models representing the outcome of an observation process, conditional on an underlying ecological process, and then additional model components that describe the ecological process (Royle & Dorazio 2008). Specifically, the description of the ecological process consists of a model for the distribution of individuals as well as the outcome of some movement process which describes the observed locations during each sampling of the population. The observation model, on the other hand, describes the process by which detections with their associated locations appear in the sample. In the present context, it will take the form of a capture–recapture type of model with a time-varying individual-covariate, that being individual location at the time of sampling.

#### Observation model

Our approach is to parameterize a model for the encounter data *y*_{it} in terms of **u**_{it}, the 2-D location of capture. We develop models for encounter probability that depend explicitly on **u**_{it}, i.e. *p*_{it} ≡ *p*(**u**_{it}) = Pr(*y*_{it} = 1|**u**_{it}). A technical problem is that we do not know **u** for *y* = 0 observations, and thus, it will not be possible to analyse the conditional-on-**u** likelihood directly. Instead, we regard **u** as random effects and assume a distribution for them, which allows us to handle the problem of missing **u** values. We adopt a Bayesian formulation in which the model is analysed conditional on **u** using MCMC methods.

Intuitively, Pr(*y*_{it} = 1|**u**_{it}) should increase as **u**_{it} comes ‘close’ to the line segments **X**. It seems reasonable to express closeness by some distance metric ||**u**_{it}−**X**|| = *dist*(**u**_{it},**X**) and then assume

For the case where **X** describes a wandering track line, some kind of average distance from the object location to the track line might be reasonable; possible alternatives include the absolute minimum distance (see Discussion) or the mean over specific segments of the line, etc.

We note the similarity to the modelling of the mortality hazard in survival analysis, a formulation adopted for distance sampling by Hayes & Buckland (1983) and Skaug & Schweder (1999) and, in the context of arrays of fixed traps by Borchers & Efford (2008). The individual is detected (analogous to mortality) if encountered at any point along **X**. Naturally, covariates are modelled as affecting the hazard rate, and we think of distance to the line as a covariate acting on the hazard. Let *h*(**u**_{it},**x**) be the hazard of individual *i* being encountered by sampling at a point **x** on occasion *t*. For example, one possible model assumes, for all points **x** ∈ **X**,

- ( eqn 1)

The total hazard to encounter anywhere along the survey path, for an individual located at **u**_{it}, say *H*(**u**_{it}), is obtained by integrating over the surveyed line, which we will evaluate numerically by a discrete sum where the hazard is evaluated at the set of points **x**_{j} along the surveyed path:

- ( eqn 2)

where **x**_{j} is the *j*th row of **X** defining the survey path as a collection of line segments which can be arbitrarily dense, but should be regularly spaced. Then the probability of encounter is

- ( eqn 3)

This is a reasonably intuitive type of encounter probability model in that the probability of encounter is large when an individual's location **u**_{it} is close to the line in the average sense defined by Eqn (2), and vice versa. Note that *p*_{it} also depends on the sample path **X**, i.e. *p*(**u**_{it},**X**) which we suppress in our notation because **X** is fixed for any specific analysis. We note that we do not require that all line segments are surveyed during each sample period, as this simply affects the construction of the encounter probability for each sample. Thus, different line segments may be surveyed at different times, which results in considerable flexibility in the design of a survey. Additional covariates could be included in the hazard function. For example, in some situations observers might record weather conditions along the route, time-of-day, effort or other covariates (Kéry, Royle & Schmid 2005).

This formulation of total hazard and encounter probability assumes that encounter at each point along the line, **x**_{j}, is independent of each other point. Then, the event that an individual is encountered at all is the complement of the event that it is not encountered anywhere along the line (see also Hayes and Buckland 1983). In terms of the survival/hazard analogy, the survival function is *S*(**u**_{it},**x**_{j}) = *exp*(−*h*(**u**_{it},**x**_{j})) and so the probability that an individual ‘survives’ all *j* points is Π_{j}*exp*(−*h*(**u**_{it},**x**_{j})) and the encounter probability is therefore the complement of this, which is precisely the expression given by Eqn (3).

Consider the case of a single survey point, i.e. **X** ≡ **x**, which we might think of as a camera trap location. In this case note that Eqn (3) is equivalent to

which is to say that distance is a covariate on detection that is linear on the complementary log–log scale, which is similar to the ‘trap-specific’ encounter probability used in Royle *et al.* (2009) and elsewhere for modelling encounter data from camera traps and related devices. The difference is that, here, the relevant distance is between the ‘trap’ (i.e. the survey lines) and the individual's present location, **u**_{it}, which is observable. On the other hand, in the context of camera traps, the distance is that between the trap and a latent variable, **s**_{i}, representing an individual's home range or activity centre which is not observed.

#### Alternative models

We consider four distinct models for the hazard function. Model 1 is that described previously,

This corresponds to what is usually called the Gompertz hazard function. It is most often written *h*(*t*) = *a* exp(*b***t*) in which case *log*(*h*(*t*)) = *log*(*a*) + *b***t*. Model 2 (squared-distance) is a quadratic function of distance,

This model comes from Royle *et al.* (2009) and implies a bivariate normal hazard rate model. Model 3 is from Borchers & Efford (2008):

which produces a normal kernel model for probability of detection at the point level. i.e. Pr(*y*=1) = 1− exp(−*h*)=*h*_{0} exp(*β*_{1}**dist*(**u**_{it},**x**)^{2}) where *h*_{0} = expit(*β*_{0}). Model 4 is,

which is a Weibull hazard function. We used posterior deviance and DIC (Spiegelhalter *et al.* 2002) to compare these models (see Model selection).

#### Ecological process model

We have so far described the model for the encounter data in a manner that is conditional on the locations **u**_{it}, some of which are unobserved. That consideration alone justifies the need for a 2nd level model – a ‘random effects’ distribution – for the **u**_{it} variables. In addition, biologically we expect that these variables should be correlated because they correspond to repeated measures on the same individual. To develop such a model, we adopt what is now customary in spatial capture–recapture problems – we assume that individuals are characterized by a latent variable, **s**_{i}, which represents a centre of activity or territory or simply ‘home range’. This leads to a natural model for the variables **u**_{it}. In particular, we can now think of **u**_{it} as the outcomes of a movement process, conditional on **s**_{i}. Here, we make use of the bivariate normal model:

where **I** is the 2 × 2 identity matrix. This is a primitive model of individual movements about their home range but, in most capture–recapture studies, we will only have one to several observations on each individual and thus very limited ability to estimate complex home range models. Therefore, we believe that the bivariate normal model will be sufficient for most real-life spatial capture–recapture problems.

Finally, to account for the fact that **s**_{i} are also unobserved random variables, we describe a further prior distribution for those variables. Following Borchers & Efford (2008), Royle & Young (2008) and others, we regard the population of *N* activity center variables, **s**_{i}, as the outcome of a point process with state-space S, and assume that the **s**_{i} are mutually independent and uniformly distributed over S. Specifically,

The state-space, S, is prescribed by the investigator based on the configuration of the survey path **X** and the biology of the species. In a Bayesian analysis, the size and configuration of the state-space require careful consideration because the model is analysed conditional on the underlying point process. That is, the point process is explicitly simulated as part of the MCMC algorithm, and thus the state-space must be described precisely. There are two approaches to choosing S. We might consider describing the geographic region containing the survey path only where habitat is suitable (e.g. Royle *et al.* 2009). Alternatively, if such information is not available, we can define a regular polygon (e.g. rectangle) containing the survey path without differentiating unsuitable habitat. While *N* is arbitrary in the sense that it necessarily increases with S, which is also arbitrary, we note that the density of points is invariant to S as long as S is sufficiently large, which can be verified by conducting a trial MCMC run.

In summary, our model as described so far can be written concisely as consisting of the following three model components:

- 1Ecological process 1 (distribution of activity centres)
- 2Ecological process 2 (distribution of individual locations around activity centres)
- 3Observation process

We see that this model is a type of binary (logistic) regression model with random (latent) effects, but with a complex latent variable structure. A number of independence assumptions among the random variables are implied. In particular, the observations *y*_{it} are mutually independent conditional on **u**_{it}. The locations **u**_{it} are mutually independent conditional on **s**_{i}, and the **s**_{i} are mutually independent.

#### Unknown *N*

We have specified the model ‘conditional on *N*’, where *N* is the total population of individuals residing in the state-space S. We need to account for the fact that *N* is unknown which we do by puting a discrete uniform prior on it. Specifically, we assume that *N* ∼ *Uniform*(0,*M*) for some integer *M* chosen sufficiently large so that the resulting posterior distribution for *N* is unaffected. Then, we define the density of individuals as a derived parameter, being a transformation of *N*:

where is the area of the prescribed state-space.

That *N* is unknown poses some computational challenge because the dimension of the resulting parameter space is not fixed. That is, the number of parameters in the model is itself a parameter. This is conveniently dealt with using the technique of ‘data augmentation’ (Tanner & Wong 1987), which was developed in the context of capture–recapture models with unknown *N* by Royle, Dorazio & Link (2007). See Royle & Dorazio (2008), Royle & Dorazio (2011), Kéry & Schaub (2011) for details and many examples of data augmentation.

To implement data augmentation, we augment the observed data set of size *n* (number of individuals observed) with a large number, *M* − *n*, of all-zero encounter histories. We recognize, by asserting that *M* > *N*, that some of these all-zero encounter histories are ‘structural zeros’, i.e. they do not correspond to extant individuals. Conversely, we also expect stochastic (or ‘sampling’) zeros which correspond to extant individuals that did not appear in the sample owing to detection error. Recognizing this structure as a problem of zero-inflation, it can be demonstrated that the observation model for the augmented data can be formulated as a zero-inflated version of the known-*N* model where the zero-inflation parameter corresponds to the number of excess zeros, beyond the unknown *N* (Royle, Dorazio & Link 2007). Thus, in a sense, the complement of the zero-inflation parameter is equivalent to *N*. To parameterize the zero-inflated observation model, we introduce a set of observation-level indicator variables, say *w*_{i}, where *w*_{i} are Bernoulli random variables with parameter *ψ* (the complement of the zero-inflation parameter). Then observations *y*_{it} = 0 correspond to an excess zero when *w*_{i} = 0 and to a sampling zero when *w*_{i} = 1; in the latter case, an individual is indeed a member of the population of size *N*. The known-*N* observation model is modified from *y*_{it} ∼ Bern(*p*_{it}) (as above) to *y*_{it} ∼ Bern(*w*_{i}*p*_{it}) and the latent variables *w*_{i} for *i* = *n*+1,…,*M* are updated with the remaining model parameters in the MCMC algorithm (see below). Under data augmentation, *N* is a derived parameter, computed as a function of the latent variables *w*_{i}: . This precise specification of the model for the augmented data induces a discrete uniform prior on [0,*M*] for the parameter *N*, i.e. the implied distribution of *N*, conditional on *ψ*, is *Bin*(*M*,*ψ*) which, when marginalized over the Unif(0,1) prior for *ψ*, yields the uniform [0,*M*] marginal prior (Royle, Dorazio & Link 2007).

Analysis of the model by data augmentation adds another layer to our hierarchical model to account for unknown *N*. The parameter-expanded model is as follows:

- 0Zero-inflation to account for unknown
*N* - 1Ecological process (distribution of individuals)
- 2Ecological process (movement of individuals)
- 3Observation process

### Bayesian analysis by MCMC

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

In dealing with hierarchical models with latent variables, we adopt a Bayesian approach to analysis using Markov chain Monte Carlo methods. We developed a custom MCMC algorithm to fit this model in the free software R, which is given in Appendix S2. In practice, we do not have to worry about how to develop such algorithms because general purpose MCMC black boxes such as WinBUGS (Gilks, Thomas & Spiegelhalter 1994; Kéry 2010) exist to do this for us. Thus, to facilitate implementation and extension, we also provide the BUGS model specification for the simulated data in Appendix S1.

#### Model selection

Bayesian model selection is now routinely carried out using the Deviance Information Criterion (DIC; Spiegelhalter *et al.* 2002), although its effectiveness in hierarchical models depends very much on the manner in which it is constructed (Millar 2009). Because our model set is focused only on variations of the model describing Pr(*y*=1|*u*), we evaluated a number of deviance and DIC criteria based on the observation component of the model, as described in Appendix S3.

#### Assessment of model fit

An approach for assessing model fit is the Bayesian *P*-value (Gelman, Meng & Stern 1996) in which model fit is assessed using a measure of discrepancy between the observed data and their expected values. At this time, an omnibus goodness-of-fit discrepancy measure for hierarchical models such as ours is unavailable. Therefore, to evaluate model fit, we focused on using conventional ideas to test individual components of the model.

To evaluate the encounter component of the model we used individual encounter frequencies and, following Brooks, Catchpole & Morgan (2000), we used the Freeman-Tukey statistic (Freeman & Tukey 1950;Bishop, Fienberg & Holland 1975) which is

where we defined *n*_{i} to be the encounter frequency for individual *i* conditional on **s**_{i} and *e*_{i} is the expected value under the model. The Freeman–Tukey statistic is desirable because it is unaffected by sparse data, i.e. cells with low expected values, thus avoiding the need to pool observations (Brooks *et al.* 2000).

#### Testing complete spatial randomness

We also evaluated the reasonableness of the ‘uniformity’ assumption for the underlying point process using a similar Bayesian *P*-value approach. Historically, especially in ecology, there has been a huge amount of interest in whether a realization of a point process indicates ‘complete spatial randomness’ (CSR), i.e. that the points are distributed uniformly and independently in space (Cressie 1991, ch. 8). In the context of animal capture–recapture studies, we expect the CSR hypothesis to be false, purely on biological grounds, because typically individuals are either clustered (e.g. denser in good habitat) or regularly distributed owing to the presence of territoriality and other behavioural processes. Nevertheless, spatial randomness may be a reasonable approximation to truth in some situations, especially for relatively sparse data. In that sense, CSR represents a reasonable null model under which to evaluate sparse capture–recapture data sets.

To evaluate the uniformity assumption (i.e. ‘complete spatial randomness’) for the activity centres, we used a standard chi-square goodness-of-fit test statistic, based on gridding the state-space of the point process into *g* = 1,2,…,*G* cells, and we tabulated *n*(*g*) the number of activity centers in each grid cell. A standard goodness-of-fit statistic for CSR is based on the ratio of the variance of *n*(*g*) to the mean (see table 8.3 in Cressie 1996). In particular, in parametric likelihood theory, should have a chi-square on (*G* − 1) df under the CSR hypothesis. However, we used this statistic as the basis for our Bayesian *P*-value calculation, comparing a posterior sample of *I* computed using each posterior realization of the point process with a value of *I* obtained by simulating **s** under CSR.

### Analysis of the MHB data

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

The Swiss MHB survey of common breeding birds is run annually in 267 1-km^{2} plots distributed as a grid over Switzerland (Schmid, Zbinden & Keller 2004; Royle *et al.* 2005;Kéry & Royle 2010). Plots are sampled 2–3 times per breeding season by experienced volunteer observers along a plot-specific route using territory (spot) mapping (Bibby *et al.* 2000). This widely used method assumes that every detection can be unambiguously assigned to an individual territory and hence, that territory encounter histories can be constructed from detections in replicated surveys within a period of closure (i.e. one breeding season). Under these precise assumptions, the resulting data are useful for modelling encounter probability and abundance or occurrence (see series of papers including Kéry, Royle & Schmid 2005; Royle & Kéry 2007; Kéry & Schmidt 2008; Kéry & Royle 2009, 2010).

We consider data collected during 2007 for a small passerine, the willow tit *Parus montanus*, from the 7 plots surveyed in the Engadine valley region in the south-eastern Swiss Alps. In those 7 plots, a total of 105 unique territories were identified (min = 0 individuals, max = 46 individuals). Of the 105 individual territories, 10 were encountered in all 3 visits, 32 were encountered twice, and 63 were encountered only once. Estimation of *N* using the simplest non-spatial capture–recapture model (‘Model *M*_{0}’) yielded for these data, corresponding to a naive density estimate of territories per plot. However, the problem with this estimator is that we cannot be certain that the entire plot population was sampled. There is variable exposure because of the configuration of the route, and some of the individuals encountered may possibly reside outside of the sampled plot. These are issues that motivate general interest in spatial capture–recapture models (Efford 2004; Royle & Young 2008), and their application in this specific situation. To visualize the data from the MHB survey, we have overlaid the individual encounters (red dots) with a 60-m point digital representation (black dots) of the surveyed route for each of the 7 MHB plots (Fig. 2).

Before fitting the model described previously, we have to address two technical elements of the MHB data, which were not discussed explicitly earlier.

#### Hard plot boundaries

The previous development assumed that encounters can be made anywhere in space but that the encounter probability decreases with distance from the survey path. In practice, as in the MHB, we might delineate a plot which restricts where individuals might be observed [as in the situation considered by Royle & Young (2008)]. For such cases, we truncate the encounter probability function such as

where is the surveyed polygon and the indicator function if and 0 otherwise. That is, the probability of encounter is identically 0 if an individual is located outside the plot at sample period *t*. Given this modified encounter probability function, it is clear that the model is a modified form of Royle & Young (2008) where their model –‘uniform search intensity’– replaces the above expression with

#### Multiple survey plots

It is common in wildlife surveys to have multiple spatial sample units which need to be integrated into a single model. It is convenient if the population sizes for each plot are independent. In the case of the MHB data, the closest two plots were 10 km apart and, for this species, it is reasonable to assume independence. Moreover, the MHB plots represent (approximately) a random sample, and thus, independence is probably justified from a design-based perspective. With multiple plots, it is convenient computationally to organize the plots in some modified coordinate system that keeps them far enough apart so that individual movement outcomes cannot be located in multiple plots. This enables an implementation by data augmentation based on a single augmented data set. To construct the point process state-space, the 7 plots were embedded into a 30·8-km rectangular state-space having a minimum of 0·6 km buffer, which we judged to be sufficient given the estimate of *σ* (see below) so that individuals cannot appear in more than 1 plot during the MCMC simulation (i.e. 0·6 is large relative to the estimate of *σ*).

### Results

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

We performed an analysis in which the survey lines were represented by a point coverage having 60 m spacing. To evaluate the influence of the coarseness of the line, we also considered 40 and 20 m spacings in a preliminary analysis, but the results appeared insensitive. We thus produced final analyses based on the coarsest (60 m) representation of the line.

We set the upper limit of the uniform prior for *N* at *M* = 1100, corresponding to the addition of 995 all-zero encounter histories for analysis of the model by data augmentation. For all models fitted, the posterior mass of *N* was located well away from the *M* = 1100 boundary indicating that this prior choice and consequent analysis was adequate.

For each model, we produced posterior summaries based on 104 000 posterior samples after a 30 000 burn-in sample was discarded.

#### Model selection results

Deviance results suggest the Weibull hazard is the preferred model, and multiple deviance/DIC statistics support this model. Deviance and DIC results are tabulated in Appendix S3.

#### Posterior summaries

Posterior summaries for model parameters under the Weibull hazard are summarized in Table 1. Selected posterior summaries for all 4 models are presented in Table 2. We see that posterior means for *N* under models 1–3 are very similar, and the favored model 4 (Weibull hazard) indicates a total population size for the state-space that is lower than the others. In particular, the posterior mean density is *D* = 21·12 territories per km^{2}. This compares with the observed 15 territories per nominal 1 km^{2} plot and, using model M0, territories per plot (of unknown area). The results (omitted) for the 20 and 40 m approximations deviated slightly from this estimate (21·28 and 21·08 territories per km^{2}, respectively). Thus, the coarseness of the discrete approximation to the line did not have a substantial effect on estimated density in this case.

Parameter | Mean | SD | 2·5% | 50% | 97·5% |
---|---|---|---|---|---|

σ | 0·0399 | 0·0035 | 0·0338 | 0·0396 | 0·0474 |

β_{0} | −7·7557 | 1·0126 | −9·6916 | −7·7851 | −5·5157 |

β_{1} | −1·7118 | 0·5851 | −2·6523 | −1·7716 | −0·2823 |

D | 21·1162 | 2·2685 | 17·0455 | 20·9740 | 25·9416 |

N | 650·38 | 69·87 | 525 | 646 | 799 |

Model | N | σ | ||
---|---|---|---|---|

Mean | SD | Mean | SD | |

Gompertz | 719·96 | 78·16 | 0·0403 | 0·0033 |

Squared-dist | 727·00 | 81·76 | 0·0398 | 0·0032 |

Normal p | 728·80 | 77·88 | 0·0398 | 0·0030 |

Weibull | 650·38 | 69·87 | 0·0399 | 0·0035 |

The posterior mean of *σ* under all models was approximately 0·04 km. Thus, the standard deviation of the movement outcomes of an individual around its activity centre is about 40 m. To put this in context, we can compute the distance, say B, that bounds a prescribed frequency of movement outcomes about an individual's home range center. In particular, distance-squared, *d*^{2} = ||**u**−**s**||^{2}, has a chi-square distribution on 2 degrees-of-freedom and so Pr(*d* ≤ *B*) = 1−*α* where where *q*_{2,α} is the critical value for a chi-square distribution on 2 df. For *α* = 0·05, *q*_{2,α} = 5·99 and therefore Pr(*d* < 98) = 0·95. Hence, 95% of movement outcomes are within 98 m of an individual willow tit's home range center. Clearly, individuals near the plot boundary will often go undetected because they are located off the sample plot at the time of the survey. That *σ* was not different across the four models is not surprising, because *σ* should be primarily informed by the observed values of **u**_{it}. Moreover, because the movement model parameter seems unaffected by the choice of encounter model, we believe this justifies the use of conditional deviance and DIC statistics to evaluate the relative merits of the different models.

#### Goodness-of-fit

Goodness-of-fit was evaluated using an independent run of 25 000 MCMC iterations after a 5000 burn-in. For the four models, the Bayesian *P*-values for the observation model were 0·332 (Gompertz hazard), 0·310 (squared-distance model), 0·312 (normal detection probability model) and 0·524 for the Weibull hazard model. Thus, for all models, the *P*-value is sufficiently far from 0 or 1 to suggest reasonably fitting models.

For the goodness-of-fit of the complete spatial randomness component of the model, the *P*-value was 0·54–0·55 for each of the four models, suggesting no lack of fit. That there is no apparent difference among models is not surprising, because none of the alternative models have to do with the point process. Furthermore, models about the encounter process should not exert much influence over the point locations, as most of the information about each **s** comes from the observed locations **u**.

### Discussion

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

We developed a spatial capture–recapture model for search-encounter data in continuous space, whereby we mean that an area is searched and individual encounter histories are obtained along with auxiliary spatial information about the locations of encounter. The situation considered is that in which an observer follows a path that can be characterized by a collection of points, such as obtained from a GPS tracking file. The surveyed path could be determined by a trail system, set of transects, or roads. To this point, there has not been any general model-based methodological framework for inference about density from such data. Royle & Young (2008) addressed a special case, uniform search intensity, and the model we developed here reduces to the Royle and Young model if the effect of distance on encounter probability is equal to 0 (i.e. *β*_{1} = 0 in Eqn. 1).

The model here is similar to that used for arrays of traps (Borchers & Efford 2008; Gardner *et al.* 2009). The key difference is that, in the models for conventional trap arrays, encounters can only occur at the trap locations, i.e. at discrete points in space. Conversely, in the situation we have described in this study, encounters occur at any point in space in the vicinity of the sample activity. This feature is shared with Royle & Young (2008) and also classical distance sampling methods (Buckland *et al.* 2001). As in Royle & Young (2008), the model also takes into consideration an explicit movement process which is not a feature of spatial capture–recapture models for trapping array data nor of distance sampling. Our model also has similarities with ‘individual covariate’ models (Royle 2009): the array containing individual locations, **u**_{it}, is an individual ‘time-varying covariate’ (Bonner & Schwarz 2006).

Our model is more related to classical distance sampling than is first apparent, especially to capture–recapture distance sampling (Borchers, Zucchini & Fewster 1998), in which repeated samples of transects are made and distances to objects are recorded. There are two important characteristics that distinguish among certain classes of spatially explicit capture–recapture models, including variations of distance sampling, and the search-encounter models we considered here. The first consideration is whether replicate observations of individuals is possible. In our models, we have repeated observations on individuals and this allows us to learn about the movement parameter *σ*. The problem is technically similar to the capture–recapture distance-sampling models (Borchers, Zucchini & Fewster 1998) and also distance-sampling with measurement error (Royle & Dorazio 2008, section 7.2; Borchers *et al.* 2010) with the measurement error being made on the activity center **s**. Specifically in our model, the observed locations resulting from *movements* about **s** are analogous to *errors* in measuring **s**. The second characteristic concerns the definition of the detection model and whether or not the point on the observation line, say **x**, is observed (or not) along with the location of animals. This determines the nature of suitable detection functions. For example, if **x** is observed, then we might consider functions of radial distance to individual locations or the cumulative hazard from the start of the line to point **x**. If **x** is not observed then, obviously, we cannot consider either type of model. Whether or not (as in our case) **x** is observed, we *can* consider the total hazard (as we have done). Alternatively (which we have not done) we could consider the shortest distance between the animal location and the line, as is commonly done in traditional distance sampling. One benefit of this is that it might improve the efficiency of the analysis by MCMC.

An important assumption underlying both our model and related models, such as distance sampling and ordinary spatial capture–recapture models, is that individual activity centres are *independent* of the sample route or lines. This is often not the case in distance sampling applications for birds and other species where the sample unit is chosen to be road or trail segments (see Marques *et al.* 2011). We inspected sample maps used in our analysis and believe that the assumption of independence between sample patch and activity centres is probably as well met as anywhere in practice. Although some of the sample paths follow footpaths, they are small features unlikely to affect the spatial distribution of the species we analysed. Moreover, the proportion of sample paths that follow other linear structures, such as paved roads or forest edges, appears only minor.

It is well known that distance sampling density estimators are fairly sensitive to errors in distance measurement (Buckland *et al.* 2001). We believe that our model is more robust in this respect owing to the movement process submodel. It is likely that any zero-mean measurement error will be absorbed into the movement process and resulting estimators of density (based on the number and location of the activity centers) will be unaffected. In other words, measurement error variance is strictly confounded with the movement variance component to no ill-effect so far as estimating density is concerned. It would be useful to evaluate this using a simulation study.

A number of extensions of this model appear straightforward and will no doubt prove useful. Covariates on density will be important in some applications. Using a discrete state-space representation makes this feasible (Efford, Dawson & Borchers 2009). Covariates on detection (e.g. effort, time and temperature) can be introduced into the model as additional linear terms to the hazard. Finally, open population models can be achieved using the proper formulation of the state model (Gardner *et al.* 2010). We do not see any difficulty in adapting those extensions to the specific observation modeling framework developed here.

### Acknowledgements

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

We thank two anonymous referees and the Associate Editor for many helpful comments on a draft of this manuscript.

### References

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

- 2000) Bird Census Techniques. Academic Press, San Diego. , , & (
- 1975) Discrete Multivariate Analysis: Theory and Practice. MIT Press, Cambridge, MA. , & (
- 2006) An extension of the Cormack-Jolly-Seber model for continuous covariates with application to
*Microtus pennsylvanicus*. Biometrics, 62, 142–149. & ( - 2002) Estimating Animal Abundance: Closed Populations. Springer, London. , & (
- 1998) Mark-recapture models for line transect surveys. Biometrics, 54, 1207–1220. , & (
- 2008) Spatially explicit maximum likelihood methods for capture–recapture studies. Biometrics, 64, 377–385. & (
- 2010) Estimating distance sampling detection functions when distances are measured with errors. Journal of Agricultural, Biological, and Environmental Statistics, 15, 346–361. , , & (
- 2000) Bayesian animal survival estimation. Statistical Science, 15, 357–376. , & (
- 2011) A non-technical overview of spatially explicit capture–recapture models. Journal of Ornithology, in press. http://www.springerlink.com/content/y09r4713l78758r2/. (
- 2001) Introduction to Distance Sampling: Estimating Abundance of Biological Populations. Oxford University Press, Oxford. , , , , & (
- 1991) Statistics for Spatial Data. John Wiley & Sons. (
- 2004) Density estimation in live-trapping studies. Oikos, 106, 598–610. (
- 2009) Population density estimated from locations of individuals on a passive detector array. Ecology, 90, 2676–2682. , & (
- 1950) Transformations related to the angular and square root. Annals of Mathematical Statistics, 21, 607–611. & (
- 2010a) Estimating black bear density using DNA data from hair snares. Journal of Wildlife Management, 74, 318–325. , , , & (
- 2010b) Spatially explicit inference for open populations: estimating demographic parameters from camera-trap studies. Ecology, 91, 3376–3383. , , & (
- 1996) Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6, 733–807. , & (
- 1994) A language and program for complex Bayesian modelling. The Statistician, 43, 169–178. , & (
- 1999) Fifty-year trends in a box turtle population in Maryland. Biological Conservation, 88, 165–172. , & (
- 1983) Radial-distance models for the line-transect method. Biometrics, 39, 29–42. & (
- 2010) Introduction to WinBUGS for Ecologists. Academic Press, Burlington. (
- 2009) Inference about species richness and community structure using species-specific occupancy models in the national Swiss breeding bird survey MHB. Modeling Demographic Processes in Marked Populations. (eds D.L. Thomson, E.G. Cooch & M.J. Conroy), pp. 639–656. Series: Environmental and Ecological Statistics, Vol. 3, Springer, New York. & (
- 2010) Hierarchical modelling and estimation of abundance and population trends in metapopulation designs. Journal of Animal Ecology, 79, 453–461. & (
- 2005) Modeling avian abundance from replicated counts using binomial mixture models. Ecological Applications, 15, 1450–1461. , & (
- 2011) Spatial capture–recapture density estimation using DNA-sampled data for rare and elusive animals. Conservation Biology, 25, 356–364. , , , & (
- 2008) Imperfect detection and its consequences for monitoring for conservation. Community Ecology, 9, 207–216. , & (
- 2011) Bayesian Population Analysis Using WinBUGS – A Hierarchical Perspective. Academic Press, Waltham, Massachussets, USA. & (
- 2011) Point transect sampling along linear features. Biometrics, in press. , , , & (
- 2009) Comparison of hierarchical Bayesian models for overdispersed count data using DIC and Bayes’ factors. Biometrics, 65, 962–969. (
- 2009) Analysis of capture–recapture models with individual covariates using data augmentation. Biometrics, 65, 267–274. , (
- 2008) Hierarchical Models and Inference in Ecology: The Analysis of Data from Populations, Metapopulations and Communities. Academic Press, San Diego, CA USA. & (
- 2011) Parameter-expanded data augmentation for Bayesian analysis of capture–recapture models. In press, http://www.springerlink.com/content/p014263x3121722k/. & (
- 2007) Analysis of multinomial models with unknown index using data augmentation. Journal of Computational and Graphical Statistics, 16, 67–85. , & (
- 2009) Bayesian inference in camera trapping studies for a class of spatial capture–recapture models. Ecology, 90, 3233–3244. , , & (
- 2007) A Bayesian state-space formulation of dynamic occupancy models. Ecology, 88, 1813–1823. , & (
- 2005) Modeling occurrence and abundance of species with imperfect detection. Oikos, 110, 353–359. , & (
- 2008) A hierarchical model for spatial capture–recapture data. Ecology, 89, 2281–2289. & (
- 2004) Überwachung der Bestandsentwicklung häufiger Brutvögel in der Schweiz. Swiss Ornithological Institute, Sempach, Switzerland. , & (
- 2002) The Estimation of Animal Abundance. Blackburn Press, London, UK. (
- 1999). Hazard models for line transect surveys with independent observers. Biometrics, 55, 29–36. & (
- 2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64, 583–639. , , & (
- 1987) The calculation of posterior distributions by data augmentation. Journal of the American Statistical Association, 82, 528–540. & , (
- 2002) Analysis and Management of Animal Populations. Academic Press, San Diego. , & (

### Supporting Information

- Top of page
- Summary
- Introduction
- Sampling situation
- Hierarchical model
- Bayesian analysis by MCMC
- Analysis of the MHB data
- Results
- Discussion
- Acknowledgements
- References
- Supporting Information

**Appendix S1.** Simulating Data and WinBUGS Analysis.

**Appendix S2.** Development of an MCMC Algorithm in the R Software.

**Appendix S3.** Model Selection based on DIC.

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

Filename | Format | Size | Description |
---|---|---|---|

MEE3_116_sm_Appendices.pdf | 175K | Supporting info item |

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.