Hierarchical spatial capture–recapture models: modelling population density in stratified populations



  1. Capture–recapture studies are often conducted on populations that are stratified by space, time or other factors. In this paper, we develop a Bayesian spatial capture–recapture (SCR) modelling framework for stratified populations – when sampling occurs within multiple distinct spatial and temporal strata.
  2. We describe a hierarchical model that integrates distinct models for both the spatial encounter history data from capture–recapture sampling, and also for modelling variation in density among strata. We use an implementation of data augmentation to parameterize the model in terms of a latent categorical stratum or group membership variable, which provides a convenient implementation in popular BUGS software packages.
  3. We provide an example application to an experimental study involving small-mammal sampling on multiple trapping grids over multiple years, where the main interest is in modelling a treatment effect on population density among the trapping grids.
  4. Many capture–recapture studies involve some aspect of spatial or temporal replication that requires some attention to modelling variation among groups or strata. We propose a hierarchical model that allows explicit modelling of group or strata effects. Because the model is formulated for individual encounter histories and is easily implemented in the BUGS language and other free software, it also provides a general framework for modelling individual effects, such as are present in SCR models.


Capture–recapture models are widely used in ecology and wildlife management to estimate the size of animal populations (Williams, Nichols & Conroy 2002). A common situation in many ecological studies concerns the case where the population is divided into spatially- or temporally referenced subpopulations (Converse & Royle 2012). This is a frequent characteristic of animal population studies because multiple units are necessary to obtain a sufficient sample of individuals. In addition, replication is often crucial to the scope of inference. For example, ecologists are interested in understanding how populations respond over time and space to variation in processes determining population dynamics, such as habitat quality; a special case is the experiment or quasi-experiment. Such studies entail replication of capture–recapture experiments in space and time. A case in point is a typical study of how small-mammal populations respond to forest management (Converse, White & Block 2006).

An important problem in interpreting estimates of N from capture–recapture studies is that, in most practical applications, the area over which individuals are exposed to trapping is not well defined (Karanth & Nichols 1998; Efford 2004). As a result, the biological meaning of estimates of N is usually unclear – does inline image apply to 1 ha, 5 ha or 10 ha? Lacking a mechanism to enforce formal closure on a land area, there is no rigorous method for converting estimates of N from ordinary capture–recapture models to density. In a related fashion, when trying to estimate treatment effects, if the size of animals’ home ranges varies across replicates (which may result from the treatments themselves, Converse, White & Block 2006), the effective area sampled essentially varies, and inference based on the abundance estimates is dubious. However, a number of methods have been developed that attempt to convert estimates of N to estimates of density. For example, the conventional approach (Karanth 1995; Karanth & Nichols 1998) involves using ordinary (non-spatial) closed population capture–recapture models (Otis et al. 1978) to estimate population size, N, and then dividing that by a quantity asserted to represent ‘effective sample area’ or similar. This is obtained by buffering the trap array by some prescribed function of home range radius (e.g. mean maximum distance moved; Wilson & Anderson 1985).

Recently, more formal approaches to estimating density based on spatial capture–recapture (SCR) methods have been developed (Efford 2004; Borchers & Efford 2008; Royle & Young 2008; Borchers 2012). SCR models regard individual location explicitly in the model, as a latent individual covariate, and model the encounter probability between individual activity centres and specific traps as a function of distance. In doing so, SCR models accommodate heterogeneity in encounter probability due to the juxtaposition of individuals with traps, and, by associating individuals with explicit locations, provide a framework for direct inference about density.

Despite increasing interest in SCR models, most extant applications arise from a study of a single population. However, capture–recapture studies, in many cases, are based on a number of sampling arrays spread out geographically over an experimental region and replicated in time. For example, this is typical of small-mammal trapping (Efford et al. 2005) as well as some large-scale monitoring programmes based on camera traps (Karanth & Nichols 2002; Jhala, Qureshi & Gopal 2011) and constant-effort mist-netting schemes (DeSante et al. 1995). Therefore, integrating data from stratified capture–recapture studies is often paramount to the main inference objective. Here, we provide a formal Bayesian SCR framework for experimental and other situations involving replicated grids of traps or encounter devices. This extends the hierarchical capture–recapture models (Converse & Royle 2012) to accommodate SCR models. We adopt a Bayesian estimation and inference framework and provide implementations in the freely available software R (R Core Team 2012) and JAGS (Plummer 2009). We provide an application involving a small-mammal trapping study from Converse, White & Block (2006).

Data and models

Data from a capture–recapture study are individual encounter observations, which we denote by yijk for individual i, trap j, and sample occasion k, where yijk = 1 if individual i was captured in trap j during sample occasion k and yijk = 0 otherwise. The vector of observations for individual i during occasion k (the spatial encounter history) is given by yik = (yi1k,yi2k,…,yiJk). We suppose that sampling was carried out at g = 1,2,…,G populations (groups, or strata), which were sampled by distinct trap arrays and so we have G data sets of individual encounter histories. The groups here represent distinct geographic strata or sample units or possibly geographic units crossed with temporal replication such as season or year.

Let Ng be the size of the population for group g, and let NT = ∑gNg be the total population of individuals across all groups. Conceptually, this is a super-population of individuals comprised of the union of all populations that were sampled. For each captured individual, let gi indicate the group to which individual i belongs, for each individual i = 1,2,…,n. where inline image is the sum of the group-specific sample sizes.

In the following sections, we describe a hierarchical model for this group-structured capture–recapture situation. The model consists of two main components: (i) a model for the observations yik conditional on the latent population size of each group, which describes the probability of encounter of individuals in traps; and (ii) a model for the latent abundance variables Ng that describes variation in abundance among groups. For group-structured data, specification of certain models for group-specific population sizes induces a specific model on the ‘group membership’ variables gi (Royle, Converse & Link 2012). In what follows, we provide explicit spatial context for the populations being sampled, in which case there will be a direct correspondence between population size and density.

Spatial capture–recapture observation model

An important problem in capture–recapture studies is that the population size of each group, Ng, is unknown and, as such, the group membership of uncaptured individuals (i.e. the value of the latent variable gi) is unobserved for such individuals. We deal with this using an MCMC method known as data augmentation, as described below in 'Bayesian analysis'. For now, we describe a model for how individuals are encountered by traps that is conditional on knowing to which population every individual belongs. We accommodate that some of the group membership variables are unknown shortly, by putting a prior distribution on gi.

There are a number of distinct observation models that arise in SCR studies including binomial, multinomial, Poisson and models of acoustic signal strength (Dawson & Efford 2009). A standard type of model that applies to detection devices such as hair snares (Borchers & Efford 2008) is that in which the yijk are independent Bernoulli trials so that an individual can be captured in any number of the J traps during a sample occasion. In that case, the probability of encounter in trap j is modelled by some function of the distance between trap j (a two-dimensional vector xj), and the individual activity centre si, which is regarded as a latent variable (i.e. random effect) in the model (Borchers & Efford 2008; Royle & Young 2008). For example, a common model is the ‘half-normal’ model in which probability of encounter in a trap is proportional to a bivariate Gaussian density of the two-dimensional coordinate of encounter x:

display math

Or, equivalently, we can express this as a linear function on a suitable scale:  log(pijk) = α0 + α1dist(xj,si)2 where α0 =  log(p0) and α1 = −1/(2σ2).

For standard live traps – also called ‘single-catch’ traps (Efford 2004), an individual can be captured in at most one trap. Then, the vector yik = (yi1k,yi2k,…,yiJk,yi,J + 1,k), where the last element yi,J + 1,k corresponds to ‘not captured’, contains a single 1 and the remaining values are 0. This (J + 1)×1 vector yik is a multinomial trial:

display math

where πik is a (J + 1) × 1 vector where each element represents the probability of being encountered in a trap (for elements 1,…,J) or not captured at all (element J + 1). For this multinomial model, we model the encounter probability vector as a function of distance between trap locations and individual activity centres, but for this case, we use the multinomial logit transform. The equivalent half-normal model is:

display math(1)

where α1 = −1/(2σ2) and σ is the scale parameter of the half-normal encounter probability model. Then,

display math

for each j = −1,2,…,J, and the last cell corresponding to the event ‘not captured’ is:

display math

This independent multinomial observation model is a misspecification of the true observation model for the single-catch trapping system. This is because competition for single-catch traps renders the independence assumption invalid. As Efford, Borchers & Byrom (2009) noted, we expect ‘bias to be small when trap saturation (the proportion of traps occupied) is low. Trap saturation will be higher when population density is high...’, relative to trap density, or when net encounter probability is high. Efford, Borchers & Byrom (2009) conducted a simulation study that indicated essentially no effective bias by the misspecification, concluding that estimators of density from the misspecified independent multinomial model are robust to the mild dependence induced when trap saturation is low. Conversely, properly specifying the likelihood for the single-catch system is challenging and, so far, has eluded formal characterization by researchers. Adequate approximation of the single-catch system by the independent multinomial model can be facilitated by placing 2 or more traps at each point so as to reduce the risk of saturation.

It is easy to build additional covariates into this model including those that vary by sample occasion. For example, to model a behavioural effect (which we do in the example below), let Cik be a covariate of previous encounter (i.e. Cik=0 before the occasion of first capture, and Cik = 1 thereafter), then

display math

We note, in this case, the multinomial probabilities depend not only on individual and trap, but also on sample occasion.

Activity centre model

The latent variables si for each i = 1, 2, …, N are the home range or activity centres of each individual. They are regarded as latent variables, which we accommodate in analysis of the model using standard methods of 'Bayesian analysis', although we note that SCR models can be fitted also by standard likelihood methods for analysing random effects. We assume the activity centres are uniformly distributed in the vicinity of the trap array (Borchers & Efford 2008; Royle & Young 2008).

Modelling group structure and population size

To extend the activity centre model to the context of group-structured data, let inline image denote the two-dimensional state-space of the random variable s for an individual belonging to group g = 1, 2, …, G. Normally, this will be a polygon around trap array g. Then, conditional on knowing which group individual i belongs to, that is, the value of gi, we suppose that inline image. That is, given that individual i has group membership gi, then its activity centre si is uniformly distributed over the state-space inline image.

The state-space inline image for each grid should be chosen so that individuals having activity centres near the edge of the state-space have a negligible probability of encounter, which can be determined based on inspecting the magnitude of the parameter σ.

It is our primary objective to develop models of variation in the group-specific population size parameters Ng (one for each of the G groups) using the individual level encounter observations yijk from each sample group. In the context of ordinary closed population models, Royle, Converse & Link (2012) showed that choice of certain models for population size parameters Ng implies a specific prior distribution for the group membership variables, gi. Therefore, SCR models can be formulated in terms of individual encounter histories and these individual effects (gi and si).

As a specific illustration, a natural model for variation among sample units is to assume Ng∼Poisson(Agλg) where Ag is the area of the state-space inline image, and we model factors that influence abundance on the local density parameter λg. Assuming, without loss of generality, that Ag = 1, then the standard log-linear model is:

display math

where xg is some group-specific covariate (e.g. a treatment effect). We could allow for excess-Poisson variation by including an additive random effect or similar (we do this in our application below). We assume here that the population sizes Ng are mutually independent of one another, which should preclude individuals belonging to >1 population. In practice, this would suppose that the trapping grids are spaced sufficiently far apart, and sampling is sufficiently short in duration, so as to preclude movement of individuals among populations.

An equivalent way to parameterize this Poisson abundance model (Converse & Royle 2012; Royle, Converse & Link 2012) is in terms of the total population size inline image, and then any set of the G−1 population size parameters. In other words, note that the joint distribution of the Ng parameters conditional on their total is multinomial with ‘sample size’ (or multinomial index) NT = ∑gNg and cell probabilities θg where

display math

This representation of the model is convenient for computational reasons we discuss below.

Bayesian analysis

We use an implementation of data augmentation (Tanner & Wong 1987) for capture–recapture models (Royle, Dorazio & Link 2007; Royle & Dorazio 2012; Kéry and Schaub 2012, ch. ), which provides a flexible framework for analysing capture–recapture models with individual effects. In the present context, the model contains a number of latent variables including the individual activity centres s and the categorical variable g (‘group membership’). A technical challenge faced in the 'Bayesian analysis' of such models is that the dimension of the parameter space (i.e. total population size, NT) is also an unknown parameter. Data augmentation fixes the parameter space dimension and also provides a convenient implementation of the model in the BUGS language. As a practical matter, analysis by data augmentation allows us essentially to ‘stack up’ all of the data sets together, that is, from the G distinct trap arrays, and analyse a single, large data set.

The fact that the total population size NT is unknown is addressed using data augmentation by augmenting the data set with a large number of all-zero encounter histories, bringing the total size of the data set up to M >> NT where M is fixed by the analyst (Royle, Dorazio & Link 2007). The excess zeros are accommodated by expanding the model to include the data augmentation variables zi where zi = 1 if individual i (from the augmented list) is a member of the population of size NT or zi = 0 if the individual is not. In effect, these latent variables partition the augmented zeros into sampling zeros and fixed or structural zeros. Formally, we impose the prior: zi∼Bern(ψ), with ψ the parameter to be estimated, and ψ takes the place of NT but the two are equivalent parameters in the sense that E[NT] = ψM. In addition to the data augmentation variables z, we assume the group membership variables are categorical random variables:

display math

where θg = λg/∑gλg. Altogether, these assumptions preserve the correct marginal prior for Ng, that is, this model assumption for gi implies that the collection of population size parameters {Ng} has a multinomial distribution, conditional on the total NT. Equivalently, we can regard the Ng as iid Poisson random variables with mean λg.

With data augmentation, we need to choose M sufficiently large so that the posterior mass of ψ is not too near 1·0 (see Kéry and Schaub 2012, Fig. 6.2). We implemented this model in JAGS (Plummer 2009) via the R programming environment and the R2Jags library. We provide an R script in the appendix.


We applied this model to SCR data on small mammals described by Converse, White & Block (2006). The data were collected using standard live traps, and thus, they represent the type of multinomial ‘single-catch’ traps described in 'Data and models', although we note that a subset of the sample locations contained 2 traps, a design feature that could be applied generally to mitigate trap saturation and resulting non-independence. The data were collected as part of an effort to understand the impacts of fuel reduction treatments on small-mammal populations at 2 replicate study sites in northern New Mexico (Fig. 1; the Jemez Mountains Study area of the National Fire and Fire Surrogate Study; McIver, Boerner & Hart 2008), with trapping over 3 years (2001–2003) in each of 4 replicate experimental units per study site (i.e. number of groups G = 24, 8 units by 3 years, in this example). The experimental design included plans for thinning, burning and thinning–burning combination treatments, as well as a control, at each study site. However, during the period when these data were collected, the thinning only treatment was completed on a single experimental unit at the JM-B study area (see Converse, White & Block 2006: 1713), and at the JM-C study area, all four experimental units were burned in a wildfire. Both the thinning treatment and the wildfire took place between the 2002 and the 2003 study seasons.

Figure 1.

Jemez Mountains Study Area from Converse et al. (2006).

Trapping was conducted over 10 occasions (2 per day, morning and evening) at each experimental unit, with half the units at each site randomly selected in the first year for trapping in trapping session 1, which lasted 5 days, and half in session 2, an additional 5 days. The assignment to session then alternated over years. In 2001, the traps in each experimental unit were configured in a 6 × 6 grid, with 50 m between each trap. After a pilot project to assess the effects of trap spacing (Converse et al. 2004), the trap density was increased such that there were 25 m between traps, and so the grid was an 11 × 11 grid with 121 total trap stations. Multiple species were captured in the trapping grids, but we base our analyses on the genus with the largest number of captures, Peromyscus spp; this was primarily composed of the deer mouse (Peromyscus maniculatus).

The detection model is related to covariates through the multinomial logit transform in which the trap-specific encounter probabilities are given by Eq. 1. In the application, we have

display math

where dij ≡ dist(si,xj), α0,1,…,α0,G are group-specific intercepts, which we treat as random effects here, α1 is the behavioural response parameter, Cik is a covariate of previous encounter (i.e. Cik = 0 before the occasion of first capture, and Cik = 1 thereafter), and inline image is a group-specific coefficient on distance (related to inline image by: inline image), allowing for the possibility that treatments influence home range size. The covariate AMk is an indicator of whether the sample occurred in the morning (AMk = 1) or in the evening (AMk = 0). Therefore, the coefficient α3 represents the change in encounter probability (on the multinomial logit scale) for an overnight trap session. Because deer mice are nocturnal, capture probability is expected to be much higher in morning than in the afternoon.

To accommodate differences in trap array configuration (e.g. 6 × 6 vs. 11 × 11 grids), we introduce a trap-operation matrix, a where inline image if, for group g, trap j is operational during period k and inline image otherwise. Then, we include trap availability as multiplying  exp(ηijk) so that, in the multinomial logit transform, the cell probability is zeroed out for an inoperative trap. A similar approach could be used if, in practice, certain traps were present but not operational during certain occasions. This could occur, for example, if traps were sprung or damaged by animals.

For the abundance model, we assume that Ng is Poisson with mean

display math

where β0,site is a random block effect that applies to each site and inline image is a vector of population-specific covariates. In our analysis here, xg = (year1g,year2g,thing,fireg) where year1 and year2 are dummy variables indicating years 2001 and 2002), that is, year1g = 1 if group g occurred in 2001, year2g = 1 if group g occurred in 2002; thin and fire are binary treatment effects being thing = 1 if group g was a thinned experimental unit, and fireg = 1 if group g was a burned experimental unit.

We used proper uniform prior distributions for each of the regression coefficients: βm∼Unif(−10,10) for m = 1,2,3,4, α1∼Unif(−10,10), and α2∼Unif(−10,10). For the group-specific intercept parameters β0,site, we assumed:

display math

with inline image. The mean of the normal distribution for β0,site is 0 because the intercept of the abundance model is confounded with the data augmentation parameter ψ. That is, ψ is providing the information on the total abundance, which is equivalent information to the intercept in the abundance model (Royle, Converse & Link 2012). The effect of this site-specific random effect is to induce extra-Poisson variation in the group-specific abundance parameters Ng. While it is convenient to use the normal distribution on the  log(λ) scale, as we do here, a negative-binomial model for Ng could be constructed by including log-gamma noise in the model for  log(λ) (Royle, Converse & Link 2012).

For the group-specific intercept parameter α0, we assumed them to be independent with normal prior

display math

and flat priors on the hyperparameters μp and standard deviation: μp∼Unif(−10,10), inline image. We assumed a normal prior for α2,g also, having parameters inline image and standard deviation inline image.

Posterior summaries of model parameters are shown in Table 1, where we see a positive response of deer mouse population density to both thinning (β2) and wildfire (β3). A graphical summary of group-specific density estimates is shown in Fig. 2. There were also reasonably strong annual effects on density. Overall density of Peromyscus spp., across all groups, was estimated to be 2·5 per ha. The conclusion that both thinning and fire had a positive effect on density of deer mice supports the general findings of Converse, White & Block (2006). We also found strong trap-happy responses (i.e. animals that had been trapped previously had a higher capture probability, see α1, in Table 1).

Table 1. Posterior summaries (mode, 2·5 and 97·5 percentiles) for parameters of the single-catch spatial capture–recapture model, and the abundance model describing variation in population size among groups, for the joint estimation and modelling of density of Peromyscus spp. on experimental units at the Jemez Mountains Study Area, New Mexico. See text for explanation of parameters
ParameterEstimate95% Lower95% Upper
  1. a

    Only 2 fixed season effects are separately estimable. The third effect = −1*(β1[seas1]+β1[seas2]).

  2. b

    Overall population size is the total across all 24 groups, each with an implied area = 12·25 ha.

  3. c

    Overall density is reported as individuals/m2.

Observation Process
inline image−1·27−1·50−1·08
inline image0·440·340·68
α1 (behave)1·301·101·53
α3 (AM)3·913·684·15
Ecological Process
β [seas 1]−0·56−0·74−0·37
β [seas 2]−0·17−0·360·00
β [seas 3]a0·740·530·93
β [fire]0·490·190·87
β [thin]0·630·141·07


Spatial capture–recapture models are a relatively new class of models for inference about animal density and other population parameters (Efford 2004; Royle & Young 2008; Borchers 2012). However, SCR data are not always collected as single isolated studies. Instead, frequently a number of replicate trap arrays are used. Often this is motivated by specific objectives, for example, the trap arrays represent experimental replicates, and oftentimes just to ensure valid estimates of density by obtaining a representative sample of space within some region. From a modelling and inference perspective, there is a need to combine data from multiple arrays or sites in a single unified model that accommodates explicit sources of variation in density among sites. This is naturally accomplished by developing an explicit model for variation in N among groups, for example, a Poisson GLM or similar.

Figure 2.

Abundance estimates for Peromyscus spp. per experimental unit (with area = 5·0625 ha) for each of 24 groups composed of eight experimental units in year 1 (groups 1:8), and the same eight experimental units in year 2 (groups 9:16) and year 3 (groups 17:24) at the Jemez Mountains Study Area, New Mexico. Point estimates (filled circles) are posterior modes, and error bars reflect 95% credible intervals. Also shown are the number of individuals captured per group (open circles).

In this study, we extended SCR models to allow for modelling variation in N with explicit assumptions on N. We adopted a data augmentation strategy for structured populations (Converse & Royle 2012; Royle, Converse & Link 2012) and extended this to the SCR observation model, and applied that model to data from a study of forest disturbance effects on small-mammal populations. Thus, our analysis combines two important technical features, hierarchical modelling of replicate capture–recapture studies, and SCR, in one synthetic analysis for jointly estimating and modelling densities across replicates. Implementation in a Bayesian framework allows for modelling of individual effects (and hence makes SCR possible) but also facilitates efficient modelling of nuisance variation via hierarchical structures (e.g. on detection parameters, block effects or time effects).

Our small-mammal trapping case study comes from Converse, White & Block (2006), who used a three step process to complete the analysis of these data: first a closed capture–recapture analysis to estimate abundance, second an analysis of mean maximum distance moved (Wilson & Anderson 1985) to allow conversion of abundance to density, and third a weighted regression analysis of the resulting density estimates. The weighted analysis was necessary to accommodate the non-zero sampling covariances resulting from the first 2 steps. The analysis shown herein is both more streamlined and also integrates the improvements that SCR methods bring to the analysis of capture–recapture data. In addition, the 'Bayesian analysis' we present makes the use of hierarchical structures simple, such as the random effects for modelling variation in components of detection. Converse & Royle (2012) showed that the use of random effects for modelling variation in components of detection provides a good compromise between model complexity and parsimony, and can result in the lowest root mean square error in analyses of replicated capture–recapture data.

The R package secr (Efford 2011) implements an estimator for ‘multiple sessions’ that can be applied to data from multiple trap arrays or other meaningfully stratified data. The multisession model in secr arises by an explicit Poisson assumption on N, but uses a classical likelihood analysis in which the parameters Ng are removed from the likelihood by marginalizing the conditional on N likelihood over a Poisson prior (Borchers & Efford 2008). One advantage of our Bayesian formulation based on data augmentation is that it enables direct implementation in widely available software (WinBUGS, JAGS, OpenBUGS) and is more versatile in terms of the model specification. For example, here, we allowed for multiple group-specific random effects in the detection model, which is not accommodated in the secr package.