Animal abundance is traditionally estimated using methods based on visual observations. Therefore, methodological development has focussed on visually acquired data. Abundance and density estimation methods based on visual data build almost exclusively on one of two different inferential approaches: mark-recapture (MR) and distance sampling (DS). A recent development, spatially explicit capture-recapture (SECR), blends the two methods. Note that we use the terms ‘capture-recapture’ and ‘mark-recapture’ interchangeably; and also that the animals are often not strictly captured or recaptured (e.g. detection via camera traps, hair snares or acoustic sensors). The key for MR is the ability to recognize whether an animal has already been detected or whether the detection represents a first encounter.
(a) Census and plot sampling
Ideally, one would like to count all the animals in the target population, i.e., implement a census. However, situations in which this is possible are rare, and usually require small populations occupying restricted areas. Hence, to obtain abundance estimates, investigators must often rely on sampling.
Although a total count of the population is seldom possible, it might still be possible to perform a total count over some randomly chosen plots. This will allow density estimates for the survey area to be obtained using conventional sampling methods. These are often referred to as strip transects or plot sampling. However, these methods are often abused, being applied to situations where the key assumption, that all animals in the survey plots are detected, is false. This leads to an underestimation of density.
Plot sampling is usually a design-based approach: sampled plots are assumed to be a random sample of a larger number of plots, and hence the density estimated over these is valid for the wider survey area. The abundance over the entire survey area needs to account for the proportion of the area surveyed (assuming a simple random sampling scheme). Alternatively, one could consider a model-based approach, where inferences over the wider survey region are based on a model which relates abundance to covariates. Hence the distinction between a design-based and model-based approach is that in the former the known properties of the random design are used to link what was observed in the sample to the rest of the study area, while in the latter a model of animal distribution is used to make this link (e.g. regression approach in which density is predicted as a function of covariates).
(b) Distance sampling
The probability of detecting an animal typically depends on its distance from the observer or sensor. A statistical method called ‘distance sampling’ uses detection distances to estimate the area effectively searched, or equivalently the average probability of detection within some fixed truncation distance (Buckland et al., 2001, 2004). This is then used to correct the observed number of individuals, or groups, for those that went undetected. The methods rely on the random placement of a sufficiently large number of line or point transects over the area of interest. Typically a systematic design is used to enforce good coverage of the entire area. The distances to the detected animals are used to model a detection function. The detection function, g(y), represents the probability of detecting an animal, given that it is located at distance y from the transect. The distance y corresponds to a perpendicular or radial distance depending on whether line or point transects are used. It can be shown (see, e.g. Borchers & Burnham, 2004, pp. 16–17) that the average probability p of detecting an animal in the covered area is given by
where w is a distance beyond which detections are ignored or assumed not to occur, usually referred to as the truncation distance, and π(y) is the distribution of distances to all animals, detected or not. Note this is an intuitive estimator, as it just represents the mean value of g(y) with respect to the available distances y. This p is then plugged in an estimator like the one presented in Equation (2).
A particular type of distance sampling is cue counting (Hiby & Ward, 1986), in which instead of detecting animals, one detects cues produced by them. This was originally developed for estimating whale density from whale blows, and is useful in general when it is possible to detect and count some cue (such as whale blows) but hard to determine which individual produced which cue. Instead, the density of cues is estimated (e.g. whale blows per unit area per unit time), and this is divided by an independently derived estimate of the average rate at which an animal produces cues (e.g. number of blows per unit time). In the original implementation, data on whale blows were collected along line transects, but only a radial sector ahead of the ship was surveyed, such that the methods are more closely related to point than line transects. Cue counting has also been applied to aural surveys of birds from points, where the cue is each individual call or song detected (e.g. Buckland, 2006). Again, independent information about cue rate (i.e. average number of calls or songs per unit time) is required to convert cue density to bird density.
Another approach applies when animals naturally occur in clusters, and the clusters become the object of analysis. Traditionally, the approach taken has been to obtain a density of clusters and then multiply that by an estimate of the mean cluster size in the population (see, e.g. Buckland et al., 2001, pp. 71–76). Larger clusters are often easier to detect than smaller ones, leading to a potential bias when determining population mean cluster size; this is often dealt with using multiple covariate distance sampling (MCDS, see Marques & Buckland, 2003; Marques et al., 2007).
Unbiased estimation from conventional distance sampling methods requires a number of assumptions, which we address in turn. Often overlooked, but strictly an assumption, the above formula is only useful because the distribution of animals π(y) is assumed to be known: uniform for line transects, and triangular for point transects. These distributions stem from the available area from the samplers as a function of distance, and are a direct consequence of transects being placed randomly within the study area; hence a safe assumption for proper survey designs. However in some poorly designed surveys, transects are placed along existing landscape features, like roads, rivers or shorelines. In this case, because the animals might also present a density gradient with respect to these features, the form of π(y) is unknown and might not be estimable from the conventional data. This leads to potentially severely biased estimates (e.g. Marques et al., 2010).
Additionally, we assume that: (i) animals on the line or at the point are detected with certainty, i.e. g(0) = 1; (ii) the animals do not move or the observation process is conceptually a snapshot, i.e. instantaneous in time; (iii) distances are measured without errors; and (iv) detections are statistically independent events. Methods are robust to the violation of some of these assumptions, in particular 4 (which generally only affects variance estimates). In the case of other assumptions, mild violation is unlikely to lead to serious problems, but moderate or severe violation can lead to considerable bias and should be avoided. Investigators should spare no effort to fulfil these assumptions at the study design and field methods level, rather than dealing with them at the analysis stage. We address the consequence of their failure in turn below. If a fraction of animals on the line are not detected [g(0)<1], then density estimates are proportionally low. This is the case both if observers fail to detect animals available for detection (perception bias), or if there is a fraction of the animals not available to be detected, say submerged or underground (availability bias). To address this assumption failure, in particular for perception bias, mark-recapture distance sampling methods have been developed that allow the estimation of g(0) (Laake & Borchers, 2004).
Although strictly a snapshot means an ‘instant’ in time, a period of time of negligible length, in practice what is required is that the period is such that animal movement is negligible within the time interval. If observers move considerably faster than the animals themselves, then bias from this source can be safely ignored. However, for highly mobile animals and in particular for point transects (in which by definition the observer stands still), even random movement can lead to considerable overestimation of density. Perhaps even more important, severe bias might result from unobserved responsive movement, typically overestimation of density if animals are attracted to the observer, and underestimation of density if animals avoid the observer. This assumption has received less attention in the literature, likely because it is difficult to obtain information about movements of unobserved animals.
The consequence of measurement error in estimated distances is very similar to that of animal movement. Random errors will typically lead to an overestimation of density (Marques, 2004), while underestimation and overestimation of distances will lead respectively to overestimation and underestimation of density. Provided the measurement error process can be modelled, this bias can be corrected (e.g. Borchers et al., 2010).
The independence assumption is required to estimate the parameters of the detection function model by maximum likelihood, but density estimates are extremely robust to its failure. While variance estimates are more likely to be affected, the recommended procedures, using an empirical estimator for the variance, are also very robust to this assumption failure (e.g. Buckland et al., 2001).
A conceptually different approach to abundance estimation is mark-recapture (MR). Chapter 6 in Borchers et al. (2002) presents an overview of simple MR. This method requires the ability to recognize individuals within the population being studied. Historically done by marking the animals in some way, increasingly other methods of individual recognition such as photographic identification and genetic markers are being used. In the context of acoustic surveys, individual vocalizations sufficiently distinct to allow individual recognition would be required. The fundamental concept underlying MR is intuitive. One collects a sample of n animals and marks them, hence an unknown fraction of animals, n/N, becomes marked. A second sample is drawn. Given random animal mixing between samples, the proportion of marked animals in the new sample p is an estimate of the proportion of marked animals in the population. Hence, an estimate of population size is given by =n/p. This is called the Lincoln-Petersen estimator, but is rarely used nowadays. It has a number of unrealistic assumptions, namely that the population is closed (i.e. no deaths, births, immigration and emigration occur between capture occasions) and that all animals have the same probability of being captured (detected). When the latter is not true, estimates are biased low, and this is known as unmodelled heterogeneity in capture probabilities (see e.g. Link, 2003, for details and examples). Because not all animals have the same characteristics, some are more detectable than others. Hence, the sampled animals tend to be biased towards the more detectable animals, animal detection probability tends to be overestimated, and abundance underestimated. MR methods have evolved from purely closed population models to methods capable of dealing with open populations and incorporating multiple sources of heterogeneity in detection probabilities (hence reducing, but not really solving (Link, 2003), the issue of unmodelled heterogeneity). Nowadays MR methods are perhaps more commonly, and certainly less controversially, applied to obtain other relevant ecological parameters rather than abundance, such as survival.
Population size estimates derived from MR are not easily converted to density estimates, because the population being sampled is ill defined under most settings. The problem is that there is no rigorous way to assess the area that the sampling effectively covers. Hence, we have an estimate of N, but not the area it corresponds to (see e.g. Efford, 2004, for details). The use of conventional MR estimates for density estimation therefore tends to be a distant second choice, but is presented here because it provides a logical building block leading to the next method, spatially explicit capture recapture.
Note the close links between MR and DS; a combination of these two approaches, mark-recapture-distance-sampling (MRDS; for details and other references see Laake & Borchers, 2004), might help to address issues that neither of them can alone, by accounting simultaneously for availability bias and heterogeneity in detection probability (due to distance, and other relevant covariates).
(d) Spatially explicit capture recapture
The recent development of spatially explicit capture recapture (SECR; Efford, 2004; Borchers & Efford, 2008; Royle & Young, 2008) was motivated by two key issues in MR: (i) unmodelled heterogeneity in detected animals (i.e. not all animals have the same probability of being detected), and (ii) an ill-defined population (i.e. the surveyed area is defined ad hoc in MR). In SECR, the available information about the spatial location of the ‘captured’ animals (at the very least, the location of the ‘traps’ in which they are captured or detected) allows one to minimize issue (i) and resolve issue (ii) above. In the acoustic context, sounds are detected in multiple devices, rather than the same animal being detected over multiple ‘traps’. SECR combines both capture recapture and distance sampling models in a unifying framework (Borchers, 2012).
SECR was originally developed in the context of trapping studies of small mammals. A central concept is that of ‘home range’ centre: the home range does not need to have a biological meaning, and its centre is typically not observed. The probability of capturing an animal is modelled in terms of the distance from the traps to this unobserved location. This model is then used to obtain the detection probability associated with any given animal location. Because the home range centre for each individual is unobserved, it must be integrated out of the process. In layman's terms, it is equivalent to calculating an average detection probability, in which the average is with respect to all the positions where the animal's home range centre could be. The standard assumption is a uniform distribution in space, and the sum of the detection probabilities over space turns out to be the effective sampling area of a given set of traps, provided this assumption holds (Borchers, 2012). Part of the unmodelled heterogeneity in conventional MR often results from some animals being more likely to be detected because their home range centres are close to traps. The explicit inclusion of the location of the traps into the estimation procedure allows one to account for that component of heterogeneity. Further, the methods are based on a model for density that allows calculation of the effective sampled area for a given array of traps, and therefore density estimates can be obtained in a rigorous framework.
Different types of trap can be considered, and SECR methods have been applied to cage traps, hair snares, camera traps and acoustic detectors. Acoustic detectors are known as ‘proximity’ detectors: capture in one detector does not invalidate capture in any other detector (unlike a cage-trap, in which any one animal can only be trapped in one trap at each capture occasion). This opens the door to SECR estimates based on a single capture occasion (see Efford, Dawson & Borchers, 2009), which was impossible with conventional MR methods. The basic data for SECR are capture histories (i.e. vectors coding when and where each animal was captured). Hence it is required that animals, or their cues, can be individually recognizable.
(f) Variance estimation
Often overlooked, precision measures for density estimates are as important as point estimates, because only then can one draw meaningful inferences from the reported values. The same point estimate for a given population, say 1000 individuals, will have very different meaning if the respective 95% confidence interval is (900, 1200) versus (50, 10000). Therefore, reliable estimates of precision must be obtained. A useful and often reported precision measure is the coefficient of variation (CV, the standard error of the estimate divided by the estimate), which provides a measure of precision independent of the scale of the measurement units.
There are two general approaches one can take to estimate the variance of some arbitrary estimator, which we present in turn below.
(i) Analytic variance estimation
The density and abundance estimators we consider in this section are the product of a number of random components (rm, m = 1,2,…,M) and constants (qk, k = 1,2,…,K), i.e. having a generic form
Typical constants relate to effort (e.g. recording time, number of sensors or line length) while the most obvious random component is the detection probability. Constants are manageable, as by definition they have no variance. If the variance in each of the random components can be quantified and the random components are independent, then
(this is an approximation based on the delta method; see e.g. Seber, 1982, p. 9). The key issue is estimating the required variance from each of the random components. For averages, weighted averages, or functions of maximum likelihood estimates this is usually straightforward. Note that if the random components are not independent, one needs to account for the correlation structure between these random components. Otherwise, the variance will be underestimated or overestimated, depending on the correlation structure (see Powell, 2007, equation 2, for the general case, and equation 15 for an example).
To obtain a confidence interval for density, the density estimator is often assumed to follow a log-normal distribution (e.g. Buckland et al., 2001, p. 77), leading to a (1-α)% confidence interval given by
and zα/2 is the upper α/2 quantile of the standard Gaussian distribution. If some of the random multipliers are based on relatively small sample sizes then a t-distribution-based method can be used (Buckland et al., 2001, pp. 77–78).
(ii) Bootstrap variance estimation
A different approach is to use resampling strategies to estimate the variance (e.g. Manly, 2007). The non-parametric bootstrap is the approach used most often. The idea is to resample with replacement the independent sampling units (e.g. transects or sensors) to build a new ‘bootstrap’ dataset, and use this to obtain a new estimate of density. Repeating this procedure many times yields a set of density estimates. The empirical variance of those estimates approximates the variance of the original estimator. From this there are two approaches to obtain confidence intervals. Either one uses the estimated variance with the log-normal assumption as described above, or a percentile method, in which the (1-α)% confidence interval is given by the lowest and highest α/2 quantiles of the bootstrap estimates. Bootstrap offers a robust alternative to analytic variance and confidence interval equations, because of the mild distributional assumptions, and so is often recommended in practice (e.g. Buckland et al., 2001).