Response to: a new method for estimating animal abundance with two sources of data in capture–recapture studies

Authors


Summary

  1. Mark–recapture studies that rely on multiple marks to identify individuals pose modeling challenges if the marks for each individual are not always linked. If an individual with unlinked marks is encountered on two occasions and different marks are observed, then it will appear that two different individuals were captured. Failing to account for these missed matches will produce incorrect inference.
  2. Madon et al. (Methods in Ecology and Evolution 2011; 2: 390) proposes a modification of the Jolly-Seber estimator for such data computed by adjusting the observed counts of individuals first captured, recaptured or not captured but known to be alive on each occasion. The adjustment involves multiplying each of these counts by a constant factor, inline image, intended to correct for double counting of individuals and constrained between 0 and 1. Results of a simulation study provided in Madon et al. (Methods in Ecology and Evolution 2011; 2: 390) show that the proposed estimator is almost unbiased, but its uncertainty is underestimated and the true coverage of confidence intervals is consistently below the nominal value.
  3. I compute separate adjustment factors for each of the counts and show (i) that a constant adjustment is not appropriate and (ii) that the theoretical adjustment factor is sometimes >1. I believe that the use of a single adjustment factor between 0 and 1 is what causes the uncertainty to be underestimated and that complete models of the observation process are required to obtain valid results.

Natural or non-invasive marks, including skin patterns and genetic markers, allow individuals to be identified in mark–recapture studies without applying man-made tags. However, data from natural marks also present novel statistical challenges. One challenge that has seen little discussion concerns inference from data that include multiple natural marks that cannot be linked unless they are observed together. This occurs if, for example, individuals are identified from skin patterns on different parts of the body or separately from both photo-identification of skin patterns and from genetic markers. Marks for one individual may not be linked, meaning that it is not always possible to determine whether, say, a photo taken on one occasion and a genetic sample collected on another occasion represent the same individual or two different individuals. Analysing the data without accounting for the possibility of missed matches will erroneously inflate the number of individuals encountered and create dependence between the histories, violating a key assumption of most mark–recapture models. Madon et al. (2011) proposes a method for estimating abundance from such data using an adjusted version of the Jolly-Seber (JS) estimator. However, the results of the simulation study provided in Madon et al. (2011) suggest that the method does not perform well. Although the adjusted JS estimator is approximately unbiased, its sampling variance is consistently underestimated particularly when capture probabilities are high. The average coverage probability of the 95% confidence intervals is 0·84 when data are simulated from the true model with capture probability 0·05 but only 0·45 when the capture probability is 0·80 (Madon et al. 2011, table 2). That the coverage probabilities are worse when capture probabilities are higher – that is when the data contain more information – is disturbing and 95% confidence intervals with coverages as low as 45% will provide misleading information for biologists trying to understand or manage a population. I believe that these results stem from errors in the derivation of the adjusted JS estimator which I discuss below.

The specific objective of Madon et al. (2011) is to estimate the number of humpback whales breeding around the island of New Caledonia each year between 1996 and 2001. Data come from a study in which whales were identified from both photo-identification of skin patterns (mark 1) and collection of DNA samples (mark 2). Sampling occurred from July to September, and it is assumed that individuals can only be encountered once each year. Each year, one of four events may occur:

  • 0. the individual is not encountered,
  • 1. the individual is encountered and only mark 1 is seen,
  • 2. the individual is encountered and only mark 2 is seen,
  • 3. the individual is encountered and both marks are seen together.

The key assumption is that the two marks for an individual can be matched only if both are seen together in at least 1 year. If this occurs, then I say that the individual's marks are linked. If it does not, then I will say that the individual's marks are not linked. If an individual's marks are linked (i.e. event 3 occurs in the encounter history), then its true capture history can be known with certainty. However, if an individual's marks are not linked and both of its marks were observed on separate occasions, then it will contribute two separate histories to the observed data. For example, an individual with the true history 01120 will contribute the histories 01100 and 00020 to the observed data – exactly as if two different individuals had been encountered.

The primary challenge with modelling such data is not to provide point estimates of the demographic parameters (e.g. the population size) but rather to compute appropriate measures of precision. Suppose that individuals can be identified by one of p marks. A simple estimator of population size combining the data from all marks is given by obtaining separate estimates for each mark, inline image, and then computing the weighted average

display math

for some set of positive weights such that inline image. The bias of this estimator will be the weighted average of the bias of the separate estimators, and inline image will be unbiased/consistent if each inline image is unbiased/consistent. However, the standard error of inline image is difficult to compute because the separate estimates inline image are dependent but their correlations are unknown. Wilson et al. (1999) describes an application of this method for two marks using inverse variance weights so that inline image, m = 1,2. The variance of inline image is then approximated by treating inline image as independent so that inline image, but this is not appropriate and inline image will be biased low.

To account for the possibility that some whales contribute multiple encounter histories to the New Caledonia humpback data, Madon et al. (2011) proposes an adjustment of the JS estimator of abundance. Specifically, Madon et al. (2011) builds on the bias-corrected version of the JS estimator, assuming no losses on capture, given by

display math

where

  • inline image is the number of individuals first encountered on occasion i,
  • inline image is the number of individuals encountered both before and on occasion i,
  • inline image is the number of individuals encountered both on and after occasion i, and
  • inline image is the number of individuals encountered both before and after but not on occasion i.

Uncertainty in matching observed histories with different marks means that these counts cannot be known, and so Madon et al. (2011) attempts to adjust these values before computing inline image.

Consider inline image. This quantity cannot be observed because it is not always possible to know whether or not an individual encountered on occasion i was previously encountered. For example, if an individual's true history was 01200, then it is truly encountered both before and on occasion 3, but this cannot be known because its marks are not linked. This individual will be mistakenly excluded from the observed count of inline image. To account for such errors, Madon et al. 2011, eqn 4) replaces inline image with

display math(eqn 1)

where

  • inline image represents the number of individuals for which mark 1 was seen prior to occasion i and mark 1 was seen on occasion i and whose marks are not linked,
  • inline image represents the number of individuals for which mark 2 was seen prior to occasion i and mark 2 was seen on occasion i and whose marks are not linked, and
  • inline image represents the number of individuals encountered before and on occasion i whose marks are linked.

The raw sum inline image is the number of individuals known to be captured both before and on occasion i, and inline image is an adjustment factor intended to account for the difference between this value and the true number of individuals recaptured. It is described as the ‘Probability of true identity’ (Madon et al. 2011, p. 3) and assumed to be between 0 and 1. The same adjustment is also used for inline image, inline image, and inline image so that (Madon et al. 2011, eqns 5, 6, and 7)

display math(eqn 2)
display math(eqn 3)
display math(eqn 4)

where inline image, inline image, and inline image are defined analogously to inline image, j = 1,2,3. My concerns with these adjustments are (i) that all counts are adjusted by the same factor and (ii) that inline image is assumed to be <1. I show below that the correct adjustment factors for the different counts are not the same and that some of the observed counts may underestimate the true counts so that the correct adjustment factor may be >1.

To illustrate my concerns, I derive adjustment factors for inline image, inline image, inline image, and inline image for the specific case of an experiment with three occasions. In the following discussion, I use inline image to denote the number of individuals whose true encounter history is inline image. Events on each occasion are represented using the notation on page 4. For convenience, I use −3 to represent any event except 3; so that, for example, n(−3,1,1) = n(0,1,1) + n(1,1,1) + n(2,1,1). Note that some of these counts cannot be observed. For example, n(0,1,−3) and n(2,1,−3) cannot be distinguished even though their sum is known.

Consider inline image. The true number of individuals encountered both before occasion 2 and on occasion 2 is

display math(eqn 5)

From the definitions above

display math(eqn 6)

Substituting (5) and (6) into (1) and solving for inline image then yields

display math(eqn 7)

Note that inline image. In general, errors in counting the number of individuals encountered both before and on occasion i occur because different marks were seen and the encounters cannot be linked. This means that the observed value underestimates the true value of inline image and should be adjusted upward.

Next consider the count of individuals first encountered on occasion 2, inline image. The true value is

display math(eqn 8)

whereas

display math(eqn 9)

and

display math(eqn 10)

Substituting (8) and (9), (10) into (2) then yields

display math(eqn 11)

Clearly, inline image and inline image. In general, errors in counting the number of individuals first encountered on occasion i occur because some individuals were actually encountered before, but the marks seen before and on occasion i were not the same and are not linked. This leads to over-counting of the number of individuals first encountered on occasion i so that the observed count should be reduced.

Theoretical correction factors for inline image and inline image can be computed in the same way. The adjustment factor for inline image is

display math(eqn 12)

Then inline image, inline image, and again inline image. Generally, the number of individuals encountered on and after occasion i is under-counted because some individuals with unlinked marks are missed. Finally, the adjustment factor for inline image is

display math(eqn 13)

This value may be either greater or less than 1 depending on whether the number of individuals mistakenly included in inline image and doubly counted, n(1,2,1) + n(2,1,2), is bigger or smaller than the number of individuals mistakenly excluded from inline image, n(1,0,2) + n(2,0,1).

Equations (7), (11), (12) and (13) for a study with three occasions make it clear that the proper adjustment factors for inline image, inline image, inline image and inline image are not the same and are not all constrained to (0,1). The expressions for studies with more capture occasions are more complicated, but the same results hold in general. Given these results, it is not surprising that the variance of inline image is underestimated when the adjustment factors are all assumed to be equal and less than 1. Of course, the correct factors cannot be computed explicitly because they depend on counts of individuals with unlinked marks that cannot be observed like n(2,1,−3), n(−3,1,2), and n(1,0,2). To compute valid adjustment factors, one would need to estimate these quantities by modeling the observation process, and so I recommend a model based approach. Maximum likelihood estimation provides one solution but is complicated because the exact likelihood can only be computed by summing over all possible configurations of the true histories, and this set may be very complicated. One solution is to approximate the likelihood function so that it is efficient to compute (R Fewster, unpublished data). Another alternative is to apply a Bayesian complete-data likelihood approach treating the counts of the true histories as a latent-multinomial process, as in Link et al. (2010). Both McClintock et al. (2013) and Bonner & Holmberg (2013) examine this approach using Markov chain Monte Carlo to sample simultaneously from the posterior distribution of both the model parameters and the configuration of the true encounter histories. Advantages of this approach are that it provides valid interval estimates with nominal coverage and also produces inference about the population dynamics (e.g. survival probabilities, recruitment probabilities and population growth rates) as well as population size. Furthermore, the model of Bonner & Holmberg (2013) allows for individuals to be encountered multiple times during each capture occasion. Although this is not a problem with the method of Madon et al. (2011) if capture occasions are instantaneous, the assumption that humpback whales are encountered only once each year seems restrictive given that sampling occurs for 3 months of the year.

Ancillary