Latent multinomial models for extended batch‐mark data

Abstract Batch marking is common and useful for many capture–recapture studies where individual marks cannot be applied due to various constraints such as timing, cost, or marking difficulty. When batch marks are used, observed data are not individual capture histories but a set of counts including the numbers of individuals first marked, marked individuals that are recaptured, and individuals captured but released without being marked (applicable to some studies) on each capture occasion. Fitting traditional capture–recapture models to such data requires one to identify all possible sets of capture–recapture histories that may lead to the observed data, which is computationally infeasible even for a small number of capture occasions. In this paper, we propose a latent multinomial model to deal with such data, where the observed vector of counts is a non‐invertible linear transformation of a latent vector that follows a multinomial distribution depending on model parameters. The latent multinomial model can be fitted efficiently through a saddlepoint approximation based maximum likelihood approach. The model framework is very flexible and can be applied to data collected with different study designs. Simulation studies indicate that reliable estimation results are obtained for all parameters of the proposed model. We apply the model to analysis of golden mantella data collected using batch marks in Central Madagascar.

Supporting Information for "Latent Multinomial Models for Extended Batch Mark Data" by Wei Zhang, Simon J. Bonner, and Rachel S. McCrea

A Unmarked individuals
When unmarked individuals are considered, there are three possible events on each capture occasion: 0 if the individual is not captured; 1 if the individual is captured and marked before being released or is already marked; 2 if the individual is captured and released without being marked.
We now consider the set of all possible latent histories.If an individual is captured and first marked on some occasion, then the individual does not have event 1 before that occasion or event 2 after that occasion.The number of all such latent histories is J 1 = T 2 T −1 , where T = K k=1 T k is the total number of capture occasions.If an individual is never marked, its capture history does not contain any event 1.The number of all such latent histories is J 2 = 2 T .Then the total number of latent capture histories is J = J 1 + J 2 = (T + 2)2 T −1 .For convenience, we index these latent histories as history j = 1, . . ., J.
Then we consider how to express the probability for each latent capture history in terms of the model parameters θ, which include p kt the capture probability on secondary occasion t of period k; ϕ k the survival probability from period k to k + 1; α the probability of not being marked before release for a captured (unmarked) individual; β k the probability of entry in period k.
Using these parameters, the probabilities of events 0 and 2 on secondary occasion t of period k are 1 − p kt and αp kt .The probability of event 1 on secondary occasion t of period k is: (1 − α)p kt if the individual is first captured and marked, and p kt if the individual is recaptured (previously marked).To be more general, the parameter α may also depend on the secondary occasion and/or the primary period.When specifying the formula for each π j , we also need to consider survival between periods and the entry of the N individuals in each period.Consider a simple example with K = 3 and T k = 2 for k = 1, 2, 3.The probability of latent history 021010 is Pr( 021010 In addition to the observed counts m and n defined for the model without unmarked individuals, we also have where u kt denotes the number of individuals that are captured and released without being marked on occasion t of period k.Note that which indicates that a matrix C can be derived such that It follows that we have which still falls within the latent multinomial class.

B Initial Values
We compute initial values for optimizing the likelihood via a Manly-Parr type approach that first estimates the capture probabilities based only on the data within each capture period and then estimates the survival probabilities conditional on the capture probabilities.Initial values are first computed for the model assuming that the capture probabilities are constant across the secondary occasions within a primary period but allowing for variation in both the capture and survival probabilities between primary periods(p(k), ϕ(k)).These values are then adjusted for any alternative models.
The key to this approach is that individual identities do not need to be known to estimate capture probabilities based solely on the data collected within each primary period.Assuming closure, we estimate the capture probability on the t th secondary occasion, t > 1, within the k th primary period by the proportion of individuals that were marked on the previous secondary occasions (plus 1) and recaptured on occasion t (plus 1): Adding 1 in both the numerator and denominator avoids division by 0 if t−1 s=1 m ks = 0 and sets p kt = 0.5 in this case.We also truncated the values to the interval (0.1, 0.9) to ensure that the capture probabilities are not too close to 0 or 1.Assuming that the capture probability is constant across the secondary occasions, we obtain our initial estimate of the probability of capture within primary period k by averaging these values: Our initial value for the probability of survival between primary periods k and k + 1 is then given by This represents the proportion of individuals marked in the (k − 1) th primary period that are recaptured on the first secondary occasion of the next primary period, inflated by the capture probability to account for non-detections.Initial values for models with different assumptions about the capture and survival probabilities are then obtained either by averaging or repeating these values.For example, the initial value for the survival probability in a model assuming constant survival between all primary periods is set to Alternatively, initial capture probabilities for a model allowing for variation across secondary occasions within a primary period would be set equal to p k for all secondary occasions within primary period k.We do not use the values p kt given above because these can be very imprecise, especially for secondary occasions early in the primary period when the number of previously marked individuals may be small.Initial estimates of the abundance in each primary period are then computed from Horvitz-Thompson type estimates of the abundance on each secondary occasion.Let c kt = m kt + u kt + k l=1 n lkt denote the total number of individuals captured on the t th secondary occasion within the k th primary period.The Horvitz-Thompson estimate of abundance on this occasion would then be Assuming closure there can only be one estimate of abundance within the primary period and so we set Here we consider the median instead of the mean because the distribution of the Horvitz-Thompson estimator is generally right-skewed and can produce very large estimates of abundance when the capture probability is small.Finally, we compute initial values for the number of individuals entering on each primary period by estimating the number of individuals surviving from the previous period and subtracting this from the estimated abundance.That is, we set B 1 = N 1 and The initial value for the probability of entry in primary period k is then

C Further Simulation Results
Tables 1-5 present more simulation results in different settings from those in the main text.Similarly, reliable estimation results are obtained for all model parameters in all these simulations.Note that using the penalized likelihood helps to reduce the mean confidence interval (CI) widths for those survival parameters with true values being over 0.8 in the simulations.For other parameters, using penalization only has negligible effects.Tables 6-10 present more simulation results on model selection using AIC.Compared to the results shown in Table 3 in the main text, the performance of AIC is significantly improved if either capture probabilities or abundance N increases in the simulations while other parameters remain unchanged; see Tables 6 and 7.This is because in these cases we have more captures of individuals and thus the estimation of survival rates becomes more accurate, which helps to select the right model for survival rates.

D Assessment of Initial Parameter Estimates
Occasion (Primary,Secondary)

Capture Probability
Estimates of the capture probabilities from the selected model using the prefiltered histories with penalization starting from either the initial values computed as described in Section B of the Supporting Information (blue squares) or by first setting p 1 = . . .= p 6 = .10(green triangles) or p 1 = . . .= p 6 = .40(red circles) and then computing initial values for the remaining parameters as described in Section B of the Supporting Information.Vertical bars show the extents of the 95% confidence intervals.Points for each version of the model have been offset to avoid overlap.This figure appears in color in the electronic version of this article, and color refers to that version.

Table 1 :
Results of Supplementary Simulation 1

Table 3 :
Results of Supplementary Simulation 3

Table 4 :
Results of Supplementary Simulation 4

Table 5 :
Results of Supplementary Simulation 5

Table 11 :
Alternative Initial ValuesAlternative sets of initial values considered in the assessment of convergence for the selected model of the golden mantella data.The first column presents the initial values computed by applying the methods in Section B of the Supporting Information directly.The second and third columns present values obtained by first setting p 1 = . . .= p 6 = 0.10 (middle) or p 1 = . . .= p 6 = 0.40 (right) and then computing initial values for the remaining parameters as described in Section B of the Supporting Information.