Implementing the trinomial mark–recapture–recovery model in program mark
Correspondence author. E-mail: email@example.com
- Time-varying individual covariates present a challenge in modelling data from mark–recapture–recovery (MRR) experiments of wild animals. Many values of the covariate will be unknown because they can be observed only when an individual is captured, and the missing values cannot be ignored.
- Catchpole et al. [Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70, 445–460, 2008] presents one solution to this problem by constructing a conditional likelihood depending only on the observed covariate information – the so-called trinomial model.
- This paper describes the link between the trinomial model and the mark–recapture–recovery model of Burnham (Marked Individuals in the Study of Bird Population, 199–213, 1993) and shows how the trinomial model can be implemented in the software package program mark. This provides the user with access to all of the features of program mark including the facilities for model building and model selection without having to write custom code.
- I provide details on the analysis of a simulated data set and discuss an r package developed to help users format their data and to implement the model through the existing rmark package.
Time-varying, individual covariates like body mass or fitness present a significant problem in modelling mark–recapture and mark–recapture–recovery data. Such quantities can only be observed when an individual is captured and a large proportion of the values may be unknown, particularly when capture probabilities are low. Moreover, the unknown values are not missing at random–the probability that a value of the covariate is observed may depend on the value itself – and cannot simply be ignored. This makes it necessary to model the distribution of the missing covariate values to construct the full-likelihood function. Evaluating the likelihood then requires computing high-dimensional integrals that can only be estimated numerically which makes classical, maximum likelihood (ML) estimation based on the full-likelihood impractical.
Catchpole, Morgan & Tavecchia (2008) presents a solution to this problem by constructing a reduced, conditional likelihood based only on the events that depend on observed covariate information. Instead of modelling the full capture history for each individual, the likelihood considers only the events that directly follow the releases of each marked individual with three possible outcomes–the individual is captured alive on the next occasion, recovered dead before the next occasion or not observed. The authors termed this the trinomial model and the resulting likelihood, which depends only on the observed values of the covariate, can be constructed without modelling the missing covariate values. The resulting likelihood is simple to evaluate and provides consistent ML estimates of the effect of the covariate on survival (Catchpole, Morgan & Tavecchia 2008), but the model considers only part of the data and is less efficient than methods that model the complete capture histories. Bonner, Morgan & King (2010) used simulation studies to compare estimators of the survival probabilities produced by the trinomial model and a Bayesian mark–recapture–recovery model based on the complete data likelihood implementation of the Cormack–Jolly–Seber developed by Bonner & Schwarz (2006). We found that inferences from the trinomial model were generally less precise but that the trinomial model could provide more accurate estimates of the capture and survival probabilities if the distribution of the covariate imposed by the Bayesian model was far from the truth.
In Bonner, Morgan & King (2010), we noted that the trinomial model can be implemented in the existing software package program mark (White & Burnham 1999) by recasting the observed capture histories into one of the existing data types. This report describes the equivalence between the trinomial model and the existing mark–recapture–recovery model that allows the trinomial model to be implemented in program mark, and a detailed example is included in Appendix S1. In short, the trinomial model is implemented by breaking each capture history into a series of individual events that are separately entered into the mark–recapture–recovery data set with group variables modelling differences over time. I hope that implementing the model in program mark will allow users to fit the trinomial model more easily and to take advantage of program mark's existing features, including its powerful optimization routines and advanced model selection tools. To further this goal, I have created an r package, trimark, which provides functions to assist in fitting the trinomial model both in program mark and through the rmark interface (Laake et al. 2012). This package is included in Attachment S1 and updated versions will be available from the author.
Materials and methods
The method for implementing the trinomial model in program mark is based on the equivalence of the conditional likelihood function with a specific version of the mark–recapture–recovery model originally described by (Burnham 1993). This section provides details on the equivalence of the likelihood functions for the two models.
The conditional likelihood function for the trinomial model is derived by considering only those events that directly follow releases of the marked individuals. Suppose that M individuals are captured and marked on the first T−1 occasions of an experiment with T occasions. Let and denote the capture and covariate history for the ith marked individual, such that
and is the value of the covariate for the ith individual on occasion t. The conditional likelihood contribution for the ith individual is constructed by modelling only those events that directly follow the occasions on which the individual was captured and released. Let denote the times over the first T−1 occasions that individual i was captured and released (). Given that , there are three possible outcomes for the following event: , , or . Let denote the apparent survival probability for individual i on occasion t (the probability that the individual survives and does not emigrate between occasions t and t + 1), the probability that the individual is captured on occasion t given that it is alive, and the probability that the individual is recovered dead between occasions t and t + 1 given that it does not survive. The conditional probabilities for the three possible events following the release of individual i on occasion t are then:
Under the usual assumptions of mark–recapture–recovery models, the conditional likelihood contribution for this individual is
and the conditional likelihood is the product of the individual contributions
If , , and depend only on (and the covariate is measured and recorded every time an individual is captured) then the likelihood in eqn 1 only involves known values of the covariate. For example, Catchpole, Morgan & Tavecchia (2008) modelled as a linear function of on the logit scale, , and only allowed for time variations in the capture and recovery probabilities so that and , independent of all values of the covariate. This is the model from which the sample data described in the following section were generated.
Now consider the likelihood function for the mark–recapture–recovery model presented in Burnham (1993). Specifically, consider the model for an experiment with only two occasions in which n marked individuals are only released once. Each individual contributes only one event to the likelihood, denoted by for individual j, and there are again three possible outcomes
Burnham (1993)'s primary goal in simultaneously modelling recaptures and recoveries was to distinguish true survival from emigration and so the model separates the apparent survival probability for individual j as the product of two parameters, the probability that the individual survives to occasion 2 (true survival) , and the probability that it remains in the capture area (fidelity) . Denoting the capture and recovery probabilities for individual j by and , the probabilities assigned to the possible events are as follows:
The full-likelihood is again equal to the product of contributions from all individuals and is
Our implementation of the trinomial model in program mark is based on recognizing that the likelihoods in eqns 1 and 2 have the same form. Equivalence is achieved by:
- setting n equal to the total number of releases of all marked individuals over the first T−1 occasions (),
- indexing the releases in a single sequence from 1 to n instead of double-indexing by both individual and occasion so that the release of individual i is reindexed as the overall release,
- defining , , , and with , and
- fixing to 1 for all j.
In practice, this is achieved by entering each release in the trinomial model as a separate line (i.e. a separate individual) in the input data to program mark, including the relevant values of as fixed individual covariates of the survival probability for each entry, and dividing the entries into T−1 groups to accommodate time effects. The following section describes the instructions for formatting data and brief results from the analysis of a simulated data set. Step-by-step details for fitting the trinomial model to the simulated data either in program mark or through the rmark interface are provided in Appendix S1.
To illustrate how the trinomial model can be implemented in program mark, I describe the analysis of a simulated data set. Complete instructions for fitting the model in program mark directly or through the rmark interface using the accompanying r package trimark are included in Appendix S1. All files required to reproduce the analysis, including the complete data and the trimark package, are included in the accompanying zip archive which is included in Attachment S1. Here I provide instructions on how to format the data and brief results.
Data for the Burnham recapture/recovery model are entered into program mark using the live/dead system of encoding. Events from each capture occasion are encoded using two binary variables–the first indicating whether or not the individual was captured on that occasion and the second indicating whether the individual was recovered dead before the next occasion (see the program mark help topic Encounter Histories File for full details). Using this method, the three possible events in the trinomial model are encoded as:
To format the trinomial data for program mark, each event in an individual's capture history is entered as a separate line in the input file using the above LD encoding. The entries are then completed by adding T−1 group variables corresponding to the time of release which can be used to model time effects followed by the observed value of the time-varying covariate. Further covariates, including additional time-varying covariates, can also be added if any were recorded.
The first three capture histories from the simulated data set, observed covariate values, and corresponding entries in the program mark input file are provided in Table 1. Consider the first history, 11012. The individual was released on three occasions and so the history contributes three separate entries to the input file. The individual was first captured on occasion 1 with a covariate value of 1·62 and was recaptured alive on occasion 2. The entry in the input file corresponding to this event is: 1010 1 0 0 0 1·62;. The individual was released on occasion 2 with covariate value −1·26 and was not captured on or recovered before the third occasion. The second entry in the input file is: 1000 0 1 0 0 −1·26;. Finally, the individual was captured and released on occasion 4, the observed covariate value was −0·28, and the individual was recovered dead before occasion 5. The final entry this individual contributes to the program mark input file is: 1100 0 0 0 1 −0·28;. A function that formats the data for the trinomial model and creates the mark input file, write.inp.tri, is provided in the R package in Attachment S1 and described in Appendix S1.
Table 1. Three histories from the simulated data. The first column indicates the capture history and the second provides the values of the covariate observed when the individuals were captured alive (the symbol ‘–’ indicates a missing value). The final column lists the corresponding entries in the input file for program mark. The number of entries in the input file for each capture history is equal to the number of releases (the number of 1 s) prior to the last capture occasion
|11012||1·62, −1·26, –, −0·28, –||1010 1 0 0 0 1·62;|
| || ||1000 0 1 0 0 −1·26;|
| || ||1100 0 0 1 0 −0·28;|
|11100||0·48, −0·45, 0·00, –, –||1010 1 0 0 0 0·48;|
| || ||1010 0 1 0 0 −0·45;|
| || ||1000 0 0 1 0 0·00;|
|10102||0·34, –, −2·61, –, –||1000 1 0 0 0 0·34;|
| || ||1000 0 0 1 0 −2·61;|
Results from fitting the model of Catchpole, Morgan & Tavecchia (2008) to the simulated data are provided in Table 2. Estimates for all parameters are close to the true values, and the 95% confidence intervals cover the true values in all cases. The estimate of lay very close to the boundary of the parameter space, and so it was necessary to compute the profile likelihood interval for this parameter to obtain reasonable confidence limits. Comparisons of the fit of the default trinomial with four alternative models are provided in Table 3. Not surprisingly, the default model from which the data were generated provides superior fit in comparison with the alternative models. Appendix S1 provides complete details on the alternative models, the procedures for fitting these models in program mark and rmark, and the steps for performing multi-model inference.
Table 2. Estimates of the parameters in the model of Catchpole, Morgan & Tavecchia (2008) fit to the simulated data set. For each parameter, the table presents the true value, the maximum likelihood estimate and the asymptotic 95% confidence interval
| ||1·39||1·52||(1·21, 1·83)|
| ||1·00||1·03||(0·77, 1·29)|
| ||0·80||0·86||(0·78, 0·91)|
| ||0·80||0·82||(0·72, 0·89)|
| ||0·20||0·17||(0·09, 0·28)|
| ||0·80||0·76||(0·52, 0·90)|
| ||0·20||0·19||(0·10, 0·33)|
| ||0·80||1·00||(0·75, 1·00)a|
| ||0·20||0·24||(0·08, 0·52)|
The trinomial model described by Catchpole, Morgan & Tavecchia (2008) provides a simple method to model the effect of time-dependent, individual covariates on the probability of survival using mark–recapture–recovery data. Although inference from the trinomial model is less efficient than inference from full-likelihood methods, including Bayesian methods (Bonner, Morgan & King 2010), it avoids the complication of modelling the distribution of the unobserved covariate values. Implementing the trinomial model in program mark and rmark provides easy access to the method for users who are already familiar with these software packages, and I am hopeful that this will help researchers to construct more realistic models of their data.
Table 3. Model comparison statistics for the default model and the four alternative models. The columns of the table list the number of parameters, deviance, small sample adjusted AIC, and model weight for each of the five models
|Alt. Model 1||9||1290·238||1308·396||0·00|
|Alt. Model 2||7||1262·383||1276·481||0·00|
|Alt. Model 3||7||1426·465||1440·563||0·00|
|Alt. Model 4||4||1477·239||1485·274||0·00|
Dr Gary White, the creator of Program MARK, has pointed out that the likelihood function for the trinomial model can also be constructed in Program MARK using the multi-state data type. The same strategy is used to construct the input file by considering multiple releases for each individual as separate entries with 2 occasions per history, but the recaptures and recoveries can be encoded as transitions to different states instead of using the live–dead encoding. The trinomial model can also be extended with more complex data types, like the Multi-state–Live and Dead Enc. which would allow one to model effects of the covariate on the transition probabilities as well as on survival.
I would like to thank Dr Byron Morgan, Ben Augustine, Dr Jeff Laake and Dr Gary White for their helpful comments on drafts of the manuscript. Funding for this work was partially provided by the National Science Foundation (NSF Grant No. 0814194).