#### Missing data in CRR studies

In CRR studies, missing data on individual times of birth (*b*_{i}) and death (*d*_{i}) can occur in response to different processes. When modelling age-specific mortality through estimating the unobservable age states, these sources of ‘missingness’ need to be clearly acknowledged. Ideally, CRR data sets should include a large proportion of uncensored records (i.e. with known times of birth and death). However, as we mentioned above, both left-truncated and right-censored records can be common, while their prevalence can be tightly linked to the duration of the study. Furthermore, low recapture and/or recovery probabilities limit the number of detections and introduce uncertainty about status.

Commonly, data collection for CRR studies consists of population sampling at discrete intervals spanning an interval [*t*_{1},…, *t*_{T}], where *t*_{1} and *t*_{T} correspond to the first and last sampling occasions, respectively. At each sampling occasion, new individuals are marked and released while previously marked ones are recaptured when they are alive or recovered when they die. The event of an individual *i* being detected (captured) or not at each occasion *t* is defined by the indicator *y*_{i,t}, which assigns 1 if the individual is recaptured and 0 otherwise. The resulting capture history vector is **y**_{i}, with maximum possible length equal to the study span *T*. Thus, for a study span of *T *= *t*_{T}−*t*_{1} + 1 years, the vector **y**_{i} for an individual with unknown times of birth and death is the *T*-sequence of binary indicators that takes values on the discrete space {0,1}^{T}. We define the first time an individual is detected as *f*_{i}, the last time as *l*_{i}, and the total number of years it was observed as *o*_{i}. Thus for the observation vector **y**_{i} = [1, 0, 1, 0, 0], we have *T *=* *5 and = 2, *f*_{i} = 1 and *l*_{i} = 3. However, the entire *individual history* is not always bound by the study span, because births and deaths can occur before and after the study, respectively. Therefore, each individual history combines the vector **y**_{i} with the respective times of birth (*b*_{i}) and death (*d*_{i}). For example, an individual history given by the vector [*b*_{i}, 1, 1, 0, *d*_{i}] corresponds to an individual known to be born at a time *b*_{i} = *t*, defined within the interval [*t*, *t* + Δ*x*], that died at a time *d*_{i} = *t *+* *4, and was recaptured at times *t *+* *1 and *t *+* *2. This uncensored record contains all the information required to estimate age-specific survival probability, while the missing observation at occasion 4 contributes to the data model that we describe below. A data set consists of *n* individual histories, where *n* corresponds to the total number of individuals recorded.

#### A hierarchical framework for truncated and censored records

Analysis of age-specific survival is traditionally applied to capture–mark–recapture data sets by combining models for survival and recapture probability. Let *x* represent age and *X* be the random variable for age at death. The age at death for an individual *i* is calculated as the difference between the times of death and birth (*X*_{i} = *d*_{i}−*b*_{i}). Because observations are often annual, we discuss data in terms of discrete time increments of duration Δ*x* = 1 year.

Parametric models are commonly used for inference on lifetime survival. A model for continuous age *x* is expressed in terms of an age-specific rate of mortality, or hazard rate, , where parameters are represented by *θ*. This estimate of mortality rate can be used to calculate the probability of survival until age *x*, or survivor function,

- (eqn 1a)

the probability that death occurs before age *x*,

- (eqn 1b)

and the probability density function (pdf) of ages at death,

- (eqn 1c)

In standard survival analysis, individuals of both known and unknown age at death contribute to estimates, but in different ways (Cox & Oakes 1984). Individuals having known age at death are uncensored and contribute likelihood 1c, while those for which age at death could not be recorded are right-censored and contribute likelihood 1a. Individuals of unknown age (i.e. left-truncated) are typically omitted.

Our treatment extends inference to left-truncated records by imputing times of birth and death, and thus, only the pdf of ages at death (eqn 1c) is required. For right-censored and left-truncated individuals, the imputed age at death substitutes for known age at death. Let **X** = [*X*_{1}, …, *X*_{n}] be the vector for ages at death, and **X**_{k} ⊆ **X** and **X**_{u} ⊆ **X** are vectors containing the known (uncensored) and unknown (truncated and censored) ages, respectively. These subsets of **X** have length and , where the level of missingness *n*_{u} < *n* will determine our ability to estimate parameters (see below). Here, *δ*_{i} is an indicator that assigns 1 if individual *i* was uncensored, and 0 otherwise.

The joint distribution of unknowns cannot be analysed under a maximum-likelihood framework, because it requires the integration of all the multinomial probabilities for possible recapture histories and the unknown (stochastic) ages. Our hierarchical framework needs only the conditionals for posterior simulation by Gibbs sampling, specifically Metropolis-within-Gibbs (Gelfand & Smith 1990; Clark 2007). This means that, for this particular case, the algorithm divides the posterior for the joint distribution of unknowns into three sections: (a) estimation of survival parameters, *θ*; (b) estimation of unknown ages at death, **X**_{u} and (c) estimation of recapture probability(ies), *π*.

In Section (a), survival parameters *θ* conditioned on age **X** have density

- (eqn 2)

The first factor on the right-hand side of eqn 2 is the likelihood for ages at death, and the second factor is the prior distribution for the mortality parameters *θ*, where *θ*_{p} are the prior parameter values. The likelihood is calculated as

- (eqn 3)

where eqn 3b is applied to left-truncated individuals. For these records, the probability of dying (to a discrete approximation) at age *X*_{i} = *x* is conditioned on surviving to the age at truncation *t*_{1}-*b*_{i} and is derived as

Section (b) estimates the age at death for individuals that are either censored, truncated or both, conditioned on survival and observations, and is calculated as

- (eqn 4)

The first factor evaluates the probability that an individual is not detected for *η*_{i} years before the first capture (for truncated records) and after the last capture (for censored records) and is calculated as

- (eqn 5)

where *η*_{i} is calculated as

The second factor in eqn 4 is the density of deaths (as in eqn 1c) conditional on estimated times of birth and death. The last factor is the prior age distribution, which is constructed from the prior parameters *θ*_{p}

where life expectancy (i.e. expected age at death) is calculated as . Prior dependence introduced by the parametric model allows for imputation of times and ages at death for censored and truncated individuals using a Metropolis step. CRR data sets consist of discrete observations separated by intervals of time where no records are taken. However, mortality is a process that happens continuously in time – individuals can die at anytime between consecutive observations. To account for this discretization in mortality with age, we used a mid-point approximation of ages at death when estimating the survival and mortality functions. This means that the probability that an individual *i* dies at age *X*_{i}, this is Pr(*x < X*_{i} < *x *+ Δ*x*), is evaluated at the mid-point of the interval, namely at age *x *+ (Δ*x** 0·5) (see Appendix S1 for details of the implementation).

Section (c) estimates detection probability *π* with conditional posterior constructed as the product of a binomial density, which is the likelihood for the number of years an individual *i* is observed (*o*_{i}) given the number of years it is imputed to be alive (*n*_{i}) within the study, and a Beta prior distribution for *π* with hyper-parameters *ρ*_{1} and *ρ*_{2}. The result is a conjugate Beta conditional posterior

- (eqn 6)

Section (c) can be extended to estimate multiple recapture probabilities through a multivariate Beta conditional posterior (e.g. different recapture probabilities by year or age; see Appendix S1 for an implementation in the code), or by treating the recapture probability as a binomial GLM with a link function that relates it with categorical or continuous explanatory variables (e.g. year, age, random individual effects, etc.).

In summary, the three sections described above are introduced into a Gibbs sampler algorithm as follows:

**Section a**: Metropolis Sampling for the two potential vectors of survival parameters *θ* and *θ*‘given the real and proposed ages at death with acceptance probability

*Section b*: Metropolis sampling between two potential ages at death *X*_{i} and *X*_{i}′, and the associated times of birth and death (*d*_{i} and *b*_{i}), with acceptance probability

**Section c**: Direct sampling for recapture probability from the Beta distribution in eqn 6. If multiple recaptures are evaluated using a link function within a GLM framework, Metropolis sampling would be required if the distributions of the link function parameters are not conjugate with the binomial recapture probabilities (Clark 2007).

#### Simulation study

To test the performance of our model in estimating mortality patterns and the imputed ages at death, we stochastically simulated 10 populations. We simulated each population for 130 years, starting from an initial population of 10 individuals and sex ratio of 1 : 1. Each female could produce one offspring per year, and breeding success was randomly drawn from a Bernouilli distribution with probability 0·5. The sex of the offspring was also determined from a Bernouilli trial with probability 0·5. The age at death for each individual was randomly drawn by inverse sampling from a Gompertz mortality cdf (eqn 1b; Gompertz 1825) with hazard *μ*(*x*) = *αe*^{βx}, where *α* and *β* are the baseline mortality and shape parameters, respectively. Survival probability is then calculated as

- (eqn 7a)

and the pdf for ages at death is

- (eqn 7b)

To reproduce the mortality patterns of a long-lived population, we fixed the Gompertz parameters at *α*_{r} = 0·02 and *β*_{r} = 0·2, which incorporates strong age-dependence in the mortality rate. The resulting survival function estimates that only 1% of the initial cohort remains alive by age 20. Because we allowed for variability in reproduction and survival, the final size of each population was different, which allowed us to test the sensitivity of the model to sample size.

For each population, we fixed the start of the study at year 100 and generated two sets of data, one for a short study of *T *=* *10 years, and one for a long study of *T *=* *20 years. We also varied the proportion of individuals first captured at birth (i.e. known-age individuals), the recapture (*π*) and recovery (*λ*) probabilities by randomly drawing from a Binomial distribution with two sets of parameters: a low probability of 0·2 (0·1 for *λ*) and a high probability of 0·8. As a result, we simulated a total of 16 samples per population, which corresponded to all the possible combinations (scenarios) of short-long study span, and low-high proportion of known ages, recapture and recovery probabilities, adding up to 160 study cases.

To provide a point of comparison with more traditional methods, we compared the results of our model with the CRR model proposed by Catchpole *et al.* (1998). We adapted Catchpole *et al.*’s model for inference on age-specific survival assuming no cohort effects. This model is a product of multinomial distributions for age-specific survival probabilities, *φ*_{x}, detection probability, *π*, and recovery probability, *λ*, applied only to individuals with known age (i.e. left-truncated records are not included) and only for ages up to *X *= *T*-1. The likelihood for an individual *i*, *L*(**u**_{i}; **Φ**, *π, λ*), where **u**_{i} represents its individual history and where **Φ** = {*φ*_{1},…,*φ*_{T-1}}, is

- (eqn 8)

where *δ*_{i} is an indicator function for uncensored (*δ*_{i} = 1) vs. censored (*δ*_{i} = 0) individuals. *ω*_{i} is either (i) the age of a censored individual the last time it was detected (*l*_{i}–*b*_{i}) or (ii) the age at death for an uncensored individual *X*_{i}, thus , and *υ*_{i} is age at the end of the study (*υ*_{i} = *t*_{T}–*b*_{i}) applied to censored individuals. The first factor on the right-hand side of eqn 8 corresponds to survival to age of last capture (for right-censored) or to last year before death (for uncensored), the second factor (second line) calculates mortality probability at *ω*_{i} for uncensored individuals, and the last factor (third line) is the probability of either dying without recovery or surviving without detection at every age after last detected for right-censored individuals. Maximum-likelihood estimates are obtained for the vector of age-specific survival probabilities and probabilities of detection and recovery (see Catchpole *et al.* (1998) for a full description of the model and fitting method). For comparison with the estimates from our Bayesian model, we used the survivor function, which can be calculated as . Owing to the large number of samples (i.e. 160), we used a Metropolis algorithm to obtain mean and 95% confidence intervals for the parameters estimated. The acceptance criterion was simply the likelihood ratio for the previous and proposed parameters, namely .

To test how accurately both models predicted the real survival patterns from each of the 160 study cases, we calculated predictive loss (Gelfand & Ghosh 1998) between the range of estimated survival probabilities and the real survival probability. This method is commonly used as a measure of predictive performance for model selection. We used the converged parameter chains to calculate survival probabilities at each age and calculated a measure of goodness-of-fit (*G*_{m}) as the error sum of squares between the average predicted survival probability and the real probability, and one of model dispersion or penalty (*P*_{m}) measured as the predictive variance. A deviance value is calculated as *D*_{m} = *G*_{m} + *P*_{m}. The model with the lowest *D*_{m} is considered to have the highest predictive ability.

We also tested the sensitivity of our models to the selection of priors for the mortality parameters. We ran the model on all populations with four sets of priors, each set describing a particular trend in survival. The Gompertz parameters for the priors were as follows: prior 1: *α*_{p} = 0·001, *β*_{p} = 0·001; prior 2: *α*_{p} = 0·001, *β*_{p} = 0·15; prior 3: *α*_{p} = 0·1, *β*_{p} = 0·001 and prior 4: *α*_{p} = 0·1, *β*_{p} = 0·15 (Fig. 1S in Appendix S2). The Beta priors for the detection probability π were *ρ*_{1} = 0·1 and *ρ*_{2} = 0·1, which yield a mean prior .

We calculated credible intervals for all parameters for each subpopulation from the estimates in the MCMC chains after burn-in. Finally, we tested the bias and dispersion in the estimation of ages at death for truncated and censored records for all populations and scenarios as the difference between the estimated age and the real age: *e*_{x} = *x*_{est}−*x*_{real}, for each individual with unknown time of birth or of death (truncated, censored or both) at each estimated value per MCMC step after burn-in. With these results, we calculated predictive intervals for *e*_{x} for each subpopulation.

#### Application

We obtained CRR data from a free-ranging population of Soay sheep (*O. aries*) from Hirta, St Kilda’s archipelago, Scotland (57°49′N, 08°34′W) studied since 1985 (for details on data collection, see Clutton-Brock & Pemberton (2004)). The data set consists of over 7000 individual histories that include times of birth and death and capture histories. Because the aim of this analysis was to illustrate the performance of our model and not to implement an in-depth analysis of Soay sheep demography (for extensive examples, see Coulson *et al.* 2001; King *et al.* 2006; Pelletier *et al.* 2007; Coulson *et al.* 2008), we only used a subsample of individuals detected between 1995 and 2005 for a total of 2577 capture histories. We assumed that individuals born before the study were left-truncated while those dying beyond the study were right-censored, for a total of 944 records with missing data. For 648 of these records, we had the real times of birth and death, and thus, we were able to use them to calculate errors in age estimation. We conditioned our analysis on survival after the first year of life to avoid the complexities of juvenile mortality (Jones *et al.* 2005). After a visual exploration of the data, we decided to use a Siler mortality rate function (Siler 1979), which allows the exploration of ‘bathtub’ patterns in mortality. The Siler hazard rate is calculated as

- (eqn 9a)

with survival probability

- (eqn 9b)

The pdf for ages at death is calculated as depicted in eqn 1c. To illustrate how to incorporate covariates, we tested whether males and females differed in mortality rates (see Appendix S1 for model description and implementation in R). We tested the difference in parameter estimates between both sexes by calculating Kullback–Liebler (K–L) discrepancy, *D*_{fm}, (Burnham & Anderson 2001) on the resulting parameter posterior densities. This metric calculates the distance between the distribution of a given parameter *θ* for females [*P*_{f}(*θ*)] and males [*P*_{m}(*θ*)] as

If both distributions are similar, *D*_{fm} should be close to 0.