The recent expansion of continuous-resighting telemetry methods (e.g. acoustic receivers, PIT tag antennae) has created a class of ecological data not well suited for traditional mark–recapture statistics. Estimating survival when continuous recapture data is available ensues a practical problem, because classical capture–recapture models were derived under a discrete sampling scheme that assumes sampling events are instantaneous with respect to the interval between events.
To investigate the use of continuous data in survival analysis, we conducted a model structure adequacy simulation that tested the Cormack–Jolly–Seber (CJS) and Barker joint data survival estimation models, which mainly differ through the Barker's inclusion of secondary period information. We simulated a population in which survival and detection occurred as a near continuous (daily) process and collapsed detection information into monthly sampling bins for survival estimation.
While both models performed well when survival was time-independent, the CJS was substantially biased for low survival values and time-dependent conditions. Additionally, unlike the CJS, the Barker model consistently performed well over multiple sample sizes (number of marked individuals). However, the high number of parameters in the Barker model led to convergence difficulties, resulting in a need for an alternative optimization method (simulated annealing).
We recommend the use of the Barker model when using continuous data for survival analysis, because it outperformed the CJS over a biologically reasonable range of potential parameter values. However, the practical difficulty of implementing the Barker model combined with its shortcomings during two simulations leaves room for the specification of novel statistical methods tailored specifically for continuous mark–resighting data.
Reliable biological inferences about the processes driving survival of individuals in a population depend on the proper formulation of stochastic process models that are confronted with capture–recapture/resighting data. Such models translate fundamental biological questions into testable hypotheses that further our understanding of the system of interest (Cohen 2004; Gimenez et al. 2007). When such models are inappropriately formulated, bias caused by structural errors can lead to unreliable statistical inferences (Pradel & Sanz-Aguilar 2012).
For capture–recapture/resighting data, formulation of an appropriate stochastic process model requires consideration of the structure of the data collected (e.g. discrete vs. continuous sampling events), the type of data collected (e.g. recapture, resighting or dead recovery) and the biological characteristics of the study system (e.g. open vs. closed populations). For example, multiple models have been developed to estimate survival from open populations when using discrete-resighting data (Hightower, Jackson & Pollock 2001; McClintock & White 2009; Johnson et al. 2010) or discrete-recapture data (Lebreton et al. 1992). The recent expansion of continuous-resighting telemetry methods (e.g. acoustic receivers, PIT tag antennae; Heupel & Simpfendorfer 2002; Barbour & Adams 2012) has created a class of ecological data not well suited for standard statistical methods when fates are unknown (Kie et al. 2010). Without an investigation of proper model formulation, the information contained in this data will not be fully harnessed, and statistical inferences may be weak or misleading (Strong et al. 1999).
Several previous survival studies using continuous-resighting data collapsed continuous resightings into discrete-time intervals and applied existing discrete-time models. For example, Heupel & Simpfendorfer (2002) applied Hightower, Jackson & Pollock (2001)'s discrete-time model to continuous-resighting data by collapsing resightings into weekly sampling bins. Similarly, Adams et al. (2006) collapsed continuous-resighting data into weekly intervals and estimated apparent survival with the discrete Cormack–Jolly–Seber (CJS) model. During a multiyear study, Cameron et al. (1999) collapsed 4 months (November through February) of continuous resightings into a single encounter occasion labelled as January 1st each year and then estimated annual survival with a discrete multistate model. Hewitt et al. (2010) took a similar approach, but used a discrete CJS model.
The use of continuous data in discrete-time models violates the assumption that sampling occasions are instantaneous with respect to the interval between periods (e.g. a cohort is marked in a single day, a prolonged period of time elapses [e.g. a month], then a subsequent capture–recapture event occurs over a single day; Pollock et al. 1990). Some studies have recognized and accounted for this issue (Barbour, Boucek & Adams 2012a; Bowerman & Budy 2012; Ruiz-Gutiérrez et al. 2012; Mintzer et al. 2013), but it is unknown how violating this assumption biases survival probabilities in studies that have not. Here, we explore this issue by simulating a population of marked individuals that are resighted on a relatively continuous (daily) basis and collapsing these ‘continuous’ resightings into discrete-time bins. We then estimate the known survival values with two survival estimation models to determine whether a model currently exists that is appropriate for estimating survival from continuous resightings.
Materials and methods
Model structure adequacy
We used a model structure adequacy (MSA) approach (Taper, Staples & Shepard 2008) to test whether two survival estimation models could be used for unbiased estimation of survival from continuous-resighting data. MSA selects models based on their ability to answer specific scientific questions given the current understanding of the relevant aspects of the real world. Under the MSA approach, a mechanistic simulation model is created to represent the underlying process of interest, and candidate models are used to estimate/predict the relevant metric from simulated data. This allows investigation of two types of error in the tested models: structural (errors of approximation) and estimation (uncertainty in parameter estimates; Taper, Staples & Shepard 2008). In addition to these error types, the MSA approach itself is subject to a third type of error. Formulation error occurs due to differences between the mechanistic simulation model and the true underlying processes.
Accordingly, we formulated a mechanistic simulation model of a marked population in which individual survival and detection occurred as a near continuous (daily) process. We then generated data sets from the simulation model using a range of parameter values that fully encompassed biologically plausible conditions. For each of the scenarios, we tested the ability of two estimation models (CJS and Barker joint data) to recover the basic properties of the survival parameter. We iterated this process for each parameter set 100 times. We evaluated structural error by calculating relative bias and per cent coverage of survival estimates from each estimation model after simulating populations from multiple known parameter values. We assessed estimation error in a second simulation by varying the number of marked individuals in the simulated population. Finally, to evaluate the robustness of the model inferences to unavoidable formulation errors, we added an additional biological process, a severe disturbance event, in a third simulation.
Survival estimation models
We employed two survival estimation models, the CJS (Lebreton et al. 1992) and the Barker joint data (Barker 1997, 1999). The CJS model assumes sampling periods that are instantaneous compared to the interval between sampling events (Pollock et al. 1990). In comparison, the Barker model is composed of both instantaneous primary periods (i and i + x) and continuous secondary periods (i, i + x), with secondary periods being the interval (x) between primary periods (Fig. 1). During primary periods, individuals are captured and recaptured in an identical fashion to the CJS approach. However, secondary periods occur between marking periods and allow marked individuals to be resighted alive or dead on a continuous basis.
The CJS model estimates two parameters: (1) survival, estimated as either apparent survival (Φ; survival confounded by emigration) when emigration occurs and true survival (s) when emigration does not occur and (2) recapture probability (p). The Barker model estimates seven parameters due to the additional information from continuous secondary periods (Table 1). The Barker model estimates true survival (s) when secondary periods are conducted over the entire range of a marked population or when emigration does not occur, and Φ otherwise. Our simulation models did not include emigration; therefore, all survival estimates will hereafter be referred to as s.
Table 1. Barker joint data model parameter definitions in program mark
The probability that an animal alive at i is alive at i +1
The probability that an animal at risk of recapture at i is recaptured at i
The probability that an animal dies in i, i +1 is found dead
The probability an animal that survives from i to i +1 is resighted (alive) sometime between i and i +1
The probability an animal that dies in i, i +1 without being found dead is resighted alive in i, i +1 before it died
The probability that an animal at risk of recapture at i is at risk of recapture at i +1
The probability that an animal not at risk of recapture at i is at risk of recapture at i +1 (this definition differs from Barker (1997) in order to force probability driven internal constraints; White & Burnham 1999)
Simulation 1: structural error
To simulate the use of continuous data for discrete survival estimation, we simulated a population that survived/died and was detected/not detected on a daily basis and collapsed these daily detections into monthly sampling bins. We assumed a system closed to emigration in which all individuals were marked during the first day with no tagging mortality. Therefore, static parameters (those held constant over all iterations) included the number of marked individuals (n =1000) and the number of days for the simulation (d =180). The variable parameters (those we altered between iterations) of the simulation model were limited to true monthly survival (sm) and true monthly recapture probability (pm). To fully encompass the biologically plausible range of parameter values, we created simulation models using 50 known sm (a sequence from 0·5 to 1·0) and 50 known pm (a sequence from 0·02 to 1·0) values. This resulted in 2500 variable parameter combinations.
We constructed the mechanistic simulation model (Appendix S1) in the program r (R Development Core Team 2011). For each individual, we conducted a Bernoulli trial (a binomial ‘coin flip’) each day to determine whether the individual survived or died with a daily survival probability (sd) of (eqn 1):
Each day an individual survived, a second Bernoulli trial was conducted to determine whether the individual was detected. To convert monthly recapture probability to daily recapture probability (pd), we calculated the daily probability of not being recaptured and subtracted this value from one, (eqn 2):
The recapture probability needs to be computed this way since there are many possible combinations for an individual to be detected at least once in a given month, but there is only one possible way to not be recaptured. Subtracting the probability of nondetection from one accounted for all possible recapture combinations.
After running the mechanistic simulation model (Appendix S1) for a given variable parameter set, we collapsed daily detections into monthly bins (m =6), in which individuals were either detected or not, to create capture histories for each individual. For the CJS, these monthly bins represented primary periods, but were used as the secondary periods in the Barker. For the Barker model, we set the capture history values in all primary periods, with the exception of the tagging event, to zero. We created capture histories for the CJS by two methods. In the first method, which mirrored Adams et al. (2006), we collapsed daily detections into 6 monthly bins as described above, meaning the tagging event was included in the first month of detections. We left all time intervals as the default length of one. In the second method, we set the marking event as an independent primary event, thereby creating a seventh bin (six intervals) in the capture history. When using the second method, we adjusted for uneven time intervals within the rmark package (Laake & Rexstad 2008) for r. Using the midpoint of each resighting month as our reference point, we set the first interval (between marking and the first resighting month) to equal a length of 0·5 months. Thus, each subsequent time interval occurred between the midpoints of the resighting months and was of length 1·0.
Since shorter-term survival estimates may be of interest in certain studies, we ran a separate simulation using 10-day bins, instead of 30-day bins. We created capture histories for the CJS by the second method, treating the marking event separately from resighting information.
We estimated survival with the CJS and Barker models using program MARK (White & Burnham 1999) accessed by the rmark package (Laake & Rexstad 2008). For the Barker model, we fixed F at ‘1’ and F' at ‘0’ as no emigration occurred, and we fixed p and r to ‘0’ since no recapture occurred during primary periods, and we did not simulate dead recoveries (parameter definitions in Table 1). The simulations did not include time variability in survival or recapture probability; therefore, we used time-independent estimates for s, R and R' in the Barker, and s and p in the CJS estimation model.
We ran the mechanistic simulation model 100 times for each of the 2500 variable parameter combinations, resulting in 250 000 total iterations. For each iteration, we computed the relative departure of the estimated survival from the true survival as, (eqn 3):
The relative bias was then estimated as the average over all iterations of these relative departures for a given variable parameter combination. Additionally, we quantified per cent coverage by counting the number of successful iterations per variable parameter set in which the true value for sm was included in an estimation model's 95% confidence interval of . For the full simulation run (all 250 000 iterations), we first used the default Newton–Raphson optimization method in program MARK, and then reran the full simulation with an alternative optimization method (simulated annealing) for the Barker model since this model failed to converge in multiple instances.
Simulation 2: estimation error
We repeated simulation one with two alterations to the mechanistic simulation model. First, we fixed the variable parameters (sm and pm) to a value of 0·9, as these values approximated initial estimates from a known study system (Barbour, Boucek & Adams 2012a; Barbour et al. 2012b). Second, we made the number of marked individuals (n) a variable parameter, with values ranging from n =50 to n =1000 by increments of 50. We then reran the simulation as previously described and iterated the simulation 1000 times for each n value while using simulated annealing for optimization with the Barker and the second method of capture history creation for the CJS. We determined per cent coverage and estimated relative bias of survival at each n value for each estimation model in identical fashion to simulation one. Besides computing the relative bias as the average, relative departure from the true survival probability, we kept track of the 2·5 and 97·5 percentiles of the distribution of these relative departures.
To address coverage issues with the CJS, we conducted a parallel simulation that used parametric bootstrapping to create confidence intervals. For n =200 and 500, we repeated the prior simulation for the CJS for 500 iterations, but used the results of each iteration to run 1000 bootstrapped simulations. We used the 2·5 and 97·5 percentiles of maximum likelihood (ML) survival estimates from these 1000 bootstrapped iterations to construct confidence intervals for each of the 500 iterations per sample size (n). We used these parametric bootstrap confidence intervals to test coverage of the true survival value.
Simulation 3: formulation error
To determine the ability of the estimation models to account for additional biological complexity in the form of a disturbance event, we altered the mechanistic simulation model to include a month of low survival. First, we fixed sm to 0·90 and pm to 0·90 and maintained n =1000. Then, for the third month of the simulation (days 61–90), we lowered sm to 0·30 to represent a severe disturbance event.
We created two model structures to account for the disturbance event, and we used them to estimate survival using the CJS and Barker models. The first model structure allowed for full time dependence with respect to survival, sm(t). Our second model structure, sm(d), represented the truth-generating process, the mechanistic simulation model. In this model, sm for the disturbance month was estimated separately from the other, time-independent sm periods. For all estimation models, the other parameters were calculated in identical fashion to simulation one.
We iterated this simulation 1000 times and used simulated annealing for optimization with the Barker and the second method of capture history creation for the CJS. We selected the most parsimonious model structure for each estimation model after each iteration by identifying the model with the minimum Akaike's Information Criterion (AIC; Akaike 1973) score. Generally, models with ΔAIC values < 2 have substantial support, and models with ΔAIC > 10 have no support (Burnham & Anderson 2004). We then summarized and plotted the simulated distribution of the ML survival estimates. Finally, we determined the per cent coverage as the number of successful iterations during which the 95% confidence interval of ŝm for the given iteration included the true value of sm.
The Barker model estimated survival from continuous-resighting data with minimal structural error, while the CJS model only performed well under time-independent conditions with high survival. Unlike the CJS, the Barker model performed well across multiple sample sizes of marked individuals (n). Additionally, the Barker model reliably estimated survival when we added biological complexity to the mechanistic simulation model. However, the Barker model's optimization failed to converge for some combinations of parameter values using Newton–Raphson's method, necessitating the use of simulated annealing. We summarize the simulation results in a series of contour plots (Figs 2-4) in which we plotted the relative bias (subfigures ‘a’ and ‘b’) and per cent coverage (subfigures ‘c’ and ‘d’) at each of the 2500 parameter combination for a given simulation run.
Simulation 1: structural error
When constructing capture histories under method one, the CJS model moderately underestimated sm (Fig. 2a) and rarely demonstrated an acceptable level of coverage (Fig. 2c). Creating capture histories under method two, which separated marking from resighting information, resulted in relatively unbiased estimates (Fig. 2b) with proper coverage except when survival was low and especially when combined with high recapture probability (Fig. 2d). Moving from monthly to 10-day bins while using method two of capture history creation did not substantially affect results for the CJS (Fig. 3a,c).
In comparison, the Barker model estimated sm with a consistent, minor positive bias, (Fig. 4a), but failed to converge multiple times when using Newton–Raphson optimization (Figs 4c and 5). This was likely due to the high number of estimated parameters leading to local minima during numerical optimization. However, 100% of model runs converged when using simulated annealing and coverage estimates consistently ranged 90–98% (Fig. 4b,d). When using 10 day instead of monthly bins, the performance of the Barker model was reduced at low recapture probabilities, with coverage approaching 0% and relative bias exceeding −10·0% (Fig. 3b,d).
Simulation 2: estimation error
When altering the number of marked individuals (n), the Barker model reliability covered the true value of sm in c. 95% of the iterations for every n tested (Fig. 6). Relative bias for the Barker model was near 0·0%, with variability in the departure from the truth decreasing with increasing n. The CJS model covered the true value of sm in 95% of the iterations when n was low (n =50, 100), but as n increased to 1000, coverage fell below 86% (Fig. 6). This occurred because confidence intervals became narrower as n increased, while relative bias was maintained at 0·005% (Fig. 6). Thus, the probability of covering sm with the CJS decreased with increasing n. When using parametric bootstrapping to address the CJS' poor coverage, coverage decreased from 92·4% to 88·4% at n =200 and from 88·8% to 81·4% at n =500.
Simulation 3: formulation error
When a 30-day disturbance was incorporated into the simulation model, AIC supported different model structures for the CJS and Barker estimation models (Table 2). For the CJS, AIC selected the fully time-dependent model for survival [sm(t)], every iteration with no support given to the truth-generating model, sm(d) (Table 2). For the Barker estimation model, the structural model representing the truth-generating process, time-independent estimates for all periods except for the disturbance month, sm(d), was the minimum AIC model in 89·9% of the 1000 iterations. However, the time-dependent model received considerable AIC support (Table 2).
Table 2. Akaike's Information Criterion (AIC) table results from simulation three, which included a 1-month disturbance event
Monthly survival (sm) estimates were either time-dependent sm(t) or time independent except for the disturbance period sm(d). The number of estimated parameters (k), the mean AIC score over 1000 simulated iterations and the per cent of iterations giving AIC support to a model are given. The simulation ran for the: (a) Cormack–Jolly–Seber and (b) Barker joint data models.
For the truth-generating and time-dependent model structures, we compared the estimated relative bias in survival obtained by the CJS and the Barker model. When the structural model was the sm(d) model, the CJS model covered the true value of sm for the disturbance in 0·1% of the iterations with an estimated relative bias of 33·3%. Coverage for nondisturbance sm was 0·0%, with a relative bias of −6·3%. In comparison, the Barker model covered the true value of sm during the disturbance in 93·8% of the iterations, with a relative bias of 2·5%. Coverage for nondisturbance sm was 83·7%, with a relative bias of 1·0%. When we compared results from the sm(t) model, the Barker outperformed the CJS model (Fig. 7). The CJS model provided relatively unbiased estimates for all months except two and three, attributing a substantial proportion of the sm decline in month three to month two (Fig. 7). For the disturbance month, the CJS estimation model covered the true value of sm in 0% of the iterations with a mean relative bias of 44·2%. The Barker model covered the true value of sm during the disturbance in 76·4% of the iterations with a mean relative bias of 5·6%.
We presented the first assessment of the statistical properties of survival estimators when continuous-time data are available yet a discrete-time sampling model is used for estimation. Using a model structural adequacy approach (Taper, Staples & Shepard 2008), we demonstrated that substantial bias exists when continuous capture–recapture information is discretized for survival estimation. The extent of the bias depends upon the estimation model used, with the Barker joint data model outperforming the CJS.
The first method of capture history creation for the CJS introduced substantial bias because we coded all individuals as alive in month one despite there being 29 days to succumb to mortality. Using the second method of capture history creation, the CJS failed to reliably estimate survival in most simulations and resulted in bias at low s with bias worsening at high p. The complexities of how the survival and detection processes operate jointly make it difficult to unequivocally ascertain why bias increases at low s. One possibility is that the sample information at such values is low enough to generate parameter identifiably problems associated with problematic joint profile likelihoods (Ponciano et al. 2012). Finding an approximation of the bias in a very simple case for which the likelihood function allows an analytical treatment of the problem may shed light on this issue.
Despite an expectation that increasing sample size (n) would improve CJS model performance, increasing n resulted in an unchanged bias and decreasing coverage due to overly narrow confidence intervals. Thus, we created parametric bootstrap confidence intervals since they have better coverage properties when the ML estimate is unbiased (Efron & Tibshirani 1993). However, our implementation of parametric confidence intervals exacerbated the coverage problem. The constant bias in parameter estimates across sample sizes suggests that a parametric bootstrap constant bias correction (constant across different values of the true parameters) of the estimate may improve coverage properties and thus warrants a detailed simulation study exploring this issue.
During the disturbance event simulation, we used an arbitrary method of capture history creation for the CJS, in which we had the ability to perfectly bracket the disturbance event within a single sampling period. Even with this prescient knowledge, the CJS returned biased estimates, making it unlikely to perform well under field conditions where such knowledge does not exist. The CJS's poor coverage and difficultly in dealing with biological complexity seem to make this model a poor choice for use with continuous data in real-life applications that require time-dependent estimates. However, while estimates were biased in the time-dependent simulation, the CJS successfully approximated the overall survival during the simulation. Thus, if a study is designed to measure overall survival the CJS may be an appropriate choice.
The success of the Barker joint data model is not surprising, since the model was formulated for a situation in which continuous resightings occurred between discrete sampling intervals (Barker 1997). Although the Barker model performed well, it did not reach 95% coverage of the true survival value in all simulations. Additionally, the practicalities of implementing the Barker model with continuous data were not without difficulties. In our simulation, we did not conduct discrete sampling events, which allowed us to compute the likelihood by fixing the values of four parameters. In real situations, however, all seven parameters in the model may need to be estimated. Since the parameters may vary by time, group, or be associated with covariates, the Barker model requires substantial experience to properly formulate an a priori model set. If the likelihood surfaces were problematic with only three parameters, we would expect nontrivial maximization problems when the full model is implemented, particularly when the number of unknown parameters is large relative to the data set at hand. The practical difficulty of implementing the Barker model combined with its shortcomings during two simulations leaves room for the specification of novel statistical methods tailored specifically for continuous mark–resighting data. A starting point to achieve such a goal could be working with continuous-time survival stochastic process models whose transition probability matrix correspond exactly to the transition matrix of a family of discrete-time stochastic processes (Allen 2010).
Our simulation represents the first step towards understanding how to best use continuous data in survival estimation. With the exception of the disturbance event, we only simulated time-independent survival and detection probability while explicitly ignoring emigration, which is not likely to be reflective of biological reality. Although the Barker model is designed to account for random emigration (Barker 1997) and has been shown to effectively handle such movement (Horton & Letcher 2008), the model's robustness to emigration when the parameters designed to deal with emigration (F and F′) are fixed is unknown. While our study ignored the issue, we are currently using empirical data from our field research site to determine realistic rates of emigration, which will be used to extend this work (A. B. Barbour, unpublished data).
Although we focused solely on the use of continuous data in the estimation of discrete survival, this problem may have parallels in other contexts. For instance, data may be grouped along space instead of the time axis, despite the acknowledged importance of spatial heterogeneity (Van Kirk & Lewis 1997; Neubert & Caswell 2000). It is unknown, for example, if the discretization of modern, large-scale GIS (Geographic Information Systems) data of spatial abundance distributions may lead to biased abundance estimators (Kleiber & Hampton 1994; Sibert et al. 1999; Adam & Sibert 2002). The mathematical intricacies of finding the correct time-scale representation for modelling, estimation and testing of the biological process of interest in each case are not trivial. In the context of mark–recapture models, it is necessary to investigate when the underlying discrete-time Markovian structure in the Barker model can be approximated with a continuous-time Markov process (e.g. see Karlin & Taylor 1981, chap. 15, section 2.F).
Reliable understanding and prediction of complex ecological data hinges on the formulation of proper statistical models to quantify biological processes while accounting for the sampling scheme used. However, the ecological literature is filled with examples where off-the-shelf statistical models have proven to be an insufficient tool to generate understanding of the biological processes of interest simply because they are not tailored to the application at hand and as such, are unable to harness the information in the data effectively (e.g. Strong et al. 1999). Here, we focused on informing theoreticians and practitioners alike about the inferential problems associated with temporal grouping practices in survival estimation. This work should be taken as a positive first step towards seeking a model-centred solution to such difficulties.
We thank J. Nichols, J. Hines and M. Conner for comments helpful in the design of the simulation and M. Allen, A. Adams, D. Behringer, David Koons and one anonymous reviewer for their valuable insights. ABB was supported by a National Science Foundation Graduate Research Fellowship under Grant No. DGE-0802270. K.L. acknowledges funding from the Florida Fish and Wildlife Conservation Commission, Project No. 11409.