Accounting for missing data when assessing availability in animal population surveys: an application to ice-associated seals in the Bering Sea


Correspondence author. E-mail:


  1. Population ecologists often use telemetry to estimate the probability that an animal is available for sampling (availability probability). Such estimates are paramount for generating reliable estimates of animal abundance and are sometimes of biological interest in their own right (e.g. for characterizing pinniped haul-out behaviour or landbird singing frequency).
  2. We consider the estimation of availability probability from telemetry data when there are missing records. When records are missing-completely-at-random (i.e. not influenced by whether an animal is available for sampling or other covariates), approaches that censor missing records result in unbiased estimates of availability probability. However, censoring such records can result in bias when data are not missing-completely-at-random.
  3. We present a novel Bayesian temporal availability model that can be used to explicitly account for the process by which missing data arise. Our approach couples an underlying model for the partially observed availability process together with an observation model that allows state-dependent probabilities of missingness. This approach provides relatively unbiased estimates of availability probability even in the face of violations of the missing-at-random assumption. A new R package, TempOcc, is introduced to automate estimation.
  4. We demonstrate the utility of our approach by analysing both simulated data and hourly satellite telemetry records for 157 ice-associated seals in the Bering Sea, for which 27% of records were missing. In this case, we were interested in generating unbiased estimates of haul-out probabilities. Such probabilities are necessary when estimating abundance from aerial survey counts and assessing possible changes in seal phenology.
  5. Our analysis indicates that the missing-at-random assumption is indeed violated in telemetry studies of phocid seals in the Bering Sea. However, estimates of availability probability do not depend critically on this assumption.
  6. We recommend that ecologists routinely consider the implications of violating the missing-at-random assumption when estimating availability probabilities. Where possible, the modelling framework articulated in this study can be used to diagnose and correct for violations of the missing-at-random assumption.


Estimates of animal abundance are often obtained by dividing a raw survey count (C) by an estimated inclusion probability (inline image) that reflects the probability that an animal, selected at random, is encountered during the survey. For many populations, p is a product of the probability an animal is available to be sampled (inline image) and the probability that an animal is detected, given that it is available (inline image) (Marsh & Sinclair 1989; Pollock et al. 2006; Diefenbach et al. 2007; Nichols, Thomas & Conn 2009). Separate estimates of inline image and inline image are frequently necessary for the calculation of reliable abundance metrics.

The component of detection inline image can be estimated using classical distance sampling (Buckland et al. 2001), capture–recapture (Williams, Nichols & Conroy 2002), or sightability survey estimators (e.g. Steinhorst & Samuel 1989; Giudice, Fieberg & Lenarz 2012) and has received much attention in the literature. Marine mammal researchers have long realized the need to also adjust survey counts by inline image (e.g. McLaren 1961). For instance, if some individuals are under water while a survey is conducted, they will go undetected and resultant abundance estimates will be biased low. Recently, practitioners have realized that this same issue applies to other taxa. For instance, in auditory surveys of landbirds, individuals must sing at some point while the survey is being conducted to be available for sampling. As with cetaceans, abundance estimators that do not account for singing probability will be biased (Alldredge et al. 2007; Diefenbach et al. 2007). Similarly, if an animal (irrespective of taxa) is in dense vegetation or temporarily outside of a predefined study area, it may be impossible to count even if protocols such as multiple observers or distance sampling are employed (these protocols are typically only sufficient for estimating inline image). In such cases, it is critical that one utilize auxiliary data to estimate and adjust for inline image when producing abundance estimates.

Satellite-linked data recorders provide one possible avenue for studying animal availability and producing estimates of inline image to use in conjunction with animal population surveys. Our own interest in availability stems from studies of phocid seals, who are captured and released with satellite tags that provide data, derived from onboard conductivity sensors, on the proportion of time an individual is in the water. As aerial surveys of seals only count animals that are out of water, these data provide the auxiliary information necessary to account for inline image when estimating abundance. Similarly, time series of geographic locations can be used to infer the proportion of time terrestrial animals spend in surveyable habitats. Such records would ideally be summarized in the form of binary responses, with a ‘1’ indicating that an animal is available. Inference about availability could then be conducted using generalized linear mixed models (as an extension of logistic regression; Bengtson et al. 2005), or with adaptations that permit temporal autocorrelation in responses (cf. Ver Hoef, London & Boveng 2010). These modelling approaches allow one to express availability probability as a function of measured covariates (e.g. time of day, season, sex and age) and to account for individual variation in availability probability. Unfortunately, availability data are rarely as clean as one would hope. For instance, our seal data often include long strings of missing values that are ostensibly a result of data corruption or tags being unable to relay data to passing satellites.

Statisticians have developed a taxonomy for missing data mechanisms as well as several inferential tools for dealing with missing data problems. Several possibilities exist, depending on the characteristics of the missing data mechanism. Suppose that inline image if animal i is available for census sampling at time j (e.g. hauled out or singing) and 0 otherwise. Now suppose we conduct an auxiliary study (i.e. independent of the census sampling effort) to estimate inline image but that some of these observations go missing. In particular, define inline image if inline image is observed and is 0 otherwise (so that inline image is an indicator for missingness). If the missingness mechanism is independent of observed and unobserved measurements (that is, inline image where inline image and inline image denote observed and unobserved responses and inline image and inline image denote observed and unobserved covariates, respectively), missing data are termed ‘missing-completely-at-random’ (MCAR) and the analyst may simply censor missing records prior to analysis without any bias on resultant inference (Heitjan & Basu 1996). Missing data may also be legitimately ignored if data are ‘missing-at-random’ (MAR; Rubin 1976), a slightly less rigid condition where the missingness mechanism is independent of unobserved data after controlling for observed covariates (i.e. inline image. In this case, likelihood-based methods controlling for covariates will yield unbiased inferences, even if basic data summaries (moments) are biased for the parameters of interest. Finally, if the mechanism by which data go missing is dependent upon some property of the unobserved data inline image, the data are not missing-at-random (NMAR), and naive application of conventional analysis may yield biased inferences (Little & Rubin 1987). Under this final condition, the missingness mechanism must explicitly be modelled as part of the overall analysis. A more comprehensive description of missing data concepts, formatted for ecologists, is provided by Nakagawa & Freckleton (2008).

With regards to availability data, we suspect that the censoring process will often depend on the underlying state of the animal (available/not available), so that missing data are NMAR. For instance, researchers may be more or less likely to obtain data from animals that are available (e.g. out of the water, above-ground, in a study area, in surveyable habitat) than for animals that are unavailable. In such cases, the missingness mechanism should be accounted for in modelling efforts. At minimum, the relative support of hypotheses regarding the ignorability of the missingness mechanism should be examined prior to interpretation of parameter estimates.

In this study, we address inference about availability probability in the face of missing data. Our analysis of availability combines several paradigms for parameter estimation under imperfect detection, including occupancy estimation (MacKenzie et al. 2002), hidden Markov (or multievent) modelling (Pradel 2005) and state-space modelling (Royle & Kery 2007). Occupancy models posit that the underlying state of a ‘site’ is either occupied or not occupied, but that some visits to sites will result in false negatives (that is, absences may still be recorded even though the site was truly occupied; Fig. 1). Conceptualizing an individual animal as a ‘site’ (as has been done in disease prevalence applications; Thompson 2007; Gomez-Diaz et al. 2010), availability is equivalent to occupancy. However, the process by which missing data arise in availability studies leads to a slight departure from the observation model assumed in classical occupancy problems (Fig. 1), instead resembling that of a partially observed multistate system (Pradel 2005; Conn & Cooch 2009). Models for multistate systems typically focuses estimation on parameters such as survival and state transition probabilities; further development is needed to use partial observation concepts when inference focuses on availability. Our approach relies on a complete data representation state space, where the true state of an individual is modelled as a latent variable whenever it is unobserved [similar to state-space models for occupancy data (Royle & Kery 2007).

Figure 1.

Observation processes for classical occupancy estimation (a) and partially observed availability data (b). True states are represented by filled boxes, while dashed boxes represent observations. In (a), all individuals are assigned a state, but some occupied sites are mistakenly recorded as unoccupied with probability (1–p) . For partially observed availability, availability is either determined definitively (with probabilities inline image and inline image, respectively) or is recorded as missing (‘NA’).

In most occupancy studies, researchers replicate sampling to estimate detection probabilities and correct for biases caused by imperfect detection. In contrast, availability data are time series so true replication is not possible. However, as noted by Johnson et al. (in press), explicitly incorporating spatial or temporal dependence in occupancy models may induce enough autocorrelation to unbiasedly estimate detection and occupancy probabilities even without true replication. We adopt this strategy, using temporal autocorrelation as represented by intrinsic conditionally autoregressive processes (ICAR; Besag, York & Mollie 1991; Besag & Kooperberg 1995) to induce enough dependence among successive observations to separate the availability process from the detection process.

We next describe a modelling framework that permits violations of the MAR assumption when estimating availability. This framework treats availability estimation as a partially observed occupancy problem. After introducing a new R package TempOcc to perform requisite Bayesian computation for this model, we illustrate our methods with simulated data and with an analysis of hourly haul-out records for phocid seals in the Bering Sea. In this case, unbiased availability estimates are needed both for abundance estimation and for describing behavioural responses to changing sea ice conditions.

Materials and methods

Temporal Availability Model

We suppose that for each individual i in a population of telemetered animals, i ∈ {1,2,…,n}, that we obtain inline image consecutive observations. We represent the time series of successive states of animal i with a state vector, inline image, which takes on a value of inline image whenever individual i is available for sampling at occasion j and is zero otherwise. Next, we define an observation vector inline image, where inline image whenever data are missing (inline image whenever data are successfully obtained).

Temporarily indicating unknown states with a ‘U’, consider the following state and observation vectors:

display math
display math

In this case, animal i is available in the first four occasions, has a string of missing data from occasions 5–9 where its true availability status is unknown, is available on occasions 10 and 11 and becomes unavailable in periods 12–15. Thus, our primary source of uncertainty is with what happened in periods 5–9. We do not know, for instance, whether the animal remained available during this whole period, or whether something else occurred. Of course, if temporal autocorrelation in the state process was known to be high, we might suspect that the animal remained available for the missing period, but do not know for sure. Similarly, if covariates inline image are collected that are thought to be related to availability, these may indicate whether animal i is more or less likely to be available during the intervening period, and we have more information with which to base our guess. Our goal in subsequent modelling efforts is to formalize these intuitive notions into probabilistic inference.

In practice, we treat all observations where inline image as parameters to be estimated (which are either 0 or 1). In statistical terms, this approach bases inference on a complete data likelihood (CDL), as advocated in several recent statistical texts (cf. Royle & Dorazio 2008; Link & Barker 2010). Practically speaking, inference with the CDL is often greatly simplified by adopting a Bayesian approach to parameter estimation.

Having decomposed our data into state and observation vectors, we now turn our attention to modelling the state and observation processes (we will often denote parameters and associated statistics of each with superscripts s and o, respectively). We suppose that the state process, Z, is influenced by observable covariates and is potentially subject to temporal autocorrelation. One way of accounting for temporal autocorrelation is through use of temporally structured random effects, represented here by inline image. For animal i, we assume that availability at time j is governed by a Bernoulli process with success probability inline image. Further structure is provided on the inline image by specifying that

display math

Here, inline image denotes a design matrix for the state process and inline image denotes a vector of regression coefficients. The probit link function is used to constrain availability probability to the range (0,1) and is chosen as a computationally efficient alternative to the logit link function. With careful choice of prior distributions, use of the probit link function allows full conditional distributions that are of recognizable form that may be simulated directly using Gibbs sampling (Albert & Chib 1993; Hooten, Larsen & Wikle 2003; Johnson et al., in press).

We include autocorrelation on successive state observations by imposing temporal correlation on the random effects, inline image. For ease of implementation (and numerical speed), we chose an ICAR formulation, which is itself an instance of a Gaussian Markov random field (Rue & Held 2005). According to this formulation, the values of inline image for animal i (here denoted as the vector inline image) are distributed according to

display math

where inline image is an inline image matrix, whose elements (a,b) are determined by

display math

where τ is a precision parameter to be estimated, and inline image if observation a is a ‘neighbour’ of observation b and is zero otherwise (Rue & Held 2005). The choice of a neighbourhood is subjective, but is analogous to neighbourhood choice in areal models for spatial data (Banerjee, Carlin & Gelfand 2004). Popular choices for temporal data include RW1 (first-order Markovian) and RW2 (second-order Markovian) neighbourhoods (Rue & Held 2005). In general, a greater degree of smoothing will occur if a larger neighbourhood is chosen. We employed an RW1 autocorrelation structure in our application.

Conditional on the value of the state process Z, we suppose that an availability datum is obtained according to inline image, where

display math

where inline image denotes a vector of responses for animal i, inline image gives a design matrix for the observation process for animal i, and inline image denotes a vector of regression coefficients. The inclusion of inline image as an individual covariate allows one to formally address the MAR assumption.

Given this model, the full conditional distribution for each missing state value, inline image is given by

display math

where inline image is obtained via Bayes' rule:

display math

(temporarily allowing θ denote a vector of all parameters). Here, the notation [A|B] denotes the conditional distribution of A given B. Missing data can be simulated directly from their full conditional distributions during Gibbs sampling.

Computing and Bayesian Inference

We developed a software package, TempOcc, to automate Markov chain Monte Carlo (MCMC) parameter estimation in the R statistical programming environment (R Development Core Team 2007). Our software allows users to specify detection and process models using formula objects, which is a major advantage over other platforms such as WinBUGS (Lunn et al. 2000). We provide TempOcc as Appendix S1.

We followed the same strategy used by Johnson et al. (in press) for updating model parameters in spatial occupancy models. In particular, the choice of a probit link function allowed us to update the majority of parameters via Gibbs sampling (i.e. there was no need for accept/reject steps). We also dramatically improved computation speed by employing sparse matrix routines in R R (Bates & Maechler 2011). A complete description of the sampling algorithm is provided in Appendix S1.

Simulation Study

We conducted a small simulation study to demonstrate our analysis approach and to confirm that parameters of interest could be reliably estimated. For each simulation replicate, we simulated a set of 50 availability time series (each corresponding to a different hypothetical individual). Time series was generated according to an ICAR process with an RW1 neighbourhood and precision parameter inline image, such that inline image, where inline image . For each time series, we imposed an observation model of the form inline image, where

display math

We used an AR1 autoregressive formulation for the observation model to mirror our example seal application in which missing records often occur in long strings. The regression coefficient for this effect, inline image, was set to 2·0 for all simulations.

We generated a total of 500 availability datasets that differed by (i) the level of temporal autocorrelation in availability (inline image), (ii) the intercept of the availability model (inline image), (iii) the intercept of the missingness model (inline image) and (iv) the effect of availability on the probability of missingness; (inline image). Independent draws from these distributions were used to generate data in each simulation replicate. For each replicate, we fitted an NMAR model with the same functional form as used to simulate the data, and used 1000 posterior predictions to summarize availability probability (see Appendix S1 for details on posterior prediction computations). We also computed ‘naive’ estimates of availability probability for each data set, which were simply computed as the proportion of observation for which animals were observed to be available (i.e. censoring missing records).

Accounting for observations being NMAR resulted in marked decrease in biases compared to the naı¨ve approach of censoring missing records whenever availability status influenced the probability of missingness (i.e. whenever inline image; Fig. 2). There was some indication of a mild positive bias (2·5% over all simulations) for the approach accounting for NMAR, compared with a large bias (24% over all simulations) when missing records were censored. Bias for both analysis approaches appeared to increase as (i) autocorrelation of the availability process increased, (ii) the amount of missing data increased, (iii) the effect of availability status on the probability of missingness increased and (iv) as the mean availability probability tended to 0·5 (Fig. 2).

Figure 2.

Proportion relative bias of estimated availability as a function of simulation inputs. Blue circles give estimated availability for the naı¨ve approach that censors missing data, while pink triangles give posterior mean predictions for our model that accounts for NMAR data. Also displayed are loess fits to proportion bias as a function of each simulation input (±1 SE). Each panel represents a different simulation input, and include ‘rho’, which measures lag 1 autocorrelation in the availability process (which is a function of inline image), ‘y-intercept’ which measures the quantity of missing data (inline image; a lower value corresponds to more missing data), ‘z.effect’ which measures the effect of availability status on the probability of obtaining data (inline image), and ‘z-intercept’ which helps determine overall availability probability (inline image).

Example Application: Ice Seal Availability

We collected and analysed haul-out records for three species of phocid seals in the Bering Sea (spotted seals, Phoca largha; bearded seals, Erignathus barbatus; and ribbon seals, Histriophoca fasciata). All three species are of conservation concern given recent warming-associated changes in the distribution and seasonal availability of Arctic and sub-Arctic sea ice. These seal species haul out on ice floes for a number of reasons, including moulting, reproduction and rest. The haul-out behaviour of ice-associated seals is of interest for at least two reasons. First, population estimates of seals made from aerial surveys require an estimate of the proportion of seals that are hauled out to adjust for availability bias (i.e. to account for seals that were in the water and thus unavailable for detection during survey counts). Second, with receding polar sea ice, researchers are interested in how haul-out behaviour may be changing over time.

We obtained haul-out data from tags on 173 phocid seals between 2004 and 2011 in the Bering Sea. ARGOS-linked satellite data recorders (models SPLASH or SPOT, Wildlife Computers, Redmond, WA) were attached to each animal and provided hourly reports of the proportion of time spent hauled out. We considered seals to be available (inline image) if their tags were dry for >50% of the time during hour j and unavailable otherwise. However, tags were not always successful in transmitting their data to passing satellites, resulting in missing data (inline image). We limited analysis to records obtained between 1 March and 30 June, a time period that includes the bulk of ice-obligate molting and reproduction activity and deleted records occurring within 48 h of tag deployment to eliminate possible handling effects. Using these criteria, our analysis consisted of records from 157 seals, 212 year-individual combinations (some SPOT tags remained attached to the same animal in successive years) and 178 248 hourly observations, of which 47 617 were missing. There was considerable autocorrelation present in the data set; seals often spent long periods hauled out or in the water before changing states.

We assumed that data from different animals and data from the same animal in different years were independent. This was accomplished by specifying a block diagonal model for Q. Initial runs indicated model instability when inline image varied on the same time-scale as the data (i.e. when a separate inline image was estimated for each hour). To compensate, we decreased the temporal resolution of the temporal random effects model to 2 h blocks (i.e. by setting inline image, inline image, etc).

We accounted for several covariates known to be related to seal haul-out probabilities (see, e.g. Ver Hoef, London & Boveng 2010). In particular, we fitted a model where haul-out probability varied by hour of day (HR; categorical, 24 levels), sex (SEX; categorical, two levels), age class (AGE; categorical, two levels corresponding to young-of-year and older), species (SPECIES; categorical, two levels) and day of year. Owing to the low number of bearded seals tagged (13 individual-year combinations), bearded and spotted seals were given a common species effect (the natural histories of these species are more closely related than that of ribbon seals). Day of year was modelled using the number of days that had elapsed since 1st March. We expected that haul-out probability would be a nonlinear function of DATE, so included both linear (DATE) and quadratic (DATE2) effects of day of year on inline image. As our main goal was to examine the appropriateness of the missing-at-random assumption, all fitted models used the same state process model for haul-out behaviour. In particular, we used a model with additive effects of all predictor variables, together with all one-way interactions between SEX, AGE, SPECIES, DATE and DATE2 (with the exception of a DATE by DATE2 interaction). The formulation of this process model was guided by previous research (Ver Hoef, London & Boveng 2010).

To examine the importance of the missing-at-random assumption on estimated haul-out probabilities, we fitted several models for the missingness indicator, inline image. All models included an effect of whether or not an observation had been obtained for the animal in the previous hour (LAG; categorical, two levels). Missing data often occurred in long strings, so we expected such an autocovariate to be important a priori. Models could also include effects of the underlying state of the animal (Z; categorical, two levels corresponding to hauled out/not hauled out), or tag type (TAG; categorical, two levels corresponding to SPOT or SPLASH). Tag type was included because of differences in deployment (SPOT tags are flipper mounted, while SPLASH tags are head mounted) and because of differences in the length of the data buffer (SPOT tags had a 14 day buffer, while SPLASH tags had an 8 day buffer). In total, we fitted five different detection models to the haul-out data (Table 1). We used a posterior predictive loss criterion (Gelfand & Ghosh 1998, Appendix S1) to compare the performance of alternative models, because commonly used metrics such as the deviance information criterion (Spiegelhalter et al. 2002) have been shown to perform poorly in missing data applications (Celeux et al. 2006).

Table 1. Detection models fitted to ice seal haul-out data, together with number of fixed effect parameters (k), and model selection criterion. Here, inline image is the error sum of squares for predicted data, inline image is a penalty based on posterior variance, and inline image is an overall model selection criterion (Gelfand & Ghosh 1998). Models with lower inline image values have greater support given the data. Covariates are defined in the text
Model k inline image inline image inline image
LAG + TAG * Z466 76510 72077 485
LAG + TAG + Z366 78010 73377 513
LAG + TAG266 94610 73277 678
LAG + Z266 74710 75577 502
LAG166 94210 71977 661

We fitted each model via MCMC as implemented in TempOcc. Two Markov chains of 280 000 iterations with overdispersed starting values were simulated for each model. Standard MCMC diagnostics (cf. Gelman et al. 2004) indicated convergence after ≈80 000 iterations. We thus discarded the first 80 000 samples from each chain as a burn-in and combined the remaining samples from both chains. Serial autocorrelation was further reduced by only storing values for every 200th iteration, resulting in 2000 posterior samples for each model.

Although the parameters describing the missing data process could provide some indication as to whether the MAR assumption was statistically justified, we were also interested in whether or not missing data could cause biologically meaningful differences in estimates of haul-out probability. For instance, it might be possible that data were NMAR, but that estimates of haulout probability are relatively insensitive to violations of the MAR assumption. This would be useful to know, because analysis approaches discarding missing data are typically much more straightforward to conduct. As such, we also computed and compared posterior predictions for haul-out probability, conditional on covariates X.

Models with an effect of haul-out status on probability of missingness received uniformly better support from our posterior predictive loss criterion (Table 1), with the model with a TAG * Z interaction being best supported by the data (Table 1). In general, data were more likely to be obtained when (i) data had been successfully obtained during the previous hour (posterior mean for LAG coefficient 4·82, 95% credible interval: 4·78, 4·86) and (ii) when the animal was not hauled out (posterior mean −0·50, 95% CI: −0·58, −0·41). The latter effect was somewhat reduced for SPOT tags, which had a positive interaction with Z (posterior mean 0·34, 95% CI: 0·23,0·44). The additive TAG effect was statistically indistinguishable from zero (posterior mean −0·02, 95% CI −0·08,0·04). Posterior predictive distributions for the probability of obtaining data (i.e. the probability that it was not missing) were obtained by the inverse probit link function, and further illustrated that that data were not MAR (Fig. 3).

Figure 3.

Violin plots giving the posterior predictive distribution for the probability of obtaining data (i.e. the probability that data are not missing) for (a) the case where data were not missing the previous hour, and (b) the case where data were missing the previous hour. Results are presented for tag type (SPOT or SPLASH), as well as for cases when animals are hauled out (Z = 1) or are in the water (Z = 0). Violin plots combine a standard box-and-whisker plot with a kernel density estimate, and illustrate a higher probability of missingness when animals are hauled out (see 'Discussion').

Although model selection and parameter estimates indicated that the MAR assumption was indeed violated, posterior predictions of haul-out probabilities indicated that this violation may not be biologically meaningful for our study system. For instance, predicted availability of a spotted or bearded adult female at 9:00 AM on May 1st of a generic year (a typical date and time for aerial surveys) was inline image for the highest ranked model permitting NMAR (i.e. the LAG + TAG*Z model). By contrast, the LAG + TAG detection model (which assumed MAR), produced a posterior prediction of inline image. The standard error associated with Monte Carlo simulations was 0·02 for both point estimates.


We have described a modelling framework capable of assessing the missing-at-random (MAR) assumption in animal availability studies. The MAR assumption is often made when estimating availability using telemetry data, despite the clear potential for bias in availability when the MAR assumption is not met. Such biases have the potential to affect inferences about habitat use and abundance, especially when availability estimates are used to correct survey counts. In our seal example, violation of the MAR assumption did not result in large differences in estimates of availability; however, we caution that the differences in estimated missingness probability were low in our case (Fig. 3), and we do not expect this outcome to necessarily be the case in other availability studies. For instance, our simulation study illustrated potential for extreme bias (often 20–30%) in estimates of availability in scenarios when availability status affects the probability of missingness and missing data are censored prior to analysis.

Our modelling framework extends classical ecological occupancy models to include a partial observation process to cope with missing data. Autocorrelation in the availability process model can be induced by specifying an ICAR process on the probit scale, and environmental and individual covariates can be examined for their ability to explain variation in availability and in the probability of missingness. Alternative models can be run using flexible and fast software capable of processing hundreds of thousands of records, and a posterior predictive loss criterion can be used to judge relative model performance.

Although we imagine this framework will be most useful for estimating availability for marine mammal populations (i.e. the proportion of time pinnipeds and cetaceans are above water and therefore detectable by census surveys), we expect our approach to be applicable to other taxa as well (e.g. accounting for the proportion of time that individuals spend on a survey area, are above-ground, singing, etc.). In fact, our work has practical significance beyond the scope of availability studies, and is applicable to the analysis of any recurrent binary process (Hamel, Yoccoz & Gaillard 2012). For example, behavioural ecologists are often interested in factors influencing breeding/nonbreeding status; in disease surveillance, disease prevalence is of interest. Our modelling framework should be useful for analysing such data sets, particularly when the state variable of interest influences the probability of missingness.

Our approach of using temporal autocorrelation in place of true replication may be useful in more typical occupancy modelling situations. As with Johnson et al. (in press) who used spatial contagion to partially replace true replication, this approach may be useful for reducing the amount of sampling effort that must be expended in occupancy studies, particularly for long-term studies with high levels of temporal autocorrelation. It must be noted, however, that our models for temporal random effects led to some bias when temporal autocorrelation in availability was high. Our solution in the seal example was to consider a courser scale for temporal random effects (i.e. considering 2 h blocks instead of a separate random effect for each hour). Such a modification may be necessary when applying this approach to traditional occupancy data with extremely high levels of autocorrelation in occupancy status, although we note that models without any true replication or autocorrelation are estimable in some circumstances (see, e.g. Lele, Moreno & Bayne 2012). For example, our models satisfactorily estimated availability even when temporal autocorrelation was low (Fig. 2).

With respect to ice seal data, we were initially surprised by the negative coefficient for Z (i.e. hauling out apparently increases the probability of missingness) given that we expected tags to be more successful in transmitting data to passing satellites when animals are hauled out. However, we note that data are not actually transmitted in real time. Instead, our tags were programmed to store data for the most recent 8 days (SPLASH tags) or most recent 14 days (SPOT tags) and to transmit these data whenever they are able to communicate with passing satellites. Thus, the data transmitted are always equal to or older than the time at which they are transmitted. When an animal emerges from the water after a long time at sea, the most recent data in its tag's queue are the previous period when it was in the water. In contrast, animals who spend long intervals in the water may disproportionately lose records for days on which they were hauled out. This dynamic emphasizes the complex nature of the missing data process in real telemetry applications.

The modelling approach we developed shows promise for modelling temporal autocorrelation in pinniped haul-out behaviour. Ver Hoef, London & Boveng (2010) previously described a mixed modelling approach for estimating haul-out probability when responses were correlated; however, MAR had to be assumed. Our approach removes this requirement, allowing one to estimate (and adjust for) state-dependent probabilities of missingness. For ice-associated seals in the Bering Sea, there did appear to be an affect of underlying state (in water/hauled out) on the probability of missingness. However, the effect did not translate into meaningful differences in predicted haul-out probabilities. Future research should be directed to investigate whether these patterns hold for other species and populations of interest, particularly those for which availability probabilities are used to adjust abundance estimates.


We thank J. Fieberg, O. Gimenez, J. Ver Hoef and two anonymous referees for comments on an earlier version of this manuscript. Views expressed are those of the authors and do not necessarily represent findings or policy of any government agency.