Bayesian shared frailty models for regional inference about wildlife survival

Authors


Correspondence

Brian J. Halstead. U.S. Geological Survey, Western Ecological Research Center, Dixon Field Station, 6924 Tremont Road, Dixon, CA 95620, USA. Tel: 1 530 669 5076

Email: bhalstead@usgs.gov

Abstract

The estimation of survival is an essential but difficult task important for developing rigorous conservation programs. Radio telemetry studies of wildlife survival are often characterized by small sample sizes and high rates of censoring. In cases where multiple radio telemetry studies of a species exist, shared frailty models of survival offer the ability to combine data from multiple studies and improve the precision of survival estimates. We used Bayesian analysis of shared frailty models to examine survival of adult females of the giant gartersnake (Thamnophis gigas) in the Sacramento Valley, California, USA, and to examine the effects of individual and habitat characteristics on daily risk of mortality. Posterior mean annual survival probability of adult females was 0.61 [95% credible interval (CI) = 0.41–0.79]. The daily risk of mortality for adult female giant gartersnakes while in terrestrial habitats was 0.38 (0.09–0.89) times as great as when they inhabited aquatic habitats. Although 95% CIs for hazard ratios of other covariates included one, sites varied substantially in the effect of linear habitats, which appear to have context-dependent effects on survival. Assessing survival with shared frailty models allows the prediction of survival probabilities at novel sites and identifies regional and context-specific mortality risks that can be targeted for conservation action.

Introduction

Many species of conservation concern exhibit characteristics that cause difficulties for estimating survival. For most species, it is impossible to follow individuals continuously from birth until death, so a number of methods accounting for imperfect detectability or unknown time of death have been developed to reliably estimate survival. Event time (survival) analysis of radio telemetry data (Winterstein, Pollock & Bunck, 2001; Williams, Nichols & Conroy, 2002) is one of these methods. Radio telemetry usually obviates the need to model detection probabilities, but the exact time of death is often still unknown. Additional problems with radio telemetry as a method of estimating survival are caused by small sample sizes, the inability to follow small animals or all life stages of some animals, and limitations on the battery life or range of transmitters, which result in the censoring of survival times for a large proportion of sampled individuals. These limitations often result in great uncertainty in survival estimates derived from radio telemetry studies.

Because survival is an individual process, the effects of individual characteristics or environments to which an individual is exposed are often the primary interest of wildlife survival studies. Indeed, failing to account for heterogeneity in the survival process can lead to biased estimates of the underlying hazard function (the instantaneous risk of mortality at time t conditional on surviving until t or longer) and therefore, survival (Murray, 2006). It is usually the case, however, that not all variables affecting survival can be measured or even observed (McGilchrist & Aisbett, 1991). An individual-level random effect (called frailty in the medical literature) would therefore be desirable for estimating the baseline hazard function of a population. Although theoretically appealing, it is exceedingly difficult to avoid issues of left truncation caused by the delayed entry of individuals in wildlife studies (individuals generally do not enter the study at birth or hatching), which can result in individuals with a greater hazard being eliminated from the population prior to entering the study. The resulting baseline hazard and degree of heterogeneity in individual frailties will therefore be biased low (Williams et al., 2002; Heisey, 2009). However, if a logical grouping structure can be applied to the population of interest, a random effect can be applied to the groups of individuals, with a reduced probability that an entire group would remain unrepresented in the sample. A natural grouping structure for many well-studied species is that caused by similar radio telemetry studies at multiple sites. These studies can be combined in a shared frailty survival model to improve the precision of estimates of average survival. Shared frailty models are particularly useful when individual studies suffer from small sample sizes or low frequency of mortality events. By using a random effects structure, sites are properly weighted by the amount of information they provide for estimating average survival for the region of interest, and estimates of survival for individual sites borrow strength from the entire dataset (Link & Sauer, 1996; Sauer & Link, 2002; Gelman et al., 2004).

The giant gartersnake (Thamnophis gigas) is a large (up to 1.6 m) semi-aquatic snake precinctive to wetlands in the Central Valley, California, USA (Fitch, 1940; Hansen & Brode, 1980). Because of the loss of approximately 95% of its habitat (Frayer, Peters & Pywell, 1989), the giant gartersnake has been listed as threatened by the US government (U.S. Fish and Wildlife Service, 1993) and the State of California (California Code of Regulations, 1971). Most extant populations of the giant gartersnake occur in rice-growing regions of the Sacramento Valley (northern portion of the Central Valley), though the species once ranged to the southern San Joaquin Valley (southern portion of the Central Valley; Rossman, Ford & Seigel, 1996). The giant gartersnake exhibits sexual size dimorphism, with females the larger sex (Wylie et al., 2010). The active season of the giant gartersnake typically lasts from mid-March to late October (Wylie et al., 2009).

The objective of our study was to demonstrate the utility of shared frailty survival models for combining inference across a region from multiple radio telemetry studies. We exploit the benefits of shared frailty survival models to estimate the average survival function of adult females of the giant gartersnake throughout the northern portion of its range. We also evaluate the effects of individual characteristics and habitat on survival in the Sacramento Valley. We illustrate the utility of the Bayesian paradigm for implementing hierarchical structures, such as shared frailty, for both regional and site-specific inference for baseline survival and the effects of covariates.

Methods

We conducted our radio telemetry studies of giant gartersnake survival at seven sites in the Sacramento Valley from 1995 through 2009 (Table 1). Sites ranged from natural marshes to seasonal managed marshes to intensive rice agriculture. Two sites, Gilsizer Slough and Colusa National Wildlife Refuge, consisted of pre- and post-restoration periods, which were treated as separate sites for analysis. At all sites, we captured individual snakes opportunistically by hand and in modified floating minnow traps (Casazza, Wylie & Gregory, 2000). We retained individuals greater than 180 g for intracoelomic implantation of a radio transmitter [Model SI-2T (mass = 9 g or 11 g), Holohil Systems Ltd, Carp, ON, Canada] at the University of California – Davis or the Sacramento Zoo following a standard procedure (Reinert & Cundall, 1982). We selected transmitters to be less than 5% of each individual's preoperative mass. Although our sample is limited to adult female snakes, the timing of surveys and multiple methods of capture at each site likely resulted in a random subset of females greater than 180 g. We released individuals at their location of capture after 1–2 weeks of recovery in the laboratory, and located them once or twice per day for 5–7 days per week during the active season and once or twice per week during brumation. We used portable receivers (Model R4000, Advanced Telemetry Systems, Inc., Isanti, MN, USA) with three-element Yagi antennas for most of our telemetry, and visually located individuals when possible. Entry into the study was generally greatest during the first month, with a few additional individuals added as the active season progressed. Pulses of new individuals into the study often occurred in the spring of each study year. Except for mating, the giant gartersnake is solitary (G. Wylie, pers. obs.), so we assumed that mortality risk was independent among individuals. For the vast majority of individuals, we did not observe obvious effects of radio transmitters on behavior. Rates of censoring at sites with more than one radio tracked individual ranged from 54 to 75%. Most individuals were censored because of impending battery failure, at which time the individual was captured, the transmitter surgically removed, and the individual returned to its location of capture after a 1–2-week recovery period. We made extensive efforts to search for individuals whose radio signal was lost prematurely to minimize censoring caused by emigration or removal by predators.

Table 1. Study sites monitored to evaluate survival of adult females of the giant gartersnake (Thamnophis gigas) in the Sacramento Valley of California, USA, from 1995 through 2009
SiteHabitatYears monitoredNumber of individuals
  1. NWR, National Wildlife Refuge.
Badger CreekNatural marsh1996–199812
Colusa DrainRegional canal, rice2004–2005, 2006–200726
Colusa NWR pre-restorationSeasonal wetland, rice1996–199820
Colusa NWR post-restorationRestored marsh, seasonal wetland, rice2000–200216
Gilsizer Slough pre-restorationNatural marsh, rice, other agriculture1995–199731
Gilsizer Slough post-restorationNatural and restored marsh, rice2007–200927
Natomas BasinRice1998–199912
Road ZRice2008–20091
Sacramento NWRSeasonal wetland, rice19971

We modeled survival as a continuous process observed at discrete intervals. We based inference on a constant hazard model, for which the probability of mortality was the same for every day of the study. We also attempted to fit a first-order autoregressive hazard model, but the standard deviation estimating the degree of serial correlation demonstrated poor mixing even after 110 000 iterations on four chains, which took over 2 months to run. The resulting posterior mean of the standard deviation was 0.008 [95% symmetric credible interval (CI) = 0.001–0.040], indicating a high degree of serial autocorrelation in daily hazards and that the assumption of a constant hazard was probably reasonable for these data. The survival function under the constant hazard model was estimated as inline image, where inline image and inline image. Definitions of parameters and their prior specifications are listed in Table 2. Subscripts i, j, k and l reference individual snakes, day, site and year, respectively, and T is the maximum number of days a population was monitored. We treated study site as a random effect because we were not interested in site differences per se, but wanted a large-scale assessment of the average survival function of the giant gartersnake. We also modeled year of study as an additive random effect to account for annual variation in survival. To account for different study start dates (which ranged from 01 May to 10 July) we set a uniform start date of 01 May and transformed dates to days since 01 May.

Table 2. Description of model parameters and their priors
SymbolDescriptionPrior distribution
SSurvival functionDeterministic node
CHCumulative hazardDeterministic node
UHUnit (daily) hazardDeterministic node
γ0Baseline (constant) log hazardU(-15,0)
ηkRandom site effectN(0,σsite)
σsiteSite standard deviationU(0,10)
κlRandom year effectN(0,σyear)
σyearYear standard deviationU(0,10)
βsize,kLn(hazard ratio) for snout-vent length at site kN(μβ.sizeβ.size)
μβ.sizeMean ln(hazard ratio) for snout-vent lengthN(0,10)
σβ.sizeStandard deviation for site variation in βsizeU(0,10)
βcond,kLn(hazard ratio) for body condition index at site kN(μβ.condβ.cond)
μβ.condMean ln(hazard ratio) for body condition indexN(0,10)
σβ.condStandard deviation for site variation in βcondU(0,10)
βterr,kLn(hazard ratio) for terrestrial habitat at site kN(μβ.terrβ.terr)
μβ.terrMean ln(hazard ratio) for terrestrial habitatN(0,10)
σβ.terrStandard deviation for site variation in βterrU(0,10)
βlin,kLn(hazard ratio) for linear habitat at site kN(μβ.linβ.lin)
μβ.linMean ln(hazard ratio) for linear habitatN(0,10)
σβ.linStandard deviation for site variation in βlinU(0,10)

We also modeled the effect of covariates to examine the influence of individual characteristics and habitat on survival of the giant gartersnake. We used a hierarchical approach that allowed the effects of covariates on survival to vary among sites (Table 2). In particular, we examined the effects of size (snout–vent length) and body condition (defined as the residuals from a log–log regression of mass on snout–vent length at the time of capture) on survival. These continuous variables were centered and standardized prior to analysis. We also examined the influence of habitat variables by defining each location of each individual snake as occurring in two habitat indicators: terrestrial (vs. aquatic) and linear (vs. areal). We could only assess the habitat of individuals on the days in which we located them; therefore, we specified a first-order Markov process to impute habitat for days when the individual was not observed. This model took the form inline imageinline image, with missing values imputed by terrestrialij ∼ Bern(Pterr,ij), and likewise for linear habitats. The coefficients of the survival model are log hazard ratios, from which we calculated the posterior distribution of each hazard ratio as exp(β). We similarly calculated posterior distributions for mean hazard ratios as exp(μβ). Priors for all parameters were selected to be uninformative (Table 2).

The model was run on three chains of 10 000 iterations each, after a burn-in period of 1000 iterations, and thinned by a factor of 3 by calling WinBUGS 1.4.3 (Spiegelhalter et al., 2003) from R version 2.12.2 (R Development Core Team, 2010), using the package R2WinBUGS (Sturtz, Ligges & Gelman, 2005; Supporting Information Appendix S1). We thinned the chains because of the storage of a large number of monitored parameters (1902). The model took approximately 120 h to run on a 2.33 GHz quad core PC running Windows 7 with 4 GB RAM. An example of the data structure and code to run the example are included in the Supporting Information Appendix S1. Three chains of 100 000 iterations following a burn-in of 10 000 iterations with the example dataset took approximately 7 min to run on the same PC. Convergence was assessed visually with history plots and with the R-hat statistic (Gelman et al., 2004) as calculated by R2WinBUGS. No evidence of lack of convergence existed (maximum R-hat = 1.1). Unless otherwise indicated, we report the posterior mean and 95% symmetrical CI.

Results

Daily mortality risk for the giant gartersnake was 1.4 × 10−3 (95% credible interval = 6.5 × 10−4 – 2.4 × 10−3; Table 3). Posterior mean annual survival probability of adult female giant gartersnakes was 0.61 (0.41–0.79). Of the sites we monitored, Gilsizer Slough post-restoration had the highest survival probability and Gilsizer Slough pre-restoration had the lowest (Table 4; Fig. 1), but 95% CIs for annual survival probability widely overlapped for all sites (Table 4). The posterior mean of the standard deviation of the random year effect was 0.33 (0.02–0.96), resulting in substantial among-year variation in survival of the giant gartersnake (Fig. 2).

Figure 1.

Site-specific baseline survival functions of adult female giant gartersnakes (Thamnophis gigas) of average size and condition inhabiting areal aquatic habitats at specific study sites in the Sacramento Valley, California, USA, from 1995 through 2009. The Sacramento Valley posterior mean is indicated by the bold solid line, and its 95% symmetric credible interval is indicated by the light solid lines. Posterior means for individual sites (without the additive random effect of year) are represented by dotted and dashed lines as follows: dashed black, Badger Creek; dotted black, Colusa Drain; dash-dot black, Colusa National Wildlife Refuge (NWR) prior to restoration; long dash black, Colusa NWR after restoration; long dash-short dash black, Gilsizer Slough prior to restoration; dashed gray, Gilsizer Slough after restoration; dotted gray, Road Z; dash-dot gray, Natomas Basin; long dash gray, Sacramento NWR. Credible intervals for individual sites have been omitted for clarity.

Figure 2.

Year-specific baseline survival functions of adult female giant gartersnakes (Thamnophis gigas) of average size and condition inhabiting areal aquatic habitats in the Sacramento Valley, California, USA, from 1995 through 2009. The overall posterior mean is indicated by the bold solid line; its 95% credible interval is indicated by the light solid lines. Each dashed gray line represents a single year.

Table 3. Posterior means and 95% symmetric credible intervals for the daily hazard, standard deviations of log-normal random effects, mean hazard ratios, and standard deviations of log(hazard ratios) for adult female giant gartersnakes (Thamnophis gigas) in the Sacramento Valley, California, USA, from 1995 through 2009
ParameterMean (95% credible interval)
UH0.0014 (0.0007–0.0024)
σsite0.32 (0.018–1.08)
σyear0.33 (0.017–0.96)
exp(μβ.size)1.46 (0.85–2.39)
σβ.size0.28 (0.022–0.88)
exp(μβ.cond)0.96 (0.52–1.82)
σβ.cond0.37 (0.020–1.28)
exp(μβ.terr)0.38 (0.088–0.89)
σβ.terr0.66 (0.010–2.60)
exp(μβ.lin)1.11 (0.094–2.96)
σβ.lin1.38 (0.18–4.51)
Table 4. Posterior mean annual survival estimates for adult female giant gartersnakes (Thamnophis gigas) of average size and condition inhabiting areal aquatic habitats at individual sites and in the Sacramento Valley, California, USA, from 1995 through 2009
SiteFirst year monitoredAnnual survival probability (95% CI)a
  1. aSurvival estimates for Sacramento NWR were truncated at 201 days because of the limited duration of sampling at this site.
  2. Values represent survival probabilities without year effects (i.e., assuming an ‘average’ year).
  3. NWR, National Wildlife Refuge.
Badger Creek19960.606 (0.361–0.816)
Colusa Drain20040.581 (0.334–0.776)
Colusa NWR pre-restoration19960.613 (0.370–0.820)
Colusa NWR post-restoration20000.613 (0.361–0.831)
Gilsizer Slough pre-restoration19950.575 (0.296–0.783)
Gilsizer Slough post-restoration20070.635 (0.420–0.838)
Natomas Basin19980.583 (0.299–0.796)
Road Z20080.612 (0.307–0.866)
Sacramento NWR19970.754 (0.510–0.921)
Sacramento ValleyNA0.609 (0.410–0.788)

Most of the variables we examined had little effect on daily risk of mortality for adult female giant gartersnakes. Only occurrence in terrestrial habitat affected giant gartersnake survival at the scale of the Sacramento Valley (Table 3; Fig. 3). Daily risk of mortality in terrestrial habitats was 0.38 (0.088–0.89) times as great as that in aquatic habitats (Table 3). Posterior distributions of mean hazard ratios and site-specific hazard ratios for all other examined covariates contained one (Table 3; Fig. 3). As expected, posterior distributions of hazard ratios for the Sacramento Valley were narrower than those for individual sites (Fig. 3). The variable with the greatest variation among sites in its effect on daily risk of mortality was linear habitat (posterior mean standard deviation = 1.38; 95% CI = 0.18–4.51).

Figure 3.

Posterior distributions of hazard ratios for (a) size (snout-vent length); (b) body condition index; (c) terrestrial habitat; and (d) linear habitat for adult females of the giant gartersnake (Thamnophis gigas) in the Sacramento Valley, California, USA, from 1995 through 2009. Solid lines indicate the Sacramento Valley mean hazard ratio; dashed lines represent site-specific hazard ratios. The dotted vertical line indicates a hazard ratio of one, which is interpretable as no difference in hazard from the mean or reference condition.

Discussion

The Bayesian shared frailty model successfully increased the precision of hazard ratio and survival estimates. Although many studies of survival focus on site-specific differences in mortality risk, the ability to examine regional patterns in hazards and establish a baseline condition for a species is very useful. For example, the baseline hazard for the average site in the Sacramento Valley could serve as a reasonable prior distribution for a new study at a novel site. Such a study could also be compared with our estimates post hoc to determine if it deviates substantially from other sites in the Sacramento Valley. We therefore suggest that the primary utility of shared frailty models of survival for wildlife populations is to estimate survival for those species for which multiple studies with small sample sizes exist.

An additional benefit of the shared frailty model is the estimation of site-specific survival functions from a single model. As in other hierarchical models (Gelman et al., 2004), sites are naturally weighted, so that survival estimates at those sites with fewer data are pulled more strongly toward the regional mean. The shared frailty model thus allowed us to use data from sites with few radio-marked individuals and no mortality events. Under most circumstances, this monitoring effort would largely be wasted.

The use of hierarchical (or multilevel) models in demographic studies of wildlife is a relatively recent phenomenon with tremendous potential. Survival was estimated for a multi-population system of bats using hierarchical mark-recapture models (Papadatou et al., 2011a), a situation analogous to our analysis of survival of the giant gartersnake. The same authors have extended the use of multilevel mark-recapture models for among-species comparisons of survival (Papadatou et al., 2011b). Benefits associated with using multilevel models (including shared frailty models) include more reliable inference than that obtained from single populations, a proper accounting for uncertainty in among-population comparisons, decreasing the number of parameters that must be estimated in among-species or among-population comparisons, avoiding bias in parameter estimates when heterogeneity is ignored, and a reliable representative value of survival for the species as a whole (Papadatou et al., 2011a,b). In addition to these benefits, adding complexity and covariates to these models is relatively simple when using Bayesian analysis by Markov chain Monte Carlo methods (Papadatou et al., 2011b). The benefits of multilevel models are not limited to studies of survival, and are applicable to many parameters relevant for conservation, including recruitment, fecundity, population growth rate, probability of occurrence and species richness (Royle & Dorazio, 2008; Zipkin, DeWan & Royle, 2009).

The only measured variable to have an effect on survival of adult female giant gartersnakes throughout the Sacramento Valley was terrestrial habitat. The risk of mortality for an individual while in terrestrial habitat was approximately one-third that of an individual occupying aquatic habitats. This result might seem counter-intuitive for an aquatic snake that is seldom located more than a few meters from water (US Geological Survey, unpubl. data), but the behavior of the giant gartersnake is very different in these two habitats. While in aquatic habitat, the giant gartersnake is commonly basking in vegetation overhanging water or actively pursuing prey, and these activities can expose the giant gartersnake to predators. In contrast, most terrestrial locations of the giant gartersnake are of snakes sheltering in burrows or under other refugia, where they are at a reduced risk of predation and insulated from environmental stressors. Although aquatic habitats are essential foraging habitat for the giant gartersnake, this species appears to be at greater risk of mortality in these habitats than while sheltering in terrestrial habitats.

The model was also informative about variables that did not have effects in the Sacramento Valley as a whole. The greater variation in the effect of linear habitats than other variables suggests that the effects of linear habitats are context specific. Although all 95% CIs for site-specific posterior hazard ratios included one, an independent analysis of Gilsizer Slough post-restoration resulted in a posterior distribution of the hazard ratio for linear habitats entirely greater than one (US Geological Survey, unpubl. data). Shared frailty models can therefore mask site-specific relationships between measured variables and the risk of mortality. Alternatively, the use of shared frailty models can be viewed as protection against overextending inference based upon conclusions drawn from post hoc analysis of individual observational studies (Link & Sauer, 1996). Shared frailty models are therefore best suited to improving inference across similar studies; if interest is in a particular hypothesis, then carefully designed experiments will provide the strongest evidence for posited effects. We suggest that variables that exhibit high among-site variation are potentially fruitful avenues for future research on the mechanisms affecting survival and the contexts in which these mechanisms are important.

Bayesian analysis of shared frailty models offers an effective solution to the problem of estimating survival for many species. These models borrow strength from multiple studies, and produce results interpretable as the baseline survival function and hazard ratios at an average site. Although context-specific effects at a given site can potentially be masked by this approach, it is nonetheless valuable for evaluating the effect of variables on survival at regional scales. Indeed, the safeguard against overextending inference from post hoc analysis of observational studies by means of shrinkage should be embraced as an advantage, rather than a detriment, of shared frailty models. Shared frailty models improve precision of survival estimates and promote inference at large spatial scales to provide valuable information for the conservation of wildlife populations.

Acknowledgments

This paper would not have been possible without the excellent workshop on Bayesian survival analysis given by D. Heisey and C. Bunck at the 2009 Wildlife Society conference. We thank the numerous biological technicians who have contributed their time and talent to our research. The staff at the Sacramento National Wildlife Refuge, including M. Carpenter, M. Wolder and J. Isola, was especially helpful in our efforts. P. Gore provided administrative assistance, and J. Yee provided statistical guidance. M. Herzog, C. Overton and J. Yee provided thoughtful reviews of an earlier version of this paper, and two anonymous reviewers further improved the paper. Our work was funded by CALFED, the US Army Corps of Engineers and the US Fish and Wildlife Service. Snakes were handled in accordance with the University of California, Davis, Animal Care and Use Protocol 9699 and as stipulated in US Fish and Wildlife Service Recovery Permit TE-020548–5. Any use of trade, product, or firm names in this publication is for descriptive purposes only and does not imply endorsement by the US government.

Ancillary