Survival probability is a key parameter whose variation may have a substantial influence on the population asymptotic and realized growth rate (Caswell, 2001; Nichols & Hines, 2002). Estimation of survival in wild vertebrate populations has long been a challenge and has stimulated collaborations between biologists and statisticians (Williams, Nichols & Conroy, 2002), mostly because of difficulties in correcting observed proportions of survivors when not all the individuals alive and present in the study area are detected by investigators (i.e. detection probability is <1; Williams et al., 2002). Halstead et al. (2011) have estimated daily mortality risk in the giant gartersnake (Thamnophis gigas) and addressed spatial variation in survival. In Halstead et al., the difficulty was not detection probability: individuals were equipped with radio transmitters and detection probability approaches 1 in many telemetry studies (Williams et al., 2002). Halstead et al. used a standard approach in human demography based on hazard models (Hosmer, Lemeshow & May, 2011), where the hazard function accounts for the instantaneous rate of occurrence of the death event. They used a Bayesian approach to estimate a mixed version of their model; that is, a model with fixed (e.g. habitat type) and random effects (year, site).
Read the Feature Paper: Bayesian shared frailty models for regional inference about wildlife survival
Other Commentaries on this paper: Combining information in hierarchical models improves inferences in population ecology and demographic population analyses; Bayesian shared frailty models for regional inference about wildlife survival
The model is a ‘shared frailty’ model. Frailty models have been developed to account for heterogeneity in populations: the latter consist of a mixture of individuals with different hazards. Biologists have long identified factors that co-vary with survival in wild populations (age, year, habitat quality, etc.). However, human demographers also pointed out that hazard estimates may be biased if some relevant sources of heterogeneity are ignored (Vaupel, Manton & Stallard, 1979; Aalen, Borgan & Gjessing, 2008): if investigators are unaware of the relevance of some sources of variation in mortality risk, if defining the variables to measure is conceptually challenging (e.g. ‘individual quality’; Wilson & Nussey, 2010) or if measuring them is technically difficult. Frailty models are individual random effects models that assume a distribution of individual hazards; this distribution accounts for the heterogeneity among individuals that remains once measured covariates have been taken into account, and its characteristics have to be assessed. Indeed, an individual hazard cannot be estimated using data from an individual because the death event is unique in the individual life history, but we can assess the distribution of individual hazards in the population. Using frailty models requires conceptual decisions whose relevance for a particular dataset cannot always be assessed because of the current limitations in statistical theory (e.g. which distribution to choose for individual hazards; Yashin et al., 2001). Moreover, how to assess the relevance of models with different parameterizations for random effects is still currently debated (Gelman, Meng & Stern, 1996; Gelfand & Ghosh, 1998; Spiegelhalter et al., 2002; Cai & Dunson, 2006; Plummer, 2008), and investigators have to choose from several methods when there is no dominating one. With some assumptions and constraints in model development, frailty models can be estimated (Yashin et al., 2001). Random effects models (including frailty models) have been used in a very large number of papers focusing on (longitudinal) data from humans (Banerjee, Carlin & Gelfand, 2003; Banerjee, Wall & Carlin, 2003; Gelman & Hill, 2007; Lawson, 2009).
Halstead et al. (2011) explained why they treated the variable ‘Site’ as a random effect as follows: ‘we were not interested in site differences per se, but wanted a large-scale assessment and the average survival function of the giant gartersnake’. This is indeed a reason why investigators consider a variable as a random effect rather than a fixed one: the study sites are considered as a sample from a larger population of sites, and the goal is to draw inferences about the population-averaged response and the variance among sites. Treating ‘Site’ as a random effect had crucial consequences: the site-specific estimates of survival were more precise than if ‘Site’ had been treated as a fixed effect (Halstead et al., 2011). When ‘Site’ is treated as a fixed effect, sites are considered as independent and data from each site are used to estimate k site-specific mean hazards. Obviously, if the number of marked snakes per site is small, the estimated site-specific hazard rate is likely to be imprecise (large credible interval). Treating ‘Site’ as a random effect is using data from all the snakes from all the sites to draw inference about individual sites, which is sometimes described as ‘borrowing strength’ (Sauer & Link, 2002; Clark et al., 2005).
Shared frailty models
The distinguishing feature of the model used by Halstead et al. is that it is a shared frailty model: the individuals in each site share the same unobserved frailty. Shared frailty models are used when the number of subjects in each group (cluster) is small, or when there are good reasons to hypothesize that groups are homogeneous in terms of hazard (e.g. when data from several studies are used, a study can be treated as a cluster), or to take non-independence of observations from subjects in a cluster into account. Halstead et al. chose not to use a model with individual frailty because the age of the snakes was unknown, contrary to studies of birds marked as chicks for example (Marzolin, Charmantier & Gimenez, 2011). Frailty is assumed to reflect an individual deviation in the mortality risk from the baseline risk: it is important to use data from individuals that survived the same number of units of time before entering the study to assess this risk. The genuine distribution of individual frailties in the population cannot be accessed if the individuals with the largest mortality risks die before being captured, marked and released. Halstead et al. were concerned about such heterogeneity among snakes created by heterogeneity in age at capture. For this reason, they focused on the variation in hazard among sites and assumed homogeneity in frailty within sites. This variation would underestimate the dispersion of genuine hazards among sites if there was still heterogeneity in frailty within sites and if the age at marking and the proportion of individuals missed before being marked differed according to site. Addressing these hypotheses is virtually impossible when the number of individuals per site is small.
Random effects models in demographic studies of wild vertebrates
Individual random effects models have been used to estimate survival in animal demography studies (e.g. Cam et al., 2002; Clark et al., 2005; Royle, 2008; Gimenez & Choquet, 2010; Hawkes, 2010; Aubry et al., 2011). Modelling approaches that are generally developed in other areas of research are increasingly being used and modified to address hypotheses in wildlife ecology. There are several reasons for this. First, the criteria to assess the quality of wildlife ecology research are changing, thanks to cooperation with modellers. For example, the issue of non-independence of responses in adjacent spatial areas has long been considered in human health studies (Waller & Gotaway, 2004), and is now handled via spatially structured random effects in ecology (e.g. Ogle et al., 2006). Second, random effects can be clearly of interest in evolutionary demography: a random effect structured as a function of the degree of relatedness of individuals in a pedigree has been used to estimate the additive genetic variance in survival and heritability (Papaïx et al., 2010; Buoro, Gimenez & Prévot, 2012). Shared and correlated frailty models have also been used in human demography to address ‘resemblance’ in mortality risk in twins or families, and the possible genetic determinism of this risk (Yashin et al., 2001). Third, as emphasized by Halstead et al., data from several studies, populations or species can be combined in a joint analysis partitioning the variance in demographic parameters; such models can be used to address the co-variation in time series among species or populations (Lahoz-Montfort et al., 2010; Papadatou et al., 2011). Last, the development of free software designed to estimate mixed models has been a deciding factor (e.g. BUGS; Lunn et al., 2009; R Development Core Team, 2011). These pieces of software are very flexible and it is possible to specify user-defined structures for the variance-covariance matrix of random effects according to the levels of variation in the response relevant to the biological questions of interest. However, flexibility requires investigators to make many decisions: how to parameterize models, to estimate their coefficients, to assess model fit and compare models. This highlights the need for advanced courses in statistical modelling in university education in wildlife science.