Consequences of heterogeneity in survival probability in a population of Florida scrub-jays



    1. Department of Biology (SCA 110), University of South Florida, Tampa, FL 33620, USA; Donald Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, CA 93106-5131, USA;
    Search for more papers by this author

    1. Department of Biology (SCA 110), University of South Florida, Tampa, FL 33620, USA; Donald Bren School of Environmental Science and Management, University of California Santa Barbara, Santa Barbara, CA 93106-5131, USA;
    Search for more papers by this author

    1. Cornell Laboratory of Ornithology, Cornell University, 159 Sapsucker Woods Road, Ithaca, NY 14850, USA; and
    Search for more papers by this author

    1. Archbold Biological Station, PO Box 2057, Lake Placid, Florida 33862, USA
    Search for more papers by this author

Gordon A. Fox, Department of Biology (SCA 110) and Department of Environmental Science & Policy, University of South Florida, Tampa, FL 33620, USA. Fax: +1 813 974 3263. E-mail:


  • 1Using data on breeding birds from a 35-year study of Florida scrub-jays Aphelocoma coerulescens (Bosc 1795), we show that survival probabilities are structured by age, birth cohort, and maternal family, but not by sex. Using both accelerated failure time (AFT) and Cox proportional hazard models, the data are best described by models incorporating variation among birth cohorts and greater mortality hazard with increasing age. AFT models using Weibull distributions with the shape parameter > 1 were always the best-fitting models.
  • 2Shared frailty models allowing for family structure greatly reduce model deviance. The best-fitting models included a term for frailty shared by maternal families.
  • 3To ask how long a data set must be to reach qualitatively the same conclusions, we repeated the analyses for all possible truncated data sets of 2 years in length or greater. Length of the data set affects the parameter estimates, but not the qualitative conclusions. In all but three of 337 truncated data sets the best-fitting models pointed to same conclusions as the full data set. Shared frailty models appear to be quite robust.
  • 4The data are not adequate for testing hypotheses as to whether variation in frailty is heritable.
  • 5Substantial structured heterogeneity for survival exists in this population. Such structured heterogeneity has been shown to have substantial effects in reducing demographic stochasticity.


Heterogeneity in demographic traits in a population can affect the population's dynamics and extinction risk, and can severely bias estimates of vital rates. Vertebrate ecologists – especially those working with birds – have long been concerned with heterogeneity in survival and recapture probabilities (e.g. Burnham & Rexstad 1993; Sandercock et al. 2000; Bearhop, Ward & Evans 2003). Recently, a number of studies have proposed heterogeneity as an explanation for the observation that, in some populations, reproductive success is positively correlated with survival (Bérubé, Festa-Bianchet & Jorgenson 1999; Cam & Monnat 2000; Mauck, Huntington & Grubb 2004; Barbraud & Weimerskirch 2005). The effect of heterogeneity on population dynamics and extinction risk has received much less attention, although its importance was pointed out over two decades ago (Johnson, Burnham & Nichols 1986).

Over the last several years we have developed theory for understanding the demographic importance of within-population heterogeneity (Fox & Kendall 2002; Kendall & Fox 2002, 2003; Fox 2005). The major results are that within-population heterogeneity can have major effects on demographic stochasticity and extinction risk, and that measuring the magnitude and direction of these effects requires estimating amount of heterogeneity and how much of that heterogeneity is structured (statistically predictable based on measurable features of individuals). In practice, the best one can do is to estimate the minimum portion that is structured. Additional unknown structuring factors can always occur.

Within-population demographic heterogeneity refers to heterogeneity in the phenotypic states underlying survival and reproduction – that is, some types of individuals having greater or lesser chances of surviving or reproducing. This is distinct from heterogeneity in demographic fates; for example, two individuals with identical phenotypes could, by chance, differ in life span or number or timing of offspring (Kendall & Fox 2002).

Florida scrub-jays Aphelocoma coerulescens (Bosc 1795) are long-lived (up to 15 years) cooperative-breeding birds (Woolfenden & Fitzpatrick 1984, 1996). Endemic to Florida, this species requires oak-scrub habitat, much of which has been lost to agriculture and suburban development in recent decades. Fire suppression has severely degraded large tracts of the remaining scrub, and usable habitat is increasingly fragmented. As a result, this species is listed as threatened under the US Endangered Species Act.

A population of A. coerulescens at Archbold Biological Station in Highlands County, Florida has been under intensive study for more than 35 years. This study has produced a substantial longitudinal data set on individual survivorship, including a large sample of birds with known parentage. In this paper we examine data on survival of breeding scrub-jays, with a view to identifying and quantifying sources of demographic heterogeneity. In addition to sex and birth cohort, we study the effect of several variables associated with family (specifically, identity of mother, father and natal territory). Finally, we use subsets of the data to ask how length of the data set affects the conclusions. We interpret the results in terms of the effects of structured heterogeneity in survival on demographic stochasticity.

Materials and methods

Data collection methods for the field study were described in detail in Woolfenden & Fitzpatrick (1984, 1996) and McDonald, Fitzpatrick & Woolfenden (1996). Each year since 1970, all breeding territories were mapped, nests of all family groups located and monitored, and surviving nestlings marked with coloured leg-bands at day 11 post-hatch. Each month since April 1971, all family groups in the study area were censused, and the social status of each individual (e.g. as breeder or nonbreeding helper) was noted. Most natal dispersal is local. Since 1985 well over half the breeders in the population were of known parentage. Once it becomes a breeder, a Florida scrub-jay remains faithful to the immediate vicinity of its initial breeding territory until death. Therefore, our study provides exact measurements of individual breeding life spans. When a breeder no longer can be located during the monthly censuses, we note its date-last-seen and the bird is logged as having died mid-way between that date and the next monthly census. Thus the only data censorship in these data is for birds still alive at present, and there is no heterogeneity in capture probability.

We fit accelerated failure time (AFT) models and Cox proportional hazard models to the data, using birth cohort and sex as predictors. A priori we expected that hazards would be approximately proportional, because it is likely that all individuals in the population undergo increased or decreased hazard at similar times, because of variation in weather, predator populations or disease. AFT models using the Weibull distribution for the error term are also proportional hazards models, and AFT models are more powerful than Cox models (Fox 2001). An additional advantage of AFT models is that one can compare the fit of models using different error distributions, and sometimes make inferences about the underlying survival process as a result (Lindsey 2004). Both types of model readily accommodate censored data by using the observation until the interval in which censorship occurs.

To compare AFT models, we used the corrected Akaike Information Criterion (AIC); all references to AIC herein are to the corrected AIC. To compare the best AFT model with the best Cox model, we compared the difference between the model deviance and the null deviance.

Having identified the best basic regression model, we asked how much the AIC changed by adding ‘frailty’ terms for identity of the mother, father, or natal territory. The idea behind frailty models is simple: some individuals (or groupings such as sibships or sites) are inherently weaker than others, and therefore likely to die sooner. If we ignore this variability, we are unable to estimate the average rate of mortality accurately (Zens & Peart 2003). Frailty is defined to be a variable with a mean of 1. The frailty of an individual (or group) modifies its hazard function, so that its hazard of mortality is hi(t) = h0(t) zi where zi is the frailty for the ith group and h0(t) is the baseline hazard function at time t for the entire population. Thus models using frailty terms allow for heterogeneous or overdispersed data; the frailties correspond to unmeasured covariates.

Frailty models are partly analogous to mixed model anova. At present, their theoretical underpinnings allow for only one frailty term in a model, although this is an area of active statistical research. Our models amount to treating offspring of each group (that is, from each father, mother or natal territory) as a random group with intragroup correlation in hazard. It should be clear that these are random groups only from a modelling viewpoint: if knowing the identity of an individual's parent or territory improves our ability to predict its life span, then differences among parents or territories contributes to structured heterogeneity. Thus, a quantity of key interest here is the variance of the random effect (estimated as the sum of the squares of the estimated frailties, divided by the frailty d.f.).

We used the R statistical package to analyse the data; frailty models can be analysed with many other standard statistical packages, including S-plus and SAS. We fit models using both gamma- and Gaussian-distributed frailty.

Finally, we used subsets of the full data set to ask how much data are necessary to get qualitatively similar results. We fit AFT models of the same form as our best-fitting models, using all possible sequential study years giving pseudo-data sets of 2 years, 3 years, and so on. We calculated the means (across these estimates) and standard errors for model estimates of the shape parameter, intercept, and variance of the random effect.


Among AFT models, the data were best fit by a Weibull model. The Weibull shape parameters were consistently estimated to be > 1, indicating that mortality rates increase with age. We restrict our discussion of AFT models to those with a Weibull distribution. The fact that Weibull models fit much better than others – especially exponential models – means that survival probabilities are structured by age.

The AFT models proved to be much better fits to the data than the Cox proportional hazard models; the difference between the residual and the null deviances was larger for the AFT models. The null deviances differ between models because the AFT models are models of time-to-failure, while the Cox models are models of the hazard of failure.

Qualitative results were similar for AFT and Cox models when we studied models using cohort and sex only. Both models showed that considerable among-cohort heterogeneity in survival exists. The cohort-specific term reduced the deviance from 2535 to 2370, on 31 d.f. for the Cox model, and from 1555 to 1499 for the AFT model. Sex reduced the deviance by less than 1 unit under both models. The analysis of deviance for the best-fitting AFT model is shown in Table 1. The range of effects for cohorts in the best-fitting AFT model is shown in Fig. 1.

Table 1.  Analysis of deviance for the best-fitting AFT model using only sex and cohort. Null deviance = 1555 on 360 d.f.
Termd.f.DevianceResidual d.f.−2*LLP
Sex 1 0·703591554·30·40
Figure 1.

Estimated effects of birth cohort. (a) Survival curves under the best-fitting AFT model for cohorts born in the best (1986 – dotted), worst (1992 – dashed), and average (solid) years. (b) Estimated median life span of birth cohorts. Bars give 95% confidence intervals. Coefficient of variation of the cohort regression coefficients is 0·146; CV for the best-fitting homogeneous model is 0·114.

The residual deviances are still large after accounting for sex and cohort (Table 1). Adding frailty terms resulted in large reductions in deviance. Models with gamma-distributed frailty fit much better than those with Gaussian-distributed frailty; we report only on the former here, but the Gaussian models were qualitatively similar. Table 2 shows a summary of the results.

Table 2.  Summary statistics from including frailty terms in models including cohort and sex as predictors. Each line gives the estimated variance of the random effect and its effect on model deviance, for a survival model including effect of year-class and sex. Variance of random effect is (sum of squares of estimated frailties)/(frailty d.f.). The null deviance = 2535 for the Cox models and 1555 for the AFT models
 Cox proportional hazardsAFT (full data set)
Variance of random effectReduction in devianceResidual devianceVariance of random effectReduction in devianceResidual deviance
Mother0·01 323670·514351064
Natal territory0·021223580·331921307

The best-fitting model is clearly the AFT model with a term for frailty(mother). Under the AFT models, the frailty term led to a reduction in deviance by as much as 400 units. Variance of the random effect is fairly large, on the order of 0·33–0·51. Figure 2 shows the distribution of log(frailty(mother)) for the best-fitting model. The Cox models fit more poorly, and they estimate somewhat smaller variances of the random effect. Under the Cox models, the frailty terms reduced the deviance by 3–57 units. The model with frailty(father) was the best-fitting of the Cox models.

Figure 2.

Distribution of log(frailty(mother)) for the best-fitting AFT model.

Under the frailty models, the coefficients for birth cohort effects vary more among years than they do under homogeneous models (Fig. 1). The standard deviation of cohort coefficients (including zero for the 1969 cohort) is 0·22 for the homogeneous model but 0·27 for the frailty model. This conclusion is robust: the model using frailty(father) and frailty(natal territory) result in standard deviations of 0·247 and 0·235, respectively. No indications of temporal trends occur in any of these models.

We were able to fit both frailty and homogeneous models to 337 sequential subsets of the data. The frailty model was a better description of the data in all but three of these cases. Typically, the AIC of the frailty model were several hundred units smaller than the homogeneous model (Fig. 3).

Figure 3.

Difference between the AICs of frailty models and of homogeneous models, for all sequential subsets of years in the data set. Frailty models were better in 334 of 337 cases for which both models converged to solutions.

Length of the data set had a substantial effect on the estimate of the Weibull shape parameter for frailty models, but a smaller effect for homogeneous models. The shape parameter determines the asymptotic nature of the hazard function: the hazard is monotonically declining for shape parameters < 1 and monotonically increasing for shape parameters > 1. Longer data sets always meant a smaller estimated shape parameter. The frailty models always had larger shape parameters than homogeneous models. The difference between them stabilizes for data sets longer than about 9 years (Fig. 4).

Figure 4.

Effect of the length of data set on estimates. Weibull shape parameters under homogeneous (a) and frailty (b) models. (c) Variance of the random effect. Error bars give 95% confidence intervals.

A similar effect can be seen in the estimates of the variance of the random effect (Fig. 4): before 9 or 10 years, it has very large variance and the mean estimate declines substantially with length of the data set. Data sets longer than the maximum life spans of individuals (15 years) have estimates with very small variance.


Working with a shorter data set, McDonald et al. (1996) found evidence that Florida scrub-jays undergo actuarial senescence – that is, the hazard of mortality increases with age in adults. Our results reinforce that conclusion: Weibull models with shape parameters > 1 were always the best-fitting models, indicating an increase in the risk of mortality with age.

Cohort affected survival (Fig. 1), but it is striking how much of the variation still was unexplained after including terms for cohort. While drought, fire and hurricanes occur in our study area fairly often, simple environmental stochasticity turns out not to be the most important explanation of variation in survival. Variation among families reduces the model deviance dramatically, even after accounting for cohort. Recent studies in other organisms (e.g. Coulson et al. 2001; Bjørnstad et al. 2002) also point to the conclusion that a substantial portion of observed variation in animal population dynamics can be explained.

In general, the effect of heterogeneity in demographic rates on demographic stochasticity depends on the second derivative of the variance in that trait, with respect to the trait value (Kendall & Fox 2003). However, things are greatly simplified for survival, because we can use the binomial model. Kendall & Fox (2003) showed that unstructured heterogeneity has no effect while structured heterogeneity in survival decreases the effect of demographic stochasticity.

Using the full data set, including a frailty term led to a reduction in model deviance from 1499 to 1064. Almost every subset of our data pointed to the same conclusion – including a frailty term for maternal family nearly always led to better fits than homogeneous models (Fig. 3). In most cases the improvement was several hundred AIC units. The variance of the random effect provides us with an estimate of variance in a structured factor: knowing the family an individual comes from, we have an improved prediction of its survival probability. The frailty terms for maternal families suggest a ‘demographic covariance’sensu Engen, Bakke, & Islam (1998) – individual demographic properties are not independent of one another.

Including a term for parentage improves our ability to predict survival probability. Can this heterogeneity among maternal families be further explained? Strong evidence exists for one part of the explanation: territories vary in quality. Mumme et al. (2000) showed that breeding adults on territories along a highway had an average annual mortality rate of 38%, as compared with an average annual mortality rate of 23% in other territories. Many other environmental factors (such as acorn production and fires) that vary spatially may contribute to heterogeneity among families as well.

It is not possible to evaluate hypotheses about the potential contribution of genetic variance to the among-family heterogeneity. One reason is simple: in this observational study parents are confounded with territories, and mating is nonrandom, rendering invalid any estimates of quantitative genetic parameters (Lynch & Walsh 1998). However, deeper issues exist in estimating heritability for survival-related traits. Recall the distinction between demographic traits and fates (Kendall & Fox 2003): having a particular death date (or death age) is a fate, while the underlying propensity to survive is its trait. Substantial variation exists in demographic fates, even if little variation exists in the underlying traits: some individuals live a long time in spite of their genes, for example. Vaupel (1988) showed that because of this, even if frailties are perfectly heritable and the variance of random effects is large, one should expect the heritability of observed longevities to be nearly zero. While mindful of our violations of quantitative genetic theory, we conducted parent–offspring regressions for longevity, and no slopes were significantly different from zero. As Vaupel (1988) showed, this does not inform us about the heritability of frailty, however, and our data set is too small to ask whether the frailties of females’ maternal sibships are correlated with the frailties of their progeny.

Interest in the evolution of longevity has existed for many years (Medawar 1952; Williams 1957; Hamilton 1966; Rose 1991; Charlesworth 1994; Grant & Grant 2000; Nisbet 2001; Partridge 2001; Reznick et al. 2001; Carey 2003). It is evident from both comparative and experimental data that longevity has evolved under the influence of natural selection. Most heritability estimates for longevity are on the low side. This has been generally interpreted as reflecting the erosion of genetic variance by selection. Vaupel's (1988) result suggests a different explanation: even with substantial heritability of underlying traits, heritability of longevity should be expected to be low.

In our study, a frailty model led to a larger estimate for the strength of environmental stochasticity than a homogeneous model. By accounting for parentage, the model can better separate among-cohort heterogeneity from residual error. It often may be the case that accounting for within-population heterogeneity improves our ability to estimate environmental stochasticity. This suggests that heterogeneity may be important to estimate even when populations are large and demographic stochasticity is of little concern.

We have identified several factors – especially parentage – as sources of structured heterogeneity in scrub-jay demography. Estimates of this magnitude for the amount of structured heterogeneity lead us to conclude that the risk of local extinction from demographic stochasticity is less than we might otherwise have estimated (Kendall & Fox 2002, 2003); subsequent papers will consider data on heterogeneity in reproduction.

Our estimates are minimum estimates of the amount of structured heterogeneity. Unfortunately no way exists, short of further field research, to improve on this. While our model (deviance = 1064) is a large improvement over the null model (deviance = 1599), no simple way exists to gauge the meaning of this residual deviance. It is not analogous to an unexplained variance. Little research on the expected deviance of AFT models has occurred, but it seems clear that the expected deviances will be large, inasmuch as the likelihood function is the product of the hazard function for all noncensored cases and the survival function for censored cases. We simulated 100 exponentially distributed death times with rate parameter 0·1; fitting this simulated data set with an exponential model gives a deviance of about 600. In other ecological applications with hundreds of study individuals, model deviances are typically in the order of several hundred to a few thousand (see Fox 2001 and Crawley 2005; references therein). Thus, while we are certain that more structure exists in the Archbold scrub-jay population than we have identified and modelled, the size of the deviance for our model need not imply that these other factors are of large magnitude.

Interest in demographic heterogeneity has existed for some time among ecologists, mainly motivated by two issues. First, scientists using mark–recapture methods have been concerned with heterogeneity in survival and recapture probabilities (e.g. Burnham & Rexstad 1993; Sandercock et al. 2000; Bearhop et al. 2003), as it can severely bias estimates of vital rates. Second, researchers working with long-term demographic data sets have observed an apparent increase in reproductive output with age, and have considered demographic heterogeneity as a possible explanation of this seeming paradox (Bérubéet al. 1999; Cam & Monnat 2000; Barbraud & Weimerskirch 2005).

We emphasize that an entirely different reason to investigate heterogeneity exists: its potential for strong impact on the growth rates and extinction risks of populations. Moreover, the approach we have employed – comparing frailty models with homogeneous models – is much more powerful statistically, and lends itself to deeper biological insight than looking for positive correlations between reproduction and survival (van Noordwijk & de Jong 1986; Bérubéet al. 1999; Cam & Monnat 2000; Mauck et al. 2004; Barbraud & Weimerskirch 2005). Few other studies have used this powerful approach. Cam et al. (2002) found a similar method useful in a study of black-legged kittiwake Rissa tridactyla survival. Their model was of random effects at the individual level, in contrast to our ‘shared frailty’ (Therneau, Grambsch & Pankratz 2000) model for families.

Our results suggest that, at least in some bird populations, other important sources of heterogeneity in survival exist – in our scrub-jay population, both age-structured and family-structured heterogeneity occur. While this suggests a need for additional care in estimating vital rates of bird populations, cause exists here for optimism: as Kendall & Fox (2002) pointed out, in many cases, the appropriate data are already in ecologists’ field notebooks. No reason exists to think that this conclusion applies uniquely to bird populations; we expect that considerable unrecognized demographic heterogeneity exists in populations of many nonavian taxa.

Even a data set of 2 years’ length leads to the conclusion that substantial structured heterogeneity exists in survival because of family structure: frailty models almost always are better fits than homogeneous models. Parameter estimates for the Weibull shape parameter and for the variance in the random effect are quite insensitive to length of the data set once the latter becomes longer than about the median life span.

Our results demonstrate that it is possible to identify some of the sources of structured heterogeneity in survival in natural populations – and to estimate the magnitude of their contribution. We are optimistic about the prospects for further understanding and quantifying the sources of heterogeneity in demographic performance – and thereby improving our understanding of demography and estimation of extinction risks.


Support was provided by the US Environmental Research Program, STAR grant no. 1282908851 to Kendall and Fox. Principle funding for Fitzpatrick and Woolfenden over the 35 years has been provided by the National Science Foundation (BSR8705443, BSR896276, BSR9021902 and DEB9707622), Archbold Biological Station, and the National Geographic Society.