Below-threshold mortality: implications for studies in evolution, ecology and demography


Daniel E. L. Promislow Department of Genetics, University of Georgia, Athens, GA 30602–7223, USA. Tel: (706) 542 1715; fax: (706) 542 3910; e-mail:


Evolutionary biologists, ecologists and experimental gerontologists have increasingly used estimates of age-specific mortality as a critical component in studies of a range of important biological processes. However, the analysis of age-specific mortality rates is plagued by specific statistical challenges caused by sampling error. Here we discuss the nature of this ‘demographic sampling error’, and the way in which it can bias our estimates of (1) rates of ageing, (2) age at onset of senescence, (3) costs of reproduction and (4) demographic tests of evolutionary models of ageing. We conducted simulations which suggest that using standard statistical techniques, we would need sample sizes on the order of tens of thousands in most experiments to effectively remove any bias due to sampling error. We argue that biologists should use much larger sample sizes than have previously been used. However, we also present simple maximum likelihood models that effectively remove biases due to demographic sampling error even at relatively small sample sizes.


Age-specific mortality schedules lie at the heart of models that explore an enormous range of biological phenomena, from basic calculations of Darwinian fitness (Fisher, 1930) to the evolution of ageing (Hamilton, 1966; Charlesworth, 1994), from explorations of population dynamics (Caswell, 1989) to predictive analyses for conservation strategies (Lande, 1988a,b). But the estimation of age-specific mortality rates brings with it a variety of statistical challenges, and unless we address these challenges in our experiments, we are likely to draw biased conclusions.

In the following work, we describe how studies of mortality rates that use insufficient sample sizes or inappropriate statistical models may bias conclusions about many contemporary issues in evolutionary research. First, studies of ageing in laboratory animals have claimed that genetic factors extend life span by slowing the rate of senescence. We will argue here that when one considers the effect of sample size, evidence for changes in baseline mortality is strong, but evidence for changes in rates of ageing may turn out to be weaker than previously thought. Second, evolutionary theory predicts that the onset of senescence should coincide with the onset of reproduction. Here we discuss how sample size affects our ability to determine when senescence begins. Third, both field and laboratory studies provide evidence for a trade-off between reproduction and survival. We would like to know whether increases in reproduction actually affect the rate of ageing. Again, insufficient sample sizes can bias our estimates of cost of reproduction. Fourth, Medawar's widely accepted ‘mutation accumulation’ model for the evolution of senescence (Medawar, 1952) makes certain predictions about variation in age-specific mortality rates among genotypes. Only recently have biologists begun to test this model by asking how genetic variance for mortality rates changes with age. Results from these studies are critical to our understanding of evolutionary models of ageing. But sampling error may bias our results in favour of the model we set out to test.

In each case, we focus in particular on the influence of sample size on mortality estimates at the youngest adult ages. At early ages, mortality rates are low and so are relatively prone to sampling error. However, models by Hamilton and others (Lewontin, 1965; Hamilton, 1966; Abrams, 1991; Charlesworth, 1994) show that it is at these early ages that fitness is most influenced by variation in life history characters. From an evolutionary perspective, early age mortality rates are extremely important, but are also those most likely to be estimated incorrectly, leading to biased estimates of mortality patterns over the life-span of a cohort. Early age fecundity is also a critical component of lifetime fitness. However, the statistical nature of age-specific mortality and fecundity are qualitatively different. We confine our discussion here to estimates of mortality rates, whose estimates lead to particular statistical challenges.

Before examining the specific problems outlined above, we will provide some conceptual background, including an explanation of mortality rates and how they are estimated, and we introduce the statistical challenges of below-threshold mortality. This brief statistical discussion will then motivate our subsequent discussion of the effect of below-threshold mortality on specific biological problems.

The force of mortality

Mortality rates measure the fraction of individuals in a population dying in a given time interval. Formally, age-specific mortality, mx, is the probability of dying in the discrete time interval from age x to age x + Δx, conditional on surviving to age x (Chiang, 1984). See Table 1 for a list of variables used throughout this manuscript. It is usually defined as

Table 1.  Variables used in this paper. Thumbnail image of

inline image

where lxis the proportion of individuals surviving to age x, Δx ≥ 1, and qx is the estimator of the parameter mx. Similarly, we can define age-specific mortality as qx=dx/Nx, where Nx is the number of individuals alive at age x, and dx is the number of individuals dying between age x and x + Δx (i.e. NxNxx). This estimate of mortality is limited, because its value depends on the size of the census interval Δx, and qx has an upper bound of 1. Both of these factors limit the power of qx as a true and unbiased estimate of mx. In place of qx, demographers estimate the force of mortality, hx (also known as instantaneous mortality rate or hazard rate) (Lee, 1992). The force of mortality is the instantaneous probability of death on a continuous time-scale, and is estimated by μx, the limit of the probability of dying in a given time interval as that interval becomes infinitesimally small:

inline image

(Lee, 1992). This formulation is unmeasurable with the discrete data typically available, but is well approximated by

inline image

(Elandt-Johnson & Johnson, 1980).

Other formulations have also been proposed (reviewed in Gavrilov & Gavrilova, 1991) to address possible bias in this approximation. However, when hx is small (as is generally the case at early adult ages), or the number of census intervals is large (Mueller et al., 1995), the approximation itself does not bias mortality estimates (see also Elandt-Johnson & Johnson, 1980).

In addition to calculating mortality rates for each age interval, one can fit parametric models to describe the trajectory of mortality across all ages. Age-specific mortality rates often increase as a function of age (e.g. Comfort, 1979; Finch, 1990; Promislow, 1991) and approximate an exponential trajectory, such that

inline image

where α is the initial, age-independent rate of mortality, and eβ is the rate at which mortality increases with age. This equation can be linearized by taking the logarithm of both sides, such that

inline image

This is the well-known ‘Gompertz’ mortality equation (Gompertz, 1825), with intercept ln(α) and slope β. The Gompertz model may not adequately describe mortality at very early ages if natural mortality rates are high (Abrams, 1991), or at very late ages, as recent studies based on sufficiently large sample sizes have shown (Carey et al., 1992; Curtsinger et al., 1992; Kannisto et al., 1994; see also Abrams & Ludwig, 1995). However, for the sake of the analysis we present here, and unless stated otherwise, we assume that mortality curves are Gompertz across all ages. This model provides a useful heuristic tool to illustrate the effect of sample size (Nx) on mortality parameters, especially under the common assumption that adult mortality rates are continuous and exponentially increasing at least through young and middle ages. Although more complex parametric models may provide a better fit to mortality data than the Gompertz, the arguments we make in here apply in any case.

Statistical implications of below-threshold mortality

We defined age-specific mortality as qx=dxNx. Note that both dx and Nx can take on discrete values only. An individual is either dead or alive (dx={0,1,2…}). Thus, at its minimum measurable value, qx is equal either to 0 if no one dies, or to 1Nx if a single individual dies. Similarly, the lower observable nonzero bound for μx=−ln{(N − 1)N} ≈ 1Nx for large N.

Although measures of mortality are discrete, we generally assume that the senescent change in mortality with age is a continuous, monotonically increasing process, and that for all ages there is a finite risk of mortality even when no deaths are observed. We recognize that there will exist some true force of mortality hx > 0 even if the observed force of mortality μx=0. But what happens if the true force of mortality is below the observable threshold (i.e. 0 < hx < 1/Nx)? In this case, we lack a robust way to estimate true mortality in the cohort because a fraction of an individual cannot die. We refer to this region of the parameter space as below-threshold mortality (Fig. 1). In this region, since the force of mortality is small relative to sample size, sampling error will lead to biases in estimates of the mean, the variance and the long-term trajectory (e.g. the Gompertz parameters) of the force of mortality.

Figure 1.

The solid line represents a Gompertz mortality curve (see text, eqns 4, 5) for a cohort of N=50 individuals. The dotted line represents the threshold at which mortality μ=1/50, and below which mortality cannot be observed, and the shaded region shows that part of the Gompertz curve that we will accordingly not be able to estimate, due to insufficient numbers of deaths.

Point estimates of mortality

At the early-age and late-age boundaries of mortality trajectories, when either mortality or survival (h or eh, respectively) are close to zero, mortality behaves as a threshold character and becomes difficult to estimate accurately (Gaines & Denny, 1993). Consider a cohort with initial sample size N0=50 individuals, with a true hazard rate in the youngest age class, h0=0.001. What is the probability of observing one or more deaths in this cohort at a given age x? Assuming that each individual in a cohort is equally likely to die in a given time interval, since h < 0.1 and Nh < 5, the distribution of deaths at age 0 is approximately Poisson (Sokal & Rohlf, 1995). Thus, the probability of observing no deaths,

inline image

and the probability of observing one or more deaths,

inline image

For h=0.001 and N=50, P[μ > 0]=0.049. There is a less than 5% chance of observing any mortality (i.e. of estimating mortality as nonzero) in the first age-class, even though h > 0. In fact, if h=0.001, to determine accurately that the true mortality rate is significantly greater than zero at the 95% confidence level with a power β=0.1, we would need a sample size N almost 10 times larger than 1/h (N ≈ 10 000) (Zar, 1984). Thus, the point estimates of below-threshold mortality will be highly inaccurate.

Variance in mortality

We have noted that the observed force of mortality necessarily takes on discrete values, because a fraction of an individual cannot die. When mortality is low relative to the inverse of sample size, we tend to either over- or under-estimate the true force of mortality. When mortality rates are much lower than 1/N (e.g. at young adult ages), we will observe mortality rates of either zero or 1/N, but not intermediate values. This, in turn, influences our estimates of the variance in the force of mortality among cohorts. Within a cohort of genetically identical individuals, if we assume that all individuals in that cohort have the same probability of mortality, then the expected variance of mortality rate, σq2 is approximately binomial (an individual is either alive or dead), and is given by

inline image

for the ith cohort. In this simplest case, the estimate of expected variance for q takes into account the effect of sample size. Note also that for small q and large N, the expected number of deaths, d, is Poisson distributed, the expected variance in number of deaths is d, and the expected variance in mortality rate, σ2−2.4mmq=dN2.

When we try to estimate variance in mortality among cohorts, the situation is more complicated. Individuals in different cohorts will not have the same underlying probability of dying, so the variance is no longer given by eqn 7. Furthermore, when mortality rates are low, we will systematically underestimate variance due to below-threshold mortality. Consider an extreme but nonetheless illustrative example. Assume that hx,i z.Lt; 1Nx,i, where hx,i is the true force of mortality and Nx,i is the population size at age x for the ith cohort. Given sufficiently low mortality rates and small sample sizes in each cohort, it is likely that no deaths will be observed in the age interval x to x + Δx and the mortality rate will be estimated as zero (μx,i=0 for all cohorts). Thus, even if σh2>0, the observed variance will be equal to zero. We elaborate on this point in the section ‘Variation in mortality and the evolution of senescence’ below.

A further complication arises from the fact that whereas mortality is binomially distributed within cohorts, it appears to be log-normally distributed among a sample of cohorts drawn from a single population (Promislow et al., 1996). A logarithmic transformation is therefore needed to normalize the variance. But given that observed mortality rates are often equal to zero, the transformed data (ln[μx=0]) will be undefined. One way to overcome this problem is to add a constant to each value (e.g. ln(μx) is replaced by ln(μx + 1), Hughes & Charlesworth, 1994). This gives rise to a systematic underestimate of the variance when μx is small (i.e. at early ages) relative to larger values of μx (i.e. at older ages), giving rise in turn to an apparent, though potentially spurious, increase in age-specific variance. The only way to truly circumvent this problem is to use sufficiently large cohort sample sizes, such that μx > 0 for all cohorts.

Although we assume that genetically identical individuals have identical risk of mortality, this is a conservative assumption. If the force of mortality, h, varied within cohorts, we would need even greater sample sizes to obtain accurate estimates of the variance.

Gompertz parameters

There are many circumstances where we wish to describe the overall trajectory of mortality schedules. We referred to one common model, the Gompertz, which assumes that mortality rates increase exponentially with age (eqns 4 and 5). Often, the parameters α and β are estimated by fitting a least-squares linear regression to eqn 5 (Finch et al., 1990; Johnson, 1990; Austad, 1993).

However, this procedure is affected by the problem of below-threshold mortality (Fig. 2). At early age-classes, when hx < 1/Nx, we may observe occasional deaths. In most age-classes, however, no deaths will occur, and log(μx) for these zero-mortality age classes is undefined. Thus, when we fit a regression line to age-specific mortality, we typically count zero-values as missing (Tatar & Carey, 1995). If we discard these classes from the regression, the best fit line will include only the early observations with values equal to or greater than 1/Nx, which will bias the intercept toward larger than real values and the slopes toward smaller than real values (see Fig. 2a). The magnitude of the bias is a function not only of sample size, but also of the value of the true Gompertz intercept parameter α. If we compare two cohorts with identical slope β but different intercept ln(α), the one with lower intercept will appear to have a lower slope. With least-squares linear regression, we cannot independently or accurately estimate α and β when cohort sizes are small.

Figure 2.

Cohorts of between 50 and 5000 individuals. Each cohort was sampled from the same underlying distribution, for which β=0.05 and α=0.001. The true mortality is shown by the dashed line. A least-squares linear regression line was then fitted to the data, using each age-specific estimate of log(mortality) as an independent point. If μx=0, the point is considered as missing (see text). Note that for small sample size, we underestimate the true slope and overestimate the true intercept of the data.

To demonstrate the strong influence of sample size on estimates of the Gompertz parameters, we conducted a simulation based on standard resampling techniques, as did Shouman & Witten (1995). However, where Shouman & Witten (1995) focus on the high variance in estimates of the Gompertz parameters due to small sample size, we focus on the consistent directional biases in the estimates that result from small sample size. We created simulated cohorts of 30 000 individuals in which mortality followed a Gompertz trajectory. With these data, we could assign a day of death to each individual within a cohort. From this population of 30 000 individuals, we then sampled individuals at random with replacement to create smaller subpopulations. This was done for each of six subpopulation sample sizes. For each sample size, we estimated the slope and intercept of the Gompertz equation using ordinary least-squares regression. We created 1000 replicate subpopulations for each sample size, from which we were able to estimate the mean and standard deviation for α and β. For the rate parameter, β, we present results from five distinct populations characterized by α=0.001 and β={0.05, 0.08, 0.11, 0.14 or 0.17}. In addition, we analyse four separate populations with respect to the intercept parameter (α) in which β = 0.05 and α={0.0001, 0.001, 0.01 or 0.1} (−9.21, −6.91, −4.61 and −2.30, respectively, on a natural log scale).

This simulation demonstrates that at small sample sizes and using standard regression techniques, we consistently underestimate the slope β and overestimate the intercept ln(α) (Fig. 3). For a fixed value of β, the greater the true value β, the more we underestimate it. Similarly, for a fixed rate of β, the lower the true value of α, the more we overestimate it.

Figure 3.

Effects of sample size on estimates of slope and intercept for Gompertz-like mortality. Points and bars represent observed mean and standard error for parameter estimates for a given sample size. Actual values for slope and intercept are given by dotted lines. As sample size decreases, the maximum likelihood estimate (MLE, upper graphs) for both slope and intercept shows greater error variance, but only weak bias. In contrast, a regression model gives strongly biased estimates of both slope and intercept for all but the largest sample size (>1000 individuals). For more details, see text (section ‘Gompertz parameters’).

These biased estimates have two implications for hypothesis testing. (1) At small sample sizes, we lose power and are more likely not to reject a null hypothesis of no difference between two rate or intercept parameters even when they differ substantially (type II error). (2) Biases in the estimates of α and β are not independent. Cohorts with low true values of α relative to N will yield erroneously low estimates of β, and we will observe differences in the slope terms among cohorts even if they only vary in their intercept parameters. Examples of such type I errors are illustrated below.

Biological implications of demographic sampling error

The importance of below-threshold mortality goes beyond that of a simple illustration of the pitfalls of sampling error. The effects of below-threshold mortality rate appear in studies on a variety of important biological phenomena. In the following section, we examine some specific implications of below-threshold mortality.

Estimating parameters of the Gompertz equation

In the preceding section, we explained how sampling error could lead to a systematic bias in the estimate of the Gompertz parameters. This problem is particularly germane to studies on the evolution of ageing. Scientists interested in the evolution of senescence have used the slope from the Gompertz equation as a simple measure of the rate of ageing, and then asked how this rate is influenced by different phenotypic or genetic factors. Factors that may affect rates of ageing include variation in reproductive effort (Tatar et al., 1993; Tatar & Carey, 1995), expression of novel genes in mutant strains (Johnson, 1990) or transgenic lines (Orr & Sohal, 1994), and the effects of long-term selection on life span (Mueller et al., 1995). Comparative studies have used the Gompertz equation to evaluate differences among species in rates of ageing, and to test whether other factors can explain this variation (Finch et al., 1990; Promislow, 1991).

Here we consider one particular study, on the role of superoxide dismutase (SOD) and catalase levels on Gompertz parameters, to illustrate the confounding effects of demographic sampling error. We should note at the outset, however, that we are not questioning the general point of this particular study, which is that increased expression of SOD and catalase in Drosophila melanogaster leads to an increase in life span. Rather, we will show that the claim that SOD and catalase change the rate of ageing is an erroneous one.

One widely held theory for ageing is that it occurs because of the cumulative oxidative damage by free radicals. In a direct test of this theory, Orr & Sohal (1994) created transgenic lines of Drosophila melanogaster with increased expression of the SOD and catalase genes (Orr & Sohal, 1994). They then fitted the Gompertz equation to mortality rates in one transgenic and three control lines to determine the effect of SOD and catalase on the rate of ageing. Estimates were based on 64–85 flies per line. Orr & Sohal (1994) found that the transgenic lines lived longer, and attributed this change in life span to a decrease in the rate of ageing (i.e. a decrease in slope β of the Gompertz model).

Inspection of figure 1 in Orr & Sohal (1994) suggests that while the transgenic line clearly has a longer life span, the original interpretation of a change in the rate of ageing may have been biased by the effects of demographic sampling error. To determine if this was the case, we used image analysis to obtain the life table data from the original figure 1 of Orr & Sohal (1994) and then reanalysed these data. In Fig. 4, we replot the mortality rates for the control cohorts and the longest lived transgenic cohort. Our figure illustrates the Gompertz mortality trajectories for each cohort based on both the least-squares regression (as used by Orr and Sohal) and a maximum likelihood estimate of the trajectories.

Figure 4.

Gompertz plots as determined from linear regression (dotted lines) and maximum likelihood (solid lines), for data in Orr & Sohal (1994). Triangles represent a control cohort, and crosses represent a transgenically manipulated cohort with elevated levels of SOD/catalase. Note that regression analysis suggests that the treated group has a lower rate of ageing, β. However, this result is not supported by the more robust maximum likelihood estimate, although the baseline mortality (α in eqn 4) is lower for the treatment group.

Note that initial mortality in their data appears flat, increasing only when the rate exceeds ln(0.011)=−4.5, which not unexpectedly coincides with 1/N where N=85, the initial cohort sizes in these particular trials. As a result of these flat early mortality trajectories, straight-line regressions fit through the nonlinear mortality trajectories diverge, and the cohort appears to have a lower rate of increase in the mortality trajectory. However, when mortality trajectories are fit only over the ages where mortality is above threshold, it is evident that the slopes are essentially identical. Our reanalysis suggests that in the SOD/catalase line, life span is extended through a change in the initial mortality rate α, rather than through a change in the demographic rate of senescence, β. We will return to this example later, and use it to illustrate a robust, maximum likelihood modelling approach that minimizes the impact of demographic sampling error.

Age at onset for senescence

Several theoretical treatments of senescence have made the prediction that the onset of a senescent increase in mortality will coincide with the onset of reproduction (Williams, 1957; Charlesworth, 1994; but see Abrams, 1991, who points out that this is conditional on the specific assumptions underlying these models). To test the claim that the onset of senescence should be coincident with the onset of reproduction, Promislow (1991) gathered data on age-specific mortality, age at onset of senescence and age of first reproduction in natural populations of 49 mammalian species. He defined the age at onset of senescence as the age at which mortality rates began to increase progressively with age. Contrary to the prediction of Williams (1957), Promislow (1991) found that the age at onset of senescence was considerably later than the age at first reproduction in almost all species. We now recognize, however, that when cohort sample sizes are small, below-threshold mortality can bias estimates of the age at onset of senescence beyond the true age.

Why this bias occurs can be explained using the logic already outlined above. Consider the extreme case, where the age at onset of senescence is expected to be at adult age 0 (this may be relevant for many species of Drosophila, in which age at first reproduction occurs shortly after eclosion (Markow, 1996)). Let us assume that mortality follows a Gompertz trajectory from age 0, with baseline mortality α and rate of ageing β. If N is small, such that α z.Lt; 1N, then we will not observe continual increase in mortality until some later age when hx ≥ 1N.

We can determine analytically how much we will overestimate the true age at onset of senescence due to the effects of sampling error. Consider the mortality patterns in Fig. 2(a,b). We sampled these cohorts from hypothetical populations in which mortality is Gompertz throughout the life span, with onset of senescence at age 0. But in Fig. 2(a,b), senescence does not appear until half the life span or more is completed. The lowest observable nonzero mortality, μlb (or lower bound mortality) is equal to one death in the cohort of size N, or μlb=1/N. If we start with N0 individuals at age 0, given α and β, when will mortality rates appear to increase? Mortality rates will reach this threshold, or lower bound, when

inline image

where xlb is the age at which μx appears to increase. Rearranging the equation, we obtain the lower bound age, or the apparent age of onset of senescence:

inline image

The lower the value of either N0 or α, the greater the overestimation of the time at which senescent mortality begins. This argument also suggests that where two same-sized cohorts have identical age at onset and rates of senescence, the cohort with the smaller baseline mortality α will appear to have a later age at onset (assuming α < 1N0).

In Promislow's (1991) study of mammalian age at onset of senescence, cohort sizes ranged from fewer than 40 to over 5000. Over one-half of the cohorts had fewer than 200 individuals in them. We now appreciate that, at least for relatively low mortality rates, sample sizes of 200 or less can give rise to substantial bias. Given the sample sizes of many of the studies that Promislow (1991) used, the pattern of delayed onset of senescence could be an artefact of the relatively small sample sizes typical of demographic studies in mammals. Senescent increases in mortality may begin at an earlier age than detected, but remain obscured by the sampling threshold.

A similar bias may have affected the interpretation of sex-mortality differentials in the laboratory study of the beetle Callosobruchus maculatus (Tatar & Carey, 1994). Tatar and Carey found that most of the sex difference in life-expectancy was due to a shift toward older ages in the mortality curve of females relative to males. Mortality began to increase with age from a minimum value of about 0.001 when females were 11 or 12 days old. For males, this increase occurred at age 5 days. From these data, Tatar and Carey suggested that females experienced a delayed onset of senescence relative to males. However, since cohorts comprised no more than 900 individuals, Tatar and Carey could not observe mortality rates of less than 0.0011. We can now see that both sexes may begin to senesce at eclosion, that is at age 0 days, and that males may have a greater baseline mortality, α, at this age. Due to the effects of below-threshold mortality and our initial sample sizes we will not see the manifestation of this difference until the true mortality rate of each sex surpasses 1/N, and the apparent age of onset of senescence will be earlier for males than for females, even if both sexes actually begin senescing at the same age.

Costs of reproduction

One explanation for the evolution of senescence is that it occurs because of a cost of reproduction, be it genetic (Williams, 1957) or physiological (Kirkwood, 1977), that is paid for in the currency of an increase in age-specific mortality rates. In a series of landmark studies on costs of reproduction by Partridge and colleagues (e.g. Partridge & Andrews, 1985; Partridge et al., 1987), it was shown that both within and among populations of Drosophila melanogaster, high levels of early reproduction are associated with relatively short life spans (see also Harvey & Zammuto, 1985). This decrease in life expectancy may be due to an acute and temporary increase in mortality at the time of reproduction (‘risky reproduction’, Partridge et al., 1987), or a long-term, permanent increase in mortality (‘accelerated senescence’, Partridge, 1987). To determine which of these is the case, studies have manipulated levels of reproduction and then observed the resulting changes in age-specific mortality. Not surprisingly, our ability to estimate accurately changes in mortality rate due to reproduction will be influenced by the sampling error that results from below-threshold mortality.

In practice, to assess the effect of reproduction on rates of senescence, we determine whether a relatively high level of reproduction is associated with a relatively high rate of increase in age-specific mortality (high β). We distinguish accelerated senescence from both an increase in the initial mortality rate α (which also represents a permanent effect of reproduction on life span), and from acute, temporary effects of reproduction on mortality, which reduce life expectancy but do not affect ageing (e.g. male reproduction in Drosophila; Partridge & Andrews, 1985). However, in both phenotypic (Partridge & Harvey, 1985; Reznick, 1985) and genetic studies (Rose & Charlesworth, 1981; Luckinbill et al., 1984), the variable examined after the manipulation has been longevity, rather than mortality rate.

The first explicit analysis of how reproduction affects rate of change in age-specific mortality came from work with the bean beetle, Callosobruchus maculatus (Tatar et al., 1993; Tatar & Carey, 1995). In these studies, egg-laying rates were manipulated during adult ages 0–5 days. Under optimal nutritional conditions, relatively high levels of egg production at ages 0–5 days produced a 1.7-fold increase in age-specific mortality, which was first observed at about age 20 days (Tatar & Carey, 1995). Contrary to the expectation that high reproduction increases the rate of change in age-specific mortality, the mortality curves for cohorts with different levels of reproductive effort did not diverge. This result itself is not likely to be biased by the effects of below-threshold mortality because the mortality rate parameters (Gompertz and other models) were estimated by maximum likelihood methods (we discuss this estimation approach later). The interpretation of these results, however, can be affected by below-threshold mortality.

Based on the 2-week difference between the time of egg production and the response in mortality, Tatar & Carey (1995) suggested that a reduction in egg production delayed the onset of senescence. But the size of the cohorts in this study was between 150 and 300 females, which means that hx < 1/300 could not be observed. At the age when egg production was manipulated, hx may have been considerably less than this threshold, and did not exceed it until cohorts were about 20 days old. It was at this age that Tatar and Carey detected the first differences in mortality rate. Clearly, with these data we cannot distinguish the possibility that reproduction led to immediate changes in hx from the possibility that changes in reproduction delayed the onset of senescence.

A similar pattern may exist in other studies, such as work by Partridge et al. that analyses the timing of egg production costs in D. melanogaster (Partridge, 1987). Partridge et al. (1987) noted that egg production in female flies is characterized by an increase in mortality rates, but not until ≈ 40 days after the differences in egg laying rate were induced experimentally. With cohort sizes of 50 females, one would not expect to detect any differences in mortality among the cohorts until the age when hx exceeds 0.02, which coincides with age 39–42 days in this experiment (based on recalculated life tables from figure 2 in Partridge, 1987).

We see that below-threshold mortality can obscure at least one aspect of the problem of how reproduction affects longevity. Tatar & Carey (1995) were able to show that egg production affects the long-term trajectory of mortality rate by changing the age-independent component of mortality but not the rate of increase in mortality. Whether egg production involves immediate or only long-terms costs to females, however, remains confounded by below-threshold mortality.

Variation in mortality and the evolution of senescence

Models for the evolution of ageing suggest that to fully understand the genetic forces that shape senescence, we need to compare not only mortality trajectories among cohorts, but also the variation in mortality rates at specific ages within and among populations (Charlesworth, 1990; Hughes & Charlesworth, 1994). However, estimates of variance that have been made to test evolutionary theories may be strongly biased by sampling error. Here, we will first present a brief simulation result that demonstrates how sample size affects estimates of variance in mortality rates. We then go on to discuss some specific experimental results that may be confounded by sampling error, and discuss some approaches that have been used to correct for at least some of this bias.

Variances of mortality – simulation results

We have briefly alluded to the way in which sampling error can lead us to underestimate the variance in mortality rates among cohorts. Here we illustrate this point in more detail with a simple simulation study. In the simulation, we compare the true variance with estimates over a range of values for the force of mortality and sample sizes.

To begin with, we created a series of 50 computer cohorts drawn from a random normal distribution of underlying mortality rates with variance σ2ln[h]=0.5 and mean mortality . Each cohort had N individuals, and within the ith cohort, each of the N individuals had random probability mi (=1−ehi) of dying. We calculated an observed mortality for each cohort, defined as qi=−ln(Pi), where Pi is the observed fraction of N individuals that did not die. We compared the variance of the true mortality rates (σ2ln[h]=0.5) and observed mortality rates (σ2ln(q)) among the 50 cohorts, for each of 36 average mortality values (ranging from approximately = 0.0001 to =10), and for each of four sample sizes (N=20, 100, 500 and 50 000). Note that the true variance of ln(h) in this model is independent of the mean mortality for all values of .

The observed variance among cohorts comes from two sources. One variance component is that due to the true variance in mortality rates among the cohorts σ2ln[h]=0.5. The second component arises from sampling variance, which would appear even if all cohorts had the same underlying mortality rate. Thus, in measuring variance for the logarithm of observed mortality rates, we correct by subtracting sampling variance that occurs when we set σ2ln[h]=0.

The results are shown in Fig. 5. Note that even relatively large sample sizes per cohort (N=500) give quite serious underestimates for all but a rather narrow range of values of . At both high and low values of , variance is underestimated. At low values of , variance is low because mortality rates are below the observable threshold. In many cohorts, qi=0, log(qi) is undefined and the cohort is considered as a missing value. Similarly, at very high values of , most cohorts are above the observable threshold; most cohorts will have an observed survival Px=0, and the existing variance will be hidden. To provide accurate estimates of variance components for mortality over a broad range, one would need many thousands of individuals.

Figure 5.

Observed variance in mortality rates among 50 cohorts of sample sizes 20, 100, 500 or 50 000. True variance among cohorts is 0.5 in all cases. Variance in mortality rates was corrected for expected variance due to sampling error (whence the negative values). When mortality rates are low relative to the inverse of sample size, variance is consistently underestimated. See text for details of the simulation.

Testing models for the evolution of senescence

The simulation shows that sampling error leads us to underestimate variance in mortality. It turns out that our ability to estimate variances for the force of mortality is critical to our ability to test theories for the evolution of senescence. To illustrate the connection between variances and senescence, we first give some background on theories for the evolution of senescence.

Senescence, defined as the persistent decline in age-specific reproduction or survival due to internal physiological decline (Rose, 1991), is found throughout the animal kingdom (Finch et al., 1990; Promislow, 1991; Rose, 1991). Evolutionary studies of senescence usually explore one of two genetic theories, mutation accumulation (Medawar, 1952) and antagonistic pleiotropy (Williams, 1957), as potential explanations for the origin and maintenance of senescence. Both theories are predicated on the fundamental idea that the strength of selection on fitness components declines with age (Medawar, 1952; Hamilton, 1966). A deleterious allele whose effects are not manifest until late in life is much less likely to be removed by natural selection than an allele with equally deleterious effects early in life.

Under mutation accumulation, the age-specific decline in the force of selection will give rise to a higher frequency of late-acting versus early acting deleterious alleles. This high mutation load at late ages will, in turn, lead to a decline in fitness components with age. Antagonistic pleiotropy differs from mutation accumulation in that these same late-acting deleterious alleles are assumed to be favoured by natural selection when they have early acting beneficial effects.

Both theories give rise to specific, testable genetic predictions. If ageing occurs because of antagonistic pleiotropic alleles, we expect to find negative genetic correlations between early-age and late-age fitness traits, and high dominance genetic variance for fitness traits (Rose, 1982, 1985; Charlesworth, 1994; Curtsinger et al., 1994). In contrast, models of mutation accumulation (Charlesworth, 1990; Charlesworth, 1994) suggest that additive genetic variance for fitness traits should increase with age.

We now have over 15 years worth of experiments that show substantial (though not universal) support for the antagonistic pleiotropy theory of ageing (reviewed in Rose, 1991). But only recently have scientists begun to estimate directly age-specific variance components of mortality rates to test predictions from the mutation accumulation theory of ageing. In a study of mortality rates in D. melanogaster, Hughes & Charlesworth (1994) estimated additive genetic variance for mortality rates at three different ages. They found a consistent increase with age, as predicted by the mutation accumulation model.

But as the simulations above demonstrate, even if the variance of hx among cohorts is constant with age, it may appear to increase with age because we systematically underestimate the variance at early ages, when hx is small relative to Nx. Thus, the increase in variance could be an artefact of sampling error.

To overcome this bias in estimates of variance, Promislow et al. (1996) used a similar experimental design as that of Hughes & Charlesworth (1994), but with substantially larger sample sizes. They also found that genetic variance components for mortality initially increased with age.

However, as Fig. 5 shows, even with quite large sample sizes we will still underestimate variance if hx is small. In light of this, Shaw et al. (unpubl. obs.) constructed a maximum likelihood model that estimates genetic variance components for age-specific mortality. The model accounts not only for effects of sample size, but also incorporates the unusual nature of the error variance for mortality rate, which is binomial within cohorts but log-normal among cohorts. Shaw et al. (unpubl. obs.) have reanalysed data from both Promislow et al. (1996) and Hughes & Charlesworth (1994). Their results indicate that both sets of data show a late-age decline in additive variance for mortality rates, contrary to expectation.

Sampling error, senescence and inbreeding load

There are clearly enormous challenges in obtaining accurate estimates for variances of mortality rates among genotypes. In a recent paper, Charlesworth and Hughes provide a possible alternative prediction for these models that may allow us, at least on the face of it, to avoid having to estimate variances. Charlesworth & Hughes (1996) pointed out that under certain circumstances, antagonistic pleiotropy can lead to an age-related increase in variance for fitness components similar to that expected with mutation accumulation. Accordingly, they derived novel genetic predictions that distinguish between mutation accumulation and antagonistic pleiotropy. They show that a comparison of survival rates in inbred versus outbred lines may provide a mutually exclusive test of the two models. According to their models, mutation accumulation should give rise to an age-related increase in the effect of inbreeding on survival, whereas antagonistic pleiotropy should not. They predict that the inbreeding load, which is equivalent to the log of the ratio of survival in outbred to inbred flies (eqn 12a,b), should increase with age.

To test this model, they compared inbred and outbred age-specific survival Px, which can be expressed in terms of mortality rate, Px=e−μx. The genetic load, L, for a given age, x, is given by

inline image

(after Charlesworth & Hughes, 1996), and can also be restated in terms of mortality, μx. In this case, the inbreeding load

inline image

We can see from this that even if the difference between inbred and outbred strains does not change, we are less likely to detect any difference at all at early ages, when hx is small. Thus, sampling error may give rise to an apparent increase in inbreeding load even if it is actually constant with age. A fuller treatment of this model is presented elsewhere (Promislow & Tatar, 1998).


Human demographers generally do not have to worry about the issues that we have raised here. Sample sizes are typically on the order of hundreds of thousands, if not millions. Of course, for studies among the ‘oldest old’ this is not the case. For example, Thatcher's (1992) study of mortality among elderly British is based on sample sizes of tens or hundreds.

The list of challenges that we have enumerated here arise as field or laboratory scientists attempt to use demographic techniques to study real organisms under constraints of time, space and money. Despite the litany of problems we have introduced, there are a variety of solutions, some more problematic than others, that can at least move us in the right direction, if not fully resolve our difficulties. Before outlining solutions, note that in some cases there may simply be no way to obtain numbers sufficient to calculate mortality rates. In this case, it is best to use a summary statistic such as life expectancy at birth (e0).

Solutions to the problems described above take three forms. First, we need to use the largest sample size possible. Second, we need to use nonparametric analyses as well as parametric approaches when we analyse age-specific mortality curves. And finally, we need to use robust statistical methods to evaluate mortality rates and their variances.

The easiest way to reduce or remove the effects of below-threshold mortality is to lower the threshold by increasing sample size (see, for example, Fig. 2). The effects of below-threshold mortality are most apparent when hx is close to or less than 1/N. If N is sufficiently large, the problem of sampling error is resolved, at least biologically. Of course, sampling error can never be completely eliminated.

It is usually not possible to gather sample sizes large enough to remove the effects of below-threshold mortality entirely. In some cases, even with fairly small sample sizes, visual inspection of the data can sometimes be even more informative than oversimplified statistical analysis. In our reanalysis of Orr & Sohal's (1994) data, we were first able to determine that the slope of the Gompertz line had not changed substantially by simply connecting the points in their figure (see Fig. 4). In general, if we use nonparametric methods we are less likely to gloss over the nonlinear trends that often appear in mortality rates as a result of below-threshold mortality. However, visual inspection can also be misleading. For example, recent work by Horiuchi & Wilmoth (1997) illustrates cases where departures of the Gompertz plot from linearity are not detectable with visual analysis.

Biological data are often rather messy, and neither large sample size nor careful visual inspection of the data will resolve the problems we have outlined. At least for the case of estimating the Gompertz parameters, a number of statistical solutions have been proposed. We can fit a least-squares, linear regression to age-specific mortality, but this technique does not account for the biasing effects of small sample size and below-threshold mortality. Here we discuss three previous statistical attempts to resolve this problem, each of which has advantages and disadvantages, and then propose a novel alternative that we believe may alleviate much of the bias arising from small sample size.

First, some workers have advocated using weighted least-squares regression, where weights are an increasing function of the number of individuals alive (Gaillard et al., 1994). Simulation results (S. Pletcher, unpublished) suggest that a weighted regression will increase error rates even further than unweighted regression, by placing greater emphasis on early mortality observations, which are least reliable if early age mortality rates are low.

Second, we can smooth mortality data using the technique of nonparametric kernel smoothing (e.g. Zelterman et al., 1994). These techniques provide estimates of the instantaneous probability of death when mortality rates are less than 1/N. However, the effect of using smoothed values in statistical estimation and inference is not yet fully understood.

A third approach uses nonlinear regression to fit the survivorship curve predicted by the Gompertz equation (Gavrilov & Gavrilova, 1991; Hughes, 1995). Although much improved over linear regression (Mueller et al., 1995), the bias is still substantial (over 50%) in small samples (Pletcher, unpublished).

The most powerful approach, used commonly in medical survival analysis, but infrequently by population biologists, is that of maximum likelihood estimation (MLE). We can use maximum likelihood to estimate the Gompertz parameters, based not on age-specific force of mortality, but rather on the age distribution of deaths (Lee, 1992). This approach avoids the threshold problem altogether by eliminating the need to calculate mortality rates, and it provides unbiased estimates of slope β and intercept ln(α).

Maximum likelihood estimates of mortality parameters

The MLE approach allows us to determine accurately the parameters of any parametric mortality model. To illustrate the power of MLE, we use this approach to re-analyse some previously published results, and focus specifically on the Gompertz model. However, as noted previously, Gompertz models may not apply in many populations, or over some portion of the life span of a cohort. Our discussion of MLE approaches applies equally well to any parameterization of mortality. In this light, we show how MLE can be used to choose the most appropriate model from among a family of models. (Software packages to run the statistical models described in the following section have been developed by S.P. and are available at no charge on the World Wide Web at

Let us first assume that mortality rates follow the Gompertz trajectory. Thus,

inline image

as in eqn 4. Unlike linear and nonlinear regression, MLE is based on fitting the distribution of ages at death rather than on age-specific mortality rates to estimate parameter values. The density function describing the probability that an individual dies between ages t and t + dt for eqn 13 is described by

inline image

where f(t) is the expected proportion of individuals dying in the tth age-class.

Since mortality rates are not calculated there is no threshold problem. Also, parameter estimates have desirable statistical properties – they are asymptotically unbiased and normally distributed. The maximization procedure includes estimates of standard errors, which makes hypothesis tests concerning the estimates straightforward. Many statistical packages provide methods for maximum likelihood estimation including BMDP (module LE) and S-Plus (the nlmin function).

For a sample of size N, the log-likelihood is calculated by

inline image

where ti is the age of death for the ith individual. Maximizing this function with respect to its parameters (α, β for the Gompertz) produces the maximum likelihood estimates.

We can use the likelihood value to compare parameters between two sets of data. For example, if we wish to test the null hypothesis that two data sets have the same slope, β, we can compare the difference in log likelihoods for a model where we constrain the slopes to be identical (β1=β2=β), versus a model where we allow β1 and β2 to differ (with β1 and β2 unconstrained in both cases). Twice the difference of the log likelihoods for the two models is distributed as χ2 with degrees of freedom equal to the number of additional parameters in the larger model. We use the χ2 value to test if the additional parameter, βi, results in a significantly better fit to the data.

This approach is illustrated in Table 2. Continuing from the example in ‘Estimating parameters of the Gompertz equation’, we used MLE to fit Gompertz equations to Orr and Sohal's Drosophila data (Orr & Sohal, 1994) and, for each model, obtained the value of the log-likelihood at its maximum. We want to determine whether the treatment (altered SOD/catalase expression) has changed the actuarial rate of ageing, the slope β. As we see in Table 2, there is no evidence that rates of ageing differ among treatments (Table 2, test 3).

Table 2.  Likelihood ratio testing procedure used to determine treatment differences in Gompertz parameters for the data of Orr & Sohal (1994). We would like to know whether the data from the two treatments are best explained by a single Gompertz equation, or whether we need to include differences in slope, intercept or both. By fitting mortality models using maximum likelihood, we can use hypothesis testing procedures to determine the best model for the data (see ‘Maximum likelihood estimates of mortality parameters’). Test 1 asks if a model with separate parameters for slope and intercept fits significantly better than a single equation. The P value for this test is <0.00001, providing strong evidence against the null hypothesis of identical parameters for both treatments. Test 2 asks if the separate equations model fits significantly better than a model of different slopes but equal intercepts. Again, the P value suggests we reject the null hypothesis (= 0.002). The final test asks if the separate equations model fits better than a model with different intercepts but identical slopes. In this case, = 0.75 and there is no evidence against the null hypothesis of equal slopes. Thus, the smallest model consistent with the data is one which assumes different base mortality rates for each treatment but equal rates of increase of mortality with age. Thumbnail image of

Mortality rates may not always follow a Gompertz trajectory. Fortunately, we can also use MLE to determine which model best describes the data. There is increasing experimental evidence that mortality rates deviate substantially from Gompertz dynamics. These deviations emphasize the need to examine multiple models when analysing mortality. In both Drosophila (Curtsinger et al., 1992) and mediterranean fruit flies (Carey et al., 1992), the rate of increase in mortality has been shown to decelerate at older ages. Also, in some cases where sample size is large enough, mortality rates do not increase coincident with the onset of maturity, as the Gompertz model would predict (Pletcher, unpublished observation). If we simply examine the data by eye, we will not always detect this trend, because when sample sizes are small and mortality is below threshold, it is not possible to distinguish visually between constant and increasing mortality (Horiuchi & Wilmoth, 1997). If rates are constant for a number of early ages we need to account for this trend in a mortality model.

Here we illustrate how MLE can be used to differentiate truly constant mortality rates early in life from apparent constancy due to occasional deaths when true mortality is below threshold (e.g. Fig. 2). The approach is based on hierarchical modelling procedures and likelihood ratio tests. Although we present only two models for comparison, the technique is easily extended to more complicated hierarchies which examine additional mortality models (Vaupel, 1990; Fukui et al., 1993). Consider the following extension of the Gompertz model:

inline image

Equation 13 is the standard Gompertz equation, while eqn 16 is called the Gompertz–Makeham and includes an additional term, γ, which represents an age-independent mortality rate. In the case of the Gompertz–Makeham, early in life μx ≈ γ; but as the cohort ages, the exponential term dominates and the population exhibits Gompertz dynamics (Fig. 6). The density function describing the probability that an individual dies between ages t and t + dt for eqn 16 is as follows:

Figure 6.

Age-specific trajectories for Gompertz and Gompertz–Makeham mortality. The two curves represent mortality following eqns 13 and 17, where α=0.002, β=0.06 and γ=0 (Gompertz) or γ=0.02 (Gompertz–Makeham).

inline image

Note that the two models (eqns 13 and 16) are nested – the Gompertz model is a special case of the Gompertz–Makeham, with γ=0. Maximizing eqns 14 or 17 with respect to their parameters (α, β for the Gompertz; α, β, γ for the Gompertz–Makeham) produces the maximum likelihood estimates. Because the two models are nested, twice the difference of their log likelihoods is distributed as χ2 with degrees of freedom equal to the number of additional parameters in the larger model (d.f.=1, in this case). This allows us to test if the additional parameters result in a significantly better fit to the data. The χ2 value tests the null hypothesis H0: γ=0. Thus, we can objectively choose the mortality model which best describes the observed data.

In addition to providing an objective framework for choosing a mortality model, MLE produces parameter estimates with the smallest error of any method we have examined to date. Standard statistical theory guarantees that parameter estimates are asymptotically efficient (i.e. they have the smallest allowable variance); and numerical simulations clearly show, for a variety of mortality models, estimates are only slightly biased (section ‘Gompertz parameters’ and Fig. 3, above). Since the standard error of an estimator is a simple function of its variance and bias, there is little room for improvement by other methods.

For simplicity, we have focused here on the Gompertz model. However, the effect of sampling error on estimates of mortality curve parameters applies equally to a variety of different mortality curves. For example, many have argued that the Weibull model provides a better fit to the data than the Gompertz curve. The general shape, however (monotonic increasing) is such that, like the Gompertz curve, estimates of the Weibull function will be biased by the small numbers of individuals dying at early and late ages.


Previous work has shown that insufficient sample sizes can lead to elevated rates of both type I and type II error in studies of mortality (Shouman & Witten, 1995; Promislow et al., 1996). If mortality follows a Gompertz-like or similar trajectory, the problems we allude to here will be most apparent very early and very late in life. Why not, then, focus on mortality at middle ages, when our statistical power is greatest? There are certainly some cases where average adult mortality provides an appropriate focus for study (Sæther, 1988; Promislow & Harvey, 1991; Charnov, 1993). However, from the experimental gerontologist's perspective, late-life mortality is of greatest interest. From an evolutionary biologist's point of view, in many ways the most important events in the life cycle are those that occur very early in life, when reproduction can have the highest fitness consequences. And in terms of the fitness cost of ageing itself, increases in mortality due to senescence have their greatest impact on fitness when they occur early in life (Abrams, 1991).

In this work, we have deliberately focused on early age mortality, when risk of bias due to sampling error is greatest, and the variation in mortality rates is most significant to questions of ecology and evolution. We have addressed four particular areas of research in which answers may be biased by below-threshold mortality, including estimating Gompertz parameters, determining the age at onset of senescence, measuring costs of reproduction, and testing for age-specific changes in variance of mortality rates.

No analytical solution will completely resolve these sampling problems. However, they can be ameliorated. At this early juncture, we have offered two general solutions.

First, and most obviously, in any study of mortality rates, it is crucial that one use as large a sample as is experimentally and financially feasible. The greater the sample, the greater the resolution, particularly at very early and late ages. Of course, if one wants to detect only very large differences in mortality rates, colossal sample sizes may be a wasteful expense (Cantor, 1992).

Just how large is large enough? A useful approach to determine how large an experiment should be is to begin with a power analysis before the actual experiment is conducted (Kraemer, 1987). The power analysis allows us to determine what sample size will be necessary to detect a statistically significant difference among treatments. For example, if we wish to compare mortality rates between two cohorts, and we know the average mortality rates in the cohorts, we can then determine what sample size would be necessary to distinguish them statistically (Casagrande et al., 1978).

Second, we have described one maximum likelihood approach to minimize the confounding effects of threshold mortality rates. Assuming that there are enough individuals for the MLE to converge on a result, the effects of below-threshold mortality will not bias the result. One can use these models not only to obtain powerful estimates of Gompertz parameters (Curtsinger et al., 1995; Mueller et al., 1995; Tatar & Carey, 1995; Fukui et al., 1996; Promislow et al., 1996), but also to infer the nature of the mortality trajectory at ages when age-specific mortality is below the threshold of detectability. Work is currently underway (S. Pletcher, in preparation) to develop more refined statistical techniques for such analysis.

One might argue that we could circumvent all of these difficulties by dispensing with mortality rates altogether, and using in their place summary statistics such as life expectancy at birth (equal to the average age at death in a stationary population), time to 50% survival, or even the Gompertz parameters themselves. Some workers have argued that these summary statistics serve as a useful alternative to mortality rates (e.g. Mueller et al., 1995), and are not subject to the challenges that arise from small sample size. There are many good reasons not to use such summary statistics. Our discussion on the biological implications of demographic sampling error provides one illustration of how sample size can lead to biases in the summary statistics (in this case, the Gompertz parameters). In addition, these summary statistics do not allow us to discern differences in mortality patterns among different age classes. These are the very differences that we need to consider if we are to determine the fitness consequences of behavioural or physiological differences among cohorts (e.g. Tatar & Promislow, 1997).

Finally, our aim here has been to highlight the need for careful experimental design and interpretation of experimental results. But even more importantly, we hope that this work might encourage others to develop analytical solutions for the problem of demographic sampling error.


For valuable discussion or comments on earlier drafts of this manuscript, we thank Peter Abrams, Jim Curtsinger, Shiro Horiuchi, Tad Kawecki, Locke Rowe, Phil Service, Steve Stearns and Frank Shaw. Financial support during the course of this work was provided by NIH grant AG08761 to J.R.C., NIH grant AG14027 to D.P. and the Center for Population Biology at the University of California, Davis.