### Abstract

- Top of page
- Abstract
- Introduction
- The force of mortality
- Statistical implications of below-threshold mortality
- Biological implications of demographic sampling error
- Solutions
- Maximum likelihood estimates of mortality parameters
- Conclusions
- Acknowledgments
- References

Evolutionary biologists, ecologists and experimental gerontologists have increasingly used estimates of age-specific mortality as a critical component in studies of a range of important biological processes. However, the analysis of age-specific mortality rates is plagued by specific statistical challenges caused by sampling error. Here we discuss the nature of this ‘demographic sampling error’, and the way in which it can bias our estimates of (1) rates of ageing, (2) age at onset of senescence, (3) costs of reproduction and (4) demographic tests of evolutionary models of ageing. We conducted simulations which suggest that using standard statistical techniques, we would need sample sizes on the order of tens of thousands in most experiments to effectively remove any bias due to sampling error. We argue that biologists should use much larger sample sizes than have previously been used. However, we also present simple maximum likelihood models that effectively remove biases due to demographic sampling error even at relatively small sample sizes.

### Introduction

- Top of page
- Abstract
- Introduction
- The force of mortality
- Statistical implications of below-threshold mortality
- Biological implications of demographic sampling error
- Solutions
- Maximum likelihood estimates of mortality parameters
- Conclusions
- Acknowledgments
- References

Age-specific mortality schedules lie at the heart of models that explore an enormous range of biological phenomena, from basic calculations of Darwinian fitness (Fisher, 1930) to the evolution of ageing (Hamilton, 1966; Charlesworth, 1994), from explorations of population dynamics (Caswell, 1989) to predictive analyses for conservation strategies (Lande, 1988a,b). But the estimation of age-specific mortality rates brings with it a variety of statistical challenges, and unless we address these challenges in our experiments, we are likely to draw biased conclusions.

In the following work, we describe how studies of mortality rates that use insufficient sample sizes or inappropriate statistical models may bias conclusions about many contemporary issues in evolutionary research. First, studies of ageing in laboratory animals have claimed that genetic factors extend life span by slowing the rate of senescence. We will argue here that when one considers the effect of sample size, evidence for changes in baseline mortality is strong, but evidence for changes in rates of ageing may turn out to be weaker than previously thought. Second, evolutionary theory predicts that the onset of senescence should coincide with the onset of reproduction. Here we discuss how sample size affects our ability to determine when senescence begins. Third, both field and laboratory studies provide evidence for a trade-off between reproduction and survival. We would like to know whether increases in reproduction actually affect the rate of ageing. Again, insufficient sample sizes can bias our estimates of cost of reproduction. Fourth, Medawar's widely accepted ‘mutation accumulation’ model for the evolution of senescence (Medawar, 1952) makes certain predictions about variation in age-specific mortality rates among genotypes. Only recently have biologists begun to test this model by asking how genetic variance for mortality rates changes with age. Results from these studies are critical to our understanding of evolutionary models of ageing. But sampling error may bias our results in favour of the model we set out to test.

In each case, we focus in particular on the influence of sample size on mortality estimates at the youngest adult ages. At early ages, mortality rates are low and so are relatively prone to sampling error. However, models by Hamilton and others (Lewontin, 1965; Hamilton, 1966; Abrams, 1991; Charlesworth, 1994) show that it is at these early ages that fitness is most influenced by variation in life history characters. From an evolutionary perspective, early age mortality rates are extremely important, but are also those most likely to be estimated incorrectly, leading to biased estimates of mortality patterns over the life-span of a cohort. Early age fecundity is also a critical component of lifetime fitness. However, the statistical nature of age-specific mortality and fecundity are qualitatively different. We confine our discussion here to estimates of mortality rates, whose estimates lead to particular statistical challenges.

Before examining the specific problems outlined above, we will provide some conceptual background, including an explanation of mortality rates and how they are estimated, and we introduce the statistical challenges of below-threshold mortality. This brief statistical discussion will then motivate our subsequent discussion of the effect of below-threshold mortality on specific biological problems.

### The force of mortality

- Top of page
- Abstract
- Introduction
- The force of mortality
- Statistical implications of below-threshold mortality
- Biological implications of demographic sampling error
- Solutions
- Maximum likelihood estimates of mortality parameters
- Conclusions
- Acknowledgments
- References

Mortality rates measure the fraction of individuals in a population dying in a given time interval. Formally, age-specific mortality, *m*_{x}, is the probability of dying in the discrete time interval from age *x *to age *x* + Δ*x*, conditional on surviving to age *x *(Chiang, 1984). See Table 1 for a list of variables used throughout this manuscript. It is usually defined as

Table 1. Variables used in this paper. where *l*_{x}is the proportion of individuals surviving to age *x*, Δ*x* ≥ 1, and *q*_{x} is the estimator of the parameter *m*_{x}. Similarly, we can define age-specific mortality as *q*_{x}=*d*_{x}/*N*_{x}, where *N*_{x} is the number of individuals alive at age *x*, and *d*_{x} is the number of individuals dying between age *x* and *x* + Δ*x* (i.e. *N*_{x}−*N*_{x}_{+Δ}_{x}). This estimate of mortality is limited, because its value depends on the size of the census interval Δ*x*, and *q*_{x} has an upper bound of 1. Both of these factors limit the power of *q*_{x} as a true and unbiased estimate of *m*_{x}. In place of *q*_{x}, demographers estimate the force of mortality, *h*_{x} (also known as instantaneous mortality rate or hazard rate) (Lee, 1992). The force of mortality is the instantaneous probability of death on a continuous time-scale, and is estimated by μ_{x}, the limit of the probability of dying in a given time interval as that interval becomes infinitesimally small:

(Lee, 1992). This formulation is unmeasurable with the discrete data typically available, but is well approximated by

In addition to calculating mortality rates for each age interval, one can fit parametric models to describe the trajectory of mortality across all ages. Age-specific mortality rates often increase as a function of age (e.g. Comfort, 1979; Finch, 1990; Promislow, 1991) and approximate an exponential trajectory, such that

where α is the initial, age-independent rate of mortality, and *e*^{β} is the rate at which mortality increases with age. This equation can be linearized by taking the logarithm of both sides, such that

This is the well-known ‘Gompertz’ mortality equation (Gompertz, 1825), with intercept ln(*α*) and slope *β*. The Gompertz model may not adequately describe mortality at very early ages if natural mortality rates are high (Abrams, 1991), or at very late ages, as recent studies based on sufficiently large sample sizes have shown (Carey *et al*., 1992; Curtsinger *et al*., 1992; Kannisto *et al*., 1994; see also Abrams & Ludwig, 1995). However, for the sake of the analysis we present here, and unless stated otherwise, we assume that mortality curves are Gompertz across all ages. This model provides a useful heuristic tool to illustrate the effect of sample size (*N*_{x}) on mortality parameters, especially under the common assumption that adult mortality rates are continuous and exponentially increasing at least through young and middle ages. Although more complex parametric models may provide a better fit to mortality data than the Gompertz, the arguments we make in here apply in any case.

### Statistical implications of below-threshold mortality

- Top of page
- Abstract
- Introduction
- The force of mortality
- Statistical implications of below-threshold mortality
- Biological implications of demographic sampling error
- Solutions
- Maximum likelihood estimates of mortality parameters
- Conclusions
- Acknowledgments
- References

We defined age-specific mortality as *q*_{x}=*d*_{x}*N*_{x}. Note that both *d*_{x} and *N*_{x} can take on discrete values only. An individual is either dead or alive (*d*_{x}={0,1,2…}). Thus, at its minimum measurable value, *q*_{x} is equal either to 0 if no one dies, or to 1*N*_{x} if a single individual dies. Similarly, the lower observable nonzero bound for μ_{x}=−ln{(*N* − 1)*N*} ≈ 1*N*_{x} for large *N*.

Although measures of mortality are discrete, we generally assume that the senescent change in mortality with age is a continuous, monotonically increasing process, and that for all ages there is a finite risk of mortality even when no deaths are observed. We recognize that there will exist some *true* force of mortality *h*_{x} > 0 even if the *observed *force of mortality μ_{x}=0. But what happens if the true force of mortality is below the observable threshold (i.e. 0 < *h*_{x} < 1/*N*_{x})? In this case, we lack a robust way to estimate true mortality in the cohort because a fraction of an individual cannot die. We refer to this region of the parameter space as below-threshold mortality (Fig. 1). In this region, since the force of mortality is small relative to sample size, sampling error will lead to biases in estimates of the mean, the variance and the long-term trajectory (e.g. the Gompertz parameters) of the force of mortality.

#### Point estimates of mortality

At the early-age and late-age boundaries of mortality trajectories, when either mortality or survival (*h* or *e*^{−}^{h}, respectively) are close to zero, mortality behaves as a threshold character and becomes difficult to estimate accurately (Gaines & Denny, 1993). Consider a cohort with initial sample size *N*_{0}=50 individuals, with a true hazard rate in the youngest age class, *h*_{0}=0.001. What is the probability of observing one or more deaths in this cohort at a given age *x*? Assuming that each individual in a cohort is equally likely to die in a given time interval, since *h* < 0.1 and *Nh* < 5, the distribution of deaths at age 0 is approximately Poisson (Sokal & Rohlf, 1995). Thus, the probability of observing no deaths,

and the probability of observing one or more deaths,

For *h*=0.001 and *N*=50, *P*[*μ *> 0]=0.049. There is a less than 5% chance of observing any mortality (i.e. of estimating mortality as nonzero) in the first age-class, even though *h* > 0. In fact, if *h*=0.001, to determine accurately that the true mortality rate is significantly greater than zero at the 95% confidence level with a power *β*=0.1, we would need a sample size *N* almost 10 times larger than 1/*h* (*N* ≈ 10 000) (Zar, 1984). Thus, the point estimates of below-threshold mortality will be highly inaccurate.

#### Variance in mortality

We have noted that the observed force of mortality necessarily takes on discrete values, because a fraction of an individual cannot die. When mortality is low relative to the inverse of sample size, we tend to either over- or under-estimate the true force of mortality. When mortality rates are much lower than 1/*N* (e.g. at young adult ages), we will observe mortality rates of either zero or 1/*N*, but not intermediate values. This, in turn, influences our estimates of the *variance *in the force of mortality among cohorts. *Within* a cohort of genetically identical individuals, if we assume that all individuals in that cohort have the same probability of mortality, then the expected variance of mortality rate, σ_{q}_{′}^{2} is approximately binomial (an individual is either alive or dead), and is given by

for the *i*th cohort. In this simplest case, the estimate of expected variance for *q* takes into account the effect of sample size. Note also that for small *q* and large *N*, the expected number of deaths, *d*, is Poisson distributed, the expected variance in number of deaths is *d*, and the expected variance in mortality rate, σ^{2}_{−2.4}_{mmq}=*d**N*^{2}.

When we try to estimate variance in mortality *among* cohorts, the situation is more complicated. Individuals in different cohorts will not have the same underlying probability of dying, so the variance is no longer given by eqn 7. Furthermore, when mortality rates are low, we will systematically underestimate variance due to below-threshold mortality. Consider an extreme but nonetheless illustrative example. Assume that *h*_{x},_{i} z.Lt; 1*N*_{x},_{i}, where *h*_{x},_{i} is the true force of mortality and *N*_{x},_{i} is the population size at age *x* for the *i*th cohort. Given sufficiently low mortality rates and small sample sizes in each cohort, it is likely that no deaths will be observed in the age interval *x* to *x* + Δ*x* and the mortality rate will be estimated as zero (μ_{x},_{i}=0 for all cohorts). Thus, even if σ_{h}^{2}>0, the observed variance will be equal to zero. We elaborate on this point in the section ‘Variation in mortality and the evolution of senescence’ below.

A further complication arises from the fact that whereas mortality is binomially distributed within cohorts, it appears to be log-normally distributed among a sample of cohorts drawn from a single population (Promislow *et al*., 1996). A logarithmic transformation is therefore needed to normalize the variance. But given that observed mortality rates are often equal to zero, the transformed data (ln[μ_{x}=0]) will be undefined. One way to overcome this problem is to add a constant to each value (e.g. ln(μ_{x}) is replaced by ln(μ_{x} + 1), Hughes & Charlesworth, 1994). This gives rise to a systematic underestimate of the variance when μ_{x} is small (i.e. at early ages) relative to larger values of μ_{x} (i.e. at older ages), giving rise in turn to an apparent, though potentially spurious, increase in age-specific variance. The only way to truly circumvent this problem is to use sufficiently large cohort sample sizes, such that μ_{x} > 0 for all cohorts.

Although we assume that genetically identical individuals have identical risk of mortality, this is a conservative assumption. If the force of mortality, *h*, varied within cohorts, we would need even greater sample sizes to obtain accurate estimates of the variance.

#### Gompertz parameters

There are many circumstances where we wish to describe the overall trajectory of mortality schedules. We referred to one common model, the Gompertz, which assumes that mortality rates increase exponentially with age (eqns 4 and 5). Often, the parameters *α* and *β* are estimated by fitting a least-squares linear regression to eqn 5 (Finch *et al*., 1990; Johnson, 1990; Austad, 1993).

However, this procedure is affected by the problem of below-threshold mortality (Fig. 2). At early age-classes, when *h*_{x} < 1/*N*_{x}, we may observe occasional deaths. In most age-classes, however, no deaths will occur, and log(μ_{x}) for these zero-mortality age classes is undefined. Thus, when we fit a regression line to age-specific mortality, we typically count zero-values as missing (Tatar & Carey, 1995). If we discard these classes from the regression, the best fit line will include only the early observations with values equal to or greater than 1/*N*_{x}, which will bias the intercept toward larger than real values and the slopes toward smaller than real values (see Fig. 2a). The magnitude of the bias is a function not only of sample size, but also of the value of the true Gompertz intercept parameter α. If we compare two cohorts with identical slope β but different intercept ln(*α*), the one with lower intercept will appear to have a lower slope. With least-squares linear regression, we cannot independently or accurately estimate *α* and *β* when cohort sizes are small.

To demonstrate the strong influence of sample size on estimates of the Gompertz parameters, we conducted a simulation based on standard resampling techniques, as did Shouman & Witten (1995). However, where Shouman & Witten (1995) focus on the high variance in estimates of the Gompertz parameters due to small sample size, we focus on the consistent directional biases in the estimates that result from small sample size. We created simulated cohorts of 30 000 individuals in which mortality followed a Gompertz trajectory. With these data, we could assign a day of death to each individual within a cohort. From this population of 30 000 individuals, we then sampled individuals at random with replacement to create smaller subpopulations. This was done for each of six subpopulation sample sizes. For each sample size, we estimated the slope and intercept of the Gompertz equation using ordinary least-squares regression. We created 1000 replicate subpopulations for each sample size, from which we were able to estimate the mean and standard deviation for α and β. For the rate parameter, *β*, we present results from five distinct populations characterized by *α*=0.001 and *β*={0.05, 0.08, 0.11, 0.14 or 0.17}. In addition, we analyse four separate populations with respect to the intercept parameter (*α*) in which *β *= 0.05 and *α*={0.0001, 0.001, 0.01 or 0.1} (−9.21, −6.91, −4.61 and −2.30, respectively, on a natural log scale).

This simulation demonstrates that at small sample sizes and using standard regression techniques, we consistently underestimate the slope *β* and overestimate the intercept ln(*α*) (Fig. 3). For a fixed value of *β*, the greater the true value *β*, the more we underestimate it. Similarly, for a fixed rate of *β*, the lower the true value of *α*, the more we overestimate it.

These biased estimates have two implications for hypothesis testing. (1) At small sample sizes, we lose power and are more likely not to reject a null hypothesis of no difference between two rate or intercept parameters even when they differ substantially (type II error). (2) Biases in the estimates of *α* and *β* are not independent. Cohorts with low true values of *α* relative to *N* will yield erroneously low estimates of *β*, and we will observe differences in the slope terms among cohorts even if they only vary in their intercept parameters. Examples of such type I errors are illustrated below.

### Solutions

- Top of page
- Abstract
- Introduction
- The force of mortality
- Statistical implications of below-threshold mortality
- Biological implications of demographic sampling error
- Solutions
- Maximum likelihood estimates of mortality parameters
- Conclusions
- Acknowledgments
- References

Human demographers generally do not have to worry about the issues that we have raised here. Sample sizes are typically on the order of hundreds of thousands, if not millions. Of course, for studies among the ‘oldest old’ this is not the case. For example, Thatcher's (1992) study of mortality among elderly British is based on sample sizes of tens or hundreds.

The list of challenges that we have enumerated here arise as field or laboratory scientists attempt to use demographic techniques to study real organisms under constraints of time, space and money. Despite the litany of problems we have introduced, there are a variety of solutions, some more problematic than others, that can at least move us in the right direction, if not fully resolve our difficulties. Before outlining solutions, note that in some cases there may simply be no way to obtain numbers sufficient to calculate mortality rates. In this case, it is best to use a summary statistic such as life expectancy at birth (*e*_{0}).

Solutions to the problems described above take three forms. First, we need to use the largest sample size possible. Second, we need to use nonparametric analyses as well as parametric approaches when we analyse age-specific mortality curves. And finally, we need to use robust statistical methods to evaluate mortality rates and their variances.

The easiest way to reduce or remove the effects of below-threshold mortality is to lower the threshold by increasing sample size (see, for example, Fig. 2). The effects of below-threshold mortality are most apparent when *h*_{x} is close to or less than 1/*N*. If *N* is sufficiently large, the problem of sampling error is resolved, at least biologically. Of course, sampling error can never be completely eliminated.

It is usually not possible to gather sample sizes large enough to remove the effects of below-threshold mortality entirely. In some cases, even with fairly small sample sizes, visual inspection of the data can sometimes be even more informative than oversimplified statistical analysis. In our reanalysis of Orr & Sohal's (1994) data, we were first able to determine that the slope of the Gompertz line had not changed substantially by simply connecting the points in their figure (see Fig. 4). In general, if we use nonparametric methods we are less likely to gloss over the nonlinear trends that often appear in mortality rates as a result of below-threshold mortality. However, visual inspection can also be misleading. For example, recent work by Horiuchi & Wilmoth (1997) illustrates cases where departures of the Gompertz plot from linearity are not detectable with visual analysis.

Biological data are often rather messy, and neither large sample size nor careful visual inspection of the data will resolve the problems we have outlined. At least for the case of estimating the Gompertz parameters, a number of statistical solutions have been proposed. We can fit a least-squares, linear regression to age-specific mortality, but this technique does not account for the biasing effects of small sample size and below-threshold mortality. Here we discuss three previous statistical attempts to resolve this problem, each of which has advantages and disadvantages, and then propose a novel alternative that we believe may alleviate much of the bias arising from small sample size.

First, some workers have advocated using weighted least-squares regression, where weights are an increasing function of the number of individuals alive (Gaillard *et al*., 1994). Simulation results (S. Pletcher, unpublished) suggest that a weighted regression will increase error rates even further than unweighted regression, by placing greater emphasis on early mortality observations, which are least reliable if early age mortality rates are low.

Second, we can smooth mortality data using the technique of nonparametric kernel smoothing (e.g. Zelterman *et al*., 1994). These techniques provide estimates of the instantaneous probability of death when mortality rates are less than 1/*N*. However, the effect of using smoothed values in statistical estimation and inference is not yet fully understood.

A third approach uses nonlinear regression to fit the survivorship curve predicted by the Gompertz equation (Gavrilov & Gavrilova, 1991; Hughes, 1995). Although much improved over linear regression (Mueller *et al*., 1995), the bias is still substantial (over 50%) in small samples (Pletcher, unpublished).

The most powerful approach, used commonly in medical survival analysis, but infrequently by population biologists, is that of maximum likelihood estimation (MLE). We can use maximum likelihood to estimate the Gompertz parameters, based not on age-specific force of mortality, but rather on the age distribution of deaths (Lee, 1992). This approach avoids the threshold problem altogether by eliminating the need to calculate mortality rates, and it provides unbiased estimates of slope *β* and intercept ln(*α*).

### Maximum likelihood estimates of mortality parameters

- Top of page
- Abstract
- Introduction
- The force of mortality
- Statistical implications of below-threshold mortality
- Biological implications of demographic sampling error
- Solutions
- Maximum likelihood estimates of mortality parameters
- Conclusions
- Acknowledgments
- References

The MLE approach allows us to determine accurately the parameters of any parametric mortality model. To illustrate the power of MLE, we use this approach to re-analyse some previously published results, and focus specifically on the Gompertz model. However, as noted previously, Gompertz models may not apply in many populations, or over some portion of the life span of a cohort. Our discussion of MLE approaches applies equally well to any parameterization of mortality. In this light, we show how MLE can be used to choose the most appropriate model from among a family of models. (Software packages to run the statistical models described in the following section have been developed by S.P. and are available at no charge on the World Wide Web at http://134.84.74.8/)

Let us first assume that mortality rates follow the Gompertz trajectory. Thus,

as in eqn 4. Unlike linear and nonlinear regression, MLE is based on fitting the distribution of ages at death rather than on age-specific mortality rates to estimate parameter values. The density function describing the probability that an individual dies between ages *t* and *t* + *dt* for eqn 13 is described by

where *f*(*t*) is the expected proportion of individuals dying in the *t*th age-class.

Since mortality rates are not calculated there is no threshold problem. Also, parameter estimates have desirable statistical properties – they are asymptotically unbiased and normally distributed. The maximization procedure includes estimates of standard errors, which makes hypothesis tests concerning the estimates straightforward. Many statistical packages provide methods for maximum likelihood estimation including BMDP (module LE) and S-Plus (the *nlmin* function).

For a sample of size *N*, the log-likelihood is calculated by

where *t*_{i} is the age of death for the *i*th individual. Maximizing this function with respect to its parameters (*α*, *β* for the Gompertz) produces the maximum likelihood estimates.

We can use the likelihood value to compare parameters between two sets of data. For example, if we wish to test the null hypothesis that two data sets have the same slope, *β*, we can compare the difference in log likelihoods for a model where we constrain the slopes to be identical (*β*_{1}=*β*_{2}=*β*), versus a model where we allow *β*_{1} and *β*_{2} to differ (with *β*_{1} and *β*_{2} unconstrained in both cases). Twice the difference of the log likelihoods for the two models is distributed as χ^{2} with degrees of freedom equal to the number of additional parameters in the larger model. We use the χ^{2} value to test if the additional parameter, *β*_{i}, results in a significantly better fit to the data.

This approach is illustrated in Table 2. Continuing from the example in ‘Estimating parameters of the Gompertz equation’, we used MLE to fit Gompertz equations to Orr and Sohal's *Drosophila* data (Orr & Sohal, 1994) and, for each model, obtained the value of the log-likelihood at its maximum. We want to determine whether the treatment (altered SOD/catalase expression) has changed the actuarial rate of ageing, the slope *β*. As we see in Table 2, there is no evidence that rates of ageing differ among treatments (Table 2, test 3).

Table 2. Likelihood ratio testing procedure used to determine treatment differences in Gompertz parameters for the data of Orr & Sohal (1994). We would like to know whether the data from the two treatments are best explained by a single Gompertz equation, or whether we need to include differences in slope, intercept or both. By fitting mortality models using maximum likelihood, we can use hypothesis testing procedures to determine the best model for the data (see ‘Maximum likelihood estimates of mortality parameters’). Test 1 asks if a model with separate parameters for slope and intercept fits significantly better than a single equation. The *P* value for this test is <0.00001, providing strong evidence against the null hypothesis of identical parameters for both treatments. Test 2 asks if the separate equations model fits significantly better than a model of different slopes but equal intercepts. Again, the *P* value suggests we reject the null hypothesis (*P *= 0.002). The final test asks if the separate equations model fits better than a model with different intercepts but identical slopes. In this case, *P *= 0.75 and there is no evidence against the null hypothesis of equal slopes. Thus, the smallest model consistent with the data is one which assumes different base mortality rates for each treatment but equal rates of increase of mortality with age. Mortality rates may not always follow a Gompertz trajectory. Fortunately, we can also use MLE to determine which model best describes the data. There is increasing experimental evidence that mortality rates deviate substantially from Gompertz dynamics. These deviations emphasize the need to examine multiple models when analysing mortality. In both *Drosophila* (Curtsinger *et al*., 1992) and mediterranean fruit flies (Carey *et al*., 1992), the rate of increase in mortality has been shown to decelerate at older ages. Also, in some cases where sample size is large enough, mortality rates do not increase coincident with the onset of maturity, as the Gompertz model would predict (Pletcher, unpublished observation). If we simply examine the data by eye, we will not always detect this trend, because when sample sizes are small and mortality is below threshold, it is not possible to distinguish visually between constant and increasing mortality (Horiuchi & Wilmoth, 1997). If rates are constant for a number of early ages we need to account for this trend in a mortality model.

Here we illustrate how MLE can be used to differentiate truly constant mortality rates early in life from apparent constancy due to occasional deaths when true mortality is below threshold (e.g. Fig. 2). The approach is based on hierarchical modelling procedures and likelihood ratio tests. Although we present only two models for comparison, the technique is easily extended to more complicated hierarchies which examine additional mortality models (Vaupel, 1990; Fukui *et al*., 1993). Consider the following extension of the Gompertz model:

Equation 13 is the standard Gompertz equation, while eqn 16 is called the Gompertz–Makeham and includes an additional term, γ, which represents an age-independent mortality rate. In the case of the Gompertz–Makeham, early in life *μ*_{x} ≈ γ; but as the cohort ages, the exponential term dominates and the population exhibits Gompertz dynamics (Fig. 6). The density function describing the probability that an individual dies between ages *t* and *t* + *dt* for eqn 16 is as follows:

Note that the two models (eqns 13 and 16) are nested – the Gompertz model is a special case of the Gompertz–Makeham, with γ=0. Maximizing eqns 14 or 17 with respect to their parameters (*α*, *β* for the Gompertz; *α*, *β*, γ for the Gompertz–Makeham) produces the maximum likelihood estimates. Because the two models are nested, twice the difference of their log likelihoods is distributed as χ^{2} with degrees of freedom equal to the number of additional parameters in the larger model (d.f.=1, in this case). This allows us to test if the additional parameters result in a significantly better fit to the data. The χ^{2} value tests the null hypothesis *H*_{0}: γ=0. Thus, we can objectively choose the mortality model which best describes the observed data.

In addition to providing an objective framework for choosing a mortality model, MLE produces parameter estimates with the smallest error of any method we have examined to date. Standard statistical theory guarantees that parameter estimates are asymptotically efficient (i.e. they have the smallest allowable variance); and numerical simulations clearly show, for a variety of mortality models, estimates are only slightly biased (section ‘Gompertz parameters’ and Fig. 3, above). Since the standard error of an estimator is a simple function of its variance and bias, there is little room for improvement by other methods.

For simplicity, we have focused here on the Gompertz model. However, the effect of sampling error on estimates of mortality curve parameters applies equally to a variety of different mortality curves. For example, many have argued that the Weibull model provides a better fit to the data than the Gompertz curve. The general shape, however (monotonic increasing) is such that, like the Gompertz curve, estimates of the Weibull function will be biased by the small numbers of individuals dying at early and late ages.

### Conclusions

- Top of page
- Abstract
- Introduction
- The force of mortality
- Statistical implications of below-threshold mortality
- Biological implications of demographic sampling error
- Solutions
- Maximum likelihood estimates of mortality parameters
- Conclusions
- Acknowledgments
- References

Previous work has shown that insufficient sample sizes can lead to elevated rates of both type I and type II error in studies of mortality (Shouman & Witten, 1995; Promislow *et al*., 1996). If mortality follows a Gompertz-like or similar trajectory, the problems we allude to here will be most apparent very early and very late in life. Why not, then, focus on mortality at middle ages, when our statistical power is greatest? There are certainly some cases where average adult mortality provides an appropriate focus for study (Sæther, 1988; Promislow & Harvey, 1991; Charnov, 1993). However, from the experimental gerontologist's perspective, late-life mortality is of greatest interest. From an evolutionary biologist's point of view, in many ways the most important events in the life cycle are those that occur very early in life, when reproduction can have the highest fitness consequences. And in terms of the fitness cost of ageing itself, increases in mortality due to senescence have their greatest impact on fitness when they occur early in life (Abrams, 1991).

In this work, we have deliberately focused on early age mortality, when risk of bias due to sampling error is greatest, and the variation in mortality rates is most significant to questions of ecology and evolution. We have addressed four particular areas of research in which answers may be biased by below-threshold mortality, including estimating Gompertz parameters, determining the age at onset of senescence, measuring costs of reproduction, and testing for age-specific changes in variance of mortality rates.

No analytical solution will completely resolve these sampling problems. However, they can be ameliorated. At this early juncture, we have offered two general solutions.

First, and most obviously, in any study of mortality rates, it is crucial that one use as large a sample as is experimentally and financially feasible. The greater the sample, the greater the resolution, particularly at very early and late ages. Of course, if one wants to detect only very large differences in mortality rates, colossal sample sizes may be a wasteful expense (Cantor, 1992).

Just how large is large enough? A useful approach to determine how large an experiment should be is to begin with a power analysis before the actual experiment is conducted (Kraemer, 1987). The power analysis allows us to determine what sample size will be necessary to detect a statistically significant difference among treatments. For example, if we wish to compare mortality rates between two cohorts, and we know the average mortality rates in the cohorts, we can then determine what sample size would be necessary to distinguish them statistically (Casagrande *et al*., 1978).

Second, we have described one maximum likelihood approach to minimize the confounding effects of threshold mortality rates. Assuming that there are enough individuals for the MLE to converge on a result, the effects of below-threshold mortality will not bias the result. One can use these models not only to obtain powerful estimates of Gompertz parameters (Curtsinger *et al*., 1995; Mueller *et al*., 1995; Tatar & Carey, 1995; Fukui *et al*., 1996; Promislow *et al*., 1996), but also to infer the nature of the mortality trajectory at ages when age-specific mortality is below the threshold of detectability. Work is currently underway (S. Pletcher, in preparation) to develop more refined statistical techniques for such analysis.

One might argue that we could circumvent all of these difficulties by dispensing with mortality rates altogether, and using in their place summary statistics such as life expectancy at birth (equal to the average age at death in a stationary population), time to 50% survival, or even the Gompertz parameters themselves. Some workers have argued that these summary statistics serve as a useful alternative to mortality rates (e.g. Mueller *et al*., 1995), and are not subject to the challenges that arise from small sample size. There are many good reasons not to use such summary statistics. Our discussion on the biological implications of demographic sampling error provides one illustration of how sample size can lead to biases in the summary statistics (in this case, the Gompertz parameters). In addition, these summary statistics do not allow us to discern differences in mortality patterns among different age classes. These are the very differences that we need to consider if we are to determine the fitness consequences of behavioural or physiological differences among cohorts (e.g. Tatar & Promislow, 1997).

Finally, our aim here has been to highlight the need for careful experimental design and interpretation of experimental results. But even more importantly, we hope that this work might encourage others to develop analytical solutions for the problem of demographic sampling error.