Dr James R. Carey, Department of Entomology, University of California, Davis, One Shields Ave., Davis, CA 95616, USA. Tel.: +1 530 7526217; fax: +1 530 7521537; e-mail: email@example.com
We address the problem of establishing a survival schedule for wild populations. A demographic key identity is established, leading to a method whereby age-specific survival and mortality can be deduced from a marked cohort life table established for individuals that are randomly sampled at unknown age and marked, with subsequent recording of time-to-death. This identity permits the construction of life tables from data where the birth date of subjects is unknown. An analogous key identity is established for the continuous case in which the survival schedule of the wild population is related to the density of the survival distribution in the marked cohort. These identities are explored for both life tables and continuous lifetime data. For the continuous case, they are implemented with statistical methods using non-parametric density estimation methods to obtain flexible estimates for the unknown survival distribution of the wild population. The analytical model provided here serves as a starting point to develop more complex models for residual demography, i.e. models for estimating survival of wild populations in which age-at-entry is unknown and using remaining information in randomly encountered individuals. This is a first step towards a broad new concept of ‘expressed demographic information content of marked or captured individuals’.
The life table is one of the most important tools in demographic and gerontological research because it is used to characterize the mortality and survival properties of cohorts and to quantify the actuarial rate of aging. The historical application of classical life table methods in aging science has been largely restricted to the use of mortality data from either humans or experimental animals maintained in the laboratory, or to life tables based on capture–recapture methods to assess aging in wild populations (Udevitz & Ballachey, 1998). In both applications, it is mandatory that age-at-entry is known. This has limited the use of life tables because in the analysis of field populations one often encounters and marks individuals of unknown age. However, capture–recapture and other current field methods generally require capturing and marking of young individuals, or alternatively of individuals of known age, for monitoring throughout their lives until they die.
The predominance of capture–recapture methods has had a limiting effect on the use of flexible non-parametric statistical methods that make minimal assumptions on survival schemes, and have the desirable property that they do not presume statistical parametric survival models. Because in non-parametric modelling one does not specify the functional form of hazard or survival functions, these methods require the exact recording of lifetimes and therefore are not applicable to usual capture–recapture designs, which correspond to usually coarsely graded life tables (Lebreton et al., 1992; Williams et al., 2002).
Because of the importance of the life table in aging research and the growing interest in understanding aging in the wild (Austad, 1993; Congdon et al., 1994; Finch, 2001; Reznick et al., 2001; Tatar & Yin, 2001), the case of life table analysis with unknown age at entry and the analogous situation for continuous lifetimes is clearly of great interest. We describe a life table identity that, by making certain key assumptions, enables us to estimate the age-specific life table rates from data based on the mark, release and monitoring of randomly captured individuals of unknown age from the time of their entry into the study (i.e. marking) to their death. We also discuss an identity for the continuous case in which marked animals are continuously monitored until their death. Continuous monitoring, when feasible, enables the continuous version of the analysis, which provides us with substantially more detailed information about the behaviour of survival functions and hazard rates (force of mortality). Our approach can also be used in conjunction with life tables that are obtained from capture–recapture experiments in those situations in which age-at-entry is unknown. Current designs of capture–recapture experiments, however, are not amenable to continuous lifetime analysis with the preferred flexible non-parametric methods, allowing for the construction of hazard rate estimates.
Consider a population that is assumed to be stable, stationary and closed. Individuals are randomly captured at an unknown age and marked, and their time-to-death recorded. The question we address is this: Can the information on time-to-death for this randomly captured marked subgroup provide the necessary information to construct a life table for the population at large? We will demonstrate that the answer to this question is yes because of a life table identity that reveals a mathematical relationship between the distribution of deaths in the marked cohort and the age structure of the original population. Individuals in the captured and marked sample are assumed to have remaining lifetimes similar to those in the wild. This model may be particularly adequate for some human populations.
The problem of constructing a survival schedule from incomplete data has been studied in anthropology (Müller et al., 2002) and has applications to human populations such as the !Kung and the Ache for which only incomplete demographic data are available (Howell, 1979; Hill & Hurtado, 1996; Hawkes et al., 1998; Jones et al., 2002). An anthropologist may encounter a group of people whose ages are unknown but whose remaining lifetime can be recorded. The key identity, on which the reconstruction of the survival schedule that we propose is based, asserts that for such situations a life table for the population can be obtained, under certain assumptions. Application of the key identity then establishes a new way to construct life tables and estimate survival functions.
We derive this key identity for both discrete life tables and situations that are modelled by continuous survival times. In the continuous case, this identity is a consequence of a close relationship between the density of the remaining lifetimes in a cohort of randomly sampled subjects and the survival schedule of the population from which the subjects were sampled. We provide statistical implementations of this identity by applying suitably adapted non-parametric density estimation methods. The proposed model is developed for a stable, stationary and closed population but possesses sufficient flexibility to allow for modifications of these assumptions.
The concept and techniques that we describe in this paper will help to advance understanding of senescence in the wild in several areas that were outlined by Gaillard et al. (1994). First, refinements of this concept have the potential to improve the reliability of survival data because, unlike the approach used in virtually all long-term field studies in which only newborn are marked and their survival monitored throughout their lives, this approach estimates survival using information from individuals first marked at any age. Therefore, for many species it may be possible to mark many more individuals than are available from only a single (newborn) age group and therefore to increase sample size. Second, this approach introduces new biological concepts for measuring senescence in the wild that differ from Nesse's (1988) intensity of selection, Finch's (1990) mortality rate doubling time, Promislow's (1991) log slope mortality and Abrams’ (1993) fitness cost of senescence. Additionally, our method is a useful addition to capture–recapture studies with unknown age-at-entry.
The method we outline in this paper focuses on the information content of wild-caught, living individuals and will ultimately not only include information on survival that can be used to estimate actuarial aging as in conventional approaches, but also information on fertility, behaviour, mating and other life history categories that can be used to shed new light on senescence in the wild. The methods we introduce will provide new techniques for expanding the taxonomic horizons of senescence studies in the wild beyond mammals, to include other vertebrates such as birds, reptiles, fishes and amphibians as well as invertebrates ranging from nematodes to insects. These methods will be especially important for studying aging of invertebrates in the wild, such as nematodes, that cannot be marked and released into the wild for later recapture.
A key demographic identity
The data on remaining lifetime after capture and marking, obtained from the marked sample, are assembled in a ‘marked sample life table’. Assuming that the process of capture and marking does not alter an individual's remaining lifespan, the corresponding ‘marked sample’ and ‘wild’ life tables are compared for a hypothetical situation in Table 1.
Table 1. Illustration of the relationship between hypothetical ‘wild’ and ‘marked sample’ life tables in the stationary case (J. R. Carey, unpublished data). The ‘wild’ cohort consists of Nx individuals at each age x with corresponding schedules of survival lx and age structure cx = lx/Σly, with life table in the leftmost subtable. The ‘marked sample’ cohort consists initially of 20 ‘marked’ individuals with the same age structure as the ‘wild’ cohort, all simultaneously entering the marked sample cohort at the age of capture and marking x* = 0. Remaining lifetimes are recorded for the marked sample, Nx* is the number of animals that remain alive at age x* after marking, and lx* is the survival schedule of the marked sample cohort, with death rates dx* = lx*+1 − lx*, as listed in the rightmost subtable. The survival schedules given separately for age cohorts x = 0, x = 1, x = 2 and x = 3 in dependency on marked sample cohort age x*, are listed in the corresponding columns of the subtable in the middle. In this hypothetical example, the initial marked sample cohort at marked sample cohort age x* = 0 has an age structure identical to cx (row in bold type in the middle subtable is identical to the bold type cx column of the leftmost subtable). The key identity is revealed by the equality of the bold type columns cx and dx* in the leftmost and rightmost subtables. This key relationship allows us to deduce the wild survival schedule from the marked sample survival schedule
Age distribution in marked sample cohort
x = 0
x = 1
x = 2
x = 3
That it is possible to obtain the survival schedule in the wild, as summarized by the wild life table, from the marked sample life table, is due to a basic relationship between these two life tables. Assuming that the population is stable, stationary and closed, i.e. is neither increasing nor decreasing, and without immigration or emigration, the number of subjects of age x is cx = lx/Σly = c0lx (Caswell, 2001; see Table 1 for definitions). The death rates in the marked sample life table at age x′ are by definition . These death rates are generated by subjects that enter the marked sample life table at various (unknown) ages, survive to ‘marked age’ (i.e. age counted in days after capture and marking) x and do not survive to ‘marked age’x′+ 1. For all subjects that enter the marked sample cohort at age z, the contribution to , is therefore
where lz refers to the survival function or survival schedule of the wild population at age z.
The contributions of subjects entering the marked sample life table at various ages are additive. Therefore, adding the contributions over all ages of entry z:
and this relationship implies that the columns cx indicating the age distribution in the wild life table and indicating the distribution of deaths in the marked sample life table are identical. We can see from Table 1 that this is indeed the case for the hypothetical case considered there. As lx = cx/c0, this relationship between the two life tables leads to
thus enabling the reconstruction of the survival schedule lx in the wild life table from the survival schedule of the marked sample life table.
Statistical estimates implementing this probabilistic relationship can be easily found, for example by plugging in empirical observed frequencies for and , thus replacing expected population values as they appear in Table 1 with their corresponding sample estimates. Based on the binomial distribution of these observed frequencies, we can derive large-sample confidence intervals for the resulting estimates of lx. Details and formulas are provided in the Appendix.
These considerations can be extended to the case in which age-at-death is considered to be measured on a continuous scale and the smooth nature of the underlying survival distributions can be discerned. The power of analysing hazard functions from continuous lifetimes has been illustrated in Müller & Wang (1994). Information loss and recovery of features related to smoothness and derivatives such as hazard rates from aggregated survival data as encountered in life tables are well known (Müller et al., 1997; Wang et al., 1998). For these reasons, it is therefore clearly preferable to work with continuous lifetime data rather than life tables whenever feasible. Therein lies one of the promises of the proposed methodology – the continuous case is supported without the need to specify a parametric model for the survival distribution as is usually required. The downside of parametric modelling is lack of flexibility because these models are tied to the correctness of the assumed parametric model, and such an assumption cannot be easily verified. The continuous model can be implemented whenever the marked cohorts can be continuously monitored.
In the following, we discuss the continuous version of the key identity. This identity enables us to estimate hazard rates and other continuous features of survival distributions by means of flexible non-parametric curve estimation methods. Denoting by X the age-at-death (lifetime) for an individual in the wild, by F̄(x) =P(X > x) the survival function in the wild, where x is a continuous age variable and P denotes probability, we find for the density of the age-distribution in the wild c(x) =, and consequently
The unknown age A at the time of capture and marking and the unknown age-at-death X are related with the known remaining lifetime X* of an individual by X* = X − A. Denote the densities of the distributions of X, X* by fX, fX* and consider the conditional density fX(· | X ≥ x) of lifetime conditional on the event that the individual survives to age x. Then one obtains for the density fX*(a) of X*, evaluated at the age-at-death a,
This relationship implies the key identity for the continuous case,
providing the relationship between the marked cohort mortality and survival in the wild. This type of relationship has been noted previously in the literature on renewal processes (Doob, 1948; Feller, 1968; Winter, 1989). Statistical estimation and inference based on this continuous version of the key identity is discussed in the next section.
Estimating the survival schedule of the wild population
Given a sample of continuous lifetimes that are observed in the marked sample cohort and measured in terms of relative age counted from the time of marking, we may substitute non-parametric kernel density estimators (compare, for example, Müller, 1997) for fX*(z), given by
Here h = h(n) is a sequence of bandwidths and K is a kernel function. Specific kernel functions are listed in the Appendix.
We implement a flexible non-parametric smoothing approach that does not depend on an assumption that the survival schedule is likelihood based, which Bayesian methods would require when dealing with continuous lifetimes. Given the enormous plasticity of mortality schedules in biological populations, these methods are very limited in their applicability whereas non-parametric methods do not make any assumptions on the underlying survival distributions except for some basic smoothness. In return, a bandwidth or smoothing parameter h in the above kernel density estimator needs to be specified to control the trade-off between variance and bias of the resulting non-parametric estimates. Methods for data-adaptive specification of bandwidths and also for efficient numerical implementations of the above estimator are described in Müller (1997).
We then obtain asymptotically consistent estimates of the survival function of the wild population,
The implementation of this estimate is less straightforward than it may seem. One difficulty is that the estimates f̂X*(0) that appear in the denominator are density estimates at a boundary point of the support of the data and therefore are subject to higher variability than density estimates in the interior of the support (Müller & Wang, 1994). We replace the kernel K in the definition of the kernel density estimator above by a boundary kernel K0 when estimating the density of X* at the boundary point x = 0 (see end of Appendix). A second difficulty is that the above estimate is not necessarily a survival function, which by definition is monotone declining from 1 to 0. This can be ensured by adding a monotonization step through the pool adjacent violators algorithm (PAVA; Robertson et al., 1988).
Using analogous kernel density estimators for the derivative of fX*, we may obtain estimates for the density f of the survival schedule of the wild cohort, . Analogously, estimates for the hazard rate h(x) = f(x)/F̄(x) are obtained as . To obtain the density derivative estimates that appear in these formulas we replace the kernel K in the kernel density estimator above by a derivative kernel K1 (often chosen as K1 = K(1), see Appendix) and the scaling factor 1/(nh) by 1/(nh2). The construction of confidence intervals and thus inference for these non-parametric estimates can be obtained through asymptotic methods. The asymptotic arguments, corresponding variance estimates and resulting formulas for confidence intervals are summarized in the Appendix.
To assess the age at capture for a subject for which an additional lifetime x was observed in the marked cohort life table, we may use the conditional density fA|X*(a | x) = fX(x + a)/F̄(x) to infer the conditional expectation
Plugging the above estimates into the right-hand side of this equation then leads to consistent estimates of conditional mean age at capture. Monotonized density estimates similar to those above were proposed by Watelet & Winter (1991) in a reliability setting.
We illustrate the reconstruction of the survival schedule of the wild population from the observations made on the marked sample in a simulation study. The underlying survival schedule of the wild population is modelled as the survival function of a real cohort. The starting point is a cohort consisting of 1000 female Mediterranean fruit flies, Ceratitis capitata, commonly known as the medfly, whose survival has been described and analysed in Carey et al. (1998).
Using acceptance–rejection sampling based on the graph of the survival function for these 1000 flies, we randomly sample N flies (with replacement) to create one simulated marked sample. Each of the flies selected for the marked sample has a random age, following the age distribution of the flies in the entire ‘wild population’, and also an associated remaining lifetime that is recorded as ‘marked lifespan’. Kernel density estimation as described above is implemented by local linear smoothing after an initial prebinning step (see Müller, 1997) and combined with the PAVA method.
The resulting survival function estimates, along with the target survival function for six generated marked sample cohorts of sizes N = 1000 and N = 50, can be seen in Fig. 1. We find that the method of reconstructing the survival schedule of the wild population works very well for the larger sample and reasonably well for the smaller sample. The infant survival estimates show a higher degree of variability than the survival estimates for the mid-age period because not very many early deaths will be recorded in the marked cohort.
Discussion: window on aging in the wild and a generalization
In this paper we demonstrated that age-specific life tables can be constructed from mortality data derived from randomly captured individuals of unknown age in stable, stationary and closed populations. The importance of our model is that it provides a starting point to develop more complex models whose purpose is to estimate the life table properties of populations based on more realistic assumptions (non-stable, non-stationary populations). However, we believe that the significance of the general approach extends beyond the life table and applies to the concept of expressed information content of marked (or captured) individuals. For the current case the expressed information is the remaining post-capture lifespan of marked individuals that is used to estimate the life table of the population at large.
The idea of expressed information content generalizes if it is assumed that: (1) the experiences of individuals early in life influence the expression and pattern of their life history traits (mortality, reproduction, behaviour) later in life; and (2) these patterns expressed in later life can be traced to early life experience. The concept of extracting knowledge of both an individual's age and its early life experience to gain insights into the demographic and gerontological characteristics of the field population can then be used as the conceptual foundation for a new sampling concept for understanding aging in the wild. Examples of the types of information that can be extracted from wild-caught (or marked) individuals at the individual level include remaining lifespan, age-specific reproduction (relative to time of capture), details of reproduction including birth interval, clutch size, post-reproductive period, overall patterns of individual reproduction, total reproduction and time from capture to first egg, timing and magnitude of peak reproduction (Carey et al., 1998; Müller et al., 2001), mating status and frequency of mating, behavioural measures such as supine behaviour (Papadopoulos et al., 2002) or calling (males, see Papadopoulos et al., 2004), mating, oviposition and overall activity, and physiological measures such as metabolic rate.
We believe that this new concept for extracting information about aging in the wild is important for several reasons. First, life course analysis will both encourage and require a deep understanding of the interdependencies of various components of an individual's life course, including reproduction, behaviour and death. In particular, the approach will require an understanding of the relationship between reproduction at young ages and mortality risk at older ages, the age patterns of reproduction that are unique to different stages in the adult life course, and the linkages between different behavioural patterns and death. Second, the approach will encourage a greater integration of laboratory and field studies. Specifically, the method will require the creation of reference ‘libraries’ consisting of the life history patterns of individuals maintained under different conditions in the laboratory. These ‘libraries’ will be used for comparing the observed life history patterns (birth and death) of wild-caught flies maintained in the laboratory. Third, the results of studies using the methods we propose to develop will shed new light on both aging and aging structure of wild populations. This includes aging data on populations of invertebrate species such as C. elegans that are difficult to study under natural conditions in the wild but that are extraordinarily important model organisms in aging science (Reznick, 1993; Gershon & Gershon, 2002). The combination of laboratory and field studies will provide the means for testing various theories about aging in the wild and also for testing models used in both forecasting and back-casting.
This research was supported by NIH grant P01-AG08761 and NSF grant DMS-02–04869. We thank J. Cardenas for technical assistance, L. Harshman, L. Partridge and A. Yashin for discussion, and J. Vaupel and K. Wachter for comments on a previous draft.
Appendix: asymptotic confidence intervals and variances
Based on the estimation of the survival schedule of the wild population, one can derive asymptotic confidence intervals for important characteristics of the survival schedule of the wild population. This includes confidence intervals and associated inference for the survival function F̄(x) for discrete and continuous lifetimes, and the density fX*(x) and hazard rate hX*(x) for continuous lifetimes. Another option is to employ a suitable bootstrap.
We first investigate confidence intervals for the survival function F̄(x) for discrete lifetimes, i.e. the survival schedule at age x, given by , where x is an arbitrary non-negative integer. Let denote the estimates obtained by plugging in empirical observed frequencies for . Let Wn(x) denote the number of deaths in (x,x + 1). It is easily seen that , for x = 0, 1, … , and , where denotes the binomial distribution with n trials and probability of success , and n is the total number of subjects. Then from the central limit theorem, one can obtain the asymptotic joint distribution of the multinomial random variable (Wn(x), Wn(0))T, which is , where N2 denotes the bivariate normal distribution, and Σ is a 2 × 2 matrix with , and with . Applying the delta method leads to the asymptotic normal distribution of F̄(x),
Then the 100(1 − α)% confidence interval of lx is obtained by substituting the empirical estimates of lx and applying Slutsky's theorem:
where Φ(·) is the cumulative distribution function of the standard normal random variable.
For the case of continuous lifetimes, the survival function is estimated by . Assume that a kernel K supported on [−1, 1] is used for f̂X*(x) and the boundary kernel K0 supported on [−1, 0] for f̂X*(0). The bandwidth h for the kernel density estimates f̂X*(x) and f̂X*(0) is assumed to satisfy h → 0 and nh → ∞, as n → ∞. For any fixed x > 0, when n is sufficiently large, one has h < x − h, i.e. no are included in both [0, h] and [x − h, x + h], whence the estimates f̂X*(x) and f̂X*(0) are asymptotically independent. From standard results for kernel density estimation (see Müller, 1997, for references), one can easily obtain the asymptotic joint distribution of [f̂X*(x), f̂X*(0)] as follows,
which is bivariate normal with mean vector 0. Here || K ||2 = . Because the bias E(f̂X*(x)) − fX*(x) = O(h2) for both x = 0 and x > 0, we can ignore biases for small values of h. Assuming this is the case and applying the delta method, we obtain the asymptotic normal approximation to the distribution of :
Then the 100(1 − α)% confidence interval for F̄(x) is obtained by substituting the consistent kernel estimates f̂X*(x) and f̂X*(0) for fX*(x) and fX*(0) in the formula, applying Slutsky's theorem, i.e. the 100(1 − α)% confidence interval for F̄(x) is
To construct the confidence interval for the density estimate , we note that the derivative estimate has slower convergence rate than f̂X*(0). Slutsky's theorem implies that is asymptotically equivalent to . From the asymptotic distribution of the kernel estimator for the derivative , and ignoring the bias terms as argued earlier, one has
where K1 is the kernel function used in . Thus the asymptotic distribution of the density estimate is approximately , and the 100(1 − α)% confidence intervals can be obtained by substituting the kernel estimates for and fX*(0), whence one obtains the intervals
Similarly, the 100(1 − α)% confidence interval for the hazard rate h(x), estimated by , is obtained by
We note that common choices for kernels for interior, boundary and derivative estimation K, K0 and K1 are K(x) = 0.75(1 − x2) on [−1, 1], K0(x) = 12(x + 1)(x + 1/2) on [−1, 0] and K1(x) = –(3/2)x on [−1, 1].