Estimation of individual fitness – i.e. description of the extent to which an individual's genes are represented in future generations – is a feature central to most evolutionary studies. Lifetime reproductive success (LRS) is a commonly used estimate of individual fitness, but because it is rate-insensitive (i.e. timing of reproductive events is not incorporated), it may give a biased estimate of fitness when reproductive timing is an important component of fitness. A review of all empirical studies which have used a recently derived, rate-sensitive estimate of individual fitness, λind revealed that λind ranks the fitness of phenotypes differently from LRS, and that this difference may lead to different conclusions about strength of selection acting on phenotypic traits. However, although λind may be a better estimate of individual fitness than LRS in certain situations (e.g. in growing populations), its application is not always unproblematic. For instance, in contrast to rate-insensitive estimates of individual fitness, the λind is sensitive to the age at which offspring are censused and there is little consensus among published studies on when offspring should be counted. Further, rate-sensitivity does not necessarily improve a fitness estimate in spatio-temporal variable environments. We suggest that the ultimate test on the applicability of λind vs. LRS as practical measures of individual fitness in quantifying selection should come from studies which correlate these estimates with actual number of descendants left more than one generation further in future.
Fitness, although a fundamental concept in biology, is not straightforward to define (Stearns 1989; Murray 1990; Metz et al. 1992; Benton & Grant 2000). The fitness concept in life-history theory has – through the years – evolved considerably (Brommer 2000) and currently hinges on invasibility, the possibility of a rare mutant strategy to replace the strategy played predominantly in the population (Metz et al. 1992; Rand et al. 1994; Geritz et al. 1998; Benton & Grant 2000). However, invasibility is not readily measured in natural populations. Measuring selection – and the response to it – in empirical studies primarily concerns short-term evolutionary change among extant genotypes and requires a robust and quantifiable measure of evolutionary success (Roff 1997; Meriläet al. 2001). Although fitness is most accurately described by the representation of an individual's genes – or descendants – far in the future (e.g. Leimar 1996; Murray 1997; Houston & McNamara 1999), many estimates of fitness consider fewer generations, typically one. Lifetime reproductive success (LRS) is a commonly used proxy for individual fitness and can be viewed as the individual-based analogue of the population-wide measure R0, the net reproductive ratio (e.g. Clutton-Brock 1988a; Newton 1989). Estimating LRS is an attempt to a complete representation of the lifetime performance of an individual, and it is therefore a better measure of fitness than any single component of fitness such as survival in a particular life-history stage (Endler 1986). The latter can lead to biased estimates of fitness whenever components of fitness face trade-off situations (e.g. early fecundity and longevity, Stearns 1992). Such trade-offs mean that interpreting one component, such as reproductive output in a given season, as a surrogate of overall fitness is too short-sighted.
Caswell (1989, 2001) has strongly argued for a more demographic approach in measuring fitness, which would encompass as many aspects of organism's life history as possible. LRS is, in fact, but one aspect of the performance of a given life history. LRS describes how the expected number of offspring (R0) was realized in a particular sample of individuals. A major shortcoming of LRS is that timing of reproduction within the life cycle of an organism is not taken into account, although timing is a major component of fitness (Stearns 1992). In a growing population, reproducing early in life greatly increases the number of descendants left in the future (Houston & McNamara 1999). In addition, the use of a rate-insensitive estimate of individual fitness, such as LRS, contrasts strongly with the majority of theoretical life-history studies, which typically use a rate-sensitive measure of fitness, the population's intrinsic rate of increase λpop (Stearns 1992). Hence, there is an important gap between theoretical and empirical studies in the quantification of evolutionary forces.
In order to close this gap, McGraw & Caswell (1996) advocated a rate-sensitive estimate of individual fitness, λind (see also Lenski & Service 1982). This estimate of individual fitness can be considered analogous to the population-wide intrinsic rate of increase as derived from the Euler–Lotka equation using projection matrices (e.g. Caswell 1989; Stearns 1992). For an age-structured population, where the maximum age is ω and the average survival of individuals from age x to x + 1 is denoted as Px and the average (same-sex) production of offspring at age x by Mx, the population growth rate is given by the dominant eigenvalue of the matrices
The former of these matrices is derived from a so-called pre-breeding census, whereas the latter is derived from a post-breeding census (Caswell 1989). Mathematically, these matrices merely represent two different ways of describing the population growth of the same life history, either counting individuals of age ‘1’ as offspring (pre-breeding census) or individuals of age ‘0’ (post-breeding census). The asymptotic population growth rate λpop is then given by the dominant eigenvalue of either matrix in eqn 1 (Caswell 1989, 2001).
Equivalently, one can consider age-specific individual survival px and reproduction fx (Lenski & Service 1982; McGraw & Caswell 1996). The propensity ‘growth rate’ of the individual life history is then given as the dominant eigenvalue of the square matrices
for the pre- and post-breeding census, respectively (analogous to eqn 1). Here, fx denotes an individual's production of same-sex zygotes at age x, which can also be interpreted – in a diploid species – as half of the total number of zygotes produced in order to incorporate the genetic contribution of parent to offspring. If and only if the survival px for each age x, during which the parent has survived, is set at ‘1’, the matrices in eqn 2 reduce to
McGraw & Caswell's (1996) fitness measure λind for an individual that lived k years is the dominant eigenvalue of the matrix in eqn 3. This fitness estimate's main attraction lies in the combination of two fitness elements – total reproduction (LRS) and timing of reproduction – into a single measure. Each observed individual life history is considered separately and its λind is the maximum likelihood estimate of the individual's propensity fitness (McGraw & Caswell 1996). The individual life history with the highest growth rate is the most fit. Because this fitness estimate takes into account when an offspring was produced, it requires data on an individual's life span and age-specific production of offspring.
In this paper, we review how λind has been used to study a variety of ecological and evolutionary questions. We then point out the importance of deciding when to count offspring when calculating λind and show that this aspect has not been fully realized by those employing λind. We conclude by illustrating that LRS and λind provide fundamentally different estimates of individual fitness and sketch some important avenues for future research.
Table 1. An overview of studies which have used McGraw & Caswell's (1996) estimate of individual fitness λind. This estimate has been used to evaluate the fitness consequences of a variety of traits and factors in a wide range of taxa. In cases where several traits were studied, we report only the trait that explained most variation in λind. Factors incorporate both natural and experimentally designed factors. If no trait or factor was considered explicitly, none is given in the table. Census refers to the life-history stage which was counted as offspring fx (eqn 3). The table is organism based; some authors have performed their study on two organisms at the same time and are therefore referred to twice.
Conclusions in examining the fitness consequences of a differential onset of reproduction are thus crucially dependent on the choice of the individual fitness estimate. Nevertheless, the incorporation of reproductive timing may also otherwise lead to different conclusions. For example, the main food supply of northern forest owls are voles that show a 3-year cycle of low, increase and peak phases. Tengmalm's owls and Ural owls that initiated their breeding career in the increase phase of the vole cycle produced up to twice as many fledglings during their lives as females that started to breed in other phases of this cycle (Korpimäki 1992; Brommer et al. 1998). This implies strong selection on the decision in which phase to start reproducing. Nevertheless, the λind of Ural owls suggested that there was no such selection, because the vole cycle not only affected the total number of offspring produced in a female's lifetime, but also her age at her first reproduction. Females which started to breed in an increase phase were older than females which started to breed in other phases, which equalled out the λind across phases (Brommer et al. 1998).
The incorporation of age-specificity in reproduction and survival is the strength of λind, but may also be problematic – or even impossible – to measure in many species. Nevertheless, uncertainty in fitness estimates can to some extent be incorporated in the analyses. For example, Krüger & Lindström (2001) showed that λind can be a useful fitness estimate even if exact knowledge of age at first reproduction is lacking. The robustness of conclusions can be checked using a randomization approach to estimate λind for a variety of breeding scenarios. In their analysis of individual fitness in wood ducks, Oli et al. (2002) used bootstrapping to incorporate the uncertainty in parentage caused by this species' intraspecific brood parasitism. An altogether different approach was taken by Twombly et al. (1998) who rephrased McGraw & Caswell's (1996) age-structured matrix (eqn 3) into a stage-structured individual matrix for the copepod Boeckella triarticulata. Individuals were categorized by six sub-adult stages and one adult (reproductive) stage. This approach highlights the importance of aspects of timing other than the timing of reproduction. A stage-structured individual matrix emphasizes the fitness consequences of the time it has taken the individual to complete each life-history stage considered, which may be a more important fitness component than the timing of reproductive bouts in certain organisms.
Although individual fitness is pivotal in evolutionary studies, the link between individual performance and population dynamics has received remarkable little attention. In a scaling-up approach, Krüger & Lindström (2001) showed that in a buzzard population with light, intermediate and dark morphs, the individual's morph was the main determinant in LRS and λind. Because an individual's morph is genetically determined, an individual of a certain morph has the propensity to produce other morphs too, including those with lower or higher fitness. The quality (i.e. morph) of offspring then needs to be incorporated as well in a full analysis of fitness (Houston & McNamara 1992). Krüger & Lindström (2001) accomplished this by constructing a morph and age-based population-dynamical model, based on the propensities of morphs to produce other morphs. They then could use standard demographic methods to verify that morph indeed was also a major determinant of fitness on population level. In general, if details of the population dynamics – including density dependence – are known, it is possible to estimate fitness directly by applying the population dynamic model to estimate the number of descendants left far into the future. Unfortunately, most often such details are unknown, but the scaling approach of Krüger & Lindström (2001) illustrates that shorter-term estimates of individual fitness (such as LRS and λind) may perform satisfactorily.
When to count offspring?
Strictly speaking, the age-specific reproduction fx (eqn 3) refers to the number of zygotes produced, because eqn 3 is only a valid simplification of the pre- and post-breeding matrices in eqn 2 in the case of age-specific survival probability px = 1 for all ages x. There is, however, little consensus among studies in the interpretation of what defines an offspring. Some studies have indeed interpreted fx as the number of zygote offspring. For example, Sadeghi & Gilbert (1999, 2000) dissected two species of aphidophagous flies to count their ovarioles. Others – especially in studies on vertebrates – have systematically considered only individuals that reached a relatively advanced age in the organism's life history as offspring. In studies on birds, the number of eggs or hatchlings produced will correlate better with the number of zygotes than the number of chicks, which hatched and survived to fledging. Yet, λind has, in all studies in birds, been quantified using the number of fledglings (McGraw & Caswell 1996; Brommer et al. 1998; Krüger & Lindström 2001; Oli et al. 2002). Gaillard et al. (2000) censused the number of weaned roe deer offspring, although they noted that substantial mortality actually occurred before weaning. In another extreme case, Käär & Jokela (1998) and Korpelainen (2000) studied pre-industrial human populations and considered only children who survived to the age of 18 as offspring.
That different authors census offspring at different ages is, in itself, nothing new. LRS is often counted at different stages in the life cycle of the organism. In birds, for example, certain authors count LRS as the sum of offspring that reached the age of fledging, whereas others include only offspring that recruited into the breeding population later in life (Clutton-Brock 1988a; Newton 1989; Merilä & Sheldon 2000). Nevertheless, in terms of LRS, there is theoretically no objection to counting offspring at different life-history stages, as long as there are no systematic differences between offspring survival probability to census. This, however, is not the case for λind. Taking into account when an offspring was produced in the life cycle of the parent will differentially weight reproduction at different ages, which makes the relationship between the λind of different individual life histories sensitive to census time.
To illustrate this crucial difference between the two individual fitness estimates, consider a case where the probability of surviving to the census stage equals p for all offspring, irrespectively of the parent's age, phenotype or environment. This can be incorporated in eqns 2 and 3, by defining the net contribution measured by the researcher as fxp instead of fx. The fitness measure lifetime reproductive success (LRS) will be insensitive to p, because all offspring produced are equally valuable and all will be scaled by the same factor p (solid line in Fig. 1). In the case of λind, however, offspring produced at different ages are weighted differently. The survival to census, p, will thus bias the final fitness estimate. Consider, for example, an individual that lives for 2 years; fitness is then calculated from the transition matrix
In general, the dominant eigenvalue of these matrices is found by solving from the equation Σfxp λind−x=1, giving for eqn 4
It is clear from eqn 5 that p weights age-specific fecundities f1 and f2 differently and that λind increases non-linearly with probability p (Fig. 1). If the census is taken late in the life cycle of the organism, many offspring may have suffered mortality (small p in Fig. 1) and the relative fitness difference between two phenotypes will lessen, or – in the worst case – flip around if p is so low that (Fig. 1).
This convergence of fitness estimates as offspring are censused later in the life history relates to the mathematical property of λind to attain unity when an individual produces a total of one net offspring pfx in its lifetime. Any two fitness estimates will thus converge when offspring are censused later (and thus have a lower survival probability to census). This convergence does not require the survival to census (p) to be dependent on either phenotype or environment. Instead, it is purely a consequence of the time-point in the organism's life cycle that is chosen to define offspring. In most species, the survival to census will not be so low as to make conclusions dependent on the census time. However, in species with low (close to unity) lifetime production of offspring, it is worthwhile to consider the robustness of the results to alternative census scenarios, by calculating λind at different census times or, in case p is unknown, by simulation.
Some authors (e.g. Gaillard et al. 2000) have, in fact, deliberately chosen to census offspring later in the life cycle, because they view the survival to census as a reflection of the parent's phenotype (as opposed to environmental factors), which therefore needs to be considered in her fitness. Consider, for example, two individuals which both produce the same number of offspring at the same time in their lives, but one individual only manages to raise half of them to independence. If the loss of half her offspring is due to a deficiency in the second female's phenotype, assigning the same fitness estimate to both feels intuitively wrong. Likewise, some authors have only considered offspring recruited into the breeding population, because they considered these a more accurate measure of the propagation of genes (Käär & Jokela 1998; Korpelainen 2000). These arguments illustrate that an interpretation of census time, although mathematically strict (eqns 1–3), is susceptible to biological arguments. From an empirical perspective, it would therefore be worthwhile to compare λind for different census times, especially in extreme cases, when the census is restricted to offspring recruited to breeding age or recruited into the breeding population.
The relationship between LRS and λind
All studies that considered the relationship between LRS and λind found that LRS is a major determinant of λind (McGraw & Caswell 1996; Doums et al. 1998; Käär & Jokela 1998), which strengthens the notion that these estimates reflect the same individual propensity. Nevertheless, the use of λind or LRS as estimate of individual fitness can lead to different conclusions, as discussed above. In fact, the relationship of λind to LRS is pronouncedly curvilinear, approaching some asymptotic value of λind for large values of LRS, as illustrated by data on individual fitness in the Ural owl (Fig. 2). This curvilinearity is independent of the life history considered, because of diminishing contributions of reproduction at later ages to the dominant eigenvalue of a Leslie matrix (Caswell 1989, 2001). Consequently, λind increases with diminishing returns for larger values of LRS, whenever variation in LRS is mainly due to differences in lifespan as opposed to differences in fertility. Clearly, this makes LRS and λind rank phenotypes differently (Käär & Jokela 1998). In the example considered (Fig. 2), individuals may differ five-fold in their LRS, but still have about the same λind. In fact, this curvilinearity conforms to the stabilizing selection gradient on λind for LRS described for several species (McGraw & Caswell 1996).
The non-linear relationship between λind and LRS implies that a group of individuals will tend to have a smaller variance in λind than in LRS, because large values of LRS do not translate directly into large values of λind (e.g. Fig. 2). This property of λind may have implications in the empirical measurement of the heritability of fitness. Heritability measures the degree of resemblance between parent and offspring and is defined as the ratio of additive genetic variance over the total phenotypic variance of a trait. Several studies have shown heritability of LRS to be low (Mousseau & Roff 1987). This may be because (1) selection has eroded most additive genetic variance in fitness (Roff 1997), or (2), because fitness is a high-level trait with a huge amount of phenotypic variance, which can mask even considerable amounts of additive genetic variance (Merilä & Sheldon 2000). Testing for heritability of λind would thus be interesting, not only because it would address a long-standing controversy in evolutionary genetics with a different, rate-sensitive estimate of individual fitness, but mostly because λind substantially lowers the phenotypic variance and may therefore provide different insights from LRS in this question.
Reproductive timing, population dynamics and individual fitness
Theoretical ecologists have recently managed to establish invasibility as the master fitness concept (Metz et al. 1992; Rand et al. 1994; Mylius & Diekmann 1995; Geritz et al. 1998) and empirical workers now face the challenge of translating this concept into an applicable estimate of individual fitness. We have in this paper reviewed the introduction of the rate-sensitive estimate of individual fitness λind, next to the traditional rate-insensitive estimate LRS. This introduction has, on one hand, brought empirical fitness estimates in line with their analogues used in theoretical studies, but has also led to the realization that these two fitness estimates often lead to different conclusions on selection. This realization is, in itself, a familiar scenario from theoretical life history, where structurally similar models based either on R0 or λ had arrived at different conclusions (Roff 1992; Stearns 1992; Brommer 2000). Life-history theory has shown that rate-sensitive and rate-insensitive measures of fitness are applicable only in certain scenarios of density dependence (Mylius & Diekmann 1995). Thus, neither LRS nor λind conform to a universally applicable individual fitness estimate. Instead, the choice of fitness estimate depends on the (population dynamical) context. However, knowledge of the mode of population regulation is usually lacking. Typically, the populations are assumed not to grow or decline in size and rate-insensitive measures of fitness are deemed adequate (e.g. Stearns 1992; Kozlowski 1993). Clearly, the rate-sensitive λind ranks individual performance differently and often leads therefore to different conclusions on individual performance from the rate-insensitive estimate LRS, which begs an answer to the question which estimate of individual fitness to use.
Most fundamental in a comparison between LRS and λind is the actual importance of reproductive timing for fitness. One might feel that λind, because it incorporates both amount and timing of reproduction, makes it the fitness estimate of choice. However, the timing of reproduction is an important aspect of fitness only in non-equilibrium populations (e.g. Stearns 1992). The lack of a clear fitness advantage of early breeding in terms of LRS in avian systems (e.g. McGraw & Caswell 1996; Brommer et al. 1998; Oli et al. 2002) may therefore simply reflect a dynamic equilibrium, instead of a fundamental shortcoming of this fitness estimate. Ideally, the population's net reproductive ratio R0 can be estimated by the mean LRS of a cohort of individuals. If mean LRS exceeds one, the population is growing and reproductive timing needs to be incorporated in a proper estimate of fitness. Hence, λind would present the better estimate in such cases. However, often LRS is measured only for individuals that survived to breed (e.g. Newton 1989), thereby ignoring the ‘invisible fraction’ of individuals that died before they expressed the trait of interest (Grafen 1988; Bennington & McGraw 1995). The exclusion of the invisible fraction seriously affects both LRS and λind (although not necessarily in the same way), but in most studies on individually marked organisms there is an unavoidable period of ‘invisibility’, usually between independence and first reproduction.
Fitness is a population measure, as it describes the performance of a group of individuals with the same life history (e.g. mutants in a resident population). All estimates of fitness derived from data on individuals are hampered by the limitations of estimating a population parameter based on samples of size one (Link et al. 2002). Thus, probabilistic events create a substantial difference in propensity (i.e. latent) and realized individual fitness (Lenski & Service 1982; Link et al. 2002). Sampling variance is especially aggravated by maternal and environmental effects on the performance of individuals. Such environmental effects appear to be abundant in nature (Clutton-Brock 1988b; Lindström 1999) and may mask the genetic component of an individual's fitness. Consequently, much of the resulting difficulties in assigning fitness estimates to individual life histories stem from spatio-temporal variability and (resulting) variation in quality across individuals. In this last section, we briefly discuss these aspects and suggest approaches to investigate them further.
Wild populations do not show strict dynamic equilibrium, but typically fluctuate. Selection pressures may be very different in temporally fluctuating environments (Benton & Grant 1996). Clearly, timing of reproduction will be an important fitness component in such an environment, but only at certain times and in variable directions. In a population that stochastically fluctuates around its carrying capacity, late reproduction will be favoured if the population is shrinking and vice versa when the population is back on the increase. In a detailed simulation study, Benton & Grant (2000) compared the performance of several proxy measures of fitness on population level with a full analysis based on the invasibility concept of fitness. They showed that the expected lifetime reproductive success predicts the optimal strategy in fluctuating environments fairly accurately, but that the intrinsic rate of increase λpop was a much poorer predictor. Nevertheless, a similar exploration on the individual level, testing the predictive power of λind and LRS, is lacking so far but would certainly be instructive.
Environmental stochasticity may have a relative mild impact in populations of iteroparous organisms. For example, the incorporation of environmental stochasticity changed the estimated population growth rate in red deer only by 1% (Benton et al. 1995). However, the selective advantages of timing could be more important in short-lived organisms living in a stochastic environment. For example, LRS of great tits – a bird that rarely breeds more than once in a lifetime – depended greatly on the quality of the year in which they started to reproduce (van Balen et al. 1987). As more studies are compiled on LRS and λind in natural populations, a comparative analysis could investigate the relationship between life history and individual fitness across environments. Environmental fluctuations will also create large variation in the long-term performance of offspring produced at different times (cohort effect, e.g. Lindström 1999). Estimates of individual fitness are usually based on the assumption that all offspring are equal (although λind distinguishes between offspring produced at different times). This assumption is clearly violated in systems where certain cohorts or morphs gain strong fitness advantages (e.g. Korpimäki 1992; Brommer et al. 1998; Krüger & Lindström 2001). As outlined above, λind is even more sensitive to census time than LRS and its technical formulation does, in fact, not allow for incorporating quality differences across offspring. Clearly, λind will be more positively misleading than LRS if early produced offspring are of poorer quality than later produced offspring (e.g. due to parental effects or environmental stochasticity). A demographic approach, where the whole population is modelled (Krüger & Lindström 2001), is an indirect way to incorporate quality differences (see also Murray (1997) for a genetic example).
The use of any estimate of individual fitness (λind, LRS or a component of these) is mainly determined by the quality of the data used to quantify it. This point (although trivial) especially concerns the question of when offspring are censused in the organism's life history. For example, lifetime recruitment of reproducing offspring is typically considered a better estimate of individual fitness than the lifetime production of offspring (Clutton-Brock 1988a; Newton 1989). This view reflects the notion that a longer time to census will effectively separate the genetic quality from the environmental noise (e.g. Clutton-Brock 1988b; Gaillard et al. 2000). However, recruitment may also be non-random because of external or methodological reasons. In a study of blue tits, local recruitment was dramatically biased to certain plots, because tits dispersed away from their natal study plot to other, surrounding study plots (Lambrechts et al. 1999). In addition, frequent movements of adults may inflate mortality estimates and introduce uncertainty in the quantification of age-specific reproduction, a spatial explicit interpretation of the ‘invisible fraction’ problem. More fundamentally, even if survival of offspring is random and not influenced by the parental phenotype, counting recruits instead of younger offspring will greatly aggravate problems of sampling variation, as explained above. Again, λind will be more sensitive than LRS to such sampling variation. Benton & Grant (2000) showed that fitness estimates based on LRS performed best if estimated in the absence of density dependence. Density dependence is likely to be strongest in the youngest age classes and environmental effects could therefore abolish the signal of genetic quality. This, in turn, implies that offspring should be censused as early as possible in the organism's life history. Increased exploration of the spatial and temporal limitations of evolutionary field studies is thus needed. Emphasis should be placed on both the spatial scale on which the population operates in relation to the species' dispersal propensity on one hand, and how these aspects affect different estimates of individual fitness on the other.
In conclusion, rate sensitivity does not necessarily confer an advantage to an individual fitness estimate in the case of spatio-temporal variability. To some extent, the rate-insensitive LRS may be buffered against environmental variability. Because fitness refers to the genetic contribution further than one generation in the future, calculation of the future number of descendants offers an opportunity to examine the value of any individual fitness estimate based on one generation (like LRS and λind). Long-term studies on individually marked organisms allow the calculation of the number of future descendants by tracing lineages. The future number of descendants as a measure of individual fitness has the advantage that it holds in almost all spatial-temporal scenarios and incorporates variation in individual quality (Houston & McNamara 1999; Benton & Grant 2000), but is difficult to gather in wild populations. The value of an individual fitness estimate can be evaluated by how well this individual fitness estimate correlates with the number of descendants left further than one generation in the future. Therefore, the relative merit of using the rate-sensitive λind or the rate-insensitive LRS for quantifying evolutionary forces remains an empirical question.
We thank Andre deRoos for discussion and Hal Caswell for comments on an early draft of this manuscript. Insights from Wolf Blanckenhorn and an anonymous referee greatly improved this manuscript. Hannu Pietiäinen kindly permitted us to use his excellent Ural owl data set for illustration.