Family level inbreeding depression and the evolution of plant mating systems


  • John K. Kelly

    Corresponding author
    1. Department of Ecology & Evolutionary Biology, University of Kansas, 1200 Sunnyside Ave. Lawrence, KS 66045-7534, USA
    Search for more papers by this author

Author for correspondence: John K. Kelly Tel: (785) 864-3706 Email:


Variation in the magnitude of inbreeding depression (ID) among families may have important consequences for mating system evolution. Experimental studies have shown that such variation is a common feature of natural plant populations. Unfortunately, the genetic and evolutionary significance of family level estimates remains obscure. Almost any kind of genetic variation will generate differences in ID among families, and as a consequence, a non-zero variance in family level ID is not sufficient to distinguish genetic architectures with wholly different implications for mating system evolution. Quantitative genetic methods provide a means to extract more information from ID experiments. Estimates of quantitative genetic variance components directly inform questions about the genetic basis of ID and should ultimately allow tests of alternative theories of mating system evolution.


Inbreeding depression is the decline of fitness related traits associated with inbreeding. The estimation of inbreeding depression (hereafter abbreviated ID) constitutes a large fraction of empirical research in plant genetics and population biology. Husband & Schemske (1996) reviewed ID estimates from 54 natural species spanning 23 plant families, and this collection of studies has expanded greatly since 1996 (Byers & Waller, 1999; Crnokrak & Barrett, 2002). The extent of this effort reflects the general perception that the magnitude and character of ID is a critical factor in the evolution of plant mating systems (Darwin, 1876; Charlesworth & Charlesworth, 1987). It also has important implications for agriculture, population ecology, and conservation (Lynch et al., 1995; Keller & Waller, 2002).

One of the primary objectives in the study of ID has been to determine its genetic basis. Attention has focused on two theories: the overdominance model and the deleterious mutation model (Crow, 1993). The former model posits that ID results from heterozygote superiority at genetic loci that influence fitness. Inbred offspring are less likely to be heterozygous at these loci and thus have reduced fitness. The second theory posits that ID is caused by rare, deleterious mutations that are at least partially recessive. These deleterious alleles are maintained by recurrent mutation and their recessivity shields them from selection in a randomly mating population. Inbreeding increases homozygosity at these loci and this reduces fitness. At present, the large body of experimental evidence seems to indicate that the bulk of ID is attributable to deleterious mutations (Charlesworth & Charlesworth, 1987; Carr & Dudash, 2003).

A second major objective has been to explain variability in ID at different levels of biological organization: among species, among populations within a species, and among families within a population. At each of these levels, researchers have attempted to determine whether ID is associated with the rate of self-fertilization. Theoretical models predict that outbreeding populations should exhibit greater ID than habitually selfing populations (Lande & Schemske, 1985). Comparisons among species are roughly consistent with this prediction: there is a negative correlation between ID and the estimated rate of self-fertilization for a species (Husband & Schemske, 1996). There is some indication of a similar trend in comparisons among populations within a species (e.g. Holtsford & Ellstrand, 1990), although a number of surveys have failed to detect a correlation (Byers & Waller, 1999).

Family level ID is estimated from the difference in mean fitness between inbred and outbred individuals within the same family, oftentimes the progeny of a single maternal plant. Variation In Family Level Inbreeding Depression (hereafter VIFLID) is the variance in this difference among families of an experimental population. Significant VIFLID has been demonstrated within a variety of natural plant populations (e.g. Kalisz, 1989; Agren & Schemske, 1993; Norman et al., 1995; Parker et al., 1995; Mutikainen & Delph, 1998; Koelewijn et al., 1999; Jarne et al., 2000; Carr et al., 2003). Several studies have further documented a correlation between family level ID and traits related to the rate of self-fertilization (Chang & Rausher, 1999; Vogler et al., 1999; Takebayashi & Delph, 2000; Stone & Motten, 2002). However, a comparable number of experiments have failed to detect any such association (Carr et al., 1997; Fishman, 2001; Rao et al., 2002).

The estimation of VIFLID is directly motivated by theoretical models of mating system evolution (reviewed by Uyenoyama et al., 1993). If ID is treated as a fixed characteristic of a population, theory predicts that a mutation increasing the rate of self-fertilization will be favoured only if the population-level ID is < 0.5 (Kimura, 1959; Lloyd, 1979). However, models explicitly characterizing the ‘coevolution’ of mating system with ID suggest that selfing may evolve even in populations with very high ID. This occurs when selfing alleles become fixed within lineages (families) that have relatively low ID (Campbell, 1986; Holsinger, 1988). More generally, these coevolutionary models exhibit a diverse array of outcomes contingent on the specific nature of genetic variation in fitness (the genetic basis of ID, among other factors). Selection and inbreeding combine to generate complicated associations among loci affecting fitness, and also between fitness loci and loci that affect the mating system. These inter-locus associations can substantially impact whether a population evolves to reproduce by outcrossing, by selfing, or by a mixture of the two (Uyenoyama et al., 1993).

Unfortunately, there is no simple relationship between the magnitude of VIFLID and the various predictions of mating system theories. Almost any kind of genetic variation will generate VIFLID. A single locus with additive effects on fitness will not contribute to population-level ID, but will contribute to the difference between inbred and outbred mean values within families and thus to VIFLID (Shultz & Willis, 1995). The same is true of a single locus with variation that corresponds to the deleterious mutation model. With overdominance, it is possible to have zero VIFLID, although only under very limited conditions. Substantial VIFLID is obtained if these single-locus models are extended to multiple loci, even if these loci are in linkage equilibrium. The key point is that VIFLID does not require any of the inter-locus associations predicted by mating system models. Where such associations are likely to exist, we require a more sophisticated characterization of variation to discern their contribution.

The purpose of this essay is to consider the issue of family level ID from a quantitative genetic perspective. Quantitative genetics provides a description of variability based directly on Mendelian transmission of genetic materials and a means to disentangle the various factors contributing to VIFLID. The experiments typically used to estimate family level ID are really just an unusual sort of breeding design, not unlike those employed by quantitative geneticists to estimate trait heritabilities and genetic correlations (Lynch & Walsh, 1998). In the following sections, the family structured ID experiment and the standard method of data analysis are described. While the standard method addresses the basic question of whether a population exhibits VIFLID, it does not effectively characterize the pattern of variation in a family structured experiment. Quantitative genetic statistics, i.e. variance components, yield a more accurate description of genetic variation in fitness components. The variance component approach also provides a necessary ‘basis of comparison’ for evaluating the contribution of the various inter-locus associations predicted by mating system models.

The family structured inbreeding depression experiment

Inbreeding depression is typically estimated from the difference in mean fitness between two populations of plants, one group outbred and the other inbred. Each of these populations is created by experimentally crossing and self-fertilizing a collection of parental plants, with these parental plants randomly selected from the natural population (or from an experimental population constituted from the natural population). A typical experiment is depicted in Fig. 1, where 40 parental plants are selected and randomly paired. The first parent in each pair is self-fertilized to produce one set of progeny (denoted Inbred subfamily in Fig. 1) and crossed to the second parent to produce an additional set of progeny (denoted Outbred subfamily). Progeny from both types of subfamily are grown simultaneously and measured for a range of traits related to fitness (e.g. germination, survival, number of flowers produced, pollen produced per flower, ovules per flower). The full set of progeny from each pair of parental plants (both outbred and inbred) constitutes a family in this design.

Figure 1.

A diagram of the family structured experiment (filled circles, individual plants; ovals, subfamilies; rectangles, replicated families. See text for additional details).

A two-way (factorial) Analysis of Variance (anova) is generally employed for the statistical analysis of these data (Johnston & Schoen, 1994). It is a ‘mixed’ model with family as a random factor and pollination treatment (outcrossed vs. selfed) as a fixed effect. Let Yijk denote the fitness estimate for the k'th individual within family i. The subscript j denotes whether the individual is outbred (j = 0) or inbred (j = 1). Individual fitness measurements are fitted to the following model:

Yijk=µ+Aij+Cij+eijk(Eqn 1)

where µ is the grand mean, Ai is the effect of family, βj is the effect of inbreeding (pollination treatment), Cij is the interaction of family and pollination treatment, and eijk is the residual error. It is generally assumed that each of the random effects (Ai, Cij, and eijk) is normally distributed with mean zero and a specified variance, i.e. Ai∼ Normal[0, inline image], Cij∼ Normal[0, inline image], eijk∼ Normal[0, inline image]. For each type of effect (family, interaction, or error), it is assumed that individual values are independent and identically distributed.

A significant effect of the pollination treatment (in the hypothesis testing component of anova) indicates that the overall mean of outbred plants is different from the overall mean of inbred plants, when appropriately averaged across families. Hence, this is a test for non-zero population-level ID. A significant effect of family (inline image > 0) indicates that average mean fitness differs among families (when appropriately averaging inbred and outbred individuals). Finally, the anova quantity that summarizes VIFLID is the ‘interaction variance’, inline image (Sokal & Rohlf, 2000). A significant interaction between family and pollination treatment (inline image > 0) indicates that the differences in mean fitness between inbred and outbred subfamilies varies among families.

Since we expect the null hypothesis inline image = 0 will be false whenever there is genetic variation, one might hope that the estimated value of inline image would inform questions about the genetic basis of VIFLID. Can we conclude that inter-locus associations contribute to variation in fitness if inline image exceeds some critical threshold value? Unfortunately, this does not seem to be the case. The standard two-way anova does not provide a sufficiently detailed description of variation to allow inferences about the genetic basis of VIFLID. Inbreeding affects not only the mean values of families, but also the distribution of variation within and among families (Harris, 1964; Gallais, 1977). As discussed in greater detail below, it is very likely that the variance among inbred subfamilies will be different than the variance among outbred subfamilies. The anova model (Eqn 1) implicitly assumes that these variances are the same.

To illustrate, consider the kinds of comparisons that can be made in the experiment of Fig. 1. There are three different comparisons among relatives: COO is the covariance between individuals within outbred subfamilies, CSS is the covariance between individuals within inbred subfamilies, and COS is the covariance between outbred and inbred individuals within the same family. Here, it may be useful to note that covariances within groups are often equivalent to variances among group means. In this case, COO is equal to the variance among the true means of outbred subfamilies and CSS is equal to the variance among true means of inbred subfamilies.

The anova model characterizes this pattern of variation in terms of only two parameters, inline image and inline image. Eqn 1 implies that inline image. Thus, variability is accurately characterized by the anova model only if CSS = COO.

A couple of points in the preceding argument merit clarification. First, the fact that CSS may not equal COO in practice does not invalidate the interaction test of anova. If the null hypothesis is correct (and VIFLID is absent), then CSS must equal COO. Distinguishing these components becomes important for understanding the nature of VIFLID when it is present. Second, the assumption that CSS = COO is distinct from the assumption of ‘equal variance within groups’, i.e. the homoscedasticity. Homoscedasticity is a standard assumption of anova and investigators generally test the validity of this assumption before proceeding with the full analysis (Sokal & Rohlf, 2000). However, even when variances within subfamilies (both outbred and inbred) are approximately equal, this does not insure that CSS = COO. The latter holds only if the variance among outbred subfamily means is equal to the variance among inbred subfamily means.

VIFLID and the nature of genetic variation in fitness

Why is it important to distinguish the covariances of Fig. 1? One reasons is that different models of ID have distinctly different implications for CSS, COO, and COS (Figs 2 and 3). The quantitative values of these covariances depend on the number of loci contributing to variation, on the frequencies, dominance coefficients, and relative fitness effects of alternative alleles at these loci, on how different loci combine to determine fitness, and on whether there are genetic associations among loci (Lynch & Walsh, 1998). They also depend on whether the parents used to found experimental families are inbred or outbred. However, over a broad range of conditions, CSS is likely to be substantially greater than COO if ID is caused by rare, deleterious alleles (Kelly, 1999, 2003; Charlesworth & Hughes, 2000). This is simply because deleterious alleles will generally be rare. Rare alleles occur almost exclusively in heterozygotes within outbred subfamilies, but frequently in homozygotes within inbred subfamilies. Because deleterious alleles should be at least partially recessive, we expect them to make a far more pronounced contribution to variation among inbred families (as homozygotes) than among outbred families (as heterozygotes).

Figure 2.

The value of Css/Coo is given as a function of the dominance coefficient of the deleterious allele (note the logarithmic scale of the y-axis). Separate functions are given for outbred parents (inbreeding coefficient F = 0, open squares) and fully inbred parents (F = 1, closed circles). The calculations assume that ID is caused by an arbitrary number of loci, each polymorphic for a ‘wild-type’ allele with a population frequency of 0.99 and a deleterious allele with a population frequency of 0.01.

Figure 3.

The value of Css/Coo is given as a function of the frequency of allele A2 for a locus exhibiting overdominance. Separate functions are given for outbred parents (F = 0, open symbols), fully inbred parents (F = 1, filled symbols), symmetric overdominance (round symbols), and asymmetric overdominance (square symbols). The symmetric case assumes that the fitnesses of the genotypes A1A1, A1A2 and A2A2 are 0.8, 1.0 and 0.8, respectively. The asymmetric case assumes that the fitnesses are 1.0, 1.2 and 0.8, respectively.

Figure 2 illustrates a typical case of ID caused by deleterious mutations. Using formulae in Cockerham & Weir (1984), a distinct set of values for CSS and COO were calculated for experiments in which parents are outbred (e.g. Koelewijn et al., 1999) and when they are fully inbred (e.g. Agren & Schemske, 1993). The ratio of CSS to COO is given as a function of h, the dominance coefficient of deleterious alleles (Hartl & Clark, 1989). The dominance coefficient seems to be quite low for strongly deleterious alleles, with estimates of h in the range of 0.01–0.02 (Crow, 1993). Mildly detrimental alleles typically exhibit greater (relative) expression in heterozygotes and estimates of h fall mostly in the range 0.1–0.3 (Willis, 1999). If h = 0.2, CSS is expected to be about 5- to 10-fold greater than COO, depending on whether parents are inbred or outbred (Fig. 2).

The ratio of CSS to COO is substantially different if ID is caused by overdominance. Consider a locus with two alleles, A1 and A2, with population frequencies p and q. In contrast to the deleterious mutation model, here we expect the frequencies of alternative alleles to be intermediate. If overdominance is symmetric, such that the fitnesses of the alternative homozygotes are equal but lower than that of the heterozygote, COO will be greater than or equal to CSS (round symbols in Fig. 3). When overdominance is asymmetric, such that the fitness of one homozygote is greater than the other but still lower than the heterozygote, COO may be greater than or less than CSS (square symbols in Fig. 3). However, we do not expect the large difference between CSS and COO predicted by the deleterious mutation model (Fig. 2).

Observational components and causal components

The preceding discussion indicates that the two leading ID models yield distinct predictions regarding the relative values for COO and CSS. This is a compelling reason to distinguish these covariances in empirical studies. Not only is the genetic basis of inbreeding depression interesting in and of itself, but it also has important implications for mating system evolution. Evolutionary models predict wholly different outcomes for the coevolution between mating system modifiers and ID depending on whether the latter is caused by deleterious mutations or overdominance (Holsinger, 1988; Charlesworth & Charlesworth, 1990; Uyenoyama & Waller, 1991; Uyenoyama et al., 1993).

Quantities like COO, COS and CSS that summarize the phenotypic resemblance of relatives are known as ‘observational components’ in quantitative genetics (Falconer & Mackay, 1996). In simple family structured experiments, e.g. Fig. 1, observational components provide an accurate and succinct description of variation. However, experimentalists routinely employ crossing designs that are substantially more complicated (e.g. Carr et al., 1997; Shaw et al., 1998; Vogler et al., 1999; Jarne et al., 2000). The number of distinct comparisons among different types of relatives increases very rapidly with the complexity of the mating scheme. For example, Shaw et al. (1998) used a multigenerational breeding design to investigate inbreeding depression and genetic variation in the annual plant Nemophila menziesii. The final generation of their experiment involved nine distinct subfamilies within each extended family. Comparing individuals within and among subfamilies yields 45 observational components (nine comparisons within subfamilies and 36 among subfamilies). While some of these comparisons are redundant from a genetic point of view, this calculation ignores comparisons with previous generations of their design and is thus an underestimate for the number of distinct comparisons! Estimating, and then interpreting estimates, for over 50 observational components within a single experiment certainly seems a daunting task.

The ‘causal components’ of quantitative genetics (Falconer & Mackay, 1996) provide a means to synthesize the data contained within complex breeding designs. Causal components summarize variation in phenotype attributable to both environmental and genetic causes. The environmental components include the familiar environmental variance, as well as the variance in maternal effects. In a random mating population, the standard genetic components are the additive variance (Va) and dominance variance (Vd), although other terms can be included to characterize epistasis and genotype-by-environment interaction (Lynch & Walsh, 1998; Falconer & Mackay, 1996). With inbreeding, the genetic variance depends not only on Va and Vd, but also on several ‘inbreeding components’ (Cockerham, 1983; Shaw et al., 1998). These include the covariance of additive and dominance effects (denoted Cad), the inbreeding dominance variance (Vdi), and the sum of squared ID at individual loci (H*). If there are only two alleles per locus, the number of terms is reduced by one because H* = Vd (Cockerham & Weir, 1984).

There are at least four important reasons why future ID studies should endeavour to estimate causal components of variation in fitness related traits. First, the myriad of distinct comparisons contained within complex breeding designs can be succinctly summarized with these components. If we ignore epistasis, the genetic covariances among outbred relatives can be expressed as a function of only two components, Va and Vd (Lynch & Walsh, 1998). With inbreeding, three other terms can contribute (Cad, Vdi, and H*), but this still represents a tremendous reduction in the number of parameters to estimate in complex designs (relative to the number of observational components). In the Shaw et al. (1998) study of Nemophila menziesii noted earlier, only 2–3 genetic causal components proved sufficient to describe variation, despite the enormous number of distinct comparisons.

A second advantage of the quantitative genetic perspective concerns environmental sources of resemblance among relatives. Maternal effects are an important source of variation in many plant species (Roach & Wulff, 1987; Vogler et al., 1999) and can cause a phenotypic covariance among relatives even when there is no genetic variation. For this reason, quantitative genetic breeding designs often include specific features to separate maternal effects from genetic causes of resemblance. These features include reciprocal crosses and relationships that are limited to male ancestry (Lynch & Walsh, 1998). This is relevant to studies of VIFLID because maternal effects are absorbed into the anova parameters (inline image and inline image) that characterize variation within and among families and subfamilies. This is another factor hindering genetic interpretation of these quantities.

A third advantage of the causal components is that estimates are comparable across studies. Genetic components like Va and Vdi depend on allele frequencies and genotypic effects. In contrast, the observational components depend not only on these two factors, but also on whether the parents are inbred (note the differences in Figs 2 and 3). As a consequence, estimates for quantities like CSS are likely to differ between experiments using different crossing designs, even if the genetic architecture of fitness traits is the same. Differences in experimental design are accounted in the procedures used to estimate causal components.

Finally, estimation of causal components (Va, Cad, and Vdi) should allow more sophisticated tests regarding the nature of genetic variation in fitness (Kelly, 1999). The different predicted values for COO and CSS in Figs 2 and 3 reflect the differing contributions of these causal components. In addition, while most attention has focused on the idea that ID is caused either by deleterious mutations or by overdominance or by some combination of the two, this dichotomy need not fully characterize genetic variation in fitness. A number of selective mechanisms, including frequency-dependence, spatial or temporal variation, antagonistic pleiotropy, and genotype-by-environment interaction, might act to maintain substantial genetic variation in fitness (Clausen et al., 1940; Rose, 1982; Gillespie & Turelli, 1989; Subramaniam & Rausher, 2000). A contribution from these other mechanisms might be discerned from estimates of the causal components, particularly the magnitude of Va relative to the inbreeding components. Recent studies of both Drosophila melanogaster and Mimulus guttatus suggest that neither standard ID model (deleterious mutations or overdominance) is sufficient to explain variation in some fitness related traits (Charlesworth & Hughes, 2000; Kelly, 2003).

Parameter estimation and testing for genetic associations

The discussion thus far has focused on the different sorts of models that can be applied to data from family structured studies of inbreeding depression. A distinct issue is how a particular model can be fitted to the data, i.e. the algorithm used for parameter estimation. Least squares methods are typically used in fitting the anova model (Eqn 1), at least in botanical studies. However, maximum likelihood has a number of favourable features when estimating multiple variance components, particularly when sample sizes vary substantially among families (Searle et al., 1992). Likelihood algorithms suitable to breeding designs (or pedigrees) that contain inbred individuals have recently been developed (Shaw et al., 1998; Abney et al., 2000; Kelly & Arathi, 2003).

Likelihood methods also lend themselves to hypothesis testing. Considering the design of Fig. 1, a simple test would determine whether the standard anova model, in which COO = CSS, is sufficient to explain the data. First, one would obtain the maximum likelihood value for the data when fitting a model in which COO = CSS. This number is then compared to the maximum likelihood without that constraint (where COO is not held equal to CSS). The likelihood for the unconstrained value must be greater than or equal to the likelihood for the constrained model since the latter is a special case of the former. However, if the null hypothesis that COO = CSS is untrue, we expect that the likelihood value for the unconstrained model to be substantially greater. This comparison of likelihood values is formalized as a likelihood ratio test (Kendall & Stuart, 1973; see Burnham & Anderson, 2002 for an alternative treatment of likelihood values).

A second interesting hypothesis test concerns the magnitude of the inter-locus associations predicted by mating system models. The trajectories of Figs 2 and 3 are essentially single locus results that can be extended to an arbitrary number of loci in linkage equilibrium. The assumption of linkage equilibrium is noteworthy given that inbreeding allows associations of two different sorts to develop among loci. The first type, linkage disequilibrium, characterizes associations between alleles at different loci (Hartl & Clark, 1989). Variable histories of inbreeding might reduce the deleterious mutations load within some families relative to others and thus yield positive linkage disequilibria among loci harbouring deleterious alleles. The second sort of association, identity disequilibrium, refers to probability that different loci are jointly heterozygous (Hartl & Clark, 1989). Positive identity disequilibrium, in which heterozygosity is positively correlated across loci, is inevitable in a mixed mating population because individuals vary in the extent to which they are inbred. Outbred individuals are more likely to be heterozygous at loci across the genome than inbred individuals.

How can we determine the contribution of these inter-locus associations to ID and the genetic variance in fitness? This is a difficult question given the range of possibilities regarding the amount and nature of VIFLID without any association among loci (Figs 2 and 3). One approach follows from the fact that both linkage and identity disequilibrium can be greatly reduced by experimental crosses. In contrast, the genotypic effects and population allele frequencies that define the genetic components can be maintained largely unaltered in an appropriately designed experiment. As a consequence, it should be possible to experimentally synthesize populations with and without the hypothesized inter-locus associations. The classic studies demonstrating the contribution of associative overdominance, which is a consequence of linkage disequilibrium, to heterosis in corn provide a model for comparable studies in plant mating system evolution (Gardner, 1963; Moll et al., 1964; see Conner, 2002 for a recent application of this approach).

A second method is to consider the linkage equilibrium model as the null hypothesis and then determine if this model is sufficient to explain variation. The linkage equilibrium model is ‘nested within’ the more general model that allows disequilibria. Thus, a likelihood ratio test is possible if the number of distinct comparisons among relatives within the experiment exceeds the number of parameters to be estimated (e.g. Kelly, 2003). However, in order for such a test to have any real power, linkage and/or identity disequilibria need to have a clear signature in the pattern of resemblance among relatives, inflating some covariances more than others. Weir & Cockerham (1977) provide a detailed partitioning of genetic variation that allows linkage disequilibrium, but this partitioning has not been translated into empirical predictions. An important objective for future theoretical studies is to provide a clearer set of testable predictions for the various models of mating system evolution.

In conclusion, the VIFLID observed in experimental studies can result from a number of distinct genetic causes, and in real populations, probably often reflects a mixture of different causes. The standard two-way anova provides a valid test for the existence of VIFLID, but does not effectively characterize variation when VIFLID is present. Quantitative genetic methods provide a means to extract more information from ID experiments. Observational components such as COO and CSS accurately characterize resemblances within simple experimental designs (e.g. Fig. 1) and provide preliminary information regarding the genetic basis of ID (Figs 2 and 3). Designs with greater internal complexity allow the estimation of environmental and genetic causal components. Causal components provide a more informative description of variation and should ultimately allow tests for the inter-locus associations predicted by mating system models.


I would first like to apologize to the many botanists whose studies of inbreeding depression I did not reference due to citation limits. I would like to thank B. Obbo, J. Ward, L. Holeski, L. Villafuerte, M. Rausher, and two anonymous referees for careful review of the manuscript. I gratefully acknowledge the support of NIH grant 1 R01G60792-01A1, the Murphy Scholarship fund, and an NSF Epscor grant to the University of Kansas.