How to use molecular marker data to measure evolutionary parameters in wild populations


Dany Garant, Fax: (+44) 1865 271168; E-mail:


Estimating the genetic basis of phenotypic traits and the selection pressures acting on them are central to our understanding of the evolution and conservation of wild populations. However, obtaining such evolutionary-related parameters is not an easy task as it requires accurate information on both relatedness among individuals and their breeding success. Polymorphic molecular markers are very useful in estimating relatedness between individuals and parentage analyses are now extensively used in most taxa. The next step in the application of molecular data to wild populations is to use them to derive estimates of evolutionary-related parameters for quantitative traits, such as quantitative genetic parameters (e.g. heritability, genetic correlations) and measures of selection (e.g. selection gradients). Despite their great appeal and potential, the optimal use of molecular tools is still debated and it remains unclear how they should best be used to obtain reliable estimates of evolutionary parameters in the wild. Here, we review the methods available for estimating quantitative genetic and selection parameters and discuss their merits and shortcomings, to provide a tool that summarizes the potential uses of molecular data to obtain such parameters in wild populations.


Our understanding of the evolutionary dynamics and conservation of a population relies on the determination of quantitative genetic and selection parameters. Estimates of the selection intensity and the quantitative genetic architecture of phenotypic traits together indicate the ability of a population to respond to selection and thus to evolve (Lande 1982; Falconer & Mackay 1996). At the same time, comparison of quantitative genetic parameters such as the variance components of a trait among populations can prove useful in conservation when attempting to estimate levels of genetic variability (Storfer 1996). However, obtaining such evolutionary parameters from wild populations is not an easy task. One of the main problems is that the traditional techniques available to estimate variance and covariance components require accurate information about the relationships among individuals (we use relationships in the sense of pedigree information; Falconer & Mackay 1996; Lynch & Walsh 1998). In special cases, pedigrees can be determined from observations of mating activities, but this usually requires long-term intensive sampling (see Meriläet al. 2001; Kruuk et al. 2002; Garant et al. 2004) and still may not be entirely reliable. In most studies of natural populations, relatedness information is generally absent.

In the past decade, the extensive development and application of highly polymorphic molecular markers (especially microsatellites) to wild populations has proven highly valuable, particularly in the fields of population and conservation genetics (Haig 1998). Such markers have also proved to be efficient in kinship and parentage analyses given their power to infer relationships between individuals when part, or all, of the relatedness information between individuals is missing (Hughes 1998; Avise et al. 2002). With such information and techniques now being routinely applied and parentage analyses being extensively conducted in many species (see Jones & Ardren 2003 for a review of the methods), the next step in the application of molecular data to wild populations is to provide estimates of evolutionary-related parameters including quantitative genetic features (e.g. heritability, genetic correlation among traits) and measures of selection (e.g. gradients).

First, relatedness information obtained from parentage analyses can be used to assess quantitative genetics estimates. Second, estimates of individual fitness, usually defined as individual's contribution to the next generation, can be used to estimate selection. However, for both estimation procedures there are a number of ways in which the molecular data can be used that are distinguished by the extent to which they require an explicit reconstruction of a pedigree for the population. In addition, the reliability of some of the methods is debatable and comparative studies of the performance of different approaches are rare (but see Thomas et al. 2002; Wilson et al. 2003b). The most suitable means of exploiting molecular markers to obtain reliable estimates of evolutionary-related parameters remains unclear.

Our goal here is to review the available methods for estimating quantitative genetic and selection parameters given varying amounts of pedigree information available a priori. In doing so we aim to provide a tool that summarizes the potential applicability of molecular data (microsatellites) to evolutionary studies of wild populations. As phenotypic traits of interest are likely to be influenced by multiple genes and environmental factors, we focus on methods that deal with quantitative traits. Therefore, approaches using marker data to infer changes in allele frequencies through selection are not discussed here (see Ford 2002 for a review). We provide examples of the application of each method, with a special emphasis on those studies that have compared alternative methods. Our aim therefore is not to review the different methods available to estimate relatedness or infer parentage (for recent reviews in this area see Blouin 2003; Jones & Ardren 2003), but rather to cover the subsequent investigation of the evolutionary dynamics of a population in which molecular data may be used to examine the quantitative genetics and selection on phenotypic traits.

Methods available to estimate quantitative genetic parameters in the wild


Causal components of variance and covariance in quantitative phenotypic traits, and other key quantitative genetic parameters (see Box 1) can be derived by comparing the phenotypic similarity between individuals with their relatedness. The latter may be assessed either directly from their similarity at marker loci, or via pedigree information. The potential use of molecular markers will depend to a large extent on the situation encountered in the wild and in particular, on the extent and reliability of pedigree information already available via other means. Here we briefly summarize the different methods available to estimate quantitative genetic (QG) parameters from molecular marker data, separating them into four broad categories (see also Fig. 1): (i) direct estimation of QG parameters with no assumptions about relatedness; (ii) direct estimation of QG parameters assuming known classes of relatedness; (iii) explicit reconstruction of sibling groups; and (iv) explicit reconstruction of full pedigree involving all types of relatedness.

Figure 1.

Flow chart representing how the information gathered could be used to estimate quantitative genetics parameters, with potential benefits (+) and shortcomings (–) of methods.

Direct estimation of QG parameters with no assumptions about relatedness: the ‘Ritland’ method (regression method)

Overview.  The first approach, proposed by Ritland (1996, 2000), uses method of moments estimators based on the quantification of pairwise relatedness between individuals (see Queller & Goodnight 1989; Lynch & Ritland 1999). At first view, this method appears widely applicable and appealing as it is based on the determination of the simplest relationship between phenotype and genotype. Indeed, trait heritability is estimated from the covariance between pairwise phenotypic similarity and pairwise relatedness (Box 2). The approach therefore relies heavily on the efficiency of estimators of pairwise relatedness (see Van de Casteele et al. 2001 for a review), but does not require specification of an explicit pedigree or prior knowledge of population structure, other than maximizing the chance of getting significant actual variation in relatedness in the sample (see Box 2; see also Ritland 1996, 2000 for a detailed description).

Examples of application.  The Ritland method was suggested to be well suited to studies of plants given that they are sedentary and show passive dispersal and it is also where it was first applied, with modest success (Ritland & Ritland 1996). In brief, a wild population of yellow monkeyflowers (Mimulus guttatus) for which data were available on 10 quantitative characters and 10 polymorphic allozyme loci, was studied by Ritland & Ritland (1996), who found substantial differences between heritability values obtained from the marker-based method and those obtained from laboratory estimates. Specifically, for many characters, the marker-based estimates of heritability in the field were higher than in the laboratory, and sometimes outside the true parameter space (> 1; Ritland & Ritland 1996). In contrast, a recent study on the same species failed to find significant heritability for similar characters (Van Kleunen & Ritland 2004). Furthermore, another study applying the same technique to estimate the heritability of phenolic compounds in a population of Turkey oak (Quercus laevis) showed that the method failed to estimate heritability as there was very low relatedness between individuals and insignificant variance in relatedness in the population (Klaper et al. 2001; see Box 2).

Methodological issues.  When further surveying the literature it is surprising to realize that, although the Ritland method was published almost 10 years ago (Ritland 1996), there are very few other examples in either plant or animal populations, perhaps indicating difficulties in its application (Table 1; see Postma et al. 2003). First, there seems to be a high sampling variance for all pairwise relatedness estimators (Van de Casteele et al. 2001; Belkhir et al. 2002). Moreover, the relative performance of different relatedness estimators may vary in such a way that there is no single best-performing estimator (Van de Casteele et al. 2001). A probably more important problem, however, is the dependence of the estimates on the variance of relatedness in a population (see Box 2). On this point, it is unclear if one should use all potential pairwise comparisons available in the sample or if one should try to restrict such comparisons to the level where variance in relatedness has a better chance of being detected. Ritland & Ritland (1996), for example, used various distances cut-offs to choose the distance that maximized the variation in relatedness in their plants sample and thus increased their power to detect heritability. Furthermore, Klaper et al. (2001) suggested that an appropriate sampling design should increase the chance of getting a larger range of relatedness and more variation in this parameter, but this has not yet been demonstrated (see Postma et al. 2003). In any case, a significant actual variance in relatedness is required in the sample analysed, and for this to occur one needs to reliably sample a fairly high proportion of related individuals, which may be problematic in natural populations (see Klaper et al. 2001). Thus, the limited number of applications of this method is probably largely attributable to the low actual variance of relatedness found in most sampled populations. Therefore, despite its appeal, the utility of the Ritland method may be minimal.

Table 1.  Summary of studies comparing different methods applied to obtain quantitative genetic parameters in the wild with benchmark reference estimates
SpeciesEnvironmentParameters estimatedRitland methodPairwise likelihoodMCMC sibships reconstructionBenchmark reference estimate used for comparisonReference
  • *

    Standard errors were not available but values differed markedly (see Fig. 2).

Soay sheep (Ovis aries)NaturalHeritabilityUnreliableUnreliableSimilar but only when used in combination with known maternal-offspring relationshipsMaternal information + reconstructed paternity and sibshipsThomas et al. (2002)
Rainbow trout (Oncorhynchus mykiss)Controlled (aquaculture strain)Heritability, genetic correlationsh2: unreliable rG: similarh2: similar rG: similar (underestimated)Parentage exclusionWilson et al. (2003b)
Capricorn silvereye (Zosterops lateralis chlorocephalus)NaturalHeritability, genetic correlationsh2: unreliable rG: unreliable, but G-matrix comparison indicated similarityFull-sib analysis from cross-fosteringFrentiu (2004)
Bighorn sheep (Ovis canadensis)NaturalHeritabilityUnreliable*Maternal information + reconstructed paternity and sibships (as in Coltman et al. 2003)D. W. Coltman (unpublished)

Direct estimation of QG parameters assuming known classes of relatedness: maximum-likelihood approach

Overview.  If some prior information is available on population structure (a priori knowledge of the distribution of relatedness, e.g., all individuals are either full-sibs or unrelated), it is possible to use likelihood-based procedures to estimate quantitative genetics parameters (Thompson 1975). With this approach, pairs of individuals are placed into a predetermined population structure according to the probability of observing their genotype and phenotype (Herbinger et al. 1997; Painter 1997; Mousseau et al. 1998; Thomas et al. 2000). This is done by using the joint probability of observed phenotypic and genotypic data to determine likelihoods for assigning pairs to either full-sibs or unrelated groups; there is thus no explicit reconstruction of any form of pedigree and the method relies on assumptions about the distribution of relatedness in the sample.

In brief, individuals are first genotyped for molecular marker loci and scored for quantitative traits. Then, a maximum-likelihood procedure infers the relatedness between pairs of individuals (based on molecular marker data), but with the assumption of a mixture of unrelated and full-sib pairs only. Finally, the estimates of relatedness and data on quantitative traits are combined in a mixture model (as there are only two types of relatives) to infer quantitative genetic parameters (see Mousseau et al. 1998 for further statistical details).

Example of applications. Mousseau et al. (1998) specifically presented and tested this method using a captive population of Chinook salmon (Oncorhynchus tshawytscha). The heritability values they obtained for male early sexual maturity and colour were significant and within ranges commonly observed for these traits in salmonids (e.g. Gjerde & Gjedrem 1984; Heath et al. 1994; Wilson et al. 2003a). However, estimates obtained for body weight and length were not significant. Genetic correlations among traits were usually positive, but they did not differ significantly from zero. Despite this, the relative magnitude of the correlations was in agreement with theoretical expectations as size traits were highly positively correlated and other traits showed only intermediate level of correlations (Mousseau et al. 1998).

Methodological issues.  The main difference between this procedure and the approach of Ritland (1996) is that in the present case there is an explicit assumption made about the distribution of relatedness (either full-sibs or unrelated), allowing maximum-likelihood methods to be used. The maximum-likelihood method therefore allows QG parameters to be estimated without a separate first stage of constructing a pedigree.

Simulation work has shown that the maximum-likelihood method returns lower errors around estimates of heritability than the regression method; however, they were still 50% higher than those obtained with reconstructed sibships (Thomas & Hill 2000; see next section). Apparently, only in cases of balanced populations containing two classes of relationship, with families weighted equally, does the method gives estimates similar to estimates obtained from known pedigree information (Thomas et al. 2000). In wild populations such scenario is very unlikely, and in most cases the prior relatedness distribution will also be unknown, which may explain why this method has not been broadly applied. Furthermore, because this approach has also received less empirical attention than others (see next section), it is unclear how deviations from the ideal conditions of two classes of relatedness might impact QG estimates.

Explicit reconstruction of sibling groups

Overview.  The third approach involves an explicit reconstruction of groups of a certain relatedness, which can then be used as pedigree information in a standard quantitative genetics analysis. In this case, the construction a pedigree from genetic data without parental information is usually performed using Markov chain Monte Carlo (MCMC) procedure (Hastings 1970) to reconstruct sibships within a single generation (see Thomas & Hill 2000; Smith et al. 2001; see Blouin 2003 for a review). The MCMC method allows improved parameter estimation through the weighting of families and uses more information than the regression method (Thomas & Hill 2000). In brief, the sibship reconstruction procedure generates probable sibships from the population sample using the marker data, aiming to reconstruct a number of groups with specific relationships rather than determining likely distances between each member in the sample (Thomas & Hill 2000). The method uses allele frequencies estimated from a given sample to provide likelihood, or the allele frequencies can be iteratively re-estimated at each step as families are constructed (as in Thomas & Hill 2000). The number of possible partitions increases rapidly with the number of individuals included in the analysis; an MCMC procedure is thus used to sample from the distribution of likelihoods in order to identify the most likely partitions. Alternatively, other methods can partition individuals into full sibships without the need for information about population allele frequencies. Almudevar & Field (1999), for example, presented an algorithm that searches for all possible full-sibling groups that could have been produced by a single pair of parents and that assigns a score to each possible sibship that is function of its probability given the parental genotypes; these scores are then used to find the most likely partition (based on likelihood principles; see Blouin 2003).

Whatever the method adopted to reconstructed sibships, the next step is then to build a relationship matrix suitable for use in either traditional quantitative genetics analyses such as a full-sibling analysis of variance (Falconer & Mackay 1996) or more complex analyses such as an animal model (Lynch & Walsh 1998; Kruuk 2004; see next section for details).

Comparisons of algorithms.  Recent work using both simulated and real data sets assessed the existing algorithms for sibship reconstruction in terms of accuracy and computer efficiency under a range of sample size (number of individuals), DNA marker information (number of loci, number of allele per loci and type of allelic frequency distribution) and family structure (ranging from highly unrelated to highly related sets). Specifically, Butler et al. (2004) examined (i) the Almudevar & Field method (Almudevar & Field 1999), (ii) the MCMC pairwise score method (Smith et al. 2001), and (iii) the MCMC full joint likelihood method (Painter 1997; Smith et al. 2001). They also tested a new algorithm called the ‘Simpson method’ that uses an ascent method to maximize the Simpson's index of concentration (the probability that two randomly chosen individuals are classified in the same group; the larger the index, the more concentrated the individuals into a smaller number of groups; see Butler et al. 2004), while imposing a full-sib genotype constraint similar to that implemented in the pairwise score algorithm (Butler et al. 2004). Overall, and not surprisingly, the accuracy in full-sibship reconstruction improved with the level of information included in the data set from four loci with four alleles each to eight loci with eight alleles each. Also, increasing the number of alleles per locus (from four to eight) while keeping the number of loci constant at four was usually better than the reverse tactic. In general, it seemed that using at least six to eight loci with six to eight alleles per locus should be a minimum for accurate full sibship reconstruction. All four algorithms were robust to deviations from uniform allelic distribution and performed well with resampled real data sets as with simulated data sets. In contrast, there were marked differences among the algorithms in their ability to deal with different types of family distribution and thus, no single approach performed well over all conditions tested (see Butler et al. 2004 for more details). In brief, the Almudevar & Field and Simpson approaches were efficient at reconstructing the pedigree of highly related sets, but the former had computational problems with large data sets. The pairwise score method operated well with low to medium relatedness and was very fast, but had major problems with highly related sets. The full likelihood approach performed best with small data sets of medium relatedness but had computational issues in the presence of large families. Butler et al. (2004) thus suggested that a combination of approaches is probably the best strategy and that it might be useful to use the predicted partition from one approach as the starting point for another. This still remains to be tested.

Explicit reconstruction of full pedigree involving all types of relatedness

Overview.  If molecular data are used to reconstruct different types of pairwise relationships, these can be integrated to form a complex pedigree potentially spanning a number of generations. Whilst this approach could theoretically incorporate the sibship-reconstruction methods outlined in the previous section, it is more typically used with parentage analysis, such that a multigenerational pedigree is constructed from a combination of maternal and paternal identity information. To this end, it will involve using one of the many possible approaches to parentage assignment (Blouin 2003; Jones & Ardren 2003). Prior information about relatedness or pedigree structure is not a prerequisite, although construction of a multigenerational pedigree will require temporal information, for example, to be able to distinguish parental vs. sibling relationships.

In many situations, the molecular data may only be used to provide paternal identities where maternal identities are already known from field observations, such as in many mammalian studies (e.g. Milner et al. 2000). In other situations, all maternal and some paternal links may be known, and the marker data used to provide additional paternities (e.g. Kruuk et al. 2000). In many passerine bird studies, identities of both parents may be based on field observations, with molecular data used to check the reliability of those links, particularly when extra-pair paternity is suspected (e.g. Meriläet al. 1998). It is also feasible to supplement the information from these parentage analyses with additional groupings of individuals. For example, reconstructed sibships of individuals with unresolved paternities (made using methods discussed in the previous section) could be added so as to include as many individuals as possible in the pedigree. Although this is a potentially fruitful approach that would maximize the marker data available, it has been rarely applied to date (Thomas et al. 2002; Coltman et al. 2003; see below).

It is worth noting however, that pedigree reconstructions using genetic data are potentially imperfect. For example, some studies used a confidence level of only 80% in their assessment of paternity (Kruuk et al. 2000; Coltman et al. 2001) while others had a potential error rate of around 5% (Milner et al. 2000; Coltman et al. 2003). Such variation will, without doubt, influence the heritability estimates: Milner et al. (2000) present heritabilities estimated using a pedigree constructed from paternities assigned with 95% confidence, but also mention that estimates obtained using 80% confidence, for which more paternities could be assigned, were lower. In such cases, there is a trade-off between assigning useful numbers of paternities to provide an interconnected pedigree vs. their reliability.

Quantitative genetic analysis of reconstructed pedigrees.  Having derived maximal pedigree information from the marker data, quantitative genetic parameters for given phenotypic traits can then be estimated in different ways. The most simple of these is to consider the phenotypic covariance between pairs of a given relationship: for example, through a linear regression of parent–offspring values, or an analysis of variance amongst sibling groups (Falconer & Mackay 1996; Lynch & Walsh 1998). However, with a complex pedigree in which information on more than one type of relationship is available — e.g. on identities of both parents and siblings of an individual, or parents, offspring and grand-offspring — this is inevitably an inefficient use of data. An alternative approach is to use a method that exploits the entire relatedness matrix for a pedigree, so that for each individual its phenotypic covariance with all its relatives is considered. This is the approach taken by a form of mixed model now routinely used in applied situations, the ‘animal model’ (see Box 3). The animal model method can also be applied to the partial pedigrees reconstructed only from sibships as in the previous section (see Thomas et al. 2002), or from pedigrees derived entirely independently of molecular data, such as from behavioural observations (see Réale et al. 1999).

Animal models are extensively used in animal breeding but, for no obvious reason, their application within evolutionary biology has only been relatively recent (see Kruuk 2004 for discussion). In addition to their efficient exploitation of all available data, they possess several key advantages over traditional techniques used to estimate heritabilities from pedigree information. Specifically, they can incorporate unbalanced data sets with missing observations or missing pedigree links (both of which are common in studies of natural populations). Furthermore, the estimates of variance components they provide are unbiased by the occurrence of selection, inbreeding or nonrandom mating in a study population, because models correct for the flow of genetic information across generations. A further key advantage is their flexibility for the explicit modelling of additional random effects such as maternal or common environment effects which may otherwise be confounded with additive genetic effects, biasing estimates of heritability. Another benefit of this approach is that no assumptions are made about the types of relatedness present in the population.

Applications.  Animal models have been applied in combination with molecular data pedigree construction in systems in which it is not possible to assign parentage from observations alone (e.g. wild fish studies, see Garant et al. 2003). Several mammalian studies have also used multigenerational pedigrees comprised of known mother–offspring links complemented by paternities assigned using molecular data to fit animal models to estimate heritabilities and genetic correlations (e.g. red deer: Kruuk et al. 2000; Soay sheep: Milner et al. 2000).

Specifically, Coltman et al. (2003) used what we regard as a very useful approach to the problem of reconstructing missing pedigree links. As they already knew mother identities from field observations, they first used a paternity analysis to assign as many offspring to fathers as possible and then performed sibships reconstruction to complement their pedigree. Finally they used this pedigree with an animal model in an analysis of the quantitative genetics of horn size and body weight in bighorn sheep (Ovis canadensis). In more detail, maternity was known for around 80% of marked sheep since 1971. However, blood, hair and tissue sampling in the later part of the study allowed genetic analyses to be conducted at 20 microsatellite loci to assess paternity and sibship links. Paternity of 241 individuals was thus assigned using a likelihood-based approach at a 95% confidence level (Marshall et al. 1998). Thirty-one clusters of 104 paternal half-sibs were then built among the unassigned offspring, where a paternal half-sibship consisted of all pairs of individuals of unassigned paternity that were identified as having a significant likelihood ratio for a paternal half-sib relationship vs. being unrelated (Goodnight & Queller 1999; see also Coltman et al. 2003). Members of reconstructed paternal half-sibships were then assigned a common unknown paternal identity, which increased the proportion of individuals with some form of paternal identity by more than 40% (345 paternity identities in the final pedigree). An animal model was then applied to the reconstructed pedigree for the estimation of variance components and heritabilities.

Comparisons of methods

Given the flexibility, the power and the robustness of its method, the animal model provides a true benchmark to which other estimates can be compared to assess their efficiency in obtaining quantitative genetics estimates. However, only a few published studies have explicitly compared the quantitative genetic results obtained with either the regression approach, the likelihood method or the reconstructed sibships, to the estimates obtained from a pedigree or from known relationships (see Thomas et al. 2002; Wilson et al. 2003b). Here we briefly summarize the main conclusions of these studies (see also Table 1).

Thomas et al. (2002) compared estimates of heritability for body weight in a feral population of Soay sheep (Ovis aries) obtained from the pedigree-free approaches (regression and likelihood methods) to pedigree-based method (from reconstructed sibships alone, from a combination of known maternities and molecular-based paternities, and from a combination of both). Using 12 microsatellite loci on a subset of the 759 measurements obtained from the Soay sheep database (see Milner et al. 2000), they found that the regression approach gave unreliable results that were highly sensitive to the fixed effects included in the analysis of body weight (see Fig. 2). In general, low amounts of marker data and low numbers of relatives in the sample resulted in poor estimates of the actual variance of the relatedness, which were greatly underestimated. As well, the likelihood approach gave negative estimates of the heritability and so estimates were fixed at the boundary of the parameter space (Table 1). Again this was because of insufficient amounts of marker data available to generate useful relationship information, and low numbers of relatives in the sample with which to partition the phenotypic variance. Interestingly, the MCMC approach using only reconstructed sibships also failed for similar reasons. Thomas et al. (2002) thus suggested that for these techniques to be successful in a natural situation, a greater number of relatives are required in the sample as well as a greater amount of marker information. Finally, they also clearly showed that the incorporation of known relationship information (such as maternal identities) into the likelihood, combined with the MCMC approach, allowed more reliable estimates of the genetic variance to be determined. They concluded that in their situation, reconstructing the most complete pedigree possible and then using an animal model was preferable to the ‘pedigree-free’ approaches.

Figure 2.

Heritability values obtained from different studies that compared the Ritland's method (grey bars) to values considered to be benchmark estimates (black bars; see Table 1 and text). Bar charts represent heritability values for each trait analysed in each study (with their standard errors when available). Dotted lines indicates the limits of the parameter space. A. Thomas et al. (2002) where (d,t) (y,d) (y,t) are the different fixed effects included in models: year (y), day of measurement (d) and twin status (t); B. Wilson et al. (2003b); C. Frentiu (2004); D. D. W. Coltman, unpublished.

In another comparative study, Wilson et al. (2003b) used rainbow trout (Oncorhynchus mykiss) strains to estimate heritability and genetic correlations of weight and spawning time. They used 71 parental fish to obtain a progeny generation containing 595 individuals originating from both intra- and interstrain crosses that were genotyped with at least eight microsatellite loci. They compared the regression approach and the MCMC sibship reconstruction procedure to values obtained from their true pedigree obtained from full parentage analysis (parentage exclusion approach, where 97% of offspring were assigned to a single parental pair). They found that both the regression and MCMC methods were able to detect significant components of genetic variance and covariance for the traits analysed. However, while the genetic correlations were fairly close to the values estimated with the pedigree, the regression model provided estimates of heritability that were quantitatively unreliable (Table 1, Fig. 2). Indeed, the regression method had both a significant bias and a low precision, apparently due to the poor performance of the estimator of pairwise relatedness. In fact, estimates of heritability were mostly outside the true parameter space (0 > h2 > 1). In contrast, genetic parameters estimated from the reconstructed pedigree showed close agreement with ideal values obtained from the true pedigree (Table 1). However, the parameters based on the reconstructed pedigree were underestimated due to the complex structure of the true pedigree. The true pedigree consisted of a high number of half-sibling relationships, causing the partitioning of full-sibships to be inaccurate and reducing the recognition of relatedness between families (see Wilson et al. 2003b).

Finally, using data from a population of Capricorn silvereyes (Zosterops lateralis chlorocephalus), Frentiu (2004) presented a comparison of genetic variances and covariances (and corresponding heritabilities and correlations), for six morphological traits estimated from an analysis of variance amongst full-siblings vs. estimates from regressions using the Ritland method. The data set included 214 individuals genotyped at 11 microsatellite loci, cross-fostered as chicks so as to minimize common environment effects. Several conclusions arise from the comparison. Figure 2 shows the estimates of heritability for the only three traits for which values could be estimated (remaining traits had negative additive variance). First, there is little similarity between the estimates of heritability from the two methods (see Fig. 2). Similarly, estimates of the 15 additive genetic covariances between the six traits, showed little obvious correspondence between the two methods, in either relative or absolute magnitude (not shown). This is notable given that the estimates of genetic covariances or correlations do not incorporate estimates of the variance in relatedness (see Box 2), so the apparent lack of accuracy is not due simply to problems with the latter. Thus, it seems that the results here differ from those of Wilson et al. (2003b), where genetic correlations were fairly accurate.

However, Frentiu also reports a comparison of the structure of the G-matrix, using Krzanowski's (1979) method based on the alignment of the principal components of the subspaces defined by the two variance–covariance matrices. This approach indicated significantly greater similarity between the two subspaces than would be expected by chance, implying a high similarity between the principal components of the two matrix subspaces. This result is more difficult to interpret, given the apparent lack of similarity of the explicit values, but it suggests that estimates made using the Ritland approach can detect similar patterns in the genetic architecture, as can those from the conventional analyses.

Thus, it seems that estimates of variance components based on pedigree reconstruction are consistently more accurate than the pedigree-free methods. In some cases, such as the Capricorn silvereye study (Frentiu 2004), estimates from the two methods differ by an order of magnitude (Fig. 2). However, despite the scarcity of comparative evidence, there are some indications that pedigree-free methods could still provide useful indications of the genetic architecture underlying multiple phenotypic traits (see also unpublished data from D. W. Coltman in Table 1 and Fig. 2). These few studies also suggest that the regression method could still be used to some extent for comparison of relative amounts of additive variance/h2 for different traits within a population (see Fig. 2).

Methods available to estimate selection gradients in the wild


Secondly, we consider an alternative use of molecular marker data: estimating selection on phenotypic traits. Evolutionary change is usually driven by natural selection, and evolutionary biologists have devoted considerable energy to classifying and quantifying patterns of selection (see Lande & Arnold 1983; Kingsolver et al. 2001). As discussed in the previous section, relatedness information derived from marker data can be used in various ways to generate estimates of quantitative genetic parameters. However, it can also be used to assess individual breeding success or fitness and — from the ensuing relationship between fitness and phenotype — to quantify selection on phenotypic characteristics. To this end, molecular data are most frequently used to categorically determine the parentage of individual offspring (Jones & Ardren 2003), and an estimate of the fitness of individuals is typically defined as the total number of offspring assigned to each. These fitness measures can then be regressed on measures of phenotype to indicate selection on particular traits (Endler 1986).

However, just as parentage can be assigned fractionally rather than categorically (Jones & Ardren 2003), methodology has now been derived for exploiting genetic markers to infer selection using a fractional rather than categorical approach to the estimation of fitness (see Box 4). As with Ritland's marker-based regression method, these techniques were originally developed for the analysis of data from plant systems, and despite their intuitive appeal, applications to date have not to our knowledge included animal studies. Here, we outline their methodology, briefly review the published literature exploiting these techniques, and then evaluate their merits relative to more traditional techniques. We focus on the maximum-likelihood methodology presented by Smouse et al. (1999), which builds on earlier work by Meagher and Adams and colleagues (e.g. Meagher 1991; Adams et al. 1992; Smouse & Meagher 1994). See also Morgan & Conner (2001) for an outline of techniques.


As with traditional selection analyses (Lande & Arnold 1983; Arnold & Wade 1984), marker-based estimates of selection on a phenotypic trait quantify the statistical association between an individual's fitness and its phenotypic value. However, fitness is now determined by an individual's probability of parentage of each offspring in the population, rather than by categorical assignment of parentage of a set number of offspring (see Jones & Ardren 2003 for discussion of the distinction). The approach is based on maximum-likelihood estimation: the likelihood of a given parent–offspring pair is defined by the product of (i) that parent's phenotypic fitness, as determined by the selection on its phenotypic traits, and (ii) the genetic probability of it producing the particular offspring, given their respective genotypes (see Box 4).

Because of the flexibility of the maximum-likelihood approach, the framework can be easily extended to consider quadratic (e.g. Morgan & Conner 2001) or other selection terms. Smouse et al. (1999) demonstrate the inclusion of measures of distance between males and prospective female parents (a key predictor of mating success in several plant systems), so that the fitness of a given male is determined by its distance from all possible mates. In theory, the technique could also be usefully extended to test for temporal and spatial heterogeneity in selection estimates.

Methodological issues

The statistical significance of each parameter is assessed by comparing the superiority of a model in which it is set to its maximum-likelihood value to one in which it is constrained to be zero. As in any such analysis, different models are easily compared within the maximum-likelihood framework under the assumption of an asymptotic χ2 distribution for the likelihood ratio of nested models. In practice, however, unease with this assumption has led most authors to report significance based on randomization tests such as bootstrapping (Smouse et al. 1999; Elle & Meagher 2000; Morgan & Conner 2001). Even here, the choice of exactly which model to bootstrap becomes critical. For example, when considering multiple correlated traits, tests of the significance of a particular βi (selection gradient on trait i; see Box 4) in a model in which all other βs are set to zero (analogous to tests of type I errors in a linear model) are inevitably far less conservative than those in a model in which all other parameters are set to their maximum-likelihood estimates. Interpretation of very different P values based on χ2 statistics or bootstrapping of different models therefore becomes complicated (Elle & Meagher 2000). As with any such analysis, one would ideally hope to arrive at the most parsimonious model containing only significant selection gradients.

Examples of applications

Use of the marker-based selection estimates to date appears to have been restricted to plant systems. In their application of the techniques they develop, Smouse et al. (1999) found no significant effect of floral morphology on reproductive success in the dioecious lily Chamaelirium luteum, in accordance with previous studies of the same system. In contrast, Morgan & Conner (2001) overturned previous conclusions regarding selection on the wild radish Raphanus raphanistrum, finding evidence of both directional and stabilizing on male floral characters. Elle & Meagher (2000) also estimated contributions of floral phenotype to variation in male mating success in the andromonoecious perennial Solanum carolinense, and reported significant selection via male fertility on the proportion of male flowers on a plant. In an analysis of mating success in the dioecious plant Silene latifolia, Wright & Meagher (2004) found no evidence for directional selection, but stabilizing selection in one of the two years for calyx diameter. Finally, despite sampling only 10% of the potential paternal plants, Van Kleunen & Ritland (2004) still found significant negative paternal selection gradient for anther–stigma separation on fertilization success in Mimulus guttatus.

Comparisons with other methods

Given the paucity of applications to date of the selection methodology, direct comparisons of marker-based estimates of selection against those made from discrete paternity assignment are unfortunately as scarce as quantitative genetic parameter estimates. Morgan & Conner (2001) report analysis of the same data set analysed in Conner et al. (1996). Selection estimates are markedly concordant (r = 0.82), but the marker-based tests appear considerably more powerful, detecting significant selection in 56% of traits compared, whereas only 25% were significant using traditional selection gradients (see Fig. 3).

Figure 3.

Comparison of standardized selection gradients for floral traits in Raphanus raphanistrum through male fitness. Estimates for each of 1991, 1992 and 1993 for: flower size, anther exsertion (plus quadratic term for each), flower production, and pollen number per flower (1993 only). x-axis: estimates from traditional selection gradients, redrawn from Table 2 of Conner et al. (1996). y-axis: estimates from marker-based selection gradients, redrawn from Table 1, Morgan & Conner (2001). Selection estimates are concordant (r = 0.82).

In a second case, two different studies (Meagher 1991; Smouse et al. 1999) estimated selection on the same characters in the same system (Chamaelirium luteum) but using slightly different data sets and a principal components analysis in one but not the other. Despite the increased power of an extended data set and the apparently more powerful techniques used for the marker-based selection estimates (Smouse et al. 1999), both studies reached the same, albeit somewhat surprising, conclusion of no evidence of significant selection on plant size through male fertility.

Discussion points

It has been suggested that marker-based methods offer a ‘much more refined’ means of quantifying selection than those based on categorical assignment (Morgan & Conner 2001). All the arguments relevant to the categorical vs. fractional assignment of paternity (see Jones & Ardren 2003 for a clear summary) become relevant to the use of fitness measures in estimating selection: for example, categorical assignment will overestimate variance in reproductive success (and hence potentially bias selection estimates) relative to fractional assignment. Certainly, they circumvent one stage of the traditional analyses of using the marker data to first derive individual male fertilities, and then, ignoring the error variance in these fertility measures (Devlin et al. 1988), using them to derive selection gradients. The number of parameters estimated (and hence ultimately degrees of freedom used) is only the number of phenotypic traits potentially under selection, rather than the total number of males (Morgan & Conner 2001), generating what should be a more powerful analysis.

Furthermore, the marker-based approaches should also be more powerful in exploiting all genetic information available on all individuals in a population, in comparison to categorical paternity assignment, which will necessarily be left with missing paternities in cases where there is no sufficient evidence to identify the father amongst candidate males. Simulation results indicate that with moderate sample sizes (< 200 males), marker-based selection estimates have sufficient power to detect weak selection pressures even with relatively uninformative marker data (e.g. an exclusion probability of approximately 80%, as might be typical of allozyme data) (Morgan & Conner 2001), and studies with access to more polymorphic loci such as microsatellites will have even greater power. However, both categorical and fractional approaches will be affected by incomplete sampling of candidate males. With marker-based selection, this will hopefully be represented in low genetic probabilities for the sampled males, such that that offspring makes little contribution to the overall likelihood; with categorical paternity assignment, the offspring will hopefully simply not be counted. In both situations, male reproductive success can only be assessed relative to sampled males, and so there is an implicit assumption that this is representative of relative fitness across all males.

Conceptually, marker-based selection estimates require more of a leap of faith than the estimation methods for quantitative genetic parameters considered in previous sections. The genetic and the phenotypic similarities between two individuals are both continuous variables, and we are ultimately only using pedigree-based relatedness as a measure of the expected similarity at the loci determining the phenotypic trait. Therefore, their correlation is biologically meaningful. However parentage in reality is discrete, and so representing it by its likelihood, or by the probability of paternity, will only ever be an artificial approximation.

In the two systems where comparisons have been made, marker-based estimates of selection seem generally similar in magnitude to estimates based on categorical paternity assignment, but potentially with greater statistical power (Fig. 3). However there is no doubt that further comparisons from a range of systems are necessary. Ultimately, an invaluable assessment of the marker-based selection estimates’ potential would comprise a three-way comparison of selection estimates for a single system derived from the following: (i) estimates of fitness when parentage is known from means other than genetic markers (for example through reliable behavioural estimates, or using realistic simulated data); (ii) estimates of fitness when parentage is assigned categorically to individuals using molecular markers, equivalent to the pedigree-reconstruction estimates above; and (iii) the marker-based estimates of selection. Conclusions may vary depending on parameters such as the assignment power of the markers and in particular the genetic similarities between candidate males, and so forth. However such comparisons would address the as yet unanswered question facing researchers with molecular marker data on their hands: whether to measure selection using a categorical or fractional approach. In summary, marker-based selection methods offer an intuitively appealing use of molecular data, and may well turn out to be of considerable value for the field ecologist faced with no other means of assessing fitness in a population, but at present we know too little about their performance and properties to conclude that they are superior to traditional techniques.

General concerns

Genotyping errors

Genotyping error and mutation problems have been highlighted in DNA marker-based studies of relatedness (see Blouin et al. 1996; O’Reilly et al. 1998) and solutions to deal with them have been integrated into parentage analyses (see Sancristobal & Chevalet 1997; Marshall et al. 1998; Duchesne et al. 2002; Jones & Ardren 2003; for a review). It is therefore surprising to note that none of the reviewed approaches for estimating either QG parameters or selection gradients explicitly takes into account genotyping errors and mutations. Marker-based measures will be downwardly biased due to genotyping errors, which will in turn underestimate the covariance between relatedness and phenotypic similarity, and also underestimate the variance in relatedness. These genotyping errors will also obviously impact parentage assignment used either for animal models or selection analyses.

Some recent results obtained from simulation work done on sibships reconstruction underline the fact that genotyping errors and mutations are worrying issues. Painter (1997) showed that the presence of mutations could alter the assignment results, increasing the number of potential relationships by up to six times. Recent simulations conducted by Butler et al. (2004) explored the performance of different algorithms (see section 3) given the occurrence of different types of errors (null alleles, genotyping errors and mutations). They concluded that while all methods were quite robust to the presence of null alleles, they were very sensitive to both genotyping errors (allele-designation errors and single-locus inversion between individuals) and mutations in the data set. Depending on the type of errors (excluding null alleles) and algorithm used, on average between 70% and 98% of individuals (initially correctly classified and randomly affected by a given type of error) were classified incorrectly (Butler et al. 2004).

However, solutions may be emerging, as recent algorithms are aimed at tackling the typing error problem. Specifically, Wang (2004) recently presented the first algorithm performing sibship reconstruction from genetic data using a likelihood method that allows for common microsatellite typing errors. This method can be used to infer full- and half-sibships accurately from marker data with a high error rate and to identify typing errors at each locus in each reconstructed sib family. Wang (2004) suggests that this algorithm can also be modified to be used with dominant markers such as amplified fragment length polymorphism (AFLP), separately or with codominant markers in sibship inference.

Environmental effects

All the QG parameter estimates in the studies described above are derived from analysing the phenotypic covariance between individuals of differing degrees of relatedness. These estimates will therefore be upwardly biased by any nongenetic source of similarity between relatives, such as maternal effects or common environment effects. The problem applies to all approaches, and the different methods have their respective ways of dealing with it. Typically these are based on the inclusion of additional terms in the model to account for the environmental effects. For example, to quantify the additional covariance that may result between groups of full-siblings sharing a common environment, an additional term of group identity can be added to Ritland's regression equations (see Frentiu 2004), or an additional random effect of group identity can be added to an animal model (Kruuk 2004).

Similarly, estimates of selection based on either explicit parentage assignment or on the likelihood methods may be equally biased by environmental covariance with fitness: if some aspect of environmental conditions affects both the phenotypic traits of interest and also, independently, affects breeding success, this generates a statistical association between phenotype and trait that overestimates the potential of the trait to respond to selection (note that traditional estimates of selection based on fitness determined without the use of molecular tools are also prone to such bias — see Rausher 1992; Stinchcombe et al. 2002).

Conclusions and recommendations

The current review underlines the fact that pedigree-free methods might not be as useful as previously thought for QG parameters estimations, despite the development and availability of sufficiently powerful genetic markers in many systems. Indeed, given the comparative unreliability of marker-based direct assessment methods for estimating such parameters (see Thomas et al. 2002; Wilson et al. 2003b), the determination of a reliable pedigree is likely to be more useful in a typical population characterized by low levels of average relatedness.

MCMC procedures to reconstruct sibships seem to provide an improved way of estimating variance components compared to pedigree-free techniques. However, it is surprising that all of the techniques to reconstruct groups of related individuals reviewed here (see section 3) only attempt to identify groups of siblings, rather than parent–offspring relationships, or more distant relatedness (but see Thomas & Hill 2002 who have extended their first algorithm to detect full-sib families nested within half-sib families). Potential problems could arise, for example, if a pair of individuals are identified as full-sibs, but could instead be parent and offspring. Maximum-likelihood approaches should be readily extendable to incorporate prior information that may be relevant to assigning relatedness such as the date of birth, the sex of individuals, or information about mating behaviour. Other algorithms designed for specific situations are available which could, given the appropriate circumstances, be used for a pedigree reconstruction prior to quantitative genetics analyses. For example, Nürnberger et al. (2005) present a method for identifying parents and full-sib groups amongst samples of adults and offspring in hybrid populations characterized by high linkage disequilibrium; their motivation was to quantify assortative mating amongst the adults, but the methods could equally be used for estimating QG parameters for phenotypic traits measured in the populations.

In cases with sufficient pedigree knowledge or efficient sibship reconstruction, the animal model provides a powerful approach for the estimation of quantitative genetics parameters. As such, important advances in the field of assessment of parentage confidence of assignments have recently been made (see Jones & Ardren 2003). However, there is still a need to develop a user-friendly program that includes both parentage and sibships assignments. Recent work on the available sibship reconstruction algorithms also suggests that a combination of some approaches could provide the best method for the future, especially if methods to incorporate genotyping errors can be implemented into new or existing software.

Recommendations about which methodology to use are therefore highly dependent on what kind of complementary information exists. In any case, it would be recommended to make maximum use of independent non-marker-based pedigree information, as for example when including known maternal identities in pedigree reconstruction. It should also be noted that to date, most studies have opted for either parentage assignment or sibship reconstruction to estimate quantitative parameters. However, recent works reviewed here (see Thomas et al. 2002; Coltman et al. 2003) suggest that a hierarchical application of different techniques in pedigree reconstruction could be profitable: (i) use parentage analysis and other available information to reconstruct pedigree links as much as possible, and then (ii) apply sibships reconstruction to complement this information. There are, however, practical issues that will still need to be addressed with such approach though, such as how to integrate potentially conflicting results obtained from the parentage analysis and the sibships reconstruction procedures. For example, what to do if a group of full-sibs that was identified among individuals with unresolved paternity but the parentage analysis suggested that some members of that group have different mothers? Such an issue should be addressed by a program that would include both parentage and sibships assignment procedures.

The methods discussed for estimating selection raise interesting opportunities, even if their evaluation would greatly benefit from attempts to apply these techniques in systems other than plants (though these would at the same time provide necessary comparative results). It is interesting to note that there is a similarity between choices for estimating quantitative genetic parameters and selection gradients. In both cases one has the choice between assigning either sibships or parentage to provide explicit pedigree information or a discrete measure of fitness, or alternatively using an approach with no reliance on explicit assignment of relationships or pedigree reconstruction. While the explicit pedigree reconstruction seems to be a better approach for estimating quantitative genetic parameters, it is too early to judge the relative merits of these two options for the estimation of natural selection.


  • Box 1

    Definitions of quantitative genetic parameters of general interestAdditive (genetic) variance: the component of phenotypic variance among individuals in a trait that can be attributed to additive genetic differences, those associated with the additive effects of alleles that are independent of other alleles or loci.Environmental variance: the component of phenotypic variance among individuals in a trait among genetically identical individuals. This variation might be due to different environmental conditions experienced by individuals or to random factors.Genetic covariance: the covariance between two traits that is due to additive genetic effects.Genetic correlation: a standardized measure of genetic covariance that takes value between −1 and 1, calculated as ratio of the genetic covariance between two traits to the square root of the product of their respective additive variances.Heritability (narrow-sense): expresses the extent that a given phenotype is determined by the genes transmitted from its parents and will thus determine the degree of resemblance between relatives (in a population). It is defined as the ratio of additive genetic variance to the total phenotypic variance.Phenotypic variance: describes the total amount of variance (sum of all components) observed for a given character in a given population. It is generally decomposed into its genetic and environmental components.

  • Box 2

    The Ritland's regression method (Ritland 1996, 2000)To assess the phenotypic similarity among pairs of individuals one calculates the value of a given quantitative trait Y for two individuals i and j such that Yi is the phenotypic value for the first individual and Yj for the second. Their shared phenotype is measured by:

    image( (eqn 1) )

    where U and V are the mean and variance of the phenotypic trait Y in the population. Then among all pairs, the average Zij equals the phenotypic correlation. As shared phenotypes are most likely determined by the sharing of both genes and environments, then

    image( (eqn 2) )

    where rij is the relatedness coefficient (see Van de Casteele et al. 2001 for a review), re is a correlation due to sharing of environments, and eij is the error. As this is a linear regression equation, one can estimate heritability over several pairs of individuals as:

    image( (eqn 3) )

    where cov(Zij, rij) is the covariance between the phenotypic similarity and the estimated relatedness, and var(rij) is the actual variance of relatedness (see Appendix 4 in Ritland 2000 for details of calculations). As noted by Ritland (2000), to apply this method, the need to measure the actual variance of relatedness is critical. Actual variance of relatedness occurs when there is a mixture of different relatives, such as full-sibs and unrelated individuals. However, if the actual variance of relatedness is not statistically significant, one could at least verify the presence of genetic variation by testing for positive cov(Zij, rij) (Ritland 2000). Genetic correlations between two traits of interest could also be estimated by considering Yi as being the first trait measured in individual i, Yj the second trait measured in individual j, and V the sample covariance between traits, using equations 1–3. Furthermore, the sign of any potential genetic correlation among traits i and j could be estimated as being the sign of this cov(Zij, rij). It is important to note here that the estimation of genetic correlations does not require estimating the actual variance of relatedness and thus will not be affected by this potential source of error (see Lynch 1999; Ritland 2000). Significance of estimates is usually obtained by bootstrapping over individuals, but it is unclear if this is the best level where it should be performed (see Thomas et al. 2002).

  • Box 3

    The ‘animal model’The animal model is a form of mixed model, the term used to describe linear regressions in which the explanatory terms are a mixture of both ‘fixed’ and ‘random’ effects. Each individual animal's (or plant's) phenotype for a given trait is partitioned into a linear sum of different effects, necessarily including as a random effect the additive genetic merit, or breeding value, of that individual. In matrix form, for the population, this is represented as:

    image( (eqn 1) )

    where y is the vector of observations of the given trait across all individuals. Broadly speaking, equation 1 represents the decomposition of the phenotype into, respectively, fixed and random effects and then an error term. Specifically, X is a design matrix of 0s and 1s relating each observation to corresponding fixed effects (such as the population mean) given in the vector β; each ui is a vector of random effects, one of which will be a vector of individual additive genetic effects a; each Zi is a design matrix for a corresponding vector of random terms; and e is a vector of random error terms.The pedigree information is then used to specify a variance–covariance structure for the vector of additive genetic effects a. These are not independent of each other, because relatives will have correlated effects. For any pair of individuals i, j, the additive genetic covariance between them is inline image, where Θij is the coefficient of coancestry (the probability that an allele drawn at random from individual i will be identical by descent to an allele drawn at random from individual j) and inline image is the additive genetic variance for the trait. The variance–covariance matrix G for the vector a is therefore given by G = Ainline image, where A is the additive genetic relationship matrix with individual elements Aij = 2Θij. Assuming that the errors are independent of each other, the variance–covariance matrix R for the vector e is simply R = Iinline image, where I is the identity matrix. For a simple model in which no other random effects are fitted, and for which there is only a single phenotypic observation for each individual (so that the design matrix for the additive genetic effects is just I):

    image( (eqn 2) )

    the variance–covariance matrix for y is given by V = G +R = Ainline image +Iinline image.Estimation of the unknown parameters in the model can then proceed by maximum likelihood. Under an assumption of multivariate normality for the random effects (in practice, the techniques are relatively robust to violation of this assumption), the likelihood of the model in equation 1 is defined as a function of β and V, given that X, the Zi and A are all known, and then solved to generate maximum-likelihood estimators (MLEs). In the simple model in eqn (2), these will be MLEs of β, inline image and inline image. More complex models may include other random effects such as maternal effects or common environment effects, for which corresponding variance components will be estimated. In practice, a restricted version of the likelihood is generally used, using estimates of the variance after correction for the fixed effects, to avoid bias introduced by estimating the latter. This method of parameter estimation is known as restricted maximum likelihood, or REML. For details see Lynch & Walsh (1998) and references therein.

  • Box 4

    Statistical approach behind the estimation of selection gradientsConsider a set of phenotypic traits, y1yn, such that the value for the l-th trait measured on the j-th individual is yl,j. The relative fertility of individual j, λj (or his proportion of the total offspring sired), can be modelled by defining his fertility wj as a function of his phenotypic traits relative to the total fertility in the population:

    image( (eqn 1) )

    and each βi represents the selection gradient on trait i, and the λk sum to unity (Smouse et al. 1999). (Note that this is not exactly analogous to the Lande-Arnold (1983) approach, which assumes a linear, rather than log-linear, relationship between fitness measure and trait values.) Following the source references, we discuss the methods as applied to a system where maternal identities are known but the marker data provide the only information available on paternity. In practice they could equally be used to consider assignment of maternity and hence selection through female fitness (or even, if required, more distant relatives such as grandparents).For a particular offspring, the probability that a particular male is the father is then defined as the product of his relative fertility (based on his phenotypic values), and his genetic probability of paternity of that offspring, based on their respective genotypes at marker loci. So assuming genotypes {Oi, Mi and Fj} for an offspring i, for the known mother of i and for the candidate father j, respectively, from the rules of Mendelian inheritance we can calculate


    (eqn 2)The probability that male j is the father of offspring i is therefore xijλj. The likelihood for each offspring i as a function of the selection parameters is then the sum of the probabilities across all fathers, giving Li = Σj xijλj. From this the likelihood of the entire data set is

    image( (eqn 3))

    Maximum-likelihood solutions for the estimates of selection on each trait, the respective β parameters, can then be found by maximizing equation 3. Smouse et al. (1999) used an iterative technique to find a solution, starting by setting all βi except β1 to zero, finding an MLE for β1, then setting β1 to its MLE and all other βi except β2 to zero, finding an MLE for β2, and so forth. Alternatively, and probably more efficiently, Morgan & Conner (2001) used an iterative technique based on a Newton–Raphson iteration.


The authors would like to thank L. Bernatchez, A. Charmantier, D.W. Coltman, F. Frentiu, A.J. Wilson, T.A. Wilkin, M.J. Wood and two anonymous reviewers for helpful comments on the manuscript. We would also like to thank D.W. Coltman and F. Frentiu for providing unpublished data. D.G. was financially supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Postdoctoral Research Fellowship, and by a Biotechnology and Biological Sciences Research Council (BBSRC) grant to Ben C. Sheldon and L.E.B.K. L.E.B.K. is a Royal Society University Research Fellow.

The authors are both generally interested in the study of evolution, and more specifically in determining quantitative genetics and selection parameters in wild populations.