Marker-inferred relatedness as a tool for detecting heritability in nature
K. Ritland. Fax: +604–822–9102; E-mail:email@example.com
This paper presents a perspective of how inferred relatedness, based on genetic marker data such as microsatellites or amplified fragment length polymorphisms (AFLPs), can be used to demonstrate quantitative genetic variation in natural populations. Variation at two levels is considered: among pairs of individuals within populations, and among pairs of subpopulations within a population. In the former, inferred pairwise relatedness, combined with trait measures, allow estimates of heritability ‘in the wild’. In the latter, estimates of QST are obtained, in the absence of known heritabilities, via estimates of pairwise FST. Estimators of relatedness based on the ‘Kronecker operator’ are given. Both methods require actual variation of relationship, a rarely studied aspect of population structure, and not necessarily present. Some conditions for appropriate population structures in the wild are identified, in part through a review of recent studies.
Developments in molecular techniques are making genetic markers more effective for measurement of relatedness and population structure, and are widening the scope of their application in ecology (Cruzan 1998; Snow & Parker 1998). Likewise, genetic markers have been of great impact in the field of quantitative genetics, mainly for identifying chromosomal segments or individual genes underlying a quantitative trait (Lynch & Walsh 1998). This paper reviews another approach for using markers to study quantitative traits; through joint analysis of natural relatedness and quantitative trait variation, we can make inferences about heritabilities and evolution of quantitative traits. Nature does the experiment, and we reconstruct it.
The advantages of this approach are, first, that the breeding and raising of progeny are not necessary, and second, that estimates are ‘natural’, or relatively unaffected by human artefacts. A current issue in field-orientated quantitative genetics is the discrepancy between field-measured vs. laboratory-measured heritability. Several factors confound estimates of heritability in experimental conditions, including lowered competition, lack of stress, and introduction of maladapted genotypes in transplants. There are methods that minimize these factors, such as cross-fostering in birds (Dhondt 1982) or regressing laboratory-raised progeny on wild-caught parents in Drosophila [Coyne & Beecham 1987; preferably using estimates of additive genetic variation in the laboratory, (Riska et al. 1989)]. However, these methods are restricted to species with the appropriate life histories, and even then, bias might remain.
On the other hand, natural variation and inferred relatedness introduces new problems. In the wild, the expression of quantitative traits is complex; both the environment and the pattern of relationship are more variable, and consequently the phenotypic similarity between relatives is often not just a function of relatedness and heritability. Dominance (broad-sense heritability), the sharing of environments, and the sharing of inbreeding levels complicate these inferences. Even if we make the correct assumptions, we still require pronounced variation of pedigree structure or population structure, which will place restrictions on the species we may study. Finally, the effort and expense of marker assays replaces the effort of breeding and raising progeny.
In this paper, I review some approaches for using genetic markers to infer the genetic basis of natural phenotypic variation. Variation at two levels is considered: among pairs of individuals within populations, and among pairs of subpopulations within a population. In the former, we estimate heritability in the wild, and in the latter, Spitze’s (1993)QST (in the absence of known heritability). The estimation of pairwise relatedness and its actual variance, both integral to these inferences, is also described. The QST estimator and Kronecker-operator relatedness estimators have not been previously described in the literature. A review of some published work reveal some correlates of the appropriate population structures needed for these inferences.
Marker-based estimation of heritabilities
Variation within populations
Heritability is a fundamental quantity in quantitative genetics, being the proportion of variation determined by genes that act in an additive manner. The covariance between relatives for a quantitative trait is the basis for estimating heritability (see Lynch & Walsh 1998). In classical quantitative genetics, the degree of relatedness has always been regarded as a known quantity.
Genetic markers provide information about relatedness between individuals of unknown pedigree (Thompson 1975; Lynch 1988; Queller & Goodnight 1989; Blouin et al. 1996; Ritland 1996a; Goodnight & Queller 1999; Lynch & Ritland 1999). It follows that the joint distribution of markers and quantitative traits among individuals of unknown relationship should provide information about heritability. At the simplest, a positive association between allele sharing and phenotypic sharing among pairs of individuals indicates a genetic component to a quantitative trait (Ritland 1989).
More precisely, one might utilize pairwise ‘relatedness’, which is the average frequency that two homologous alleles, one sampled from each individual, are identical-by-descent. For estimating heritability ‘in the wild’, estimates of pairwise relatedness can then be combined with a pairwise measure of phenotypic information. Appendix I describes a regression-based method to estimate heritabilities using inferred pairwise relatedness proposed by Ritland (1996b). In this method, the regression of phenotypic similarity on estimated relationship gives the heritability estimate, but with the qualification that unlike normal regression, the predictor variable (relatedness) has error. One must use ‘actual’ variance of relatedness in the denominator (Appendix IV) of the regression coefficient.
Pairwise approaches such as the above were first proposed by Grimes & Harvey (1980) in their Symmetric Differences Squared method where, among all pairs of individuals, known relatedness is regressed onto the squared differences of quantitative trait measurements. Likelihood methods, such as that described by Shaw (1987) for the case of known relatedness, have not been developed for the marker-based estimation of heritability.
Instead of using an estimate of pairwise relatedness, which can take on a continuum of values, one can assume a specific structure of relationships, such as a mixture of full-sibs vs. unrelated relatives. ‘Relationship’ (type of relative) is considered instead of ‘relatedness’ (probability of gene-identity). This assumption of a structure of relationship takes advantage of the multilocus structure of pedigree data. Examples of such structure may occur when: (1) several full-sib families of fish are raised in a tank, but parentage of these fish is unknown; and (2) progenies from open-pollinated trees are grown in field trials; while maternal parentage is known, families consist of unknown proportions of full- vs. half-sibs.
Dealing with such a fish-tank, Mousseau et al. (1998) developed a likelihood-based technique for a captive population of Pacific Chinook salmon. Assuming a full-sib structure, two simple sequence repeat (SSR) loci were used to infer relationship. Estimated heritabilities for weight, jacking and flesh colour were in good agreement with estimates for salmonids generated using classical quantitative genetic methods. More recently, Thomas et al. (2000) presented a modified form of Mousseau’s likelihood technique, which required fewer initial assumptions about population parameters. They also compared their method with the continuum-of-relationship regression method (Appendix I), and found the likelihood method to exhibit larger biases, but smaller variances, of estimates.
The reconstruction of entire sibships (as opposed to pairwise relationships) from genetic data, in the absence of parental information, has received recent attention (Blouin et al. 1996; Herbinger et al. 1997; Painter 1997). In a more sophisticated approach, Smith et al. (2000) proposed a Markov Chain Monte Carlo algorithm to partition individuals into full-sib groups, and found very good accuracy with four loci of eight alleles apiece. This approach has yet to link quantitative traits, and hence heritability estimation, to these inferred sibships.
Variation among populations
During the process of population differentiation for a quantitative trait, within-population variation is converted to among-population variation by genetic drift. Wright (1951) mathematically described this repartitioning as a function of FST at neutral loci (see Appendix II). Felsenstein (1973) described how under this process, quantitative genetic variation provides information about phylogeny. Rogers & Harpending (1983) and Lewontin (1984) discuss the prediction of among-population variation for a quantitative trait from FST at neutral loci.
Recognizing that FST can be estimated solely from within- and between-population variances for a quantitative trait (as opposed to markers), and that the evolution of this quantity might be influenced by natural selection, Spitze (1993) named the estimate of that FST based upon quantitative trait variation as ‘QST’. With additive effects of genes and linkage equilibrium for loci underlying the quantitative trait, QST should equal the FST estimated from neutral markers. As an indirect method for detection of natural selection, one can compare QST with FST. Diversifying selection causes QST to be larger than that expected on the basis of marker variation, and normalizing selection causes the opposite.
By making this comparison, several studies have inferred the nature of selection responsible for quantitative trait differentiation. Spitze (1993), in his pioneering study of among-population variation in the microcrustacean Daphnia obtusa, inferred that divergent selection affected body size (QST/FST > 1), that convergent selection affected relative fitness (QST/FST < 1), and that several other traits (clutch size, age at reproduction and growth rate) appeared to have evolved in a neutral manner, by genetic drift alone (QST/FST = 1). Lynch et al. (1999) summarized several recent studies that compared QST with FST. Interestingly, they found QST/FST to be generally larger in cases where FST was lower, as one would expect when FST values are more influenced by small population sizes, compared to time since initial population divergence (wherein selection would have greater opportunity). Just as FST can be extended to a hierarchy, the QST/FST comparison can also be extended to a hierarchy. In a study of the selfing annual Medicago truncatula, Bonnin et al. (1996) found QST/FST > 1 among-populations, but QST/FST ≅ 1 among-subpopulations within populations, indicating that divergent selection acts among populations, but not within populations.
However, studies of marker variation are usually not accompanied by parallel studies of genetic variation for quantitative traits. This introduces a bias towards easily reared organisms with short generation times, such as annual plants or fruit flies (but see Yang et al. 1996).
Using variation between populations (a new marker-based method for estimating QST)
A major obstacle in QST studies is the requirement for estimates of genetic variances within populations. This is usually done through an experiment designed to estimate heritability, and this can be laborious or impractical in many species. Furthermore, artificial environments can bias these estimates of heritability.
Appendix II shows how to use markers to estimate the heritability that explains genetic divergence between populations, and more importantly, to estimate QST, without prior estimates of heritabilities. This method requires estimates of FST between pairs of populations. Reynolds et al. (1983) derived an estimator of genetic distance, which assumed populations diverge solely by genetic drift (Appendix III).
Estimation of relatedness
Pairwise relatedness and relationship
Coefficients of relationship play a central role in several fields, including quantitative genetics, conservation genetics, and social behaviour (Lynch & Ritland 1999). The above methods for estimating heritabilities motivates further interest in methods of estimating these coefficients from marker data. Ideally, estimators of relationship seek to estimate the probability that genes share ancestry, or are ‘identical-by-descent’. Estimators for pairwise relatedness were first seriously considered for the rather specialized data provided by DNA-fingerprint profiles (Lynch 1988). However, the indeterminate homology of bands inherent in this data makes band-sharing by chance difficult to separate from band-sharing by descent.
Methods for estimating pairwise relationship fall into two major classes: (1) maximum likelihood (ML) estimators; and (2) methods-of-moments (MM) estimators. ML estimators are best suited for discriminating among hypothesized relationships (for example, full-sib vs. half-sib), while MM estimators are designed for estimating relatedness (probability of identity-by-descent). While the derivation of ML estimators is straightforward once the probability model is described, the derivation of MM estimators is somewhat ad hoc. Thompson (1975) pioneered ML estimators for pairwise relationships. Queller & Goodnight (1989) were the first to develop MM estimators for pairwise relatedness, and more recently, Ritland (1996a) and Lynch & Ritland (1999) developed more statistically efficient MM estimators. These estimators are described in Appendix III.
Moran’s I, an analogue of the pairwise relatedness coefficient, has been used extensively in spatial autocorrelation studies to characterize genetic variation within populations (Barbujani 1987; Epperson & Li 1996). Autocorrelation analyses are concerned with patterns for individual alleles and not with relatedness (gene-identity). This presents two problems in the context of the analyses presented in this paper: (1) their connection to probability of gene identity is indirect or at least inefficient; and (2) they do not naturally extend to multilocus and multallelic measures, although recent papers have attempted multiallelic (Epperson et al. 1999) and multivariate (Smouse & Peakall 1999) extensions.
At highly heterozygous loci such as microsatellites, the observed fraction of band sharing can approximate the true gene identity. However, band-sharing will always overestimate gene identity, with a bias approximately equal to the homozygosity, which can be considerable with isozymes, random amplified polymorphic DNAs (RAPDs) and amplified fragment length polymorphisms (AFLPs). With dominant markers, other biases also enter —estimators of pairwise relatedness for loci exhibiting dominance need to be developed. Band sharing measures also do not incorporate variation among alleles in their information content, for example, sharing of rarer alleles is more likely to represent gene identity than the sharing of common alleles (Ritland 1996a).
Actual variance of relatedness
A critical feature of the marker-based heritability methods (Appendices I and II) is the need to measure ‘actual’ variance of relatedness. Actual variance of relatedness occurs when there is some mixture of relatives, such as full-sibs vs. unrelated individuals. Variation of relatedness between subpopulations or populations can also occur via differential rates of genetic drift due to differences of effective sizes. This is an aspect of population structure rarely measured.
Appendix IV shows how the actual variance of pairwise relatedness can be estimated using a weighted anova on data from at least two loci. The method relies on the statistical independence of estimates from different marker loci (e.g. the cross-product of two independent estimates is an unbiased estimate of r2, since errors are independent). Note that this is not a pairwise-quantity, but a population quantity, so it has much less statistical error than pairwise estimates of relatedness.
The most conspicuous property of pairwise relatedness estimates is their large statistical errors. In fact, estimates may commonly lie outside the space of ‘allowable’ values, and be negative, or greater than one. One might ‘restrict’ estimates to allowable values, but the effect of this is to introduce statistical bias, usually in the positive direction. Another peculiar property of pairwise relatedness estimates is that, when there is no a priori grouping of individuals into groups of relatives (all possible pairwise comparisons are made), the relativity of relatedness causes an expected r estimate of approximately zero, regardless of the true level of relatedness (Ritland 1996a). This is not a problem for estimating heritabilities (Appendix I) since variances and covariances of relatedness are employed, rather than mean relatedness.
Loci provide information about relatedness roughly in proportion to the number of alleles at that locus in the population (Appendix III). This places a premium on using highly variable markers such as microsatellites; for example, one additional allele provides the added information of an entire diallelic locus. However, this is an approximation for low actual relatedness, and because actual relationship can vary among loci (particularly for full-sibs), it is certainly wise to use a minimum number of marker loci (≈ 5–10) to average out this variation. For good statistical power to discriminate between full-sibs, half-sibs vs. unrelated individuals, upwards of 20 loci are needed (Blouin et al. 1996).
While the expected information about relatedness is the same among pairs for the Ritland (1996a) estimator, the information varies among pairs for the Lynch & Ritland (1999) estimator. Rarer alleles provide more information about relatedness (e.g. shared rare alleles are more likely to be identical-by-descent than are shared common alleles). In other words, we know better about relatedness for those pairs possessing rarer alleles. Interestingly, there has been no demonstration of how marker loci can differ for information about pairwise FST; current estimators combine estimates among loci in proportion to their heterozygosities (cf. Reynolds et al. 1983), implying that information is proportional to heterozygosity, and not to allele number.
There is also the question about how one can optimally allocate experimental resources into numbers of markers vs. numbers of phenotypes. Generally, beyond a few (6–12) marker loci, the major constraint on statistical power is the number of phenotypes (Ritland 1996a). Likewise, in linkage-disequilbrium mapping, Long & Langley (1999) found that greater power was achieved by increasing the number of individuals phenotyped rather than by increasing the number of marker polymorphisms assayed. While it is not usually difficult to locate greater numbers of individuals for pairwise comparisons, it may be difficult to find more populations for the QST analysis.
Microsatellites have become the ‘marker of choice’ for population-level studies, but their utility is reduced for longer-term comparisons, which might occur in some QST/FST studies. As recently noted by Hedrick (1999), FST values may be reduced at microsatellite loci, relative to less polymorphic loci such as isozymes, due to the homogenizing effect of high mutation rate. This may bias the QST/FST estimate upwards. For mutation to be of a lesser effect than drift, we require u < 1/(2N), so for example, with mutation rate u = 0.001, effective population size should be less than 500, suggesting that less mutable loci should be used for QST–FST comparisons.
Where might we find appropriate population genetic structures?
While marker-based techniques might obviate the need for controlled experiments, a new constraint is imposed; we are restricted to species that show detectable variation of relatedness. Two processes may promote variation of relatedness: (1) philopatry in stable environments; and (2) founder events in a stochastic environment. Due to developments in marker technology, studies documenting these processes in natural populations have become increasingly frequent.
Philopatry and social grouping
Chesser (1991) predicted that there would be significant differentiation among lineages or breeding groups when there was one polygynous male breeding within a lineage of philopatric females, a common breeding tactic in mammalian social systems. Subsequent empirical studies have generally borne out this prediction. In an allozyme study, Dobson et al. (1998) found prairie-dog breeding groups to exhibit substantial genetic differentiation, with 15–20% of the variation occurring among groups. Using 10 microsatellite loci, Surridge et al. (1999) found that European wild rabbits (Oryctolagus cuniculus) exhibit stable, genetically differentiated breeding groups. In a review of published studies of mammalian populations in which genetic structuring was assessed at the level of social groups, Storz (1999) found low to moderately high levels of genetic differentiation among social groups (FST = 0.006–0.227), coupled with consistently high levels of within-group heterozygosity with often negative FIS values. Higher levels of genetic structuring were observed in populations in which polygynous mating were reinforced by female philopatry.
In plants, local dispersal of seed offspring is analogous to animal philopatry. Progeny of a single mother would be predominantly half-sibs. In studies of several transects of monkeyflowers and some British Columbia conifers, levels of fine-scale structuring are highly variable among populations of the same species (Ritland & Ritland 1996; Ritland, unpublished; Travis and Ritland, in preparation). In approximately half of these transects, the pairwise relatedness within a radius of three plant heights (50 cm in monkeyflowers, 100 m in conifers) on average corresponded to between first-cousins (r = 0.063) and half-sibs (r = 0.125). Most other transects showed no structure, which emphasizes that population structures are difficult to predict, being often the outcomes of accidents of population founding and local patterns of dispersal. Such variation of local structure suggests that optimal surveys should first ‘prescreen’ populations with small (≈ 100 individuals) surveys of crude (isozyme) variation, then proceed with intensive sampling of those populations showing significant structure. This assumes that heritabilities (and other features perhaps estimated, such as the correlation of environments) are the same among populations that vary in local structure. It is conceivable that these quantitative genetic features may also vary in accordance with local structure, but this seems unlikely.
The smaller the number of founders and the higher their relatedness, the greater the genetic differentiation among founding groups (Husband & Barrett 1991). These initial differences can be subsequently blurred by added immigration. In a demonstration of the magnitude of differentiation possible via founding events, Ingvarsson & Giles (1999) investigated the genetic structure of a single island population of the dioecious plant Silene dioica in the Skeppsvik Archipelago, Umea, Sweden. Over an area of 200 m2, levels of genetic differentiation among contiguous patches were greater than or comparable to what is observed over larger scales in the archipelago. They suggested that this structuring occurred during population expansion, soon after island colonization, and that founding patches were essentially family groups, an outcome of ‘kin-structured’ dispersal.
Philopatry and kin-structured dispersal can both contribute to subpopulation differentiation. Interestingly, in the subsocial spider Stegodyphus lineatus, Johannesen & Lubin (1999) were able to distinguish between the genetic structure caused by random mating and philopatry in old breeding groups vs. that caused by newly founded groups consisting of sibs. It is likely that these alternative genetic structures contribute different types of information when used for the marker-aided quantitative genetic analyses of Appendices I and II.
The approaches outlined in this paper are just one way to integrate molecular markers with quantitative genetics. Relatively few marker loci are needed for these approaches, and inferences are made using natural variation. However, they make a number of simplifying assumptions including additivity of effects and lack of inbreeding. There is much room for alternative approaches for interfacing marker variation with natural quantitative trait variation.
Marker-aided analysis can also be fruitful to study inbreeding in natural populations. Changes of the average inbreeding coefficients allow inferences about selection against self-progeny (Ritland 1990). As well, estimates of individual inbreeding coefficients (Sweigart et al. 1999) might remove the bias of phenotypic selection measures in partially selfing populations (Willis 1996). Inbreeding can also play a part in population differentiation for quantitative traits, and may be a components that should be included in QST analyses. For example, Lynch et al. (1999) found that among 17 populations of Daphnia pule, mean phenotypes of the individual populations are strongly correlated with local levels of homozygosity, suggesting that variation in local inbreeding plays a role in population differentiation.
In the future, it may also be possible to adapt quantitative trait locus (QTL) mapping techniques to naturally occurring variation. Analogous to the identity disequilibrium needed here (Appendices 1 and 2), QTL mapping requires linkage disequilibrium between markers and QTLs. Associations between markers and traits can be strong in hybrid zones (Rieseberg et al. 1999) and in expanding populations (Long et al. 1998). Directional selection can also create associations, so that conversely, one can detect certain patterns of selection based upon associations, such as runs of ‘plus’ QTLs along a chromosome (Orr 1998) or variation of FST among marker loci in selectively diverging populations (Enjalbert et al. 1999). However, an impediment to using QTL variation was noted by Latta (1998), who showed that while QST for a trait might be high, QST for individual QTLs may be low (perhaps undetectable) because local adaptation causes positive associations among ‘plus’ QTLs. In these cases, multilocus estimates of FST might be used, but the correct statistic is not clearly apparent (Latta 1998).
I thank the reviewers for providing insightful comments. The Natural Sciences and Engineering Research Council (NSERC) of Canada supported this research.
Kermit Ritland conducts research in both basic and applied aspects of population and quantitative genetics, mainly in plants, and is best known for his work on estimation of plant mating systems. For information about current projects and his laboratory, visit http://genetics.forestry.ubc.ca. For information about the availability of computer programs for the analyses described in this paper, contact Kermit at: firstname.lastname@example.org.
Estimation of heritability in the field, and the level of shared environments
Let the value of a quantitative trait Y for two individuals i and j be Yi for the first and Yj for the second. Their shared phenotypes are measured by the cross-product
where U and V are the sample mean and variance of Y, respectively, in the population. Among all pairs, the average Zij equals the phenotypic correlation. If shared phenotypes are determined by the sharing of both genes and environments, then
where rij is the relatedness coefficient, re is a correlation due to sharing of environments (assumed constant for all pairs), and eij is random error. This is a linear regression equation, so over several pairs of individuals, we can estimate both heritability and shared environments as
where Z and R are the means of the Zij and rij, Cov(Zij, rij) is the covariance between estimated relatedness and phenotypic similarity, and Var(rij) is the actual variance of relatedness (see Appendix IV). More complex models of trait-sharing, which incorporate isolation-by-distance, dominance, and inbreeding are described in Ritland (1996a). If actual variance of relatedness is statistically not significant, then at least, the mere presence of genetic variation can be ascertained by testing for positive Cov(Zij, rij).
The genetic correlation between two traits can be estimated by a formulae similar to Eqns 1.1–1.3 (where Yi would be the first trait measured in individual i, Yj would be the second trait measured in individual j, and V would be the sample covariance between traits). Interestingly, the sign of the genetic correlation can be estimated as the simply the sign of this Cov(Zij, rij); this does not require the measuring of actual variance of relatedness.
Variances of estimates can be found by bootstrapping individuals to create replicate datasets, wherein identical comparisons are omitted. This gives a confidence interval about the point estimated. Alternatively, randomization of marker genotypes with respect to traits can generate replicate datasets conforming to the null hypothesis of zero heritability. The distribution of estimates among these replicate datasets are then used in the statistical tests. Because each individual is typically involved in many pairwise comparisons, datapoints used in the regression estimators (Eqn 1.3) are not independent and alternative methods such as the above must be used to gauge statistical significance.
Estimation of realized heritability in the field, and QST, using pairwise population divergence
If additive variation for quantitative trait differences among populations is due to the neutral processes of migration and drift, then
where σb2 and σw2 are the genetic variances between and within populations (Wright 1951). Solving for FST gives the among-population genetic variance for the quantitative trait, termed ‘QST’ by Spitze (1993) as, QST = σb2/(σb2 + 2σw2). Among-population genetic variances are found simply as the among-population variance component in a two-factor anova of population and individuals within populations. Within-population genetic variances requires a breeding design that estimates heritabilities (see Lynch & Walsh 1998).
In the absence of knowledge about heritability, the joint pattern of marker and quantitative trait variation can be used to estimate within-population genetic variances. First, we might assume that natural selection has not affected population differentiation for the quantitative trait. We simply solve for σw2 in Eqn 2.1 to obtain an estimator for it:
(note that computing QST from this estimate will simply give FST).
If we suspect QST to differ from FST because of the past action of natural selection, this relationship will not hold and we need to jointly estimate QST and σw2. This involves the use of variation in the differentiation of pairs of populations, which give us sufficient degrees of freedom. Between two populations i and j, the expected squared difference of means for a quantitative trait has the following relationship
where s is a deviation from the expected between-population variance, caused by natural selection (s is positive with disruptive selection, and negative with normalizing selection), and eij is random error. We assume s is independent of Fij. Eqn 2.3 is in the form of the regression equation, Yi = a + (b + c)Xi + ei, with c known and b the parameter to estimate. If we first define
as the regression of pairwise trait divergence on pairwise FST, the estimate of within-population genetic variation is
More importantly, since bQF = σb2 + 2σw2, we can estimate QST as simply
As in the within-population heritability analyses, the critical parameter is the regression of quantitative trait variation on marker parameters (r or F).
Inference of pairwise relatedness and pairwise FST
Compact expressions for estimators of pairwise relatedness can be developed using the ‘Kronecker operator’δ. These are particularly useful in writing programming code. Suppose in the assay of marker genes that we have genotyped two diploid individuals, for a total of four alleles, denoted Ai and Aj from the first individual, and Ak, and Al, from the second. Now, if alleles Ai and Aj are the same (e.g. the same band or sequence), then δij = 1, while if different, δij = 0. Among the four sampled alleles, there are six δ’s, one for each pairwise comparison of alleles, both within and between individuals. The estimator of pairwise relatedness of Ritland (1996a) can then be written as
where n is the number of alleles at the locus, and pi is the frequency of allele i in the population (estimated from a larger sample of at least 30 individuals). As the variance of this estimate is 1/(4(n−1)), an efficient multilocus estimate is the sum of locus-specific estimates, each weighted by (n-1), divided by the sum of the weights. Lynch & Ritland’s (1999) estimator of pairwise relatedness is
and for finding multilocus estimates, the locus-specific weight is the inverse of (the statistical variance). Note that 3.2, being based on a regression, is an asymmetrical measure of relatedness; one should compute relatedness in both directions then take their simple average. Eqn 3.1 is more appropriate for loci with fewer (< 6) alleles while Eqn 3.2 behaves better for highly polymorphic loci (Lynch & Ritland 1999). Queller & Goodnight’s (1989) estimator can also be written in this notation as
Their estimator is not defined when the reference genotype is heterozygous at a diallelic locus (the denominator is zero). For multilocus estimates, Queller & Goodnight (1989) advocate summing the numerator and denominator terms separately across loci, then dividing one by the other.
Kronecker notation also efficiently give the probability of pairwise relationship (Ritland 2000). Given marker data of two individuals, the likelihood of a given relationship is (modified after Jacquard 1974):
where AiAj and AkAl are the genotypes of the two individuals at a single locus. The triplet of relationship coefficients (Δ7, Δ8, Δ9) are the probabilities of identity-by-descent, for, respectively (a) both pairs of genes (b) one pair of genes, and (b) no genes; they take the values of (1, 0, 0) for identical twins (0, 1, 0) for parent-offspring (1/4, 1/2, 1/4) for full-sibs (0, 1/2 1/2) for half-sibs, and (0, 1/4 3/4) for first-cousins (see Jacquard 1974).
For estimating FST between a pair of populations diverging solely by genetic drift, Reynolds et al. (1983) derived an estimator, whose single-locus version for larger sample sizes is
where the summation is over all alleles k present at the locus (see their paper for the formula that accounts for sample size). Again, multilocus estimates are obtained by summing the numerator and denominator terms separately, then dividing.
Estimation of actual variance of pairwise relatedness and pairwise FST
Denote the relatedness estimate for pair ij at locus k as rij,k. Let wk be the locus-specific weight used for the multilocus estimate, such that relatedness for pair ij is estimated as . The variance of actual relatedness is estimated as
(Ritland 1996b), where ‘Ave’ is the average over all pairs ij, and r is the average pairwise relatedness in the population.
This formula can be extended to estimating variance of pairwise FST. Assuming that each locus provides information about FST in proportion to its heterozygosity, the variance of pairwise FST can be estimated from Eqn 4.1 by replacing ri,kj by Fij,k letting the weights wk equal the expected heterozygosity for locus k.