Inference of pairwise relatedness and pairwise FST
Compact expressions for estimators of pairwise relatedness can be developed using the ‘Kronecker operator’δ. These are particularly useful in writing programming code. Suppose in the assay of marker genes that we have genotyped two diploid individuals, for a total of four alleles, denoted Ai and Aj from the first individual, and Ak, and Al, from the second. Now, if alleles Ai and Aj are the same (e.g. the same band or sequence), then δij = 1, while if different, δij = 0. Among the four sampled alleles, there are six δ’s, one for each pairwise comparison of alleles, both within and between individuals. The estimator of pairwise relatedness of Ritland (1996a) can then be written as
where n is the number of alleles at the locus, and pi is the frequency of allele i in the population (estimated from a larger sample of at least 30 individuals). As the variance of this estimate is 1/(4(n−1)), an efficient multilocus estimate is the sum of locus-specific estimates, each weighted by (n-1), divided by the sum of the weights. Lynch & Ritland’s (1999) estimator of pairwise relatedness is
and for finding multilocus estimates, the locus-specific weight is the inverse of (the statistical variance). Note that 3.2, being based on a regression, is an asymmetrical measure of relatedness; one should compute relatedness in both directions then take their simple average. Eqn 3.1 is more appropriate for loci with fewer (< 6) alleles while Eqn 3.2 behaves better for highly polymorphic loci (Lynch & Ritland 1999). Queller & Goodnight’s (1989) estimator can also be written in this notation as
Their estimator is not defined when the reference genotype is heterozygous at a diallelic locus (the denominator is zero). For multilocus estimates, Queller & Goodnight (1989) advocate summing the numerator and denominator terms separately across loci, then dividing one by the other.
Kronecker notation also efficiently give the probability of pairwise relationship (Ritland 2000). Given marker data of two individuals, the likelihood of a given relationship is (modified after Jacquard 1974):
where AiAj and AkAl are the genotypes of the two individuals at a single locus. The triplet of relationship coefficients (Δ7, Δ8, Δ9) are the probabilities of identity-by-descent, for, respectively (a) both pairs of genes (b) one pair of genes, and (b) no genes; they take the values of (1, 0, 0) for identical twins (0, 1, 0) for parent-offspring (1/4, 1/2, 1/4) for full-sibs (0, 1/2 1/2) for half-sibs, and (0, 1/4 3/4) for first-cousins (see Jacquard 1974).
For estimating FST between a pair of populations diverging solely by genetic drift, Reynolds et al. (1983) derived an estimator, whose single-locus version for larger sample sizes is
where the summation is over all alleles k present at the locus (see their paper for the formula that accounts for sample size). Again, multilocus estimates are obtained by summing the numerator and denominator terms separately, then dividing.