The estimation of population differentiation with microsatellite markers


  • François Balloux,

    1. Zoologisches Institut, Universität Bern, CH-3032 Hinterkappelen-Bern, Switzerland,
    Search for more papers by this author
      ‡Present address; I.C.A.P.B. (Institute for Cell, Animal and Population Biology), University of Edinburgh, King’s Buildings, West Mains Road, Edinburgh EH9 3JT, UK. Fax: (0131) 650 6564; E-mail:
  • Nicolas Lugon-Moulin

    1. Institut d’Ecologie, Laboratoire de Zoologie et d’Ecologie Animale, Bâtiment de Biologie, Université de Lausanne, CH-1015 Lausanne, Switzerland
    Search for more papers by this author

François Balloux. ‡Present address; I.C.A.P.B. (Institute for Cell, Animal and Population Biology), University of Edinburgh, King’s Buildings, West Mains Road, Edinburgh EH9 3JT, UK. Fax: (0131) 650 6564; E-mail:


Microsatellite markers are routinely used to investigate the genetic structuring of natural populations. The knowledge of how genetic variation is partitioned among populations may have important implications not only in evolutionary biology and ecology, but also in conservation biology. Hence, reliable estimates of population differentiation are crucial to understand the connectivity among populations and represent important tools to develop conservation strategies. The estimation of differentiation is c from Wright’s FST and/or Slatkin’s RST, an FST-analogue assuming a stepwise mutation model. Both these statistics have their drawbacks. Furthermore, there is no clear consensus over their relative accuracy. In this review, we first discuss the consequences of different temporal and spatial sampling strategies on differentiation estimation. Then, we move to statistical problems directly associated with the estimation of population structuring itself, with particular emphasis on the effects of high mutation rates and mutation patterns of microsatellite loci. Finally, we discuss the biological interpretation of population structuring estimates.


The populations of most, if not all, species show some levels of genetic structuring, which may be due to a variety of nonmutually exclusive agents. Even the European eel (Anguilla anguilla), often considered as the classical example of a random mating population because all individuals are thought to migrate to the Sargasso Sea for reproduction, has recently been shown to be geographically structured (Wirth & Bernatchez 2001). Environmental barriers, historical processes and life histories (e.g. mating system) may all, to some extent, shape the genetic structure of populations (e.g. Donnelly & Townson 2000; Gerlach & Musolf 2000; Palsson 2000; Tiedemann et al. 2000). In addition, as species’ geographical distributions are typically more extended than an individual’s dispersal capacity, populations are often genetically differentiated through isolation by distance (i.e. populations in close proximity are genetically more similar than more distant populations).

Because genetic structuring reflects the number of alleles exchanged between populations, it has major consequences on the genetic composition of individuals themselves. Understanding gene flow and its effects is central to many fields of research including population genetics, population ecology, conservation biology and epidemiology. The exchange of genes between populations homogenizes allele frequencies between populations and determines the relative effects of selection and genetic drift. High gene flow precludes local adaptation (i.e. the fixation of alleles, which are favoured under local conditions), and will therefore also impede the process of speciation (Barton & Hewitt 1985). On the other hand, gene flow generates new polymorphism in the populations, and increases local effective population size (the ability to resist random changes in allele frequencies), thereby opposing random genetic drift, generating new gene combinations on which selection can potentially act. Reliable estimates of population differentiation are also crucial in conservation biology, where it is often necessary to understand whether populations are genetically isolated from each other, and if so, to what extent. Small isolated populations are subject to genetic drift, which will affect their evolutionary potential, through fixation of deleterious mutations (Wright 1977; Madsen et al. 1996; Frankham & Ralls 1998; Saccheri et al. 1998; Eldridge et al. 1999; Higgins & Lynch 2001). The knowledge of population structuring may therefore provide valuable guidelines for conservation strategies and management (e.g. Rossiter et al. 2000; Eizirik et al. 2001).

Since the advent of the polymerase chain reaction (PCR; Saiki et al. 1985) in the 1980s, the use of microsatellites has become extremely widespread in biology. Today, a large number of studies rely on these codominant genetic markers to investigate the genetic structuring of populations, addressing specific questions in evolutionary and conservation biology. To estimate the connectivity and patterns of gene flow among populations, many studies rely on Wright’s (1951) FST and/or on Slatkin’s (1995) RST, which is an FST-analogue assuming a stepwise mutation model (SMM;Box 1), thought to reflect more accurately the mutation pattern of microsatellites. While FST can provide the basis for a measure of genetic distance when divergence is caused by drift (Reynolds et al. 1983), other genetic distance measures have been specifically developed for microsatellites (e.g. Goldstein et al. 1995; Goldstein & Pollock 1997; Zhivotovsky 1999). Nevertheless, we concentrate here on FST and RST, which are the most commonly reported statistics for the estimation of population structure.

In an ideal population (island model of migration; Wright 1931), assuming that mutation follows the infinite allele model (IAM; Box 1), FST is a decreasing function of N (m + µ), the product of local population size and the sum of migration and mutation (e.g. Hartl & Clark 1997). FST becomes a simple function of the number of migrants when mutation is negligible, although this is often not the case for microsatellites. An additional difficulty arises when the mutation model cannot be assumed to be an IAM. Under mutation models generating homoplasy, such as the SMM (Box 1), the relation between FST and the number of migrants + mutants no longer holds (Rousset 1996). Conversely, RST is independent of the mutation rate under a SMM (Kimmel et al. 1996). The drawback of RST is its high variance. Even under the strictest SMM, FST estimates may outperform their RST counterparts (Gaggiotti et al. 1999).

As both measures (FST and RST) have their drawbacks and should be interpreted cautiously, we present here a cursory review of both the theoretical and empirical literature on the estimation of genetic structuring with microsatellites. Our aim is to particularly emphasize the potential caveats of these statistics when computed from microsatellite markers and give hints for their biological interpretation. We build our review in the same chronological order as a classical population structuring study. First, we discuss the sampling designs, both spatially and temporally. We then present Wright’s (1951) FST and Slatkin’s (1995) RST and compare their relative qualities and drawbacks when inferred from microsatellite markers. Finally, we discuss the biological interpretation of these structuring estimates.

Sampling strategies

Spatial sampling

One weakness of the traditional population genetics approach is that the global population must be a priori subdivided into smaller entities called subpopulations. In the population genetics view, a subpopulation is generally considered as the smallest level of population structure, also called a deme. In some organisms, which are distributed discretely, a subpopulation may correspond to an existing physical structure, such as a pond or a small island. For other more continuously distributed species, this subdivision can be rather arbitrary. This a priori subdivision may have important consequences on the estimation of population structuring. Ideally, each sample should represent a deme. Indeed, when a sample consists of several distinct demes, structuring within samples (‘subpopulation’) will lead to an underestimation of between sample structuring, which is often the focus of a study. Alternatively, when several samples belong to the same deme, no structuring will be evident within and between these samples.

As it is difficult for many species to know a priori where the boundaries lie between demes in the field, and hence, to clearly define a subpopulation, the samples determined by the experimentator will be typically treated as the subpopulations. One way to potentially verify this assumption is to estimate the inbreeding coefficient FIS (where ‘I’ stands for ‘individual’ and ‘S for ‘subpopulations’). FIS will measure the correlation of genes within individuals belonging to the same subpopulations (Wright 1921). That is, FIS estimated from empirical data will assess whether there is random mating within the samples and hence, give indications of whether we have sampled one or several distinct demes (here, we disregard both the effect of mixed mating systems, as can be commonly found in plants or snails, and possible social structure; Sugg et al. 1996; Ross 2001). Because several samples may belong to the same deme, Goudet et al. (1994) suggested to perform a cumulative pooling of samples to estimate the size of a random breeding unit. As long as samples from the same deme are pooled together, no significant change in FIS is expected. However, when a sample from a different breeding unit is incorporated in the pooling strategy, a significant increase in FIS should occur. This approach has been used in several empirical studies (Goudet 1993; Goudet et al. 1994; Raybould et al. 1996; David et al. 1997).

Temporal sampling

Ideally, individuals sampled for the estimation of population genetic structuring should belong to the same generation (or to the same cohort for organisms with overlapping generations), because allele frequencies vary not only over space, but also over time as populations are of finite sizes (Waples 1989a,b). This can be particularly important after founder events or bottlenecks (e.g. Hansson et al. 2000). However, it can be quite difficult to get reasonable sample sizes for some organisms within the same breeding season. Thus, population genetics data sets often comprise individuals from several generations. This generally is also the case for long-lived organisms with overlapping generations. When sampling multiply the same site over time, a simple way to test for the absence of temporal genetic structuring is to test for differentiation between those samples (i.e. over generations; Viard et al. 1997; Lugon-Moulin et al. 1999a). The absence of significant structuring over time allows samples, obtained at the same locality, to be pooled. A potential problem arises when the samples are very small, as the power of this approach directly depends on sample size.

Other noteworthy points concern temporal sampling within generations. Sampling can indeed be performed before or after natal dispersal (i.e. sampling adults or offspring), which can affect the level of population structuring. Basset et al. (2001) found that FST values were always higher when sampling was carried out before rather than after dispersal, this difference being more pronounced with high migration rates and under polygynous mating systems (Basset et al. 2001).

Sampling adults allows the bias in sex-specific dispersal rates to be estimated by comparing the structuring estimates obtained for each sex independently. If the confidence intervals of the sex-specific structuring estimates do not overlap, dispersal can be shown to be significantly sex-biased. Alternatively, sex-biases in dispersal can be estimated with the distribution of assignment indices (Favre et al. 1997). However, these methods can only be performed in species with juvenile dispersal on adults. In offspring, the sex-specific dispersal signature is lost, as allele frequencies are equally randomized again in males and females.

Measuring population structuring

In the present paper, the estimators θ̂ and p̂ of the parameters FST and RST are assumed to be estimated following Weir & Cockerham (1984) and Rousset (1996), respectively, using a conventional analysis of variance framework. To avoid using various notations, we will simply refer to the parameters as FST or RST and to the estimators as ‘FST and RST estimates’ (θ̂ and p̂ ). It is not our purpose to review the important bulk of literature on the various manners to estimate these statistics (see e.g. Cockerham 1969, 1973; Nei 1973, 1977; Nei & Chesser 1983; Weir & Cockerham 1984; Cockerham & Weir 1986, 1993; Goudet 1993; Rousset 1996; Nagylaki 1998).

An important consequence of the extremely high mutation rate of microsatellite loci is that their underlying mutation pattern cannot be neglected. In order to elaborate statistics that better reflect migration or time of divergence, it is crucial to understand the way microsatellites mutate. If the patterns of microsatellite mutations were perfectly known (i.e. the probability to mutate from a given allelic state to another), it should be possible to define differentiation statistics on allelic distances — function of migration or the time of divergence between populations. Several specific mutation models have been proposed by population geneticists (Box 1). For example, the SMM, thought to reflect the way microsatellites mutate, is an assumption of R-statistics. However, none of the models to hand appear to perfectly fit all microsatellite loci. Consequently, both the traditional differentiation estimators (F-statistics) and the microsatellite-specific differentiation estimators (R-statistics) are commonly reported in studies using microsatellite markers. However, FST and RST estimates often differ in a pronounced manner (cursory review in Lugon-Moulin et al. 1999b).

Sensitivity of fixation indices to mutation rates

The high mutation rate of microsatellites (Weber & Wong 1993; Jarne & Lagoda 1996) has an important consequence for FST, which can be defined as 1 – (Hs/Ht) (Box 2). Indeed, with high mutation rates, the probability of identity of two genes decreases (Rousset 1996). Hence, this statistic will be deflated with high mutation rates irrespective of the mutation model (see Box 1; Wright 1978; Nagylaki 1998; Hedrick 1999; Balloux et al. 2000a). Subpopulation genetic diversities (Hs) of 90% or even 95% are commonly reported in studies using microsatellites. These values correspond, respectively, to 10 and 20 equifrequent alleles. Balloux et al. (2000b) presented a simple example to illustrate the consequence of such high genetic diversities. Suppose that we have two subpopulations, each with 10 equifrequent alleles, but that none of them is shared between the two subpopulations. It is clear in this example that there is no gene exchange between these subpopulations and that genetic differentiation is as high as it can be. However, this situation corresponds to a FST of 0.053. Further note that in this example, we stated that because the two populations did not exchange any migrants, they had strictly no allele in common. This situation is however, unlikely for microsatellite markers that are characterized by high levels of size homoplasy (Estoup et al. 1995; Ortìet al. 1997; van Oppen et al. 2000). Thus, in this example, differentiation estimates are unlikely to reach this theoretical maximal value of 0.053. On the other hand, RST is independent of the mutation rate, be it high or not, under a SMM. Unfortunately, this feature no longer holds when deviations from the SMM occur (Slatkin 1995; Balloux et al. 2000a). Because this is likely to be the case for most microsatellite loci, RST will also be an unknown function of both migration and mutation in many situations.

The important factor is not the mutation rate per se, but the magnitude of the ratio of mutation over migration. When gene flow is reduced, as can be expected across hybrid zones or when known barriers to dispersal exist, the effect of mutation may become important relative to migration, and has to be accounted for. On the other hand, mutation is unlikely to matter when levels of gene flow are high. This situation can be encountered at restricted geographical scales, as well as over much wider areas for species with high dispersal abilities, such as many marine organisms (Waples 1998) or flying animals like bats (Petit et al. 2001).

Empirical evidence of microsatellite mutations

The patterns of microsatellite mutations appear extremely complex (Primmer & Ellegren 1998; Anderson et al. 2000; Brohede & Ellegren 1999). Between loci mutation rates of microsatellites show important variations. Similarly, the mutation patterns governing individual microsatellite loci evolution are diverse. We briefly present here empirical evidences of the variations and complexity of both mutation rates and patterns.

Mutation rates.

The mutation rate of a particular microsatellite locus is usually unknown. Typically, it is assumed that most microsatellites have high mutation rates of approximately 10−3 (Weber & Wong 1993; Jarne & Lagoda 1996). However, mutation rates may not only vary among repeat types (di- tri-, and tetranucleotide), base composition of the repeat (Bachtrog et al. 2000) and microsatellite types (perfect, compound or interrupted), but also among taxonomic groups. Other potential factors such as the nature of the flanking sequences or the position on a chromosome may also influence the mutation rate of a particular microsatellite. Furthermore, even at a given locus, mutation rate may vary among alleles, with long alleles being generally more mutation prone than shorter ones (Jin et al. 1996; Wierdl et al. 1997; Schlötterer et al. 1998).

Mutation patterns.

As mutations are by definition rare events, even for microsatellites, there are few empirical data on the type of mutations. It appears that most mutations involve the addition or deletion of a single repeat, with fewer mutations involving two to several repeats (e.g. Weber & Wong 1993; Di Rienzo et al. 1994; Primmer et al. 1998). In a recent study, 236 mutations, where the ancestral and derived states were known, were reported at tetranucleotide microsatellites (Xu et al. 2000). A total of 85% mutations involved a single repeat and 95%, less than three repeats. The largest mutation was a five repeat expansion. It is however, unclear whether this result would hold for other microsatellite repeat motifs (di- and trinucleotides; Ellegren 2000a). Additionally, the frequency of nonstepwise mutation seems to vary considerably between taxonomic groups, with estimates ranging from 4 to 74% (reviewed in Ellegren 2000b). Finally, there is a strong body of evidence that the maximal possible size of microsatellite alleles is constrained. For instance, perfect dinucleotide alleles rarely exceed 30 repeats. A restricted number of possible allelic states will of course lead to additional size homoplasy.

Summarizing the available data, it seems rather difficult to reconcile empirical data to any of the existing models. Neither of the two extreme mutation models proposed by population geneticists (IAM and SMM; Box 1) appears to perfectly account for the observed patterns of microsatellite mutations. Their mutation pattern probably lies somewhere in between these two extreme models. Furthermore, neither these extreme models nor their offshoots [K-allele model (KAM), two-phase model (TPM); Box 1] can account for asymmetries in the mutation patterns or constraint on allele size (except, for the latter, the KAM with a low enough number of allelic states K). More realistic mutation models could be developed, but they would probably be intractable analytically. It is further questionable how useful these models would be given the variation in the mutation pattern both between loci and taxa.

Consequences of deviations from the extreme mutation models

Although the mutation pattern of microsatellites is still not fully understood, it is often assumed that they mutate under the SMM (Box 1), which is the mutation model underlying R-statistics. The expectation is higher for RST than FST under a strict SMM. As discussed above, it is however, unlikely that microsatellite loci strictly follow this model. Under a strict SMM, alleles before and after mutation display the highest correlation. The correlation in size will decrease with the inclusion of nonstepwise mutations and eventually, if all mutations are at random as is the case in the IAM (Box 1), the correlation in size between alleles before and after mutation no longer exists. A likely consequence of a departure from a SMM is that the expectations of both statistics will converge.

Both the IAM and the SMM consider that there are an infinite number of possible allelic states. However, it appears clear that the number of possible allelic states is finite, constraining the size of alleles within a given range. If we consider K possible alleles at a given locus, then it appears that constraints on allele size will make stepwise mutation patterns become more similar to those expected under the KAM (Box 1). An important consequence of constraint on allele size on RST is that size differences among alleles can no longer be used to accurately reflect distances among alleles. Therefore, RST, which is precisely based on the variance of allele size, will be deflated (Nauta & Weissing 1996). It has to be stressed that this effect is much less important on FST (Balloux et al. 2000a).

Relative performance of FST and RST

When estimating FST or RST, two distinct aspects have to be accounted for: (i) the bias; and (ii) the variance of the estimators. Indeed, even with a perfect fit of the estimated FST or RST to their own analytical expectations (or to their equivalent in terms of effective number of migrants), these estimates may not be reliable if their associated variance is high, for example when working on small samples with a limited number of loci.

The main problem affecting F-statistics when working with microsatellites, is their sensitivity to the mutation rate when migration is low. Conversely under a strict SMM, RST is independent of the mutation rate. However, even under the strictest SMM assumption, RST can be less accurate at reflecting population differentiation than FST due to its high associated variance. Under a SMM, RST will therefore benefit more than FST from reducing the sampling variance, for instance through increasing the number of populations sampled, the number of individuals per population or the number of loci scored (Gaggiotti et al. 1999; F. Balloux and J. Goudet 2001).

RST will be deflated when the mutation pattern includes mutations involving more than one repeat when the number of possible allelic states is finite (Slatkin 1995; Balloux et al. 2000a). RST is nevertheless expected to give, on average, more accurate differentiation estimates than FST as long as there is some memory in the mutation process (i.e. a mutation process where a new allele obtained by mutation is more similar in size to its previous state than to randomly chosen alleles). While deviations from a strict SMM will make expectations of both statistics converge, the relative performance of RST over FST degrades because its variance is larger. Under any mutation model with some memory in allele size, the relative performance of RST over FST is also expected to improve with the level of population differentiation because the effect of mutation will become more important than migration (Balloux et al. 2000a). This general trend has been observed in several empirical studies where RST seems to better reflect true differentiation in highly structured populations (cursory review in Lugon-Moulin et al. 1999b).

Therefore, the estimation and comparison of both F- and R-statistics is relevant particularly when important differences in levels of differentiation are expected among sets of subpopulations. An example is provided by a study including domestic sheep (Ovis aries) and wild Rocky Mountain bighorn sheep (O. canadensis) (Forbes et al. 1995). On the one hand, RST was a better predictor of interspecific divergence, that is, it better detected longer historical separations than FST. On the other hand, the latter appeared to be more sensitive to detect intraspecific differentiation. A similar finding was reported in a study of a hybrid zone between two distinct chromosome races of the common shrew (Sorex araneus) (Lugon-Moulin et al. 1999b). FST appeared to better estimate differentiation within chromosome races while RST better reflected structuring between the two races, which are thought to have diverged during the last Pleistocene glaciations.

Biological interpretation

The main reason for the popularity of F- and R-statistics probably stems from their direct link to the effective number of migrants (Nm) under the assumptions of the island model. FST and RST are very commonly used to describe population differentiation at various levels of genetic structuring, either directly as differentiation estimators or through their link with the effective number of migrants. At the smallest level of differentiation, inferences regarding mating systems have been made (e.g. Balloux et al. 1998; Petit et al. 2001; Ross 2001). For more isolated populations, barriers to dispersal have been inferred (e.g. Lehmann et al. 1999). Even for highly isolated populations, differentiation estimators have been used to give insights into the evolutionary history of a species or group of species (e.g. Castella et al. 2000; Danley et al. 2000). In the following sections, we will discuss the interpretation of FST and RST values, their statistical significance and their translation into effective number of migrants, when using microsatellites.

Interpreting FST and RST value per se

Interpreting FST and RST values per se can be a dangerous task. For example, identical FST values can be estimated from different patterns of allele frequencies (Wright 1978). The interpretation of the two theoretical extremes for FST (0 and 1) is however, straightforward. A value of zero means that we sampled within a panmictic unit. At the other extreme, a value of one means that there is no diversity within subpopulations and that at least two of the sampled subpopulations are fixed for different alleles. Values between these two extremes will then be interpreted as depicting various levels of structuring. However, it can be difficult and misleading to give a biological meaning for these values.

For the interpretation of FST, it has been suggested that a value lying in the range 0–0.05 indicates little genetic differentiation; a value between 0.05 and 0.15, moderate differentiation; a value between 0.15 and 0.25, great differentiation; and values above 0.25, very great genetic differentiation (Wright 1978; Hartl & Clark 1997). Indeed, a FST of 0.05 will generally be considered as reasonably low and investigators may interpret that structuring between subpopulations is weak. While such an interpretation may turn out to be correct, it may also not be representative at all of the real population differentiation. One has to recall that the expectation of FST, under complete differentiation will not always be one. In fact, in the great majority of cases, it will not be one, because the effect of polymorphism (due to mutations) drastically deflates FST expectations (Wright 1978; Charlesworth 1998; Nagylaki 1998; Hedrick 1999). Hence, a seemingly low FST of 0.05 may in fact indicate very important genetic differentiation. This point was already stressed by Wright (1978), who wrote that differentiation is by no means negligible if FST is as small as 0.05 or even less.

An empirical example is given by the use of a polymorphic Y-chromosome microsatellite (Balloux et al. 2000a). Fifteen alleles were scored at this locus, which showed strictly nonoverlapping allele distributions between two very distinct chromosome races of the common shrew (Sorex araneus). While these disjunct distributions translated into a very high RST of 0.98 that correctly reflected the total absence of male-mediated gene flow between these races, the FST value was only of 0.19. This example illustrates well the sensitivity of FST to high polymorphism when migration is low.

Testing FST and RST values

While interpreting FST and RST values per se may lead to erroneous conclusions, population geneticists are often interested in assessing whether structuring is significant. That is, whether the estimated FST or RST value significantly differs from zero, the situation where all subpopulations belong to a single random breeding population. The use of nonparametric tests provides us with a means to assess the significance of FST/RST estimates. An exact FST/RST-estimator test will be based on a permutation procedure, in which genotypes are shuffled among subpopulations a great number of times, say, 10 000 times. From each of these data sets, FST or/and RST is estimated and the proportion of values larger than or equal to the one estimated from the real data set will yield the unbiased P-value of the test. These tests are very powerful even for reasonably sized data sets. For example, the FST estimated for noctule bat populations all over Europe is only 0.006, but its value is significant (P < 0.0001; Petit & Mayer 1999). Simulations further indicate that this result is not artefactual (Petit et al. 2001).

While permutation procedures allow testing as to whether FST (or RST) estimates depart from zero, other exact tests of genetic differentiation that are independent of the way population structuring is inferred, are available. Thanks to the important polymorphism of microsatellites, such tests can have a tremendous power. An example involves the populations of the European eels sampled from Iceland to North Africa (Wirth & Bernatchez 2001). Here the FST is very low (0.0017), but highly significant (P = 0.0014, Fisher exact test). Goudet et al. (1996) compared the efficiency of several such tests (including exact FST-estimator tests) for diploid populations and concluded that overall, the exact G-test is the most powerful, particularly when samples are unbalanced, as is common in biological studies. This test is therefore more powerful than other exact FST-estimator tests, e.g. the exact FST(θ)-test (Goudet et al. 1996; Petit et al. 2001). To carry out the exact G-test, the log likelihood ratio statistics G is first calculated from contingency tables of alleles in columns vs. samples in rows (Goudet et al. 1996). Individual genotypes are then randomly shuffled among samples. This permutation procedure is repeated many times and each time a G-statistic is calculated from the allelic counts. The G-statistic obtained from the original data set is compared to the G-statistics obtained from the permuted data sets. The proportion of G-statistic larger than or equal to the observed one will give the exact P-value of the test.

If this approach is statistically strictly correct, what is the biological meaning of significant structuring inferred from such powerful tests? These tests are able to detect very fine differences of allele frequencies among subpopulations. Consequently, it is not surprising to find significant genetic differences among a set of subpopulations, even if these differences may not necessarily be biologically meaningful (Waples 1998; Hedrick 1999). In many population studies, at least some departure from complete panmixia will occur and translate into significant tests. For example, Lugon-Moulin et al. (1999b) used the exact G-test to study the differentiation among subpopulations of the Cordon chromosome race of the common shrew. The exact G-test over all loci was highly significant. It turned out that this result was due to a single, significant locus that may show null alleles, and which was further found to be monomorphic in one of the subpopulations. When the exact G-test was performed either with or without this locus, but omitting this subpopulation, the exact G-test over all loci was no longer significant. This example shows the high power of such tests of differentiation and illustrates that careful examination of the data may be necessary to avoid possible biological misinterpretation of significant results.

Estimating the effective number of migrants

While population structuring may provide important information, biologists are generally interested in more than only estimating the differentiation between populations. Under the assumption of the island model of migration (e.g. no mutation, same Ne in every subpopulations; Wright 1931), the degree of population subdivision is related to the number of effective migrants via the simple relationship FST = (1 + 4Neme)−1 where Ne c population size and me the effective migration rate. It is important to note that it is not the census size (N) nor the migration rate (m) that are relevant, but their effective counterparts. Variance of reproductive success in excess of a binomial distribution, due for example to uneven sex-ratio, will reduce the ratio of Ne over N. Effective migration can also deviate from the actual migration rate, depending on the relative reproductive success of immigrants. For instance if there is a positive relationship between heterozygosity and fitness, migration is expected to be more efficient at homogenizing allele frequencies because offspring having an immigrant parent are expected to be more heterozygous on average (Ingvarsson & Whitlock 2000). Further, even if the genetic markers under study are strictly neutral, parents of the subsequent generation are not necessarily a random sample from the juvenile genotypes. For instance, average heterozygosity of adults can be correlated with survival (Bierne et al. 1998; Coltman et al. 1999). In this case, differentiation will be underestimated. However, the effective number of migrants itself is also a rather abstract quantity, as it is not possible to disentangle migration from the effective population size. To be interpretable in terms of mating systems, an independent estimation of Ne or me must be obtained. If estimates of effective migration are generally out of reach, it is possible in certain cases to get independent estimates of the effective population size. This parameter can for instance be estimated through demographic models, which take into account the census size and the variances in reproductive success (e.g. Bouteiller & Perrin 1999). Alternatively, the effective number of migrants can also be disentangled into mating system parameters by using the additional information provided by sex-specific markers as mitochondrial DNA or Y-chromosome markers (Petit et al. 2001).

These mating system parameter estimates should however, be interpreted with caution. Indeed, the simple relation between fixation indices and effective migration generally does not hold because the underlying island model of migration makes several assumptions that are likely to be violated in real populations (extensive review in Whitlock & McCauley 1999). It is not our purpose to review these hypotheses once more, but we would like to stress again that FST and RST are nonlinear functions of Nm (Waples 1998; Whitlock & McCauley 1999). This will cause small FST (RST) to translate into estimates of effective number of migrants with unduly large confidence intervals unless a prohibitively large number of loci are used (Waples 1998; Whitlock & McCauley 1999).


Despite the development of alternative approaches such as methods assigning individuals to populations (Paetkau et al. 1995; Pritchard et al. 2000), differentiation estimators remain the most commonly used tools to describe population structuring. The main reason behind this popularity stems from their direct link to the biologically relevant number of effective migrants (4Neme). This parameter provides an interface linking theoretical and empirical work, even if this relation relies on a series of assumptions that are unlikely to be met in natural populations (Whitlock & McCauley 1999). The development of highly polymorphic markers, and in particular microsatellites characterized by extreme mutation rates and largely unknown mutation patterns, provides additional challenges to the use of differentiation statistics. The high mutation rate itself is actually not a problem, especially as high heterozygosities reduce the stochastic variation between loci (Beaumont & Nichols 1996). However, as mutation cannot be disentangled from migration, FST will seriously underestimate differentiation in highly structured populations. This limitation was recognized long before highly polymorphic loci were available, by Wright (1978), who wrote ‘FST can be interpreted as a measure of the amount of differentiation among subpopulations, relative to the limiting amount under complete fixation …’. Differentiation statistics, as estimated from microsatellite allele frequencies, are still expected to be one of the most valuable tools for studying moderately structured populations. It is however, more questionable how informative FST can be for highly divergent populations when using microsatellites. In the later situation, R-statistics are better suited to provide relevant biological information, although care should be taken because of the high variance associated with RST. In addition, their performance will depend on how well the microsatellite markers under study fit a SMM because it is only under a strict SMM that RST is independent of mutation. In summary, as the relative performance of these two statistics depends on many factors that cannot generally be quantified, it is the use, critical comparison and careful interpretation of both statistics which may give the most valuable information about the genetic structure of populations.


This paper is dedicated to Jérôme ‘Fstat’ Goudet, who invested a lot of time and energy to share with us his encyclopedic knowledge of differentiation statistics. We also thank David Hosken, Eric Petit and an anonymous reviewer for useful comments and suggestions on previous versions of this paper.

François Balloux is presently a postdoctoral fellow at the University of Edinburgh. His current research focuses on how population subdivision affects the evolutionary outcome in classical biological problems as the result of competition between sexually reproducing populations and asexual lineages or the evolution of virulence. Nicolas Lugon-Moulin is a researcher at the University of Lausanne and his prime research interests are population genetics, with emphasis on the use of microsatellites, and the evolutionary history of soricine shrews.

Box 1 Mutation models

Understanding the mutation model underlying microsatellite evolution is of great importance for the development of statistics accurately reflecting genetic structuring. Two extreme mutation models have been developed by population geneticists: the infinite alleles model (IAM; Kimura & Crow 1964) and the stepwise mutation model (SMM; Kimura & Otha 1978).

In the IAM, each mutation creates a novel allele at a given rate, u. Consequently, this model does not allow for homoplasy. Identical alleles share the same ancestry and are identical-by-descent (IBD), unlike in other models (see below). In the K-allele model (KAM), the number of possible alleles is K. The probability for any allele to mutate to any other (K – 1) allelic state is identical. Hence, a given allele will mutate to any of the remaining alleles at a rate u/(K – 1). This model allows for homoplasy, that is, alleles that are identical-in-state (IIS), but not IBD. Note that the IAM is a special case of the KAM, with K = ∞ (hence lacking homoplasy).

The second extreme model is the SMM (Kimura & Ohta 1978). Under this scenario, each mutation creates a novel allele either by adding or deleting a single repeated unit of the microsatellite, with an equal probability u/2 in both directions. Consequently, alleles of very different sizes will be more distantly related than alleles of similar sizes. Therefore, unlike the two above models, the SMM has a memory of allele size. The two-phase model (TPM; Valdès et al. 1993; Di Rienzo et al. 1994) is an offshoot of the SMM, developed to account for a proportion of larger mutation events (that is, addition or deletion of several units). In this model, mutations increase or decrease allele size by one repeat with probability p, and increase or decrease allele size by k repeats with probability (1 – p), k following some probability distribution (Di Rienzo et al. 1994).

Box 2  F-statistics and R-statistics

Several definitions can be given for FST. Originally, a fixation index was developed by Wright (1921) to account for the effect of inbreeding within samples. He defined this quantity in terms of a correlation coefficient. Later, Wright (1951) expanded this concept to a population subdivided into a set of subpopulations, leading to the traditional hierarchical F-statistics, FIS, FST and FIT (where I stands for individuals, S for subpopulations and T for the total population; note that additional hierarchical levels can be defined). He defined FST, in which we are interested in this paper, as the correlation between two alleles chosen at random within subpopulations relative to alleles sampled at random from the total population (Wright 1951, 1965). Therefore, FST measures inbreeding due to the correlation among alleles because they are found in the same subpopulation. When considering two subpopulations and a two-alleles locus, this quantity will reach a value of one when the two subpopulations are totally homozygous and fixed for the alternative allele (hence explaining the term of fixation index) and a value of zero when the frequencies in the two subpopulations are identical (under the original correlation definition by Wright (1921) extended to FST, negative values are allowed because correlations vary from –1 to +1). Therefore, FST represents a measure of the Wahlund principle (Wahlund 1928), that is, a heterozygote deficiency due to population subdivision (note that subpopulations have finite sizes). Hence, FST measures the heterozygote deficit relative to its expectation under Hardy–Weinberg equilibrium (Hartl & Clark 1997). The Wahlund principle can be stated in terms of variance in allele frequency (Wright 1943, 1951, 1965; Hartl & Clark 1997):

FST = Vp/[p(1 − p)], (1)

where p and Vp are the mean and the variance of the allele frequency among subpopulations, considering a two-alleles locus. This positive quantity is the ratio of the observed variance divided by the maximum possible variance (when alleles are fixed in subpopulations).

Nei (1977) later redefined the fixation indices for multiple alleles as:

FST = (Ht − Hs)/Ht, (2)

Note that the quantities vary from 0 to 1, since Ht = Hs.

Cockerham & Weir (1987) defined an FST related to probabilities of identities:

FST = (f0 − f1)/(1 − f1), (3)

where f0 is the probability of identity-in-state (IIS) for pairs of genes between individuals within subpopulations and f1, between subpopulations. Note that under this definition, FST can possibly be negative in particular situations when f0 < f1.

Because microsatellites appear to follow a stepwise mutation model (SMM), Slatkin (1995) devised a statistic explicitly based on this mutation model. Slatkin (1995) showed that RST can be defined as follows:

RST = (S − Sw)/S, (4)

where S is the average squared difference in allele size between all pairs of alleles, and Sw, the average sum of squares of the differences in allele size within each subpopulations. These two quantities (S and Sw), and hence RST, can be calculated from the variances of allele sizes, whereas FST will typically be derived from the variances of allele frequencies. Slatkin (1995) showed that the relationship in eqn 4 has the same properties for microsatellites that follow a generalized SMM as does FST in the absence of mutation. In addition, RST being the fraction of the total variance in allele size between subpopulations, an RST parameter (ρ) and an estimator (p̂) can be defined using an analysis of variance framework (Michalakis & Excoffier 1996; Rousset 1996), by analogy to Weir & Cockerham’s (1984) θ parameter and its estimator, θ̂.

It is worth mentioning that Nei (1973) defined a multiallelic analogue of FST among a finite number of subpopulations, called the coefficient of gene differentiation (Nei 1973), as being the ratio:

GST = DST/Ht = (Ht − Hs)/Ht, (5)

where DST is the average gene diversity between subpopulations, including the comparisons of subpopulations with themselves, with DST = (Ht − Hs). GST is an extension of Nei’s (1972) genetic distance between a pair of populations to the case of hierarchical structure of populations (Nei 1973). Hence, the Hs in eqn 5 were defined in terms of gene diversities. However, for random mating subpopulations, gene diversities can be defined as expected heterozygosities under Hardy–Weinberg equilibrium averaged among subpopulations (Hs) and of the total population (Ht).

The main difference with the FST defined in eqn 2 is that the estimation of the heterozygosities in GST rely on allele frequencies only (Nei 1987), whereas to estimate the Hs in eqn 2, the individual genotypes have to be known (J. Goudet, personal communication). Crow & Aoki (1984) redefined GST in terms of probabilities (we denote this quantity GCA instead of GST, following Cockerham & Weir 1993) as:

GCA = (f0 − )/(1 − ), (6)

where f0 is as defined in eqn 3. GCA is related to the FST definition (eqn 3) given by Cockerham & Weir (1987), since f is a weighted mean of f0 and f1 (e.g. Cockerham & Weir 1993):

 = [f0 + (n − 1)f1]/n. (7)