Comparative studies of quantitative trait and neutral marker divergence: a meta-analysis


T. Leinonen, Ecological Genetics Research Unit, Department of Biological and Environmental Sciences, PO Box 65, FI-00014 University of Helsinki, Finland.
Tel.: +358 9 191 57801; fax: +358 9 191 57694;


Comparative studies of quantitative genetic and neutral marker differentiation have provided means for assessing the relative roles of natural selection and random genetic drift in explaining among-population divergence. This information can be useful for our fundamental understanding of population differentiation, as well as for identifying management units in conservation biology. Here, we provide comprehensive review and meta-analysis of the empirical studies that have compared quantitative genetic (QST) and neutral marker (FST) differentiation among natural populations. Our analyses confirm the conclusion from previous reviews – based on ca. 100% more data – that the QST values are on average higher than FST values [mean difference 0.12 (SD 0.27)] suggesting a predominant role for natural selection as a cause of differentiation in quantitative traits. However, although the influence of trait (life history, morphological and behavioural) and marker type (e.g. microsatellites and allozymes) on the variance of the difference between QST and FST is small, there is much heterogeneity in the data attributable to variation between specific studies and traits. The latter is understandable as there is no reason to expect that natural selection would be acting in similar fashion on all populations and traits (except for fitness itself). We also found evidence to suggest that QST and FST values across studies are positively correlated, but the significance of this finding remains unclear. We discuss these results in the context of utility of the QSTFST comparisons as a tool for inferring natural selection, as well as associated methodological and interpretational problems involved with individual and meta-analytic studies.


The relative importance of natural selection vs. genetic drift in explaining evolutionary changes both at molecular (e.g. Kimura, 1983; Mitton, 1994; Ohta, 2002) and phenotypic (e.g. Lande, 1976; Lynch, 1994; Whitlock & Phillips, 2000; O’Hara, 2005) levels constitute a long-standing source of debate in evolutionary biology. Although it is clear that both processes occur and most of the phenotypic evolution is likely to be driven by natural selection, there are also ample opportunities for genetic drift to be an important component of population differentiation. This is true especially in populations with small effective sizes (Ne). In other words, small Ne increases the rate of genetic drift providing more scope for nonadaptive differentiation, and the efficiency of natural selection is inversely related to Ne (e.g. Jones et al., 1968; Frankham et al., 2002; England et al., 2003). Therefore, small populations can respond to selection more slowly than large populations (Robertson, 1960; Frankham & Weber, 2000). Similarly, although there is ample evidence for rapid evolutionary differentiation apparently driven by selection (Hendry & Kinnison, 1999; Kinnison & Hendry, 2001), evolutionary rates are also often so low – especially over longer time scales (Gingerich, 2001; Kinnison & Hendry, 2001) – that in many cases they could, at least in principle, have been achieved with very little selection or even by genetic drift alone (e.g. Lande, 1976; Lynch, 1988; Kinnison & Hendry, 2001; but see Estes & Arnold, 2007).

Today, as our understanding of the genetic basis of quantitative trait variability and differentiation is increasing at an accelerating pace (e.g. Mackay, 2001, 2004;Anholt & MacKay, 2004; Colosimo et al., 2005; Chenoweth & Blows, 2006; Hoekstra et al., 2006), new insights into many of the long-standing controversies about the relative roles of selection and drift in evolution can be expected. Although there are already a few examples where enough is known both about the genetic details and selective factors to firmly infer selective cause for population differentiation in particular genes (e.g. Colosimo et al., 2005; Hoekstra et al., 2006), it will probably take a good while before these technologies and possibilities become widely applicable for nonmodel organisms and more complex traits (e.g. Chenoweth & Blows, 2006). Therefore, comparisons of quantitative genetic (as measured by the QST) and neutral marker gene (as measured by FST) differentiations are still providing one of the most accessible and universal tools for inferring the role of natural selection in population differentiation for quantitative traits (see Merilä & Crnokrak, 2001; McKay & Latta, 2002 for reviews).

The logic of FST and QST comparisons is based on the realization that for loci subject to the effects of genetic drift and migration only (assuming a negligible contribution from mutation), FST for neutral markers provides a null expectation for the degree of population differentiation attainable without selection (e.g. Merilä & Crnokrak, 2001). When comparing FST with the analogous index for quantitative traits (QST), three outcomes are possible (Merilä & Crnokrak, 2001). First, if QST > FST, then this implies that the degree of differentiation in quantitative traits exceeds that attainable by genetic drift alone, and directional natural selection favouring different phenotypes in different populations is the likely cause of this differentiation. Secondly, if the QST and FST estimates are almost equal, the observed degree of differentiation in quantitative traits could have been reached by genetic drift alone. However, this does not prove that the observed degree of differentiation was caused by genetic drift – only that the relative contributions of drift and selection are unknown. Third, if QST < FST, this implies that the observed degree of differentiation is actually less than expected on the basis of genetic drift alone, the most likely cause for this being stabilizing selection.

Comparative studies of differentiation in quantitative traits and in presumed neutral marker genes have become increasingly popular since the last comprehensive review and meta-analysis was published about 5 years ago (Merilä & Crnokrak, 2001; see also McKay & Latta, 2002). The number of studies providing comparative data on both QST and FST from the same populations has more than doubled since then and interest seems to continue, as reflected in the steady increase in publication number in this topic (Fig. 1).

Figure 1.

 Number of studies dealing with comparison of quantitative trait and neutral marker gene differentiation as a function of time. Empirical studies are represented by black bars, commentaries, reviews and theoretical studies by white. Numbers within the bars indicate number of studies in each bar. Note that studies for 2007 depict the situation until 15 March 2007.

Although the foundation for this approach to compare marker genes and quantitative traits was laid down by Sewall Wright as early as in 1951 (Wright, 1951), it took about 30 years before it was put in practical use by Rogers & Harpending (1983; Fig. 1). Soon after, Lewontin (1984) published a commentary that provided the earliest criticism directed against comparisons of marker and quantitative genetic measures of population differentiation. In his criticism, Lewontin (1984) was concerned about the actual information content of marker vs. quantitative trait comparisons and concluded that such comparisons would serve little useful purpose. Lewontin’s (1984) article was followed by Felsenstein’s (1986) commentary and Rogers’ (1986) response concluding that Lewontin’s (1984) criticism was not entirely valid – a point that Lewontin (1986) later commented on. After this rather dramatic dawn of comparative studies of marker genes and quantitative traits, it took about another 10 years before the next empirical contribution to field was made by Prout & Barker (1993).

The work by Prout & Barker (1993) was followed by Spitze’s (1993) study, which also introduced and established the ‘QST’ notation to the terminology of comparative studies of quantitative trait and neutral marker differentiation. Several other studies followed, but the number of publications in this topic remained low until about the year 2000, after which the interest in QSTFST comparisons experienced a noticeable increase (Fig. 1). The markedly delayed surge of these studies – relative to Wright’s (1951) original paper – is understandable as access to polymorphic markers, as well as interest in evolutionary quantitative genetic studies were both quite limited until the 1980s. Hence, most empirical studies on QSTFST comparisons have been published in the 21st century (Fig. 1).

Theoretical studies focussing on the assumptions and properties of QST have been rather slow to appear (Fig. 1), starting with Whitlock’s (1999) simulations of effects of nonadditive genetic effects on the difference between FST and QST. This work has been recently extended by several authors (Le Corre & Kremer, 2003;Lopez-Fanjul et al., 2003;Goudet & Buchi, 2006), the main practical conclusion being that nonadditive genetic effects are unlikely to bias QST estimates considerably. An important concern in comparative studies of quantitative trait and marker gene differentiation is also the distinction between QST for a quantitative trait and individual quantitative trait loci (Latta, 1998; McKay & Latta, 2002). As it turns out, one should not expect to see the signature of selection (QST > FST) in individual loci in the same way as for traditional estimates of QST based on quantitative traits (Latta, 1998).

During the past 5 years or so, a number of practical and conceptual problems with comparative studies of quantitative trait and marker differentiation have emerged (e.g. Hendry, 2002; O’Hara & Merilä, 2005). One of these concerns the difficulty of obtaining accurate estimates of QST with a small number of populations, as well as the related and more practical problem of correctly estimating its standard errors (O’Hara & Merilä, 2005). Another important concern with the comparative studies is the problem with the effect of mutation rates on the difference between FST and QST (Hendry, 2002). For instance, if the mutation rates for neutral marker traits are larger than those for quantitative traits, then the comparisons might be predisposed to find QST > FST. However, it is yet unclear whether mutations in quantitative traits will affect QST in an analogous fashion or how this influences QSTFST comparisons.

Some controversy has also arisen around the issue of whether molecular and quantitative genetic estimates of population differentiation are correlated or not (Merilä & Crnokrak, 2001; Crnokrak & Merilä, 2002; Latta & McKay, 2002; McKay & Latta, 2002). This controversy was mainly due to the fact that two reviews (Merilä & Crnokrak, 2001; McKay & Latta, 2002) used different criteria for selecting data, and there were some minor errors in numerical values of the estimates in one of the data sets (Crnokrak & Merilä, 2002). Irrespective of these issues, the amount of data in both of the reviews was so low that no strong conclusions about the existence relationship – or lack of it thereof – were possible.

The main aim of this review was to provide an update of the earlier meta-analysis of comparative studies of QST and FST estimates, with about 100% more data. In particular, we are interested to see whether the earlier main findings, that is overwhelming evidence for selective nature of quantitative trait differentiation (QST > FST) and a weak correlation between QST and FST across different studies – still hold with a larger data set. At the same time, we extended the meta-analyses to the individual trait level (as opposed to averages over traits) while also accounting for nonindependence of individual QST and FST estimates. We accomplished this by extending the meta-analysis to a hierarchical modelling framework. Our analytical approach also allowed us to evaluate the relative influence of marker type, trait type, population number and other factors related to study design on the outcome of QSTFST comparisons. In addition to the results of meta-analyses, we evaluate and discuss the results in the light of some potentially critical assumptions and problems in QST and FST estimation. While doing so, we will focus on ideas and findings that have emerged since the previous meta-analysis (Merilä & Crnokrak, 2001) to avoid extensive repetition of basic assumptions, background and potential pitfalls.


To allow direct comparison of results with the earlier analyses (Merilä & Crnokrak, 2001) and to make the data transparent, we present the patterns of FST and QST variability in published studies in two different ways. First, we will adhere to the format of the earlier review and present the mean QST (and FST) estimates for each specific study. Secondly, we present the data for individual traits as averaging over traits hides a significant proportion of among-trait variability in the QST estimates.

The data

By searching Web of Science (up to 15 March 2007) and reference lists of articles reporting QSTFST comparisons, we found 62 studies that have compared estimates of QST and FST indices between different populations (Table 1). The only criterion we employed in our selection of studies was that the articles to be included into the meta-analysis should include QST and FST estimates from the same set of populations (cf. Crnokrak & Merilä, 2002). We also excluded the two unpublished studies in Table 2 of Merilä & Crnokrak (2001; refs 16 & 17) as these estimates have since been published with some corrections (Palo et al., 2003; Knopp et al., 2006). As the purpose of this review was to evaluate the evidence for natural selection in empirical studies on wild populations, we excluded two studies that involved populations under artificial selection. However, these studies (Morgan et al., 2005; Porcher et al., 2006) are included in Fig. 1, Table 1, and the reference list.

Table 1.   Synopsis of comparative studies of marker and quantitative genetic population structure.
  1. FST = divergence in marker genes, QST = divergence in quantitative traits (mean value of traits in given study), Design = QST based either upon genetic data from a full-sib design (FS), purely additive genetic (G) or phenotypic (P) data, ntraits = number of quantitative traits upon which QST estimate is based, nloci = number of loci upon which FST estimate is based, npop = number of populations included in the study, trait = type of traits upon which QST estimates are based (M = morphological, LH = life history, MIX = mixture of the previous, B = behavioural), marker = marker system upon which FST estimates are based (A = allozymes, M = microsatellites, RAPD = randomly amplified polymorphic DNA, RFLP = restriction site length polymorphism, rDNA = ribosomal DNA, ESTP = expressed sequence tag polymorphism, CAPS = cleaved amplified polymorphic sequences, AFLP = amplified fragment length polymorphism). References: 1. Podolsky & Holtsford (1995), 2. Bonnin et al. (1996), 3. Yang et al. (1996), 4. Karhu et al. (1996), 5. Lascoux et al. (1996), 6. Kuittinen et al. (1997), 7. Kremer et al. (1997), 8. Waldmann & Andersson (1998), 9. Waldmann & Andersson (1999), 10. Andersson et al. (2000), 11. McKay et al. (2001), 12. Petit et al. (2001), 13. Jaramillo-Correa et al. (2001), 14. Widen et al. (2003), 15. Gonzalez-Martinez et al. (2002), 16. Steinger et al. (2002), 17. Bekessy et al. (2003), 18. Baruch et al. (2004), 19. Pressoir & Berthaud (2004), 20. Le Corre (2005), 21. Stenøien et al. (2005), 22. Navarro et al. (2005), 23. Volis et al. (2005), 24. Gravuer et al. (2005), 25. Waldmann et al. (2005), 26. Sanou et al. (2005), 27. Porcher et al. (2006), 28. Jorgensen et al. (2006), 29. Steane et al. (2006), 30. Jimenez-Ambriz et al. (2007), 31. Willi et al. (2007), 32. Spitze (1993), 33. Prout & Barker (1993), 34. Lynch & Spitze (1994), 35. Long & Singh (1995), 36. Lynch et al. (1999), 37. Morgan et al. (2001), 38. Lee & Frost (2002), 39. Luttikhuizen et al. (2003), 40. Edmands & Harrison (2003), 41. Evanno et al. (2006), 42. P. Armbruster & J. Hard (unpublished data) (in Lynch et al., 1999), 43. Conde-Padín et al. (2006), 44. Merilä (1997), 45. Rogers et al. (2002), 46. Storz (2002), 47. Koskinen et al. (2002), 48. Saint-Laurent et al. (2003), 49. Palo et al. (2003), 50. Gomez-Mestre & Tejedo (2004), 51. Bernatchez (2004), 52. Cano et al. (2004), 53. Østbye et al. (2005), 54. Morgan et al. (2005), 55. Perry et al. (2005), 56. Leinonen et al. (2006), 57. Turan et al. (2006), 58. Knopp et al. (2006), 59. Wojcik et al. (2006), 60. Raeymaekers et al. (2007), 61. Zhan et al. (2005), 62. Zhan et al. (2006). *, †, ‡, §, ¶ denote the studies excluded from the analyses.

  2. *QST and FST estimates from different populations.

  3. †Only average QST given.

  4. ‡Populations subjected to artificial selection.

  5. §Information on design and traits lacking.

  6. ¶No neutral genetic data provided.

PArabidopsis thaliana1.000.89FS654LHA6
PArabidopsis thaliana0.640.66FS553MIXM6
PArabidopsis thaliana0.430.38G955MIXM27‡
PArabidopsis thaliana0.600.63FS121212MIXM20
PArabidopsis thaliana0.60   1912 CAPS20
PArabidopsis thaliana0.890.82FS9259MIXM21
PArabis fecunda0.200.94FS618MA11
PBrassica cretica0.630.62G787MIXA14
PBrassica insularis0.210.08G1054MIXA12
PCentaurea corymbosa0.360.20G854MIXA12
PClarkia dudleyana0.660.35G18811MA1
PHordeum spontaneum0.210.67P115420MIXRAPDs23
PHordeum spontaneum0.11   1720 A23
PLiatris scariosa0.210.30FS132412MIXA24
PMedicago truncatula0.140.58FS24222LHRAPDs2
PNigella degenii0.050.06FS814920LHAFLP28
PRanunculus reptans0.090.22HS7813MIXA31†
PScabiosa canescens0.160.10FS8217MIXA8
PScabiosa canescens0.120.18FS826MIXA9†
PScabiosa columbaria0.120.45FS8517MIXA8
PScabiosa columbaria0.160.25FS856MIXA9†
PSenecio vulgaris0.490.51FS818310MIXAFLP16
PSilene diclinis0.050.12FS985MIXA10
PThlaspi caerulescens0.150.14FS1456LHM30
PTrachypogon plumosus0.280.16FS8109MIXA18
PZea maize0.010.21FS151131MIXM19
PAraucaria araucana0.190.26FS2659LHRAPDs17
PCedrela odorata0.670.29FS8-30MRAPDs22
PEucalyptus globulus0.090.13G15810MIXM29
PPicea glauca0.010.10G10136LHA13
PPicea glauca0.02   116 ESTP13
PPinus contorta0.020.12G6195MA3
PPinus pinaster0.050.83G31419MIXA15
PPinus sylvestris0.030.77G165LHM25
PPinus sylvestris0.020.36FS1104LHA4*
PPinus sylvestris0.02   34 RFPL4*
PPinus sylvestris0.01   24 M4*
PPinus sylvestris0.14   84 rDNA4*
PPinus sylvestris-   ?4 RAPDs4*
PQuercus petraea0.020.31G2881LHA7
PSalix viminalis0.040.07FS5914LHA5
PVitellaria paradoxa0.050.19FS9811MM26
IDaphnia obtusa0.280.23FS578MIXA32
IDaphnia obtusa0.290.29FS798LHA34
IDaphnia pulex0.310.35FS181317LHA36
IDaphnia pulex0.29   617 M36
IDaphnia pulex0.52   717 RFLP36
IDaphnia pulicaria0.390.53FS81414LHM37
IDaphnia pulicaria0.27   1414 A37
IDrosophila buzzatii0.030.15FS1613MA33
IDrosophila melanogaster0.280.44FS978MA35
IEurytemora affinis0.620.16FS854MA38
ILittorina saxatilis0.100.10P553MA42§
IMacoma balthica0.010.42G1511MA39
IRadix balthica0.180.05G41904LHAFLP41
ITigriopus californicus0.800.30G2256MIXM40†
IWyeomyia smithii0.370.65----LH-43
VApodemus flavicollis0.040.14P511MA59
VCarduelis chloris0.030.30P162212MA44
VBufo calamita0.030.26FS185LHM50
VRana arvalis0.340.34G282MIXM58
VRana temporaria0.240.81G386LHM49
VRana temporaria0.210.29G1082MIXM52
VCoregonus clupeaformis0.240.73FS362BM45
VCoregonus clupeaformis0.100.13P2666MM51
VCoregonus lavaretus0.120.25P20611MM53
VGasterosteus aculeatus0.190.54P21810MM56
VGasterosteus aculeatus0.080.14P1492MM60
VOsmerus mordax0.020.42P1564MM48
VPomatomus saltatrix-0.95P425-M-57
VSalvelinus fontinalis0.150.23G552LHM55
VThymallus thymallus0.100.35G7173LHM47
VMus domesticus0.150.01FS168MM54‡
VMus domesticus0.350.12FS118BA54‡
VCynopterus sphinx0.030.03P867MM46
FMycosphaerella graminicola0.100.17G785MIXRFLP61
FMycosphaerella graminicola0.110.54G11415LHRFLP62

The studies used in the analyses included 50 species, 54% of which were plants, 24% vertebrates and 20% invertebrates, whereas two studies used a species of fungus (Table 1). On average, nine traits were used in each study, the number of traits per study varying from 1 to 26. Only five of the studies examined a single trait, whereas 41 of the FST estimates corresponded each with only a single QST estimate. Although studies that explicitly used morphological traits were in the minority (Table 1), the number of traits measured in those studies was larger than in the rest of the studies, so that overall the majority of traits measured were morphological (61%), whilst growth-related and life-history traits were slightly less frequent (38%). Only two of the studies measured (five different) behavioural traits (Table 1).

In almost all of the studies, FST estimates were based on allozyme markers (25 studies) or microsatellites (24 studies). However, microsatellites were used more often than allozymes in the studies published after 2001 (Table 1). Throughout the analyses, allozymes and isozymes were pooled together. Other types of nuclear markers were used in 12 studies altogether (Table 1). A total of six studies used more than one marker type to estimate FST. As it was the case with studies published prior to 2001 (see Merilä & Crnokrak, 2001), studies that used more than one marker type to estimate FST yielded similar conclusions based on estimates from the different marker types, although the choice of marker affected the magnitude of QSTFST difference (Jaramillo-Correa et al., 2001; Morgan et al., 2001; Volis et al., 2005). One study reported QST estimates based on three different variance component estimation methods, of which anova and REML gave similar results, whereas Bayesian analysis produced larger confidence intervals that were more difficult to interpret (Evanno et al., 2006). To be conservative, in the meta-analysis below we used the estimates from the Bayesian analysis with the widest confidence intervals.


Fifty-five of the total of 62 studies were included in the meta-analysis. Studies that did not report trait-specific QST values were excluded, as were studies that did not present comparable statistics or did not use same populations for both FST and QST estimates (see Table 1). A total of 84 FST and 873 QST values were presented in these studies, of which roughly 75% included an error estimate (64/84 for FST and 675/873 for QST). Studies reported error estimates either as standard errors, standard deviations or confidence intervals, which were all transformed to standard errors prior to the meta-analysis.

The logic of the meta-analytical model employed was to summarize the effects of different factors on both FST and QST. There were several reasons why we used the present extended model instead of a simple meta-analysis. The aim of a simple meta-analysis is to provide an estimate of an effect size, but our main interest was in explaining the variation in the effect size on the difference between FST and QST. Furthermore, we wanted to distinguish between factors that affect the ‘true’ values of FST and QST and those that only affect the estimates. In addition, the model had to be weighted to account for the different precisions of the estimates. Extension of the model also allowed us to tackle a more practical problem of missing error estimates. Thus, instead of having to drop the FST and QST estimates from those studies that did not report standard errors, we were able to include them in the analyses and estimate the missing values through the model. Because of the structure of the problem, we developed the meta-analysis model as a hierarchical model.

The following factors were included in the meta-analysis:

Study identity. The identity of the study was included as a random factor to account for the variation caused by the particular characteristics of each study that were not considered explicitly in our model (e.g. species identity, experimental treatment, location and sample sizes).

Study design. Different studies used different designs to estimate QST, including different contributions from nonadditive genetic variation and/or environmental noise. The factor has three levels:

  • 1Wild. QST estimation is based on purely phenotypic data from wild individuals; so, the genetic and environmental contributions to phenotypic variation cannot be separated.
  • 2Broad sense. QST is estimated from individuals kept in common garden conditions, but the estimates of genetic variation include nonadditive components. This is the case for studies that measured variation among full-sib families and, therefore, estimates of additive genetic variance may include dominance and environmental maternal effects.
  • 3Narrow sense. QST is estimated based on additive genetic variation only. These are common garden studies that can separate additive effects from environmental and maternal effects on quantitative trait variation (i.e. half-sib designs).

QSTestimation method. The statistical method used to estimate QST and its components (i.e. additive genetic variation within and among populations) can result in biased QST estimates (O’Hara & Merilä, 2005). Here, we categorized the estimation methods in three categories, restricted maximum likelihood (REML), Bayesian Inference (Bayes) and analysis of variance (anova).

QSTError estimation. Whereas the Bayesian approach gives automatically the uncertainty (error) of all the parameters in the model, studies using REML and anova usually approximate the error of QST by using jackknife or nonparametric bootstrapping, which can result in an underestimation of the error and poor coverage (O’Hara & Merilä, 2005). Here, the QST values were grouped according to the published estimation methods (Bayesian, parametric bootstrapping, nonparametric bootstrapping, residual sum of squares, jackknifing and delta method).

Marker type. The studies reviewed used a range of genetic markers with varying degrees of polymorphism that can affect the bias and accuracy of FST (Hedrick, 1999). In our model, this factor had seven levels as follows: restriction fragment length polymorphisms (RFLP), randomly amplified polymorphic DNA (RAPDs), microsatellites (MICRO), ISOZ (includes isozymes/allozymes), expressed sequence tag polymorphisms (ESTP), cleaved amplified polymorphic sequences (CAPS) and amplified fragment length polymorphisms (AFLP).

Trait type. Quantitative traits may differ in their genetic architecture. For example, life-history traits have a more polygenic nature than morphological traits and this may result in different rates of response to selection (Merilä & Sheldon, 1999). For this reason, we classified the traits into three levels, morphological (Morph), life-history (L-hist) and behavioural (Behav). As was the case in the earlier review (Merilä & Crnokrak, 2001), traits directly connected to fitness, such as growth rate and fecundity, were classified as life-history traits, whereas metric size measurements, such as skeletal dimensions, were classified as morphological traits.

The influence of these factors on either FST or QST can be estimated by developing a model for their effects. There are two responses, FST and QST, and the difference between these is ascribed to selection. The simplest approach to modelling the influences of the factors would be to fit a linear model for FST and QST [i.e. a meta-regression, sensuThompson & Higgins (2002)], in the same way that a normal experiment might be analysed with a linear model. Because of the more complex structure of the data here, the analysis is extended to account for the dependence of QST on ‘true’FST, to distinguish between the factors that affect ‘true’QST and FST from the factors that affect the estimates, the variation in the precision of the estimates and to include estimates for which standard errors were not reported by estimating the errors. As a by-product of the latter, we are able to use the known standard errors to investigate the factors that affect the estimates.

The hierarchical model we developed can be viewed as several generalized linear models fitted together, so that, although the full model is complex, it is made up of several simpler parts. The model is described below, with the full details given in the Appendix. The notation for describing the model follows Wilkinson & Rogers (1973): XY shows that X has a distribution with expected value Y, where Y can be the sum of several terms. These terms are the factors in the linear model.

We assumed that the observed estimate of FST is affected by both its true value and the type of marker used, i.e. there may be bias in the estimate of FST due to properties of the markers (as pointed out by Hedrick, 1999, 2005). We can represent the model as:


The true value of FST was modelled as being drawn from a distribution with a grand mean (i.e. over all studies) and a variance that reflects the variation in estimated FST.

The observed estimate of QST could be biased by the study design and the QST estimation method, so it was modelled as:


The true value of QST was then modelled as the true value of FST plus an effect of the trait type (i.e. morphological, life-history or behavioural trait) and a random effect of the study:


True FST is continuous and is modelled as an offset (i.e. setting the coefficient to 1): this was investigated further (see below).

The standard error for FST was modelled as being affected by the actual value of FST, the type of marker, the number of populations and number of loci in the study:


The standard error for QST is modelled as being affected by the study design, the number of populations (treated as continuous) in the study, the actual value of QST, and the method used to estimate the standard error of QST:


The model described above assumes that QST increases with FST in a 1 : 1 manner, i.e. it forces a correlation between the two statistics. We therefore fitted a second model that relaxed this assumption by estimating the regression coefficient for the effect of FST on QST using eqn 3.

A full description of the model is given in the Appendix. The model was fitted to the data using a Bayesian approach with OpenBUGS 2.2.1beta (Thomas et al., 2006): details of the priors and implementation are given in the Appendix. Missing data (e.g. standard errors) were treated by estimating them as extra parameters (see Appendix for details). The parameter estimates are summarized by the posterior mode (i.e. the most likely value) and by the 95% highest posterior density intervals (HPDIs; Gelman et al., 2004). The HPDI is a Bayesian confidence interval, which has the property of being the shortest possible confidence interval with the correct coverage.


Descriptive statistics

The comparison of mean QST and FST values revealed that 70 % of the QST values exceeded their associated FST values (Table 1). The average difference between mean QST and FST estimates was 0.12 (SD 0.27), QST estimates being on average significantly larger than the corresponding FST estimates (Wilcoxon signed-rank: z = 3.31, = 59, = 0.0005; Fig. 2b). Furthermore, the correlation between the pairwise estimates across the studies was significantly positive with both parametric (= 0.41, z = 0.43, d.f. = 59, = 0.012), and nonparametric (rs = 0.39, z = 0.42, d.f. = 59, = 0.017) tests.

Figure 2.

 The relationship between QST and FST estimates in empirical studies. (a) Mean QST estimates against FST estimates as in Merilä & Crnokrak (2001), (b) mean QST estimates against FST estimates in this study, (c) QST estimates of individual traits against FST estimates, (d) QST estimates for individual traits against standardized (Hedrick, 2005) FST estimates. Note that the standardization has been applied only to the studies reporting GST estimates. Note also that for studies that have used more than one species, each point represents one species. In all figures, the thick solid line depicts 1 : 1 correspondence, and the thin line smoothed regression of QST on FST.

Meta analysis

The meta-analysis showed that most of the variance in the data is due to study identity and between FST estimates within studies, as well as QST estimates within studies (Fig. 3). Study design, marker type and trait type explain virtually none of the variation in the data, whereas the QST estimation method has a weak contribution (Fig. 3), with REML being associated with higher estimates (Fig. 4a). The estimate for the overall average difference (Δ) between QST and FST was positive (Δ = 0.16, 95% HPDI: −0.012–0.50, P(Δ < 0) = 0.02). The estimates were very similar for all three trait types (Fig. 5). Of the other effects, study design had no effect on the QSTFST difference (Fig. 4b), nor did the choice of marker in these data (Fig. 4c).

Figure 3.

 The proportion of variance in QST or FST explained by different factors in the meta-analysis. Vertical dash denotes posterior mode, boxes 50% highest posterior density intervals (HPDIs) and horizontal lines 95% HPDIs.

Figure 4.

 The effect of different factor levels on QST or FST. Values on the horizontal axis are the effects that each factor adds to QST or FST. Values are centred so that each subplot has a mean of zero. The posterior mode (vertical dash), 50% highest posterior density intervals (HPDIs) (boxes) and 95% HPDIs for each of the effects are shown. See text for definition of different factors and factor levels.

Figure 5.

 The effect of trait type on the difference between QST and FST. The values on the horizontal axis are the expected differences, QST − FST. The posterior mode (vertical dash), 50% HPDIs (boxes) and 95% HPDIs for each type of trait are shown.

Are molecular and quantitative genetic differentiation correlated?

When the effect of FST on QST was introduced as a regression into the model, the slope estimate was positive and less than one (posterior mode: 0.78, 95% HPDI: 0.32–1.18). In other words, QST increases with FST but QST does it by a smaller amount, so that the difference between the two tends to get smaller (Fig. 2c). It is also noteworthy that with low FST values QST estimates can reach virtually any value.

Variation in standard error estimates

Focusing on the published standard error estimates of QST, study design had no effect on their size (Fig. 6). However, standard errors tended to increase with increasing QST, and the method of estimation had a large effect, with Bayesian methods (set to zero) and the nonparametric bootstrap having similar sizes of estimates, and the other methods tending to yield lower estimates (Fig. 6).

Figure 6.

 The effects of different factors on the precision of QST. Values on the horizontal axis are the logarithms of the effects that each factor multiplies the square of the standard error (i.e. larger values indicate larger standard errors). Thick bars represent 50% highest posterior density interval (HPDI), and thin lines 95% HPDI. Vertical dashes represent the posterior mode. The error estimation methods are compared with Bayesian inference, which is set to zero. Design effects are centred to have a mean of zero.


Our meta-analysis largely confirms the main results of the review by Merilä & Crnokrak (2001), which found a significant positive covariance between the indices of population differentiation for quantitative traits (QST) and neutral genetic markers (FST), and that on average QST > FST. The present review, however, is more powerful in the sense that the number of studies has more than doubled since 2001. In addition, our analytical approach used all the information at the individual trait level, whereas the previous published reviews averaged trait values for each study (Merilä & Crnokrak, 2001; McKay & Latta, 2002). In the following, we discuss these results in the light of their biological significance and the latest methodological and theoretical developments in the field.

The relative importance of selection vs. genetic drift in the wild

The results of our meta-analyses showed that the levels of quantitative trait divergence among populations typically exceeded what would be expected from genetic drift alone. Whether this finding can be generalized to indicate the ubiquity of natural selection as a force behind population differentiation in quantitative traits depends on the validity of the assumption that the case studies included in the meta-analyses are a representative sample from the wild of traits and species (cf. Merilä & Crnokrak, 2001). In most cases researchers have probably chosen their study populations because they are located in contrasting environments and/or are known to be phenotypically divergent in the first place. In other words, the question of interest in most studies has been to know whether the observed high degree of phenotypic differentiation reflects local adaptation or random genetic drift. Therefore, the results of our analyses support the contention that selection is the main force driving population divergence in those systems where pronounced phenotypic differences are known to exist a priori. Hence, it is important to recognize that this is not the same as saying that natural selection would be a more common and important determinant of population differentiation than genetic drift.

Given the potential bias in the selection of study populations for QST vs. FST comparisons, we see that the main utility of comparative studies of quantitative trait and neutral marker differentiation is to act as an exploratory tool. In other words, such comparisons are useful as ‘blind-search’ tools to detect traits that are under selection, but for which there is limited background information to give clear a priori expectations. Along the same lines, the QST vs. FST comparisons can be very useful for deciding whether populations should be considered as separate management or conservation units, i.e. where they might show adaptive divergence but there is lack of knowledge about potentially divergent selection pressures affecting the different subpopulations. In the context of conservation and management of wild populations, a particularly interesting feature emerging from our analysis is the fact that quantitative trait differentiation can be very high even in cases where FST values are very low. This means that even if gene flow is high among populations and/or effective population sizes are very large, selection can be an important force leading to adaptive divergence. For this reason, conservation and management policies based solely on information about variation in neutral genetic markers can potentially be very misleading.

Morphology vs. life-history traits

Evolutionary theory predicts that quantitative traits will respond differently to selection depending on their genetic architecture (i.e. number of genes and proportion of phenotypic variation due to additive genetic effects) and how closely related they are to fitness (Merilä & Sheldon, 1999). Therefore, morphological traits (which have a less polygenic nature and weaker nonadditive genetic effects than life-history traits) are expected to respond faster to selection than life-history traits, and hence have higher QSTs. In contrast to this expectation and the findings of an earlier meta-analysis (Merilä & Crnokrak, 2001), our results indicate that the average difference between QST and FST is similar for morphological and life-history traits (Fig. 4d). Behavioural traits were not considered in the previous review, but they seem to have a similar QSTFST difference than the other trait types. However, there is a large uncertainty around the effect of behavioural traits due to the low number (= 2) of behavioural studies available.

Does differentiation in neutral markers predict quantitative genetic differentiation?

We found a positive correlation between QST and FST estimates across different studies suggesting that the genetic differentiation in quantitative traits might be predictable from knowledge of differentiation in neutral markers. However, this does not necessarily mean that variation in neutral markers is a good predictor of adaptive quantitative trait differentiation (r2 = 0.16). In principle, divergence in neutral genetic markers per se– if behaving truly neutrally – is not expected to reflect quantitative trait differentiation as a result of selection (Pearman, 2001; McKay & Latta, 2002), although some demographic scenarios (e.g. habitat fragmentation and reduced gene flow) may lead to concerted increase in both neutral and quantitative trait divergence (Frankham & Weber, 2000). In addition to the lack of genetic dominance effects, the expectation that QST = FST under neutrality also relies on the assumption that mutation rates are significantly lower than migration rates (Hendry, 2002) and, in cases where both mutation and migration rates are low and divergence time is large, there is little room for QST to exceed FST even under strong selection (Hendry, 2002). The latter point relates to the fact that FST and QST are bounded between 0 and 1 and this constraint is apparent also in our results. The results show that at the lower range of FST values (i.e. large population sizes and high gene flow) QST can be highly variable, taking in many cases values close to unity (Fig. 2). Therefore, if QST > FST when FST is low, FST and QST cannot increase at the same rate, otherwise QST would end up being larger than 1. In addition, the high QST values at the lower range of FST might indicate a publication bias favouring those studies that report high phenotypic divergences in the face of significant gene flow.

Problems with QST?

For many species, it is very demanding to set up the controlled experimental conditions needed to estimate the levels of additive genetic variation within and among populations. This results in situations where researchers use levels of phenotypic variation as a surrogate for genetic variation (i.e. PST; Leinonen et al., 2006; Raeymaekers et al., 2007) or broad-sense estimates of heritability (i.e. variation among full-sib families or lines). If quantitative divergence is estimated from wild phenotypes, then environmental effects can mask the true levels of quantitative trait divergence. Population divergence can be overestimated in cases where phenotypic variation is mainly a plastic response to the environment or underestimated in cases where the environmental effects produce reduced phenotypic variation even if genetic divergence is high. However, our analysis suggests that studies based on information from wild phenotypes do not tend to yield higher estimates than common garden QST studies that use broad- and narrow-sense estimates of additive genetic variation (Fig. 4b), although biases might still be present in specific studies (see, e.g. Lee & Frost, 2002; Raeymaekers et al., 2007). Another aspect that has received attention in simulation studies (Lopez-Fanjul et al., 2003; Goudet & Buchi, 2006) is the potential effects of dominance and environmental maternal effects on QST. Lopez-Fanjul et al. (2003) predicted significant deviations from the true QSTFST difference for bi-allelic loci after bottlenecks, depending on the frequencies of the recessive alleles. Goudet & Buchi (2006), on the other hand, considered a wider context and showed that in inbred populations the effects of dominance are negligible and concluded that, in general, when QST > FST the inference of directional selection for different local optima is robust to nonadditive genetic effects. The present study supports this contention as using broad- or narrow-sense estimates had a negligible effect on QST estimation (Figs 3 and 4b).

Our results also give some empirical support for the simulation results of O’Hara & Merilä (2005) who investigated the bias and precision of different methods for estimating standard errors of QST. As expected from their study, the standard errors estimated with non-Bayesian methods tended to yield significantly lower SE estimates (Fig. 6). However, contrary to the simulations (O’Hara & Merilä, 2005), the nonparametric bootstrap standard errors were no different (on average) than the Bayesian methods, whereas parametric bootstrapping yielded lower error estimates (Fig. 6). Although our analyses cannot differentiate between alternative explanations, it is likely that the small SEs of the former two methods are because they underestimate standard errors of QST (O’Hara & Merilä, 2005). Hence, our results give support for a contention that many of the standard error estimates in the QST literature may be severely downward biased.

Another concern arises from the poor statistical properties of the QST index. The precision of the estimate is low where the interest is in comparison of a small number of populations, and in two-population comparisons in particular (O’Hara & Merilä, 2005). The mean number of populations in studies included into this meta-analysis was 10 (median 8), which is well below the ca. 20 populations which O’Hara & Merilä (2005) identified as being a demarcation point below which QST estimates have low precision. We can only hope that new approaches and statistical innovations can improve our ability to estimate quantitative trait differentiation among relatively few populations.

Problems with FST?

A critical assumption in using marker gene-derived FST estimates as a null model against which the effect of natural selection is to be tested, is that FST estimates are measuring the differentiation to be expected under neutral evolution on the same scale as QST. This assumption may be violated if some of the loci used for FST estimation are linked to selected loci or if the equilibrium conditions are violated (Storz, 2002; Palo et al., 2003). Similarly, problems may arise if the FST estimates are bounded due to high mutation rates (Hedrick, 1999, 2005). Contrary to the findings in previous reviews (Merilä & Crnokrak, 2001) and theoretical expectations (Hedrick, 1999), our results indicate that the choice of marker and their differences in mutation rates do not seem to have an effect on the estimation of FST (Fig. 4c).

An issue that certainly deserves further attention is the possibility of underestimating FST and GST if highly polymorphic markers (i.e. microsatellites) are used for their estimation (Hedrick, 1999). To compare F-statistics obtained with different types of markers some standardization procedures have been suggested (Hedrick, 2005; Meirmans, 2006). We explored the effect of such standardization on the studies compiled for the present meta-analysis (Fig. 2d). However, we were able to apply the standardization proposed by Hedrick (2005) only to the subset of studies that reported GST estimates, as raw genotypic data would have been needed to standardize the FST estimates as well (Meirmans, 2006). Before the standardization, this subset of studies showed the same trends as did the complete data set. After the standardization, the average difference between FST and QST was not significantly different from zero for any type of trait (results not shown). Whereas this result gives support to the contention that the estimates of neutral genetic divergence may be biased downwards, it should be noted that standardizing only FST may lead to biased FST vs. QST comparisons. Similar adjustments may be needed for the within-population variance estimation of QST. Obviously, this is an issue that should be investigated in further detail before being able to assess its impact on the published studies in general. For instance, it remains to be shown that G′ST as proposed by Hedrick or the analysis of molecular variance estimates of FST as proposed by Meirmans (2006) should equal QST as a neutral expectation.

Future prospects

New insights from QSTFST comparisons and alternatives as means of detecting signatures of natural selection and/or adaptive divergence among populations are emerging from the field of genomics. Gibson & Weir (2005) suggested comparing transcriptional QST (tQST) and genotypic FST statistics as a promising approach to detecting selection on transcript abundance, i.e. gene expression patterns. Two studies have applied this in relation to human disease (Rockman et al., 2003, 2004). Similarly, Whitehead & Crawford (2006) studied gene expression patterns within and among killifish (Fundulus heteroclitus) populations and found that although much of the variation in gene expression was apparently governed by drift, variation in expression of 22% of genes was consistent with selective differentiation along a steep thermal cline. Approaches to detecting allele frequency changes in candidate loci also provide a platform for comparative studies of population differentiation, but as yet, such studies are rare and subject to different sets of constraints and problems than traditional QSTFST comparisons (e.g. see Vasemagi & Primmer, 2005; Hoffmann & Daborn, 2007 for recent reviews).


Taken together, the results suggest that divergence due to natural selection and local adaptation is the norm rather than an exception in published studies. Yet, the amount of variance in data explained by different biological and methodological factors remained low, suggesting a fair amount of unpredictability and heterogeneity in FSTQST relationships. We also found a positive overall relationship between pairwise QST and FST estimates suggesting that neutral marker differentiation could at least be roughly predictive of genetic differentiation in quantitative traits. Yet, any interpretations of this relationship should be made cautiously (e.g. Hendry, 2002), and keeping in mind that for low values of FST, almost any value of QST seems to be possible. This means that a weak population structure found in neutral marker genes can hide a lot of (cryptic) genetic differentiation in genes coding quantitative traits.


We would like to thank Theresa Knopp for help in acquiring data, as well as Thrond Haugen, John McKay, Carlos Navarro, Joost Raeymaekers, Ophelie Ronce and Patrick Waldmann for providing additional data, and an anonymous referee for helpful comments. Our research was supported financially by the University of Helsinki, the Academy of Finland, and LUOVA graduate school funded by the Ministry of Education (Finland).



The model for the meta-analysis models both FST and QST as being affected by several factors. One problem with the data is that there is missing information, particularly in the estimates of standard errors. Rather than remove those studies, we treated the standard errors as missing data, and used multiple imputation (Little & Rubin, 1987) to estimate the standard errors. Technically, this assumes that the data are missing at random, in essence that the decision not to report the standard error is not influenced by the data that is presented.


For the ith (i = 1,…,I) study, it is assumed that the observed value of FST, inline image comes from a normal distribution:

image( (A1))

where inline image is the estimated variance (i.e. the square of the standard error) for that study. μF(i) is the expected value:

image( (A2))

where FST(i) is the ‘true’ value of FST and inline image (m = 1,…,M) is the bias in the estimate due to the choice of marker. It is assumed that, averaged over the markers, there is no bias, and that the marker effects are normally distributed, i.e.

image( (A3a))
image( (A3b))

(throughout, the tilde represents a mean-centred variable). It is then assumed that the true value of FST is normally distributed with a mean and variance:

image( (A4))

The square of the standard error was also modelled by assuming that it followed an inverse-gamma distribution:

image( (A5))

where ζF is the shape parameter, and νF(i) is the scale. The scale is then modelled further, primarily to estimate the missing standard errors, but also to get some insight into what factors affect the precision of the estimates. This is performed with a linear model for log(νF(i)):

image( (A6))

where inline image is the intercept, inline image is the regression coefficient for the number of populations, P(i), inline image is the regression coefficient for the number of loci, L(i), inline image is the regression coefficient for the effect of the true FST, and inline image is the (centred) effect of the mth marker. To improve mixing, the continuous covariates P(i) and L(i) are mean centred. FST(i) is approximately mean centred, as 0.22 is approximately the mean of FST. The only effect on the inference is to change the value of the intercept inline image).

The inline image are centred, and the inline image are assumed to be normally distributed, with a common variance:

image( (A7a))
image( (A7b))


The observed jth (j = 1,…,J) estimate of QST, inline image, comes from the ith study, and is assumed to be normally distributed:

image( (A8))

where inline image is the estimated variance (i.e. the square of the standard error) for that study. μQ(j) is the expected value:

image( (A9))

where QST(j) is the ‘true’ value of QST, inline image (e = 1,…,E) is the bias in the estimate due to the statistical method used to estimate QST, and inline image (d = 1,…,D) is the bias in the estimate due to the choice of design of the experiment. It is assumed that, averaged over the markers, there is no bias, and that the effects are normally distributed, i.e. for the design parameter

image( (A10a))
image( (A10b))

and for estimation method

image( (A11a))
image( (A11b))

The true value of QST, QST(j), is then assumed to be affected by the true value of FST(i) and the type of trait, and the study identity:

image( (A12))
image( (A13))

where inline image (t = 1,…,T) is the trait effect, and inline image is the study effect (note the index by i: we assume that one study is for one group of populations with one FST estimate, regardless of whether it was reported in a paper with other studies).

The trait effect is normally distributed, but not mean centred:

image( (A14))

The variance of the true values around the expected value is inline image (i.e. this contains the variation in the actual value of QST between traits, species, etc. that is not explained by the other factors). Any difference between FST and QST will appear as QST(j)−FST(i), and the average difference trait t will be inline image. The overall difference is μQ.

Both the study and design effects are mean centred with a common variance:

image( (A15a))
image( (A15b))


image( (A16a))
image( (A16b))

Finally, the standard error in QST is modelled by assuming that its square follows an inverse-gamma distribution:

image( (A17))

where ζQ is the shape parameter, and νQ(j) is the scale. This is modelled further with a linear model for log νQ(j):

image( (A18))

where inline image is the intercept, inline image is the regression coefficient for the number of populations, P(i) (note that this is the same for all estimates of QST for study i), inline image is the regression coefficient for the true value of QST, inline image is the (centred) effect of the dth design and inline image is the effect of the νth method for estimating the standard error of QST.

The design and estimation effects are assumed to be normally distributed, with common variances:

image( (A19a))
image( (A19b))


image( (A20a))
image( (A20b))

Regression of FST

The model above assumes that the effect of changing FST is to change QST by the same amount. The effect of this assumption was examined by expanding the model to add a regression parameter in front of FST, i.e. eqn A13 is changed to

image( (A21))

Model fitting

The model was fitted using a Bayesian approach. Because of this, several parameters need priors. These were chosen to be only weakly informative. The priors were:


where exp(λ) denotes an exponential distribution with mean λ, and WC(s) denotes a wrapped Cauchy distribution with scale s [see Gelman (2006) for a motivation of this prior]. The number of loci and populations was not available for all data sets; so, these were estimated as extra parameters, using a uniform distribution on the integers between 1 and 200 as a prior distribution.

The model was fitted with OpenBUGS2.2.1beta (Thomas et al. 2006). Two chains were run, after a burn-in of 1200 iterations, a further 50 000 iterations were run, with the chain being thinned to every 10 iterations, to give a total of 10 000 draws from the posterior. Convergence was assessed by eye and with the Brooks–Gelman–Rubin statistic (Brooks & Gelman, 1998).

Because of the small number of levels of some of the factors, the variances reported for the effects (e.g. inline image) in the results are finite-population variances (Gelman, 2005), i.e. the variance of the estimated parameters, e.g.


rather than the super-population variance (e.g. inline image). However, the variances for FST and QST are the super-population variances (i.e. inline image and inline image).