Juha Merilä, Section of Population Biology, Department of Ecology and Systematics, PL 17, FIN-00014, University of Helsinki, Finland. Tel.: +358 9 40 8374165; fax: +358 9 191 28701; e-mail: firstname.lastname@example.org
The comparison of the degree of differentiation in neutral marker loci and genes coding quantitative traits with standardized and equivalent measures of genetic differentiation (FST and QST, respectively) can provide insights into two important but seldom explored questions in evolutionary genetics: (i) what is the relative importance of random genetic drift and directional natural selection as causes of population differentiation in quantitative traits, and (ii) does the degree of divergence in neutral marker loci predict the degree of divergence in genes coding quantitative traits? Examination of data from 18 independent studies of plants and animals using both standard statistical and meta-analytical methods revealed a number of interesting points. First, the degree of differentiation in quantitative traits (QST) typically exceeds that observed in neutral marker genes (FST), suggesting a prominent role for natural selection in accounting for patterns of quantitative trait differentiation among contemporary populations. Second, the FST – QST difference is more pronounced for allozyme markers and morphological traits, than for other kinds of molecular markers and life-history traits. Third, very few studies reveal situations were QST < FST, suggesting that selection pressures, and hence optimal phenotypes, in different populations of the same species are unlikely to be often similar. Fourth, there is a strong correlation between QST and FST indices across the different studies for allozyme (r=0.81), microsatellite (r=0.87) and combined (r=0.75) marker data, suggesting that the degree of genetic differentiation in neutral marker loci is closely predictive of the degree of differentiation in loci coding quantitative traits. However, these interpretations are subject to a number of assumptions about the data and methods used to derive the estimates of population differentiation in the two sets of traits.
The study of local adaptive differentiation in ecologically important quantitative traits among contemporary populations of the same species has been (e.g. Mayr, 1963; Endler, 1977; Berven & Gill, 1983), and still remains (e.g. Conover & Schultz, 1995; Ebert, 1995; Linhart & Grant, 1996; Lively & Jokela, 1996; Blondel et al., 1999; Huey et al., 2000), one of the central themes in evolutionary biology research. Local adaptations, stemming from spatial and temporal heterogeneity in selection pressures acting on heritable traits, are thought to be responsible for much of the phenotypic diversity seen in the wild. This view is consistent with the observations that (i) strong directional natural selection is of common occurrence in the wild (Endler, 1986; Kingsolver et al., 2001), and (ii) most traits appear to be also moderately to highly heritable (e.g. Houle, 1992; Cornelius, 1994; Roff, 1997). However, random genetic drift is also a potent force causing population differentiation in quantitative traits (e.g. Lande, 1976; Lynch, 1990), but distinguishing between these processes as a cause for observed genetic differentiation among populations can be problematic (e.g. Turelli et al., 1988). Hence, the question of the relative importance of genetic drift and natural selection as determinants of population differentiation in quantitative traits has remained enigmatic.
The structure and aims of this review are as follows. First, we will introduce the standardized measures of genetic differentiation for neutral marker genes (FST) and quantitative traits (QST), and review theoretical underpinnings required to understand why these two measures of genetic differentiation can be informative about the relative roles of random genetic drift and natural selection as a cause of population differentiation in quantitative traits. Second, we will review the literature on studies that have compared the genetic differentiation among contemporary populations in neutral molecular marker genes and quantitative traits using these standardized measures of genetic divergence, and evaluate what this data tells us about: (i) the relative importance of natural selection and random genetic drift as a cause of population differentiation in different types of quantitative traits (viz. life history and morphological traits), (ii) to what degree the divergence in neutral molecular markers reflects divergence in genes coding quantitative traits, and (iii) do different molecular markers (viz. allozymes vs. microsatellites, etc.) produce different divergence relationships with quantitative traits. Third, we will evaluate and discuss the potential pitfalls involved with comparisons of FST and QST estimates, and point out some directions for future work in this area.
Theory and methods
The total genetic variation in neutral marker loci can be partitioned into within (vw) and between population (vb) components, where the vb is equivalent to the expected gene diversity (or heterozygosity assuming random mating) between populations that is in excess of that within populations (Nei, 1987). From this, a standardized measure of the degree of among population genetic differentiation is obtained as (Wright, 1951; Nei, 1987):
which scales from 0 to 1. Under the assumption of similar (and low, Nagylaki, 1998) mutation pressure in different populations, the FST of neutral marker loci is primarily determined by the balance between random genetic drift and migration (Kimura, 1983; Hartl & Clark, 1989). Consequently, the degree of among population differentiation at neutral marker loci as estimated by the FST index, indicates the expected degree of population differentiation as a result of the combined effect of genetic drift and gene flow (Wright, 1951; Rogers, 1986; Lande, 1992). Here, it is perhaps also worth pointing out that there are different estimators of FST, the three perhaps most commonly used ones being GST (Nei, 1986, 1987), θ (Weir & Cockerham, 1984), and RST (Slatkin, 1995) estimators. The latter is a GST analogue derived particularly for microsatellite loci.
For quantitative traits with an additive genetic basis, Wright (1951) showed that the neutral expectation for mean additive genetic variation within populations is σGW2=(1 – FST) σ02, and that for between population variance σGB2=2FSTσ02, where σ02 is the expected additive genetic variance that would exist if all the populations under study formed a panmictic unit (Lande, 1992). As the expectation for total genetic variance (σT2) in a trait equals (1 + FST)σ02, it follows that an estimate of population differentiation for a quantitative trait (termed QST by Spitze, 1993) analogous to that for single locus FST estimate can be obtained as:
In diploid species and in the case of a trait having an additive genetic basis and linkage equilibrium among loci (Latta, 1998), this index is expected to take exactly the same value as would be obtained with equation (1) if it were estimated from allele frequencies at the quantitative trait loci (Wright, 1951; Lynch & Spitze, 1994; Latta, 1998). Consequently, if the quantitative trait in question is selectively neutral (and the population is in Hardy–Weinberg equilibrium), the QST value is expected to equal the FST value. The two in the front of the within population component of variance in equation (2) is due to the fact that the QST is based on comparison of genotypes, whereas FST is based on comparison of genes (Lynch & Spitze, 1994). In other words, this term accounts for the fact that among population variance for a quantitative trait is magnified due to statistical associations between identical genes within individuals in subdivided populations.
There are three possible outcomes from the comparisons of FST and QST indices, each of which has a unique interpretation (Table 1). First, if QST > FST, then this means that the degree of differentiation in quantitative traits exceeds that achievable by genetic drift alone, and consequently, directional natural selection favouring different phenotypes in different populations must have been involved to achieve this much differentiation. Second, if the QST and FST estimates are roughly equal (this can be tested statistically), this means that the observed degree of differentiation in quantitative traits could have been obtained by genetic drift alone (Table 1). However, note that this does not prove that the observed degree of differentiation was caused by genetic drift – only that the effects of drift and selection are indistinguishable. The third possible outcome corresponds to a situation where QST < FST (Table 1). This implies that observed degree of differentiation is actually less than that to be expected on the basis of genetic drift alone, which means that natural selection must be favouring the same mean phenotype in different populations (Table 1). Alternatively, it may be that the divergence in the quantitative traits is constrained by a lack of genetic variability within populations. However, this seems unlikely given the fact most traits seem to be moderately to highly heritable (Mousseau & Roff, 1987; Lynch & Walsh, 1998). Hence, to sum up, the degree of divergence in single locus marker genes and neutral quantitative traits (as measured by FST and QST) are expected to be similar and independent of mutational variance, and depend only on effective population size and migration rates which affect all traits equally. Consequently, all else being equal, any difference between FST and QST indices estimated for the same set of populations must be attributable to effects of natural selection (but see below). As verified by recent theoretical treatments (Whitlock, 1999), these predictions are general and do not depend on whether an island-model or some other population structure is assumed.
Table 1. Possible outcomes and their interpretation in comparisons of quantitative trait (QST) and neutral marker gene divergence (FST). See text for further details.
These interpretations are subject to a number of assumptions underlying the inference based on FST and QST estimates. First, when using the FST values as neutral expectation for the degree of among population differentiation, the implicit assumption is that the markers behave in an effectively neutral manner. As for allozymes, although numerous exceptions exist (e.g. Mitton, 1994; Watt, 1995), variation in the majority of allozyme loci seem to conform to the expectations of neutral theory (e.g. Barrowclough et al., 1984; Nei & Graur, 1984; Skibinski et al., 1993). Nevertheless, noncoding microsatellite loci may be a better option to derive the neutral expectation, although some individual microsatellite loci can be under selection (e.g. Kashi et al., 1997). However, as pointed out by Hedrick (1999; see also: Nagylaki, 1998; Allendorf & Seeb, 2000), FST estimates from microsatellite loci may under some conditions underestimate the degree of population subdivison because of their high mutation rates relative to allozymes. Nevertheless, studies comparing FST estimates for different types of markers seem to return generally comparable values (see Allendorf & Seeb, 2000 for a recent review), giving some confidence that the estimates of neutral expectation for population differentiation are not commonly biased by the choice of a particular marker system. Nevertheless, Ritland (2000) recommended that comparisons of FST and QST indices should be preferably carried out by employing less mutable loci than microsatellites. Another possible problem with the FST estimates relates to the fact that it is implicitly assumed that the populations are in drift-migration equilibrium, which may not be the case, for example, if the populations have been recently bottlenecked (Hedrick, 1999). However, this may not be a large problem as the FST (or its GST analogue) appears to attain its equilibrium value much faster than its components (Crow & Aoki, 1984; Slatkin, 1993).
For QST estimates, the critical assumption is that the estimates of within and among population variance components represent purely additive effects, and are free of maternal, environmental and nonadditive genetic effects. These are trivial assumptions in the sense that these potential sources of bias can be controlled for with careful experimentation (e.g. Lynch & Walsh, 1998), but nevertheless, if ignored, the conclusions can be grossly misleading. For instance, uncontrolled maternal or common environment effects can inflate estimates of σGW2, and result in conservative estimates of QST. Likewise, unaccounted cross-generational maternal and environmental effects specific to different populations (e.g. Roff, 1997) can inflate estimates of σGB2, and hence, also estimates of QST. Furthermore, as estimates of σGW2 and σsGB2 are typically derived in a common garden situation, the implicit assumption is that the effects of the rearing environment will not impose strong genotype × environment interactions on the studied traits (Prout & Barker, 1993). Finally, two additional points are worth making. First, Lynch et al. (1999) argued that epistatic gene action can cause QST values to exceed FST values even if no selection on traits in question has taken place, and consequently, different traits may well show different values of QST just because of differences in their genetic architecture. However, Whitlock (1999) pointed out that this not so: the effect of epistatic variance is actually to decrease the expected QST, and hence, epistatic variance cannot be an explanation for a result where QST > FST. The effect of dominance variance on the QST estimates is not well understood, but both an increase or decrease with neutral differentiation are possible (Whitlock, 1999). However, nonadditive genetic variance in morphometric traits is thought to be typically low (but see: Gilchrist & Partridge, 1999), although life history traits may often harbour substantial dominance and epistatic variance (Crnokrak & Roff, 1995; Merilä & Sheldon, 1999). Second, an assumption implicit in estimation of the QST index is that the population is in Hardy–Weinberg equilibrium (i.e. FIS ≠ 0), and deviations from this can cause QST estimates to depart from the neutral expectation (Yang et al., 1996). In fact, Lynch et al. (1999) provided evidence that variation in local inbreeding was an important component of quantitative trait differentiation among Daphnia pulex populations.
In our exhaustive search for studies that have compared estimates of QST and FST indices between different populations, we were able to find 18 studies of 20 species, including four unpublished ones, which reported QST estimates (Table 2; see also: Rogers & Harpending, 1983; Waldmann & Andersson, 1999). Most of these studies have been conducted in plants (55%), whereas invertebrate (30%) and vertebrate (15%) studies in particular, were scarce (Table 2). QST estimates in a given study have been based, on average, on eight different traits (min–max=1–24), growth related morphological and (juvenile) life history traits dominating (Table 2). FST estimates in most of studies have been based on allozymes, but a few microsatellite, RAPD, as well as one nuclear RFLP and one ribosomal DNA based studies have been conducted (Table 2). One study also reported an estimate of FST based RFLP analysis of mtDNA, which was in agreement (after taking account that the Ne for mtDNA genome is 1/4 of that for nuclear markers; see, e.g. Crochet, 2000) with values returned from analyses of allozymes and microsatellites (Lynch et al., 1999). Those few studies which have used more than one marker system to estimate FST return qualitatively (Karhu et al., 1996; Kuittinen et al., 1997), or even quantitatively (Lynch et al., 1999; see also: Karhu et al., 1996; Table 2), similar conclusions based on estimates from different marker types. Hence, in accordance with data from a number of studies (Allendorf & Seeb, 2000), this suggests that the choice of markers to estimate FST-values may not be of great concern (but see: Nagylaki, 1998; Hedrick, 1999).
Table 2. Synopsis of comparative studies of marker and quantitative genetic population structure. FST=divergence in marker genes, QST=divergence in quantitative traits. h2=breeding design (BS=broad sense, NS=narrow sense), ntraits=number of quantitative traits upon which QST estimate is based, nloci=number loci upon which FST estimate is based, npop=number of populations included into study, trait=type trait(s) upon which QST estimates are based on (MOR=morphology, LH=life history, MIX=mixture of the previous), loci=type marker system upon which FST estimates are based (ALLO=allozymes, MSAT=microsatellites, rDNA=ribisomal DNA, mtDNA=mitochodrial DNA, RAPD=randomly amplified polymorphic DNA, RFLP=restriction site length polymorphism), ‘QST=FST’=number traits for which QST=FST, ‘QST < FST’=number of traits for which QST < FST, ‘–’=missing data.
To get an overview on how FST and QST estimates from different studies compare with each other, and whether any general patterns emerge, we made pair-wise comparisons of FST and QST estimates across different studies using standard statistical (parametric and nonparametric) and meta-analytical tools. In standard statistical comparisons, we first restricted ourselves to the comparisons of mean FST and QST values averaged over different loci and traits, respectively. The analyses were performed separately for FST estimates based on allozymes and microsatellites, as well as pooled data in which FST estimates for studies using more than one marker system were based on allozyme data (because they were the most commonly used marker system; Table 2). Few data restrictions were made. Two of the QST estimates were not accompanied with data on corresponding FST values (Lynch & Spitze, 1994), and in one study (Rogers & Harpending, 1983), lack of comparable statistics led to exclusion of these studies from the comparisons. In one study, the QST and FST estimates were not based strictly on data from the same populations (Karhu et al., 1996), but nevertheless, this study was included as the results were not sensitive to its exclusion (results not shown). Furthermore, two of the studies have not, in the strict sense, estimated the σGW2 and σGW2 components (Kremer et al., 1997; Merilä, 1997), but rather, made qualified assumptions about their likely magnitude. Nevertheless, both provide sensitivity analyses and verbal arguments to suggest that the conclusions from these studies are not sensitive to even large deviations from the assumed values, and hence, they were included into the comparisons. The estimates used in the analyses of pooled marker data are marked with two asterisks in the reference column.
In addition to the pair-wise analyses, we conducted a meta-analysis of the data to test if the divergence between QST and FST estimates is significantly different from zero. The advantage that meta-analysis holds over conventional parametric tests is that the inherent variability found between studies can be accounted for statistically by weighting of effects for different sample sizes and variance components (Arnqvist & Wooster, 1995). Variation in within-study sample size will produce heteroscedasticity that in many cases violates the assumptions of parametric tests, but not meta-analytical tests (Gurevitch & Hedges, 1999). This not only allows generalizations about the quantitative magnitude of a certain ‘treatment’ effect (as opposed to a qualitative assessment given in verbal reviews), but also allows quantification of the degree to which variation between studies affects the variation of the treatment effect. The fundamental parameter used in meta-analysis is the effect size. For the purposes of this study, as we are interested in the magnitude difference between QST and FST, the effect size best used is the mean standardized difference between QST and FST (Hedge’s d, Rosenberg et al. (1997). In addition, because we have an a priori reason to believe that the characteristics of a study will have an effect on the magnitude of effect size (heterogeneity effects of species type, sample size and other factors unique to each study), we have decided to employ a random (or weighted) effects model as opposed to the more common, fixed effects model (Hedges & Olkin, 1985; Gurevitch & Hedges, 1999; Osenberg et al., 1999). A random effects model allows for the calculation of a distribution of effect sizes across studies and thus allows for an estimation of the variation of effect size (Gurevitch & Hedges, 1999). In our case, although it may be that the average true effect is significantly > 0 (there is significant divergence between QST and FST estimates), a large portion of the studies may have no significant effect differences. In this case, any interpretation of mean effect size must be carried out in light of the inevitably large variance component of effect size. The estimate of the variance of effect size can also be tested against a null hypothesis of zero (no heteroscedasticity) to determine if our studies have significant heterogeneity. In addition, in a weighted meta-analysis, greater weighting is given to studies with lower variance (i.e. larger sample sizes).
For the meta-analysis, we divided the traits into life history and morphological traits, and calculated an average QST for each study over these trait groups to see whether there was heterogeneity in divergence estimates between different types of traits. We categorized those traits that are closest to fitness (e.g. fecundity, growth rate, etc.) as life history whereas metric size measurement traits (e.g. skeletal dimensions) were categorized as morphological (for further discussion see Roff, 1997). Likewise, to investigate possible heterogeneity in neutral divergence rates, the data was divided according to marker type treating allozyme and DNA-marker [microsatellites (four studies), and RAPD (one study) see below and Table 3 for details] based studies as different groups. We were also restricted to use studies that reported individual trait and locus based estimates of QST and FST which allowed us to calculate means and standard deviations per study (references marked with †† in Table 2 were used in the meta-analysis). For those studies that reported QST estimates based on one trait (references 2 and 15 in Table 2), we used the standard deviation for FST to standardize the effect size (Rosenberg et al., 1997).
Table 3. Results for the meta-analysis and nonparametric Wilcoxon signed-ranks test for difference between QST and FST estimates. Data were divided according to locus type [allozyme vs. DNA based markers (i.e. MSAT and RAPD markers)] and trait type (LH=life history, MOR=morphology). ¯ and CI (95%)=mean effect size and 95% confidence interval, Q=equivalent chi-square test statistic to test if σ2(Δ)=0 (no Q-values were larger than the 100 (1−α) percentile point, and therefore we cannot reject the null hypothesis), σ2(Δ)=variance of Δ, N=number of studies, Nn=fail safe study number (to test file drawer problem), QST−FST ± ∑E=mean difference between QST and FST estimates and standard error.
The meta-analysis of random effects models requires the calculation of the mean effect size ¯, the variance of ¯: σ2(¯), and the Q statistic, which allows for testing of σ2(Δ)=0. The statistical methods of meta-analysis are laid out in detail in Hedges & Olkin (1985) and will not be discussed here except what is necessary for the present analysis. We used MetaWin 2.0 (Rosenberg et al., 1997) for all meta-analyses. The variance of ¯ was calculated using the following formula:
where nFST and nQST are the sample sizes for molecular and quantitative genetic variance estimates, respectively, and Δ2 is Hedges’ effect size calculated using an estimate of pooled sampling variance (Rosenberg et al., 1997). We calculated the 95% confidence interval using MetaWin (Rosenberg et al., 1997) which performs 999 bootstrap iterations. ¯ can be interpreted by reporting it in standard deviation units: i.e. QST is ‘x’ standard deviations larger than FST. Cohen’s (1977) guidelines classify a mean effect size of 0.2 as small, 0.5 as medium and anything >0.8 as large. Finally, we determined the degree to which the file drawer problem affects our statistical analysis (i.e. what is the possibility that the number of unpublished null results invalidate the results of the meta-analysis by bringing the significance down to P=0.05?) by estimating the ‘fail-safe’ number: that value of the number of filed studies (those studies not published) required to bring the probability of a type I error (rejecting null hypothesis when true) to the desired level of significance (Hedges & Olkin, 1985). A small fail safe number (in reference to the total sample size), means that the findings of the meta-analysis are not resilient to the file drawer problem and we must then reconsider our conclusions. A fail safe number of around 5n + 10 is considered a reasonable, conservative critical value (Rosenberg et al., 1997).
(i) Parametric and nonparametric analyses
A comparison of FST and mean QST values across the studies revealed that in most cases, the value of the QST index exceeds that of the FST index, and only in three cases, QST < FST (Fig. 1). Hence, in general, the degree of differentiation in quantitative traits exceeds that in neutral molecular markers (Wilcoxon signed-rank: z=2.94, n=18, P=0.0033). This comparison of average value over different traits will, of course, hide the fact that there are some individual traits nearly in all studies for which QST values are smaller than FST values, but these are relatively infrequent: on average, 25% of traits in a given study fall in this category (Table 2). In general, these results hold also if we split the data according to marker (allozyme vs. DNA data) and trait type [life-history vs. morphological traits (Table 3)]. Finally, and perhaps most interestingly, despite the difference in average magnitude of the two indices, QST and FST values are positively correlated across studies both with parametric (r=0.751, z=3.77, d.f.=16, P=0.0002) and nonparametric (rs=0.552, z=2.76, d.f.=16, P=0.023) tests. These results remain qualitatively similar even if the study by Kremer et al. (1997) using multivariate measures of differentiation is excluded, and/or if the most divergent data point (see Table 2) for Pinus sylvestris is removed. Similarly, QST and FST values were highly correlated even if microsatellite (r=0.87, z=1.88, d.f.=4, P=0.05) and allozyme (r=0.81, z=3.10, d.f.=13, P=0.0005) based studies were analysed separately.
(ii) Meta analysis
To test whether the QST is typically larger than FST when controlling for possible heterogeneity in the data, we performed a series of confirmatory meta-analyses for the data stratified according to marker and trait type (Table 3). These analyses revealed relatively small variance components [σ2(Δ)] for all estimated ¯ (Table 3), and none of the estimates of σ2(Δ) were significantly >0 for any of the data subsets (Table 3). Hence, there was no significant variation between studies for differences between QST and FST, suggesting that the separation of the whole data set according to marker and trait type adequately reduced the heterogeneity between studies.
Focusing first on the comparisons in which different types of traits were pooled, there was a substantial difference in mean effect size (¯) between the allozyme and the DNA data sets (QB=1.35, d.f.=1, P < 0.05): the divergence between QST and FST was larger for allozyme (1.18) as compared with DNA based (0.63) molecular markers (Table 3). The lower confidence interval for the DNA data set encompassed zero (Table 3), indicating that the difference between QST and FST was negligible for DNA-based markers. Analysing the microsatellite data alone, we found that the ¯ (95% CI) was 0.004 ≤ 0.27 ≤ 0.48. Although this is a very small difference [coupled by a significant file drawer problem due to a low fail-safe number (1.4)], these results suggest the general conclusion that QST > FST, with the difference being significantly larger for allozyme than for microsatellite based estimates of FST.
Comparison of life history and morphology traits within the allozyme data set revealed that QST for morphological traits was on average 1.20 standard deviations larger than FST, whereas the difference for life history traits (0.73) had a wide confidence interval that included zero (Table 3). We found no significant difference between morphology and life history traits for differences in QST and FST for allozyme markers (QB=0.50, d.f.=1, P=0.51). The lack of difference in life history traits may reflect the fact that the standard deviation is larger than the absolute difference between QST and FST. Therefore, we conclude that the difference in QST and FST estimates is largely restricted to morphological traits, whereas life history traits show a similar degree of differentiation as DNA markers.
Is natural selection ubiquitous?
The review of the published (and few unpublished) estimates of QST revealed that the degree of quantitative trait differentiation typically exceeds that of neutral marker genes, suggesting a prominent role for natural selection in determining the population genetic structure for quantitative traits (cf. Tables 1 and 3). In other words, the available data suggest that directional natural selection favouring different phenotypes in different populations of the same species is ubiquitous. This in particular in the case for morphological traits, which exhibited a significant and substantial divergence between QST and FST (QST 1.20 standard deviations larger than FST) as compared with life history traits where no significant differences between QST and FST were found. However, the conclusion of the ubiquity of directional natural selection based on these findings is subject to three major assumptions. First, as discussed above, it is assumed that the estimates of QST and FST are not biased by factors – such as cross-generation (‘grand-parent effect’) maternal or environmental effects (e.g. Roff, 1997, p. 264) or selection acting on ‘neutral’ marker genes – that can make them deviate from their true values. Second, most of the estimates of QST listed in Table 2 are based on univariate approaches (but see below), the implicit assumption being that QST values returned for different traits are independent. This, of course, is not necessarily so as the different traits are often genetically intercorrelated (e.g. Roff, 1997), and the unaccounted covariance structure between them may give a biased impression of the degree of divergence when the values are averaged over different traits (Rogers & Harpending, 1983; Kremer et al., 1997). Third, even if the values plotted in Fig. 1 would be free of any source of technical bias, the conclusion concerning the ubiquity of natural selection is subject to the assumption that species, populations and traits that have been included in the comparative studies of population structure are a random sample from all possible species, populations and traits in the world. In other words, we are assuming that the populations, species and traits entering into these studies were not selected, for example, on the basis that the researchers wished to study as divergent populations as possible. In the following, we will consider each of these topics in detail, and examine what the work carried out so far indicates about their role in impacting the pattern seen in Fig. 1.
As to the first possible source of bias, estimates of within population genetic variance in most (Table 2) of the studies have been based on full-sib estimates which may over estimate the levels of additive genetic variance due to nonadditive genetic, common environment and persistent maternal effects (e.g. Roff, 1997; Lynch & Walsh, 1998). Although the degree of bias caused by these factors is not known for most of the studies in Table 2, in one case, an additional narrow sense experiment was performed (Podolsky & Holtsford, 1995). This experiment utilized a subset (two of eleven) of the populations to check whether the assumption concerning the additive basis for traits obtained from the broad sense experiment would be justified. The results were somewhat inconclusive, but for at least some traits, significant nonadditive contributions were detected, and additive genetic contributions were generally much smaller than in the broad sense experiment (Podolsky & Holtsford, 1995). Whether such biases are present in other studies remain to be investigated, but at least the estimates for the two frog species in Table 1 should be free of these as they were obtained with half-sib designs accounting for maternal and dominance contributions (J. Merilä, P.A. Crochet & A. Laurila, unpublished results). It is also noteworthy that our meta-analyses failed to find any significant QST – FST difference for life history traits (known to harbour significantly higher amounts of nonadditive genetic variance than typical morphological traits; Crnokrak & Roff, 1995), whereas this difference was pronounced for the morphological traits. As the data for morphological traits are less prone to biases due to nonadditive effects, one should exercise some caution when interpreting the results for the life history traits.
Although the overestimation of within population variance will render the comparisons of FST and QST estimates conservative, a potentially more serious source of bias is in the overestimation of the among population component of variance. Geographical differences in the mean value of quantitative traits are known to be frequently influenced by persistent environmental and maternal effects (e.g. Roff, 1997), and complete removal of these effects may require the organisms to be reared for several generations in a common garden situation. In none of the studies presented in Table 2 has this been carried out, and consequently, there is a possibility that the QST estimates are generally inflated to some unknown degree. To what degree this occurs may vary between studies and traits, but accounting for these effects should be of some concern in studies to come. Likewise, both the within (e.g. Hoffman & Merilä, 1999) and among population (e.g. Conover & Schultz, 1995) components of genetic variance are subject to biases stemming from genotype × environment interactions. Novel environments (such as laboratory), may influence the expression of genetic variation in unpredictable ways (Hoffman & Merilä, 1999), and also lead to either the over or underestimation of actual genetic differences among compared populations (Conover & Schultz, 1995). Hence, the conclusion is that much of the inference from the data presented in Fig. 1 rests on the assumption that genetic parameters have been estimated with reasonable precision, but the primary data provides little direct evidence to support this assumption. Nevertheless, the fact that QST estimates tend to be underestimated by inflated estimates of within population additive genetic variance (see above) and by the presence of epistatic variance (Whitlock, 1999) may mask the effects of divergent selection and render the comparisons to favour drift over the directional selection hypothesis (cf. Table 1).
The second potential source of bias – nonindependence of different traits – has been investigated in some detail. Rogers & Harpending (1983) were first to propose a multivariate procedure to estimate the degree of among population differentiation in quantitative traits. This method was recently expanded upon by Kremer et al. (1997). These treatments indicate that the estimated degree of among population divergence can be different – depending on the covariance structure of traits within and among populations – whether single-trait or multivariate measures of differentiation are used (e.g. Kremer et al., 1997). However, only few studies have compared multivariate and univariate estimates of population differentiation. Rogers & Harpending (1983) compared the estimated degree of among population differentiation in dermatogylphic (e.g. finger ridge counts) and metric traits against neutral expectation derived from allozyme data, and found that although differentiation in dermatogylphic traits was well within the range to be expected due to drift alone, this was not the case for metric traits which showed differentiation up to six standard errors in excess of the neutral expectation. Merilä (1997) used the multivariate method of Rogers & Harpending (1983) to account for intercorrelations between different quantitative traits in the greenfinch (Carduelis chloris), and found that the conclusions from univariate and multivariate analyses were by in large, consistent with each other. Waldman & Andersson (1999) working with two perennial plants (Scabiosa canescens and S. columbaria, respectively) and using the methods of Kremer et al. (1997) found that the multivariate estimates of FST (CFSTmean) were roughly similar – albeit somewhat lower – than the univariate estimates published earlier (Waldman & Andersson, 1998), whereas the multivariate estimate of QST was much higher for S. canescens, and much lower for S. columbaria as estimated earlier with univariate methods (Waldman & Andersson, 1999). Likewise, comparison of univariate and multivariate measures of differentiation for bud burst and height growth among Quercus petraea populations revealed that the unaccounted covariance structure among traits within and among populations, influenced the results, although the bias was not particularly strong in this case. Hence, the degree to which the data in Tables 2 and 3 are biased by averaging QST estimates across intercorrelated traits warrants further study, but the available evidence suggests that such biases are possible. Nevertheless, from the biological point of view, it may be (at least in some cases) more meaningful to stick to the univariate QST estimates and evaluate the degree of differentiation against a null-expectation from a trait to trait basis: different types of traits are known to be under different selective regimes (e.g. Merilä & Sheldon, 2000; Kruuk et al., 2000), and QST values for different types of traits may be very different (e.g. Spitze, 1993; Podolsky & Holtsford, 1995).
For instance, Lynch et al. (1999) found a positive correlation between heritability (h2) and QST across different traits in Daphnia pulex, suggesting that traits harbouring more additive genetic variance are also the ones likely to show more among population differentiation. This is also in accordance with the results from our meta-analyses which revealed larger differences between QST and FST for morphological traits (typically high h2;Mousseau & Roff, 1987) than between life history traits (typically low h2;Mousseau & Roff, 1987). Hence, comparative studies focusing on patterns within and among population components of genetic variance with traits of known selective regimes could be one avenue for future studies.
The third, and perhaps the most interesting source of possible bias in the perspective emerging from Fig. 1, is the question whether the graphed studies are a representative sample of what might be considered as ‘typical’ for natural populations. Given that the reviewed studies include a variety of taxa ranging from annual and perennial plants to (few) vertebrates, such a concern seems to at first sight to be far fetched. However, as difference – rather than similarity – is usually perceived to be biologically interesting (cf. Conover & Schultz, 1995), a bias in favour of populations known to be phenotypically divergent is quite possible. Consequently, as the number of comparative studies of genetic population structure is still relatively small, the conclusion concerning the ubiquity of natural selection on the basis of this data could be premature. However, as the evidence from other approaches testify (e.g. Endler, 1986; Hendry & Kinnison, 1999; Kingsolver et al., 2001), it seems unlikely that this conclusion will change when more studies become available.
Does the degree of marker gene divergence predict quantitative trait divergence?
The fact that the measures of quantitative trait and neutral marker divergence were positively correlated across different studies for both allozymes and microsatellites suggests that the divergence in neutral marker genes is – perhaps against common wisdom [e.g. Karhu et al., 1996; Hedrick, 1999; Reed & Frankham, 2001) – indicative of the degree of genetic differentiation in quantitative traits. In other words, although the average magnitude of the FST and QST estimates is different [intercept of the regression of QST on FST different from zero: 0.217, SE=0.044, P (determined by randomization)=0.0007; see also Table 3 for meta-analytical support], there is a positive relationship between the two indices across studies [allozymes: slope = 0.663, SE=0.146, P (determined by randomization)=0.0011]. This relationship was first discovered by Lynch et al. (1999), and holds true now for both allozymes and microsatellites separately after an addition of eight studies. How could this relationship be explained? Assuming that the correlation is not driven by phylogenetic nonindependence of the data points, one possible explanation is that there is a common factor that drives the divergence in both FST and QST. One such factor could be geographical distance: FST for neutral marker genes typically increases with increasing geographical distance (isolation, e.g. Slatkin, 1993), and increasing distance is also likely to lead to increased heterogeneity in selection pressures, and hence, to increased QST. Although this explanation is intuitively appealing, there is, however, also an alternative to it: if the gene flow among different populations is restricted, increasing degree of isolation by distance will eventually result in genetic differentiation in neutral loci due to drift. Drift will also impact the quantitative trait loci so far as they are not under strong selection. Consequently, part of the differentiation in quantitative traits could also be accounted for by genetic drift, and hence, result in a positive correlation between FST and QST indices. A third alternative is that not all allozyme and microsatellite loci are completely freely recombining with the rest of the genome, and small amounts of linkage between molecular marker loci and selected quantitative trait loci might be driving the correlation. However, our meta-analysis reveals that the divergence between QST and FST for allozyme markers is substantially larger than that for DNA based markers such as microsatellites. Linkage between allozyme markers and quantitative trait loci would constrain the divergence between the two estimates as any changes caused by selection and/or drift in QST would be mirrored in FST. The large divergence between QST and FST for allozyme markers but not for DNA based markers also suggests that perhaps allozymes are less susceptible to selection than commonly thought in the present-day literature, and that the high mutation rates of DNA based estimates of genetic divergence (four of the five studies used for the DNA data set meta-analysis are for microsatellites) underestimate quantitative trait divergence caused by selection. However, it is not entirely clear which of these explanations (if any) is likely to account for the positive correlation between the two indices – further studies are required. Proportionality between FST and QST estimates for individual traits has been observed also at population level comparisons. Long & Singh (1995) discovered that pair-wise QST values (calculated across pairs of populations) for some traits scaled proportionally to pair-wise estimates of FST values, whereas other traits in particular populations, deviated frequently from this proportionality. However, although the deviations from the proportionality were taken as evidence for locally varying selection pressures (Long & Singh, 1995), the proportionality itself did not receive any explanation. None of the other studies listed in Table 2 have conducted across population comparisons of FST and QST estimates, and hence, whether the correlation seen in Fig. 1 is a general phenomenon also for within species comparisons remains to be determined (see also: Reed & Frankham, 2001).
Conclusions and future prospects
In conclusion, the empirical data from comparative studies of genetic population structure in quantitative traits and single locus molecular markers suggest that the degree of genetic differentiation in quantitative traits typically exceeds that for marker genes. This pattern is consistent with the interpretation that quantitative traits are typically under directional natural selection, and that the direction and magnitude of this selection varies among local populations. However, the meta-analyses – although based on limited data – suggest that the choice of molecular marker system and traits included in the study may impact strongly upon the magnitude of difference between FST and QST. Given that a number of factors other than natural selection (e.g. maternal or environmental effects) can have inflated estimates of among population components of variation, further studies testing the proposition that the deviations from the null-expectation are caused by selection, rather than by some other factor, are needed. One obvious way to do this would be to compare the consistency of QST estimates and direct measures of selection in different populations for different traits. A positive correlation between the degree of heterogeneity in selection pressures and QST values across populations would provide further evidence for a causal relationship between natural selection and the degree of quantitative trait divergence. Likewise, although the positive correlation between FST and QST estimates across different studies suggests that the degree of genetic differentiation in neutral marker loci may be roughly indicative of the degree of differentiation in genes coding for quantitative traits, further studies are required to establish whether this correlation is also common across different populations of the same species. If such a relationship exists, then the relatively high degree of marker gene differentiation among local populations of many species (e.g. Ward et al., 1992) suggests that quantitative genetic differentiation, and hence, local adaptation, is wide-spread.
We thank Martin Lascoux, Patrik Waldmann and two anonymous referees for insightful comments and discussions which improved this manuscript. The Swedish Natural Science Research Council, the Academy of Finland and the Nordic Academy for Advanced Study (NorFA) are acknowledged for their financial support of J.M., while Fonds FCAR Quebec is acknowledged for its financial support of P.C.