- Top of page
It is now common for population geneticists to estimate FST for a large number of loci across the genome, before testing for selected loci as being outliers to the FST distribution. One surprising result of such FST scans is the often high proportion (>1% and sometimes >10%) of outliers detected, and this is often interpreted as evidence for pervasive local adaptation. In this issue of Molecular Ecolog, Fourcade et al. (2013) observe that a particularly high rate of FST outliers has often been found in river organisms, such as fishes or damselflies, despite there being no obvious reason why selection should affect a larger proportion of the genomes of these organisms. Using computer simulations, Fourcade et al. (2013) show that the strong correlation in co-ancestry produced in long one-dimensional landscapes (such as rivers, valleys, peninsulas, oceanic ridges or coastlines) greatly increases the neutral variance in FST, especially when the landscape is further reticulated into fractal networks. As a consequence, outlier tests have a high rate of false positives, unless this correlation can be taken into account. Fourcade et al.'s study highlights an extreme case of the general problem, first noticed by Robertson (1975a,b) and Nei & Maruyama (1975), that correlated co-ancestry inflates the neutral variance in FST when compared to its expectation under an island model of population structure. Similar warnings about the validity of outlier tests have appeared regularly since then but have not been widely cited in the recent genomics literature. We further emphasize that FST outliers can arise in many different ways and that outlier tests are not designed for situations where the genetic architecture of local adaptation involves many loci.
In the early days of molecular population genetics, Lewontin & Krakauer (1973) introduced a method of detecting loci under selection, using a large number of loci to infer the distribution of FST under neutrality and then testing for loci that appear as outliers to this neutral distribution. It took nearly three decades for the relevant data to become widely available to population geneticists, but it has now become routine to identify outlier loci from genomewide scans, either to infer demographic parameters by excluding the outliers or to understand adaptive diversity in natural populations by focusing on the outliers themselves.
Soon after the publication of Lewontin & Krakauer (1973), Robertson (1975a,b) and Nei & Maruyama (1975) pointed out a problem with this test: correlations in gene frequencies among subpopulations could inflate the neutral variance in FST above the value assumed and thus lead to a high rate of false positives. Correlations in gene frequencies can arise if the spatial structure of the population departs from Wright's island model of population subdivision. For example, Nei & Maruyama (1975) simulated a circular stepping stone of 20 demes of N = 10 individuals, with a migration rate of m = 0.5 between adjacent demes (Nm = 5). They found a twofold increase in the variance in FST when compared to Lewontin and Krakauer's approximation, while the increase was threefold when 100 demes were simulated. Subsequent articles explored other stepping stone models with different parameter values, generally finding less increase in variance. For example, Beaumont & Nichols (1996) simulated a 2D stepping stone with 24 × 24 demes and Nm~5 and found little effect of isolation-by-distance, while Whitlock (2008) simulated a 1D stepping stone of 20 demes with N = 100 and m = 0.08 (Nm = 8), as well as long-distance migration to randomly chosen demes at a rate m = 0.01. Whitlock (2008) noted that correlations were stronger in the 1D case, but concurred with Beaumont & Nichols (1996) that the variance in FST is little altered by isolation-by-distance. Narum & Hess (2011) simulated a 1D stepping stone of 10 demes with N = 500 and m = 0.01 (Nm = 5), assuming that 5/100 simulated loci were under local selection. They obtained a high rate of low- FST outliers, perhaps because the selected loci led to an overestimate of the average FST of neutral markers (which was used to simulate their null distribution) without inflating the variance substantially. Finally, Meirmans (2012) simulated a 2D rectangular stepping stone with 5 × 20 demes of N = 100 individuals or 39 × 51 demes of N = 30 individuals organized on the map of the Scandinavian peninsula, with a migration rate of m = 0.1 between adjacent demes (Nm = 10 and three, respectively). This time, the variance was substantially inflated (rate of outliers >20%, Meirmans 2012).
In this issue of Molecular Ecology, Fourcade et al. (2013) show that one type of population structure greatly increases the rate of false positives in FST outlier tests. Observing that river organisms have a particularly high rate of outliers, Fourcade et al. hypothesized that the characteristic spatial distribution of river organisms, in tree-like networks, could explain this observation. They used computer simulations to show how river-like population structure generates strong correlations in co-ancestry, substantially inflating the neutral variance in FST when compared to an island model. In consequence, the variance does not decrease as strongly as expected with the number of subpopulations sampled, and high rates of false positives can be obtained. It is important to stress that the network geometry is important only to the extent that it leads to correlations in co-ancestry. Indeed, Fourcade et al. (2013) also considered the stepping stone model, showing, in agreement with the studies cited previously, that the inflation of neutral variance is most severe in 1D stepping stones of sufficient length and lower values of Nm. This observation is also in agreement with Rousset (1997) who showed analytically that FST /(1 − FST) increases linearly with distance in the 1D stepping stone, but logarithmically with distance in a 2D structure. It is a matter of debate whether the most problematic kinds of population structure will apply to many natural populations, but, as emphasized by Fourcade et al. (2013), they are most likely to be relevant for river, deep sea or coastal aquatic organisms (Fig. 1).
Figure 1. Dragonflies and trouts in rivers, deep sea shrimps in hydrothermal vents and marine mussels and wrack seaweeds in tidal zones are all examples of species living in long linear landscapes, sometimes reticulated into ‘fractal’ or tree-like networks. Pictures were kindly provided by Pierre-Alexandre Gagnaire, Christophe Barla, Sophie Arnaud-Haond and Myriam Valero.
Download figure to PowerPoint
The problem of inflated neutral variance is most severe when using the original version of the Lewontin-Krakauer test, since more recent Bayesian methods for detecting FST outliers allow for unequal levels of differentiation among populations (Beaumont & Balding 2004; Foll & Gaggiotti 2008; Bazin et al. 2010), which may lessen the problem. However, even these newer methods do not overcome the problem entirely when spatial structure is hierarchical (Excoffier et al. 2009) and/or auto-correlated (Fourcade et al. 2013). In such cases, methods introduced by Excoffier et al. (2009) and Bonhomme et al. (2010) should be preferred. Indeed, Pérez-Figueroa et al. (2010) recommended comparing multiple methods, even when the spatial structure is well described by an island model. Studying pairs of populations is also recommended, although this approach also has limitations (Tsakas & Krimbas 1976; Vitalis et al. 2001).
While this limitation of the Lewontin–Krakauer test has been known since the 1970s, and reiterated by subsequent authors (e.g. Hermisson 2009), it has not always been taken into account in the interpretation of recent genomewide scans. In Fig. 2a, we report the number of citations per year of Lewontin and Krakauer's article and its earliest critics, while Fig. 2b compares citations of more recent versions of the test with citations of methods that account for correlation in co-ancestry. These Figures suggest that the problem of correlated co-ancestry tends to be neglected in the more recent literature.
Figure 2. (a) Number of citations by year of Lewontin & Krakauer (1973), and its earliest critics [Robertson (1975a,b), Nei & Maruyama (1975)], who mentioned the inflation of neutral variance that arises with some forms of population structure. (b) Number of citations by year of more recent FST outlier tests (blue) and of methods that account for correlation in co-ancestry, plus Hermisson's (2009) comment (red).
Download figure to PowerPoint
Finally, we will emphasize two further caveats about interpreting FST outliers, which are independent of the form of population structure. First, in addition to local adaptation, many other processes can lead to FST outliers. These include (i) background selection against deleterious mutations, which can increase the variance in FST, in particular when regions of differing recombination are analysed (Pannell & Charlesworth 2000); (ii) specieswide selective sweeps, which can lead to transient FST outliers around the selected locus (Slatkin & Wiehe 1998; Santiago & Caballero 2005; Bierne 2010; Kim & Maruki 2011); (iii) cryptic hybrid zones involving multiple loci involved in pre- and postzygotic reproductive isolation (Bierne et al. 2011); and (iv) stochastic effects at the wave-edge of an expanding population (Klopfstein et al. 2006; Hofer et al. 2009). This caveat emphasizes that FST outliers are just candidate loci, requiring subsequent detailed investigation (e.g. Faure et al. 2008; Gosset & Bierne 2013).
The second caveat is that no FST outlier test is designed to perform well in the presence of pervasive selection. The basic assumption of all such tests is that most loci evolve neutrally and can thus be used to infer the neutral FST distribution. When a high proportion of ‘outliers’ are present, either the neutral variance has been underestimated or we should not expect to successfully infer the neutral distribution. In such a case, the relevant theoretical framework involves genetic barriers to gene flow (Barton & Hewitt 1985; Barton & Bengtsson 1986), including semi-permeable barriers (Harrison 1993). Every locus of the genome is affected by selection to some extent, and ‘true’ neutral loci (i.e. those whose dynamics are unaffected by selection at linked loci) are no longer present. Genomewide genetic barriers are often multifactorial, involving local adaptation but also intrinsic pre- and postzygotic isolation (Barton & Hewitt 1985; Bierne et al. 2011). As a consequence, the results of FST scans become difficult to interpret in terms of number of selected loci, their genomic distribution and their effect on the phenotype (extrinsic or intrinsic selection, pre- or postzygotic isolation). Furthermore, the relationship between the genetic structure and the few macroscopic ecological variables available (e.g. temperature, salinity), with which this structure often correlates, is indirect and tells us little about the type of selection affecting a specific locus (Bierne et al. 2011). Finally, genomewide genetic barriers are often the result of a long history of divergence, including a succession of contraction/expansion cycles (Hewitt 2004, 2011), and these might not be easily inferable from the current spatial distribution of neutral genetic variation (Bierne et al. 2013).