Andrés Pérez-Figueroa, Departamento de Bioquímica, Genética e Inmunología, Facultad de Biología, Universidad de Vigo, 36200 Vigo, Spain. Tel./fax: +34 986 813828; e-mail: email@example.com
We carried out a simulation study to compare the efficiency of three alternative programs (dfdist, detseld and bayescan) to detect loci under directional selection from genome-wide scans using dominant markers. We also evaluated the efficiency of correcting for multiple testing those methods that use a classical probability approach. Under a wide range of scenarios, we conclude that bayescan appears to be more efficient than the other methods, detecting a usually high percentage of true selective loci as well as less than 1% of outliers (false positives) under a fully neutral model. In addition, the percentage of outliers detected by this software is always correlated with the true percentage of selective loci in the genome. Our results show, nevertheless, that false positives are common even with a combination of methods and multitest correction, suggesting that conclusions obtained from this approach should be taken with extreme caution.
One of the key topics in evolutionary biology is to unravel the molecular basis of adaptive changes to characterize those parts of the genome subject to natural selection. This objective may have obvious applications to different fields, such as biological conservation, animal, plant or microorganism production, or even medical genetics, and it has also recently become a main research focus in molecular ecology and population genomics. Two main strategies have been used to distinguish natural selection from stochastic forces at the genome level (Nielsen, 2005; Holderegger et al., 2008; Volis, 2008): (i) to compare data from different species to detect old signatures of selection and (ii) to use information within the same species to detect recent selective changes. Regarding the second strategy, one of the most common approaches to deal with recent signatures of selection was first introduced by Lewontin & Krakauer (1973) and since then used under slightly different frameworks (Beaumont & Nichols, 1996; Vitalis et al., 2001; Beaumont & Balding, 2004). The methodology consists of identifying loci (molecular markers) that present population differentiation (FST) coefficients that are ‘distinct’ (called outlier loci) from those under neutral expectations. This strategy has been widely used to detect recent episodes of selection in nonmodel species, where the absence of detailed genomic information does not allow other alternatives.
A third alternative is bayescan (Foll & Gaggiotti, 2008), which implements a Bayesian method to estimate directly the posterior probability that each locus is subject to selection. This method is an extension of that proposed by Beaumont & Balding (2004), and it is based on a logistic regression model in which each logit value of genetic differentiation FST (i, j) for locus i in population j is decomposed as a linear combination of the coefficients of the logistic regression, αi and βj, corresponding, respectively, to a locus effect and to a population effect. The posterior probability of locus i being under selection is estimated by defining two alternative models, one that includes αi and another that excludes it. The respective posterior probabilities of these two models are estimated using a reversible jump Markov chain Monte Carlo (RJMCMC) approach. The posterior probability that a locus is subject to selection is then estimated from the output of the RJMCMC by counting the number of times that αi is included in the model. This Bayesian approach takes all loci into account in the analyses through the prior distribution, resolving the problem of multiple testing of a large number of genomic locations. bayescan has being used for codominant (Gaggiotti et al., 2009; Knapen et al., 2009; Medugorac et al., 2009; Nielsen et al., 2009) as well as dominant (Manel et al., 2009; Paris et al., 2010; Parisod & Joost, 2010) DNA markers.
The available software has been shown to solve most of the previous criticisms (Robertson, 1975; Nei & Chakravarti, 1977; Flint et al., 1999) on the original methodology proposed by Lewontin & Krakauer (1973), as suggested by some previous simulations (Beaumont & Nichols, 1996; Vitalis et al., 2001; Foll & Gaggiotti, 2008). To our knowledge, however, there is only one study analysing the efficiency of one of these programs (dfdist) for a wide range of situations (Caballero et al., 2008). These authors found that the detection of selective loci can be a difficult and risky task. For example, under certain simulated conditions, the highest percentage of outliers was observed under the null (fully neutral) model. Only for extremely favourable conditions (strong selection coefficients, comparatively low levels of neutral gene frequency differentiation and low critical P-values), the program was able to detect outliers in a number proportional to the true percentage of selective loci existing in the genome. However, even for these favourable circumstances, the outliers detected were often false positives, suggesting that the method should be used with caution. A comparison in efficiency between this method and the other widely used ones (detseld and bayescan) has not been made so far. In addition, only a few studies have incorporated a multitest correction for detecting outliers of selection (Eveno et al., 2008; Galindo et al., 2009; Manel et al., 2009; Michalski et al., 2010), and the efficiency of this strategy has not been studied under a wide range of scenarios by simulation. Here, we use simulation data to compare the efficiency of the three alternative programs to detect selective loci from genome-wide scans using dominant markers. We also evaluate the efficiency of correcting for multiple testing of those methods that use a classical probability approach.
Materials and methods
Simulated data and scenarios investigated
The frequencies of neutral and selected dominant markers in a subdivided population were obtained by an analytical procedure following Caballero et al. (2008). Details of the procedure are also given in the Appendix S1. Briefly, a classical island model at equilibrium between migration and drift was used to obtain the frequency of neutral loci in a subdivided population (two subpopulations with N =500 individuals each were assumed). Allelic frequencies sampled from the theoretical beta distribution were assigned to each subpopulation for a number of loci (1000). Allele frequencies for selective loci were generated using a transition matrix approach. Briefly, we assumed that selection acts only in one subpopulation with fitnesses 1 + s, 1 + s/2 and 1 for the genotypes AA, Aa and aa, respectively. Selective coefficients of loci (s) were sampled from an exponential distribution with mean effect . The transition matrix considers the effects of selection, drift and migration between subpopulations. Genotypic values of individuals were obtained assuming Hardy–Weinberg proportions, and sample sizes of 40 individuals were assumed for each subpopulation.
A range of possible scenarios were run regarding different values of neutral gene frequency differentiation (FST for neutral loci equal to 0.025, 0.1 and 0.3), mean selection coefficients (= 0.005, 0.05 and 0.5) and different proportions of true selective loci in the genome (0%, 1%, 3%, 5% and 10%). Each scenario was replicated 10 times and analysed with the three programs, averaging results over replicates.
1 The critical frequency for the most common allele was 0.99 (loci in which the most frequent allele had a frequency ≥ 0.99 were excluded).
2 The scale for the Zhivotovsky (1999) parameters for estimating allele frequencies was 0.25 (the accuracy of the estimation of frequencies with this parameter value was checked).
3 The number of resamplings used to obtain the confidence intervals for outliers was 10 000 (several runs were performed with 100 000 resamplings, and results did not change).
4 The smoothing proportion used was 0.04.
5 The estimate of average FST to be used in the dfdist simulations was obtained in two different ways: (i) the average estimated FST calculated by the Ddatacal program (one of the programs in the dfdist package) and (ii) a trimmed mean FST (provided also by the Ddatacal program) obtained excluding 30% of the highest and 30% of the lowest FST values (this trimmed mean FST is supposed to be an estimate of the average ‘neutral’FST uninfluenced by outlier loci; Bonin et al., 2006).
The second program, detseld, was also run for all simulated marker data sets. The significance level was set at 95%, and a multitest correction based on FDR was also applied. Two different models, given by the nuisance parameters, were run in detseld. The first model was based on a range of parameters used empirically by Bonin et al. (2006). Because the amount of false positives obtained with this set of parameters was very high (about 20% of outliers with a fully neutral model for a critical P-value of 5%), we obtained a second model optimized to reduce the number of outliers detected by an exhaustive search in more than 1000 sets of parameters. As a result of this optimization, we obtained a set of nuisance parameters (Table S1) yielding a minimum of 11% (FST = 0.025) or 5% (FST = 0.1) of outliers when all loci were neutral for a critical P-value of 5%. We simulated 107 points for each set of parameters to ensure a correct generation of P-values.
For the third program, bayescan, the estimation of model parameters was automatically tuned on the basis of short pilot runs (10 pilot runs, length 5000), using the default chain parameters given in the program: the sample size was set to 5000 and the thinning interval to 20. The loci were ranked according to their estimated posterior probability. This probability cannot be interpreted directly or compared to P-values (Marden, 2000; Foll & Gaggiotti, 2008). Instead, all loci showing log(Bayes Factor) > 2 ((P[αi] ≠ 0) > 0.99) were retained as outliers, which provides decisive support for the acceptation of the model (Foll & Gaggiotti, 2008).
Comparison between the performances of the methods
Any useful method to detect selective loci should present at least the following four desirable criteria:
1 The method should detect the lowest percentage (ideally none) of significant outliers under the null model (i.e. when all loci are neutral).
2 There should be a positive relationship between the percentage of outliers detected and the true percentage of selective loci simulated across scenarios.
3 The outliers detected should be typically true selective loci.
4 The method should detect a substantial proportion of true selective loci.
Thus, following Caballero et al. (2008), for each method and scenario, we obtained the average percentage of outliers detected, the average percentage of those outliers that corresponded to selective loci and the average percentage of truly selective loci detected as outliers. Every locus detected as a candidate for selection by any of the methods was recorded, and the number of such loci rightly or wrongly assigned as being under selection by one method or a combination of two or three methods was obtained. The correlation between the percentage of loci detected as outliers and the true percentage of selective loci was obtained using the Kendall’s Tau (τ) nonparametric coefficient (Sokal & Rohlf, 1995) calculated by spss for Windows version 17 (SPSS Inc., Chicago, IL, USA).
We assessed whether the three different approaches to detect selective loci behave conveniently under distinct population genetic differentiation scenarios for neutral loci (FST = 0.025 and 0.1; Figs 1 and 2, respectively). For a scenario assuming a neutral gene frequency differentiation of FST = 0.3, almost no outliers were detected by any of the methods and scenarios. The lowest FST value represents the most favourable framework for detecting selective loci under directional selection. In addition, under each scenario, we compared dfdist and detseld without (Figs 1a–f and 2a–f) and with a multitest (FDR) correction (Figs 1g–l and 2g–l). Note that bayescan already incorporates a multitest adjustment so black bars in panels A–F are the same as those in panels G–L. We present results using only the optimized set of parameters for detseld (Table S1) and the trimmed correction for dfdist, as they produced the best results. In addition, we show results with moderate or high average selection coefficients ( = 0.05 or 0.5). In general, the performance of the methods with the lowest was substantially worse than with higher , although the comparative results among methods were hold.
The left column of Fig. 1 (Fig. 1a, d, g, j) represents the percentage of outliers detected by each method under comparatively low mean levels of neutral genetic differentiation (FST = 0.025). Both dfdist and detseld produced a high overestimation of the percentage of outliers when they were not corrected by a multitest method (Fig. 1a, d). Furthermore, the percentage of outliers detected was the highest (11–14%) under the neutral model (0% selective loci). Obviously this invalidates these approaches, as all detected outliers could be in fact false positives. However, this problem was partially corrected when a multitest correction was provided (Fig. 1g, j), showing dfdist a higher capability to detect outliers than detseld. In fact, the relationship between the percentage of outliers detected and the true percentage of selective loci simulated was positive and significant for dfdist (Fig. 1g, j pooled; Kendall’s τ8 = 0.89, P =0.001) but nonsignificant for detseld (Fig. 1g, j; Kendall’s τ = 0.07, P =0.780). Interestingly, bayescan detected a proportion of outliers similar to dfdist corrected for multiple testing, showing also a positive relationship between that proportion and the true percentage of selective loci (Fig. 1g, j; Kendall’s τ8 = 0.91, P <0.001). The percentage of selective loci found in the outliers (Fig. 1b, e, h, k) was moderate when a low percentage (1%) of selective loci was simulated but typically high when at least 3% of the loci were selective (Fig. 1h, k). The percentage of truly selective loci detected (Fig. 1c, f, i, l) was comparatively high with the highest average selection coefficient ( = 0.5), but substantially smaller with the moderate average selection coefficient ( = 0.05).
The same trend is described in Fig. 2 for a less efficient scenario (FST = 0.1). Briefly, dfdist and detseld were still inefficient without any multitest correction, detecting 5–7% of outliers irrespective of the true percentage of selective loci, whereas bayescan detected less than 0.5% under the null model and an increasing proportion of outliers with increasing proportions of true selective loci. dfdist and, particularly, detseld showed a rather small percentage of selective loci found in outliers without multitest correction. Again, detseld detected a very low percentage of outliers after correcting for multiple testing, whereas dfdist and bayescan behaved better. Both detseld (Fig. 2g, j; Kendall’s τ8 = 0.28, P =0.320) and dfdist (Fig. 2g, j; Kendall’s τ8 = −0.27, P =0.310) did not show a significant relationship between the percentage of outliers detected and the true percentage of selective loci, suggesting a low efficiency as estimators of the percentage of selective loci in the genome. bayescan, however, performed much better, showing a positive relationship between the percentage of outliers detected and the true percentage of selective loci (Fig. 2g, j; Kendall’s τ8 = 0.94, P <0.001), and a comparatively high percentage of selective loci in outliers (Fig. 2h, k). The percentage of selective loci detected was low (Fig. 2i, l), particularly so for dfdist and detseld for a high proportion of true selective loci (> 5%).
Figure 3 represents the number of loci detected as outliers for the different methods (detseld and dfdist using multitest correction) with a neutral gene frequency differentiation of FST = 0.025 and for a case with no selective loci ( = 0) or 1% of selective loci in the genome with average effect = 0.05 or 0.5. In the neutral case, the three methods detected an average of 1.1 false outlier loci, but another 10.9 more were detected by both detseld and dfdist, 6.7 more by detseld and 4.7 more by dfdist. The three methods detected almost the same 3.9 and 10.6 true selective loci on average for moderate ( = 0.05) and strong ( = 0.5) average selection coefficients, respectively. A few additional true selective loci were detected by one or two methods. However, most loci detected exclusively by dfdist were false positives.
For all the scenarios analysed, we can conclude that bayescan appears to be more efficient than the other methods. bayescan usually detected a higher percentage of outliers than the other methods after multitest correction. This percentage of outliers was always correlated with the true percentage of selective loci in the genome. bayescan detected less than 1% of outliers under the neutral model and showed the highest percentage (or as higher as other alternatives) of selective loci in outliers and selective loci detected. bayescan showed another useful property, which is the automatic optimization of the parameter conditions used in simulations. dfdist and detseld with multitest correction were also shown to be efficient at least for the most favourable scenarios, but they seemed to fail in a number of situations, particular the latter. One possible reason behind the poor performance of detseld may be that it assumes a divergence model between subpopulations with no migration, rather than an island model (assumed by dfdist, bayescan and the simulations).
The consequences of these results are straightforward. Most genome-scan studies in nonmodel species for dominant markers have used dfdist or detseld (see references in the Introduction section). Moreover, most of them have applied the methods without any multitest correction (exceptions are Eveno et al., 2008; Galindo et al., 2009; Manel et al., 2009; Michalski et al., 2010). Therefore, our results show that an unknown number of outliers detected with dfdist or detseld without multitest correction could be false positives, suggesting the need for a re-evaluation of results to confirm their conclusions. In some studies, however, the outliers detected were confirmed by a complementary argument, such as getting repeated detection under replication or pseudo-replication (Wilding et al., 2001) or comparing results between a priori adaptive cases and controls (Miller et al., 2007; Nosil et al., 2008; Galindo et al., 2009; Manel et al., 2009).
We have focused on the three more widely used methods for genome scan to identify candidate dominant loci for selection although other methods and software have been proposed for genome scans too. For example, the program spatial analysis method (SAM) (Joost et al., 2008) performs a SAM (Joost et al., 2007) with the help of geographic and environmental information. However, this approach only permits identifying molecular markers associated with environmental variables and requires a combination with one of the above programs to differentiate the type of selection. Another example is the program winkles (Wilding et al., 2001), which is based on the same principle as dfdist, but the null distribution of genetic differentiation is here conditional on allele frequency instead of on heterozygosity. bayesfst (Beaumont & Balding, 2004) allows for Bayesian estimation of FST, and it is the base from where bayescan was extended (Foll & Gaggiotti, 2008). Recently, arlequin 3.5.1 implements a new hierarchical test of selection (Excoffier et al., 2009), as an extension of FDIST, that highlights the need to have a good understanding of the population genetic structure of the studied organism to accurately identify loci with unusual levels of differentiation.
In our analysis, we have assumed that the selective loci are subject to directional selection in one of the subpopulations. This implies that the most favourable situation for detecting outliers is that of a low neutral FST, as selective loci would tend to show exceedingly high FST values. In fact, the results show that for FST = 0.1, the efficiency of the methods is much lower than for FST = 0.025 (cf. Figs 1 and 2), and for FST = 0.3, the methods do not work at all. Under balancing selection, in contrast, a high neutral FST would be more favourable for detecting selective loci.
All these methods for genome scan assume independence among loci (Foll & Gaggiotti, 2008), which is an unrealistic assumption when using a large amount of markers. The effect of linkage disequilibrium on these methods is not clear. An exhaustive simulation study, including different degrees of linkage disequilibrium, would be needed to address this issue. There are also further factors that would have a presumable influence on the ability of the methods to detect loci under selection. Some demographic scenarios, for example, would lead to erroneously assign population differentiation as a consequence of selection. In general, it can be assumed that the detection of selection would be more difficult under this kind of factors. Our simple scenario assumed, where no demographic complexities are involved, can therefore be considered as a favourable situation for the performance of the methods. More complex scenarios would require further detailed simulations.
Our analysis referred to dominant markers, such as AFLPs. In the conditions simulated, we expect that our results can be also extrapolated to codominant markers. The main difference between the dominant and codominant versions of the software is that the dominant versions must include a previous estimation of allelic frequencies. Because, in our study, we run simple scenarios under Hardy–Weinberg equilibrium, the estimated allelic frequencies should be unbiased. In fact, this was checked previously for the dfdist software (Caballero et al., 2008). It is possible that deviations from Hardy–Weinberg equilibrium may also reduce the efficiency of the methods using dominant markers.
In conclusion, from our results, we can give the following advices to design a study for detecting loci under directional selection using any of the analysed software:
1 Do not start the experiment if the populations compared show a mean gene frequency differentiation (FST) larger than about 0.2.
2 Use preferentially bayescan, or alternatively dfdist or detseld, using a multitest correction.
3 Use experimental controls: compare detection of outliers in situations expecting positive and negative results. Alternatively, provide independent experimental replication.
4 Be rather cautious when the percentage of outliers observed after multitest correction falls below 1%.
The authors thank N. Santamaría for her technical assistance and two anonymous referees for helpful comments. This work was supported by grants from Ministerio de Ciencia E Innovación y Fondos Feder (CGL2008-00135/BOS; CGL2009-13278-C02) and Xunta de Galicia (IN825B 2009/6-0). R. Vitalis provided a source code and helpful inputs for the detseld software. A. P.-F. was supported by an Ángeles Alvariño fellowship from Xunta de Galicia (Spain). M.J. G-P was supported by a María Barbeito fellowship from Xunta de Galicia (Spain).