Detecting recent selective sweeps while controlling for mutation rate and background selection

Abstract A composite likelihood ratio test implemented in the program sweepfinder is a commonly used method for scanning a genome for recent selective sweeps. sweepfinder uses information on the spatial pattern (along the chromosome) of the site frequency spectrum around the selected locus. To avoid confounding effects of background selection and variation in the mutation process along the genome, the method is typically applied only to sites that are variable within species. However, the power to detect and localize selective sweeps can be greatly improved if invariable sites are also included in the analysis. In the spirit of a Hudson–Kreitman–Aguadé test, we suggest adding fixed differences relative to an out‐group to account for variation in mutation rate, thereby facilitating more robust and powerful analyses. We also develop a method for including background selection, modelled as a local reduction in the effective population size. Using simulations, we show that these advances lead to a gain in power while maintaining robustness to mutation rate variation. Furthermore, the new method also provides more precise localization of the causative mutation than methods using the spatial pattern of segregating sites alone.


Introduction
Rapid advances in sequencing technology during the past few years have facilitated studies using genomewide molecular data for detecting signatures of selective sweeps (Akey et al. 2002;Carlson et al. 2005;Kelley et al. 2006;Voight et al. 2006;Wang et al. 2006;Kimura et al. 2007;Sabeti et al. 2007;Tang et al. 2007;Williamson et al. 2007;Xia et al. 2009;Qanbari et al. 2012;Ch avez-Galarza et al. 2013;Long et al. 2013;Ramey et al. 2013;Huber et al. 2014), and a large number of compu-tational methods have been developed for this purpose (e.g. Fu & Li 1993;Kim & Stephan 2002;Sabeti et al. 2002Sabeti et al. , 2007Kim & Nielsen 2004;Nielsen et al. 2005;Voight et al. 2006;Jensen et al. 2007;Boitard et al. 2009;Chen et al. 2010;Pavlidis et al. 2010Pavlidis et al. , 2013Li 2011). The various methods differ in the assumptions that they make about the selective sweep. For example, the extended haplotype test and its derivatives are powerful in cases where the beneficial mutation has not yet reached fixation in the population (Sabeti et al. 2002(Sabeti et al. , 2007Voight et al. 2006). Methods based on measures of population subdivision rest on the assumption that a selective sweep in geographically structured populations has a locally confined effect on genetic diversity, which increases population differentiation at the position of the sweep (Akey et al. 2002;Chen et al. 2010). More recently, statistics have been developed specifically for the detection of soft sweeps, that is a pattern caused by multiple haplotypes sweeping to high frequencies (Ferrer-Admetlla et al. 2014;Garud et al. 2015).
In this study, we are solely concerned with the model of a classical hard selective sweep in a single population, and we assume that the beneficial mutation has reached fixation not too long ago. The methods usually applied in this scenario aim to detect deviations in the shape of the site frequency spectrum (SFS), which can be quantified with simple summary statistics like Tajima's D or Fay and Wu's H. In addition, more powerful statistics have been developed that explicitly model the effect of a selective sweep on the SFS in a likelihood ratio framework (Kim & Stephan 2002;Nielsen et al. 2005). Kim & Stephan (2002) proposed a composite likelihood ratio statistic based on calculating the product of marginal likelihood functions for all sites on a chromosome under models with and without a selective sweep at a particular position, and under the assumption of a panmictic population of constant size. The resulting composite likelihood ratio is then computed for each position of interest to evaluate the evidence for a sweep at those positions. This method, therefore, does not only incorporate information regarding the SFS, but does so in a way that uses the spatial distribution (along the chromosome) at segregating alleles of different frequencies. The null distribution of the test statistic is approximated using simulations. An extension to this test was proposed by Nielsen et al. (2005). In this method, the overall genomic SFS is used as the neutral, or background, model instead of using the standard neutral model as the null. The distribution of the SFS under the alternative hypothesis of selection is derived by considering the way a selective sweep would modify the observed background distribution of allele frequencies. This leads to a computationally fast method, facilitating genomewide analyses. Nielsen et al. (2005) also argued that the use of the overall genomic SFS to represent the neutral case leads to increased robustness, and showed that the method was robust to a two-epoch growth model and an isolation-migration model with population growth in both populations, with parameters estimated from human single nucleotide polymorphism (SNP) data (Marth et al. 2004). Since then, it has become clear that, while this method may be more robust than some previous SFS-based approaches, it can produce a high proportion of false positives if there has been a strong recent bottleneck in population size, but a standard neutral model is used to calculate critical values (Jensen et al. 2005;Pavlidis et al. 2008).
If invariable sites are included in the analysis, then both the methods of Kim & Stephan (2002) and Nielsen et al. (2005) may be sensitive to assumptions regarding selective constraint and mutation rates. A region with strongly reduced levels of variation due to selective constraint or reduced mutation rate may be misinterpreted as a region that has experienced a recent selective sweep (Nielsen et al. 2005;Boitard et al. 2009;Pavlidis et al. 2010). For these reasons, Nielsen et al. (2005) proposed using only polymorphic sites, an option that became incorporated as default in both SWEEPFINDER (Nielsen et al. 2005) and SweeD (Pavlidis et al. 2013).
Background selection can also lead to locally reduced levels of neutral variation (Charlesworth et al. 1993(Charlesworth et al. , 1995Hudson & Kaplan 1994Nordborg et al. 1996;Charlesworth 2012;Cutter & Payseur 2013) and cannot be ignored for the study of neutral polymorphisms in many cases (Williford & Comeron 2010;Cutter & Payseur 2013;Messer & Petrov 2013). Cutter & Payseur (2013) argue that the inevitability and prevalence of deleterious mutations necessitates the incorporation of background selection in the null model when identifying positive selection. There is a well-developed mathematical framework for quantifying the strength of background selection given the genomewide mutation rate, recombination rate, position of functional elements and distribution of fitness effects (Hudson & Kaplan 1995;Nordborg et al. 1996;Nicolaisen & Desai 2013). As data sets and methods for estimating the effect of background selection for each position in the genome are becoming available (McVicker et al. 2009;Comeron 2014), the objective of developing methods for detecting positive selection that can take background selection into account is becoming tenable. However, it is unknown to what degree those currently available maps of background selection are also affected by recurrent selective sweeps (McVicker et al. 2009), which could lead to overcorrection when using those maps.
Here, we explore the potential for improving the composite likelihood ratio test of SWEEPFINDER (Nielsen et al. 2005) by either including invariant sites that differ with respect to an out-group (i.e. fixed differences), or all invariant sites, in addition to polymorphic sites. When only including fixed differences, the method incorporates the information typically represented in a Hudson-Kreitman-Aguad e (HKA) test (Hudson et al. 1987), but adds the information from the spatial distribution of allele frequencies. We show that this approach is robust to variation in mutation rate across the genome, and also develop an approach for incorporating estimates of the strength of background selection into the SWEEPFINDER framework. Using the reduction in diversity relative to divergence as a necessary hallmark of a selective sweep in our model also helps to reduce false positives, for example in the case of a recent population bottleneck. Finally, we compare results of both the old and the new version of the likelihood ratio test applied to human genetic data.

Including invariant sites into the SWEEPFINDER framework
Starting with n aligned DNA sequences, each of length L, we wish to determine whether a selective sweep has occurred at some defined position along the sequence. Based on results of Durrett & Schweinsberg (2004), Nielsen et al. (2005) derived an approximate formula for p k *, the probability of observing k derived alleles, k 2 {1,2,. . .,n-1}, in a sample of size n, immediately after a selective sweep, for a site at a particular distance (d) from the selected mutation. For each k, p k * is a function of d, the background allele frequency distribution p = (p 1 ,p 2 ,. . .,p n-1 ), and the parameter a = r ln(2 N e )/s. Here, r is the per-base per-generation recombination rate, s is the selection coefficient, and N e is the effective population size. The parameter p k is the expected proportion of sites, not affected by the sweep, in which the derived allele has a frequency of k/n in the sample. The vector p is commonly estimated as the observed SFS from the whole genome, under the assumption that only a small and therefore negligible proportion of positions are affected by selection. The parameter a quantifies the relative influence of recombination and selection, with small values of a indicating strong sweeps.
The equations in Nielsen et al. (2005) allow for the incorporation of invariant sites that may or may not be fixed differences relative to an out-group, using p = (p 0 , p 1, . . .,p n ) as the definition of p, and with the modification that the upper limit of the sum in equation (5) of Nielsen et al. (2005) is n and not n-1. The quantity p k * is a function of the probability of a lineage escaping a selective sweep, P e = 1-exp(-ad), where d is the distance between the polymorphic site and the sweep location. Our new version of SWEEPFINDER allows distances between sites to be defined as genetic distance. This is achieved by allowing d to be defined by a recombination map rather than by physical distance as in the previous version. As in Nielsen et al. (2005), we then define the composite likelihood ratio statistic CLR = 2[log (CL sweep )log(CL background )], where CL sweep is the composite likelihood maximized over alpha, and CL background is the composite likelihood calculated under the assumption of a = ∞. This is a composite likelihood ratio, and not a full likelihood ratio, because sites in the genome are not independent, but correlated due to linkage disequilibrium. One thing to notice, about which there has existed some confusion in the literature, is that this approach is not window based but in theory incorporates information from all SNPs in the genome to inform the CLR calculated for a single point in the genome. However, for computational efficiency SWEEPFINDER uses a cut-off for distances from the focal SNP to include in the calculation. As distances become large, the contribution to the likelihood ratio approaches zero. The value used for the cut-off in SWEEP-FINDER is ad = 12, corresponding to a probability of a lineage escaping a sweep of 0.999994. Furthermore, SWEEPFINDER calculates probabilities on a grid of recombination distances and uses a smooth interpolation to approximate probabilities for a particular point.
The effect of including invariant sites on the SFS is illustrated in Fig. 1. In a region close to the site of the selective sweep, variability is reduced because almost all the probability mass is concentrated on fixed alleles. Notice also that under the infinite sites assumption, as the mutation rate affects all categories proportionally, a change in the mutation rate will not change the SFS defined on {1, 2,. . .,n} (Fig. 1a, b). This statement does not hold true when invariant sites that do not differ from the out-group (Fig. 1c) are incorporated.

Correcting for background selection
A B-value (B) is the factor by which the effective population size is expected to be reduced due to background selection, that is N e * = N e B, where N e and N e * are the effective population sizes with and without background selection, respectively (Charlesworth 2012). We will assume that a reasonable estimate of the 'B-value map', the value of B for each site in the genome, is available (see, e.g., McVicker et al. 2009; for humans). We note that this assumption limits the use of our method to organisms for which such estimates have been obtained. We also note that we only model the main effect of background selection: the well-known reduction in effective population size. However, background selection can also affect the distribution of allele frequencies (Charlesworth et al. 1993(Charlesworth et al. , 1995Hudson & Kaplan 1994;Lohmueller et al. 2011a;Zeng & Charlesworth 2011;Nicolaisen & Desai 2013), an effect that is ignored here.
Based on the B-value map, the expected site frequency spectrum can be adjusted simply by multiplying all categories in the spectrum, except for the zero and the n (fixed differences) category, by B, that is, by setting p ðBÞ k ¼ Bp k for 1 < k < n-1, as the expected diversity reduction is proportional to B. The n category can be adjusted as described in the next section. The zero category can be obtained by standardization, that is k . If the zero category is not included in the analysis, all included categories will have to be standardized to ensure that the frequencies sum to 1. The calculation of the CLR then proceeds as in Nielsen et al. (2005).

Effect of background selection on number of fixed differences
We assume the availability of a sample of n chromosomes and a single chromosome from an out-group species, which split from the in-group species g generations ago. Fixed differences are defined as sites with an allele that is invariant within the in-group sample, but different from the allele at the orthologous position of the out-group chromosome (Fig. 2). The expected number of fixed differences, K, in the sample is then E[K] = l(2T anc -T in ), where T in is the time to the most recent common ancestor in the in-group sample, T anc is the divergence time between in-group and out-group, and l is the per-generation mutation rate. We further assume a standard neutral coalescent model with populations of constant sizes N e,in and N e,anc for the in-group population, and ancestral population, respectively (Fig. 2), and that the split time, g, is so large that we can assume Pr( where n is the sample size of in-group sequences, and E[T anc ] = g + 2N e,anc . Then, under an infinite sites model E½K ¼ l 2ðg þ 2N e;anc Þ À 4N e;in ð1 À 1=nÞ À Á and the relative number of fixed differences with and without background selection is The same expectations for a SFS that is extended to include the class of fixed differences (sites that are invariant in the sample, but different to an out-group species). (c) The same expectations for a SFS that is extended for the class of fixed differences and invariant sites that do not differ from the out-group species. All expectations are calculated with the formulas in Nielsen et al. (2005).
;anc À 2BN e;in ð1 À 1=nÞ g þ 2N e;anc À 2N e;in ð1 À 1=nÞ which reduces to for N e,anc =N e,in = N e . In the limit of large split times (g ? ∞), E[K (B) ]/E[K] % 1, and the effect of background selection on fixed differences can generally be ignored if g ≫ N e /n. In our new version of SWEEPFIN-DER, if the B-value map is included for sweep detection, estimates of N e,in , N e,anc and g have to be provided to the software.

Constant size and bottleneck simulations
Simulations were performed under the model described in Fig. 2 assuming L = 100 kb and n = 30, using msms (Ewing & Hermisson 2010). We set the split time, g, between in-group and outgroup to 20 coalescent time units (2N e generations), resulting in a neutral divergence of 0.1. The scaled mutation rate h = 4N e l per site was set to 0.005 and the population scaled recombination rate per site, 4N e r, to 0.02. Those parameters where chosen to be comparable to the ones in (Nielsen et al. 2005). One chromosome was sampled from the out-group species to classify invariant sites into sites that differ or do not differ to the out-group. To analyse the effect of reduced mutation rate in a genomic region compared to the background, we varied the mutation rate between 0.1 and 0.9 times the mutation rate in other regions. Further, we simulated two demographic scenarios, a constant size and a bottleneck population. In simulations with selection, the selected mutation was introduced in the population at specified times (0.01, 0.02, 0.04, 0.06, 0.08, 0.16, 0.24,. . ., 1.2), at a frequency of 1/(2N e ) with a population scaled selection coefficient of 2N e s = 200. We only kept simulations in which the mutation did not get lost (-SFC option in msms).
For the bottleneck simulations, we varied onset (0.004, 0.04 and 0.4), strength (0.05, 0.1 and 0.5) and duration (0.08 and 0.4) and explored all possible combinations of those parameters. To compare different bottleneck scenarios, h was scaled depending on the bottleneck parameters to keep SNP density constant for all simulations (on average~1850 SNPs per simulation, see Figs S2 and S3). This was achieved by calculating a scaling factor (f) using the formula of Marth et al. (2004) and the approach described in DeGiorgio et al. (2014). The recombination rate was scaled to be 4fN e r to keep the mutation over recombination rate ratio comparable to the constant size simulations. The split time was also adjusted to g/f.
For the simulations with selective sweeps, we used 200 replicates for each parameter setting and sweep start time and assumed N e = 10 000. For calculation of the false-positive rate (FPR), we conducted 4000 neutral simulations under each bottleneck condition. For power calculations, we generally assumed that the correct demographic model was known and used to identify critical values for the test, while for investigations of robustness, we used the standard neutral model to estimate critical values. In all cases, the background site frequency spectrum was estimated using 1000 neutral simulations. Note that in our analyses, the significance level is set so that 5% of all simulated 100 kb regions are expected to contain at least one outlier, that is it is an experiment-wise significance level based on our simulated sequence length.

Simulation of background selection
Background selection was simulated with the forward simulation software SFS_CODE (Hernandez 2008). To reduce the computational burden, we simulated relatively small populations of N e = 250 (Hernandez 2008). We used n = 15 and assumed constant population sizes with neutral and deleterious mutation rate of h = 0.0025 per bp, g/(4N e ) = 2, 4N e r = 0.15 and L = 100 kb. We fur- Fig. 2 Definition of fixed differences and polymorphic sites. We assume the infinite sites model, that is every mutation happens on a different site. We define fixed differences (red) as sites that are not polymorphic within the in-group and differ between in-group and out-group. Note that mutations on the lineage to the out-group also count as a fixed difference. Here, the in-group is sampled with 5 chromosomes and the outgroup with one chromosome. Background selection influences both the number of fixed differences and the number of polymorphisms.
ther assumed a selection coefficient of 2N e s = À50, reducing the neutral diversity by background selection by 40%. In the middle of the sequence (from 37.5 kb to 62.5 kb), we introduced a 100-fold reduction in recombination rate, which led to a local increase in the effect of background selection and an 80% reduction in SNP density (see Fig. S5). This reduction in recombination rate mimics a selective sweep by locally reducing diversity through the effect of background selection (Fig. S5). While the effect of background selection is more likely to act on a megabase scale (McVicker et al. 2009), we simulated strong background selection in a small segment of simulated sequence to keep the data sets small reducing the computational burden of the simulations. However, the difference in scale should not affect the generality of our conclusions.
To simulate selective sweeps in conjunction with background selection, a single positively selected mutation was introduced into the population 0.02 coalescence time units (2N e ) in the past in the middle of the sequence, with a selection coefficient of 2N e s = 2000, or 0.1 coalescence time units in the past with a selection coefficient of 2N e s = 200. Whenever the mutation was lost from the simulation, the output was discarded and the simulation was repeated. For simulations without background selection, we set the deleterious mutation rate to zero. The composite likelihood ratio was calculated using a grid of 40 points for each simulated data set. The neutral simulations described above were used as background site frequency spectrum. For the HKA test, we used nonoverlapping windows of length 5 kb.

Analysis of human data
We used data from nine unrelated European individuals sequenced by Complete Genomics (Drmanac et al. 2010). Data and filtering steps were the same as in DeGiorgio et al. (2014). We found that, in low complexity regions around the centromeres and elsewhere in the genome, diversity drops to low levels while divergence from chimpanzee stays constant or even increases relative to other regions. Those regions are highly correlated with low values of CRG100, a measure of local alignability, and increased levels of missing data. Therefore, they most likely reflect errors due to poor mappability and not patterns of recent selective sweeps. To filter those regions out, we only retained SNPs and fixed differences with a CRG100 value of 1 and full sample size. We also excluded windows with average CRG100 value of less than 0.9, in 100 kb windows moving by 50 kb. CRG100 values (Derrien et al. 2012) were downloaded from the UCSC Genome Browser at http://genome.ucsc.edu/.
We obtained recombination rates between pairs of sites from the sex-averaged pedigree-based human recombination map from deCODE Genetics (Kong et al. 2010).
For the sweep scan, we calculated a composite likelihood ratio at grid points with 1 kb spacing. We ran both standard SWEEPFINDER, using only polymorphic sites (CLR1), and our new method using polymorphic sites, fixed differences relative to chimpanzees and the B-values map from McVicker et al. (2009) (CLR2B). Each chromosome was run in parallel, taking 1 week for the whole genome. We assume an effective population size of humans and the human-chimpanzee ancestor population of 10 000 and 99 000, respectively, and a split time of 240 000 generations (McVicker et al. 2009). To look for overlaps with previous sweep scans, we use the supplementary table from (Akey 2009), compiling SFS-based scans (Carlson et al. 2005;Kelley et al. 2006;Williamson et al. 2007), LD-based scans (Voight et al. 2006;Wang et al. 2006;Kimura et al. 2007;Sabeti et al. 2007;Tang et al. 2007) and one F ST -based scan (Akey 2009).

Including diversity as a sweep signal increases power and precision
We compare the power and accuracy of the CLR test when including only variable sites (CLR1), variable sites and fixed differences (CLR2), and all sites (CLR3), in the calculation of the composite likelihood ratio. CLR1 is the CLR that is calculated by current sweep detection software (Nielsen et al. 2005;Pavlidis et al. 2013). We start with a simple scenario of a constant population size with no background selection, and an advantageous mutation in the middle of the sequence, with selection strength of 2N e s = 200 and varying start times (see Methods).
The power drops quickly with the age of the selected mutation and approaches zero for sweeps that start more than 0.5 coalescence time units (2N e generations) in the past (Fig. 3a). The root-mean-square error (RMSE) of the estimated location of the sweep also increases for older sweeps (Fig. 3b). At an age of 0.5 coalescent time units, localization using the CLR1 statistic is not better than picking a site at random. In contrast, CLR2 and CLR3 still have power until 0.8 time units in the past. Furthermore, for sweeps that start 0.2 coalescence time units in the past, there is an almost 40% increase in power. We also tested CLR2 and CLR3 on data with less neutral divergence from the out-group (1%, 5%) and do not see a reduction in power (Fig. S1). This suggests that a recent split time between in-and out-group does not negatively affect performance of the tests.
In summary, both power and accuracy of localization of the selected allele vastly increase when including fixed sites and there is little difference between including all sites (CLR3) and fixed differences (CLR2).

Including only fixed differences maintains robustness against mutation rate variation
We investigated the effect of varying mutation rates on the inference of sweeps. To this end, we use two sets of simulations in 100 kb windows: one set with a population mutation rate of 0.005 and another set of simulations with reduced mutation rates relative to the first set. The likelihood ratio is then calculated using the first set of simulations as the background SFS when calculating the CLR for the second set (see Materials and Methods). The power is estimated by running a third set of simulations, with similarly varying mutation rates as in the second set, but with a beneficial mutation with selection coefficient 2N e s = 200 arising at 0.08 coalescence units in the past. The selected site is placed in the middle of the simulated region. In both cases, the null distribution of the test statistic is obtained using simulations with a constant high mutation rate of 0.005 and no selective sweeps.
If all sites are used for inference (CLR3), the power is close to 1 irrespective of the mutation rate. However, the FPR increases rapidly with the reduction in the mutation rate, so that at a 60% reduction already half the signals are false positives and at a reduction of 40%, almost all of the signals from the neutral simulations are false positives (Fig. 4). This explains the apparently constant power. The reason for the increase in FPR with decreasing mutation rate is the reduction in the proportion of polymorphic sites relative to all other sites, which replicates what is expected after a selective sweep (Fig. 1c). In contrast to CLR3, the power of both CLR1 and CLR2 reduces with the reduction in mutation rate (Fig. 4). For CLR1, this reduction in power is due to the reduced SNP density. The power for CLR1 is only 80% to begin with and drops to 55% at a reduction in mutation rate by 50%. CLR2 performs much better: the power to detect a sweep is still at 80% with a mutation rate reduction of 50%. The FPR for both CLR1 and CLR2 stays at or below the expected 5% level, as predicted, as decreasing mutation rate does not affect the relative proportion of polymorphic sites to fixed differences. In fact, the tests become extremely conservative when a mutation rate that is too high is used to obtain the null distribution of the composite likelihood ratio. This is because the distribution of the composite likelihood ratio is not invariant with respect to the number of SNPs included in the analysis. Including many more SNPs for generating the null distribution (as a consequence of a higher mutation rate) than used in the analyses of the data will result in a conservative test.

Robustness to population bottlenecks
We simulated several bottleneck scenarios, varying onset, duration and strength of the bottleneck (Fig. 5a) and calculated the FPRs for the three sweep statistics (CLR1-3). The background SFS is calculated from neutral simulations under the respective bottleneck model. Critical values for a 5% significance level were obtained from simulations with a constant size population. For each bottleneck scenario, we adjusted mutation rate, recombination rate and split time to the out-group, so that the expected number of SNPs as well as divergence from the out-group is comparable for all bottleneck models and for the constant size model (see Methods). This is equivalent to adjusting mutation rate and recombination rate in the simulations used to obtain critical values to match the observed data. In a scenario with recent (onset = 0.004 or 0.04) and strong (strength = 0.05) or intermediate (strength = 0.1) bottlenecks, this generates a large proportion of false positives (>87%) if the population size is assumed to be constant (Fig. 5b). The proportion of false positives is smaller if the bottleneck is old, as most lineages coalesce before or during the bottleneck. This is true for all three CLR statistics. However, by including invariant sites or fixed differences in the CLR framework, we increase robustness to bottlenecks whenever the chance of surviving the bottleneck is relatively small, for example when the bottleneck is strong (5%) or intermediate (10%) and has a long duration (0.2). We also conducted simulations under the bottleneck scenarios of Fig. 5, but also varied mutation rate relative to the background mutation rate. We observe the same qualitative relationship as in Fig. 4. In particular, CLR1 and CLR2 consistently show decreasing levels of FPR with decreasing mutation rate, whereas CLR3 consistently shows increasing levels of FPR with decreasing mutation rate (Fig. S4).
As a specific example, European humans are assumed to have experienced a bottleneck during colonization of Europe. Estimated bottleneck parameters (Lohmueller et al. 2011b) indicate a relatively recent, short, but strong bottleneck (onset = 0.055, duration = 0.02, strength = 0.05). Simulating data under this scenario results in a proportion of false positives of 0.21 for CRL1 and CLR2 and 0.24 for CLR3, suggesting that constant population size is not a suitable demographic model for calculating significance thresholds for any of the three CLR tests.
False positives due to background selection are prevented by including a B-value map A strong reduction in diversity relative to divergence in regions of the genome can be caused not only by selective sweeps, but also by the effects of deleterious mutations on linked neutral variation, that is background selection. We adapted SWEEPFINDER to enable the inclusion of genomewide estimates of this effect, the B-value map, to account for this type of variation. To evaluate the method, we simulated a genomic region with increased background selection, that is a local reduction in diversity due to background selection (Figs S5, 6 and Methods). The background SFS used to calculate the CLR statistics was based on otherwise identical neutral simulations. To evaluate power in the presence of background selection, we simulated data with both background selection and a recently completed selective sweep located in the middle of the sequence (Fig. 6). The nominal FPR, which is used to determine the nominal significance level, was estimated from neutral simulations without background selection.
The HKA test and the uncorrected CLR2 and CLR3 cannot distinguish background selection from selective sweeps, as is evident from the nearly 100% false positives under our strong background selection scenario (Fig. 7a). If only polymorphic sites (CLR1) are used, the test does not suffer from an elevated level of false positives, indicating that CLR2 and CLR3 mainly pick up on the diversity reduction. However, if the diversity reduction due to background selection is factored in using a B-value map, the statistics return to the desired behaviour in that the FPR corresponds to the nominal significance level, while maintaining increased power as compared to CLR1. The same results are found for simulations with background selection and a recent population bottleneck (Fig. S7), assuming bottleneck parameters that were estimated for European humans (Lohmueller et al. 2011b). False-positive rate (FPR) and power with reduced mutation rate. FPR and power, at a nominal significance level of 5%, as a function of the reduction in mutation rate of the sequence under investigation, relative to the mutation rate of the sequence that is used to calculate the background site frequency spectrum. Both power and FPR are calculated by assuming a nominal significance level that is derived from simulations with no reduction in mutation rate (relative reduction = 1). Each 100 kb simulated region is scored as significant if it contains at least one significant outlier CLR at the 5% level.

Analysis of a human genetic variation data set
We screened the data from nine unrelated European individuals sequenced by Complete Genomics (Drmanac et al. 2010) for selective sweeps to prove the utility of our improvements to SWEEPFINDER. We compare the composite likelihood ratio across the whole genome, calculated using only polymorphic sites (CLR1), with our new approach by including fixed differences with respect to chimpanzees into the calculation (CLR2). To account for varying diversity across the genome due to background selection, we also incorporate the B-value map from McVicker et al. (2009) into the calculation of CLR2, henceforth referred to as CLR2B.
Due to the complex human demography and the added complication of background selection, we do not calculate critical values, but report the 0.2% most extreme regions in Table 1. This approach has previously been used in other selection scans (e.g. Voight et al. 2006) under the argument that it is an outlier approach, although we notice that no formal testing has been carried out here or in Voight et al. (2006) to determine the degree to which the most extreme values indeed are outlying with respect to some parametric distribution. We note however that, based on neutral simulations under a simple bottleneck model with parameters taken from Lohmueller et al. (2011b), we would expect 8 sweep signals genomewide above the CLR2B threshold of 270, suggesting 33 true positives amongst our 41 candidates in Table 1.
The strongest sweep signal is on chromosome 4, 33.6 Mbp, a region without any annotated genes. The closest gene, ARAP2, is 2.15 Mbp downstream from the CLR2 peak. This sweep region has a B-value close to one and a strong reduction in diversity relative to divergence. The peak in CLR1 shows that this region is characterized by a sweep-like site frequency spectrum. This region was also listed as a candidate region in LD-based (Voight et al. 2006;Wang et al. 2006;Kimura et al. 2007;Sabeti et al. 2007) and SFS-based sweep scans (Carlson et al. 2005;Kelley et al. 2006;Williamson et al. 2007). The gene with the strongest CLR2B signal is KIAA1217, which was suggested to affect lumbar disc herniation susceptibility (Karasugi et al. 2009). The gene is also an outlier for haplotype-based sweep statistics for detecting incomplete soft or hard sweeps, in an African population (Ferrer-Admetlla et al. 2014). This may suggest that the variant is fixed, or at very high frequency in Europe, but still polymorphic in Africa. Another gene in one of the outlier regions, HERC2, is known to modulate iris colour and blonde hair (Wilde et al. 2014). This candidate has previously been identified in a screen for population-specific sweeps using XP-CLR (Chen et al. 2010). Analyses of ancient DNA suggest that strong selection has been operating on

BGS BGS+Sweep
Only polymorphic sites Polymorphic sites and fixed differences All sites Polymorphic sites and fixed differences, corrected All sites, corrected   HERC2 in western Eurasia during the past 5000 years (Wilde et al. 2014). About half of our outlier regions in Table 1 overlap with at least one candidate region of previous sweep scans in humans (Akey 2009), and most of them are also outlier regarding CLR1. However, there are some notable exceptions: one example is the sweep region on chromosome 7, at 72.6 Mbp, with the genes BCL7B, FZD9 and BAZ1B. This region has a small CLR2B percentile rank of 0.0005, but a much larger CLR1 percentile rank (0.071), and is not listed in Akey (2009).
In conclusion, we show that CLR2B shows enrichment for previously detected candidates, but also identifies novel sweep signals. These previously undetected sweeps are likely to be enriched for sweeps that started between 0.2 and 0.8 N e generations ago and thus escaped detection with LD-, F ST -or SFS-based methods.

Discussion
We evaluated the performance of a composite likelihood ratio test for detecting selective sweeps (Nielsen et al. 2005) when including fixed differences in the likelihood ratio in addition to SFS information, using extensive simulations. We show that there can be a marked increase in power as well as a reduction in FPR for a number of different scenarios in several different models of mutation rate variation, population bottlenecks and background selection. We also show that estimates of the strength of background selection can be included into the framework, to prevent false positives in regions with strong, long-term background selection. By applying the method to human genetic data, we detect novel regions that are not identified as candidate regions with the standard SWEEPFINDER approach.

Using invariant sites increases power and robustness
Given that both diversity and divergence change proportionally with mutation rate, we integrate variation in mutation rates by including a measure of divergence to an out-group species. More specifically, we include sites that are not polymorphic within the species under investigation, but differs from an out-group sequence, that is inferred fixed differences. If the SWEEPFINDER CLR is calculated including all sites (CLR3), variation in mutation rates can create false positives (Fig. 4). However, if only fixed differences are added to the SFS (CLR2), the power, but not the FPR, increases. This strongly suggests using CLR2 instead of CLR3 when out-group information is available.
Furthermore, including invariant sites can increase robustness to certain bottleneck scenarios if the bottleneck is of intermediate to high strength, but not too

Background selection as a null model for sweep detection
What is often neglected in previous discussions of diversity-based sweep detection methods is variation in diversity across the genome that is not caused by variation in mutation rate (or conservation level), but by variation in background selection, that is by the effect of deleterious mutations on linked neutral variation (Charlesworth et al. 1993;Hudson & Kaplan 1995;Charlesworth 2012;Cutter & Payseur 2013). A locally increased level of background selection will lead to a reduction in diversity similar to that expected after a selective sweep. As data sets and methods for estimating the effect of background selection for each position in the genome are becoming available (McVicker et al. 2009), the objective of developing methods for detecting positive selection that can take background selection into account is becoming tenable. We present the first such method by including a map of predicted B-values in the calculation of the CLR. McVicker et al. (2009) provide such a Bvalue map for humans by defining functional elements based on mammalian sequence conservation, and fitting parameters to phylogenetic data. Therefore, reductions in neutral diversity in regions of the human data do not influence the local estimation of B. Our approach considers a local reduction in diversity as evidence for a selective sweep only if it is not also predicted by a local drop in B-values, that is background selection is our evolutionary null model (Cutter & Payseur 2013). We simulated background selection levels typical for humans (McVicker et al. 2009), and by accounting for background selection, we could effectively prevent false positives without loosing power. If one does not account for background selection, the proportion of false positives is large and similar to that of a HKA test (Fig. 7a).

Application to human data
Finally, by applying our method to human genetic variation data, we show that the new method detects novel regions that were not identified as candidates using the standard SWEEPFINDER approach. Based on our simulations, we would expect those regions to be enriched for old selective sweeps that started between 0.2 and 0.8 N e generations ago, a time range where the power of other SFS-based, F ST -and LD-based methods is low (Sabeti et al. 2006). Interestingly, the strongest signal we find, which has been missed by most previous scans, is near KIAA1217, a gene affecting lumbar disc herniation susceptibility. We speculate that the selection in this region may possibly be related to changes in human muscular-skeletal function subsequent to the evolution of erect bipedal walk. Increased risk of lumbar disc herniation is a likely consequence of bipedal walk. We may still be evolving to optimize muscular-skeleton functions after this recent, radical change in skeletal structure and function.

Data accessibility
We did not generate any new data set for this study. The SWEEPFINDER2 (DeGiorgio et al. 2015) software that was used for all CLR calculations is freely available at www.personal.psu.edu/mxd60/sf2.html. The human polymorphism data, the divergence to chimp data, the B-value map and the recombination rate data that we used for running SWEEPFINDER2 are available from the Dryad Digital Repository: doi:10.5061/dryad.23d0f.

Supporting information
Additional supporting information may be found in the online version of this article.