THE DISTINCTIVE FOOTPRINTS OF LOCAL HITCHHIKING IN A VARIED ENVIRONMENT AND GLOBAL HITCHHIKING IN A SUBDIVIDED POPULATION

Authors

  • Nicolas Bierne

    1. Université Montpellier 2, Place Eugène Bataillon, 34095 Montpellier, France
    2. CNRS, Institut des Sciences de l’Evolution UMR5554, Station Méditerranéenne de l’Environnement Littoral, 1 Quai de la Daurade, 34200 Sète, Montpellier, France
    3. E-mail: n-bierne@univ-montp2.fr
    Search for more papers by this author

Abstract

Loci with higher levels of population differentiation than the neutral expectation are traditionally interpreted as evidence of ongoing selection that varies in space. This article emphasizes an alternative explanation that has been largely overlooked to date: in species subdivided into large subpopulations, enhanced differentiation can also be the signature left by the fixation of an unconditionally favorable mutation on its chromosomal neighborhood. This is because the hitchhiking effect is expected to diminish as the favorable mutation spreads from the deme in which it originated to other demes. To discriminate among the two alternative scenarios one needs to investigate how genetic structure varies along the chromosomal region of the locus. Local hitchhiking is shown to generate a single sharp peak of differentiation centered on the adaptive polymorphism and the standard signature of a selective sweep only in those subpopulations in which the allele is favored. Global hitchhiking produces two domes of differentiation on either side of the fixed advantageous mutation and signatures of a selective sweep in every subpopulation, albeit of different magnitude. Investigating population differentiation around a locus that strongly differentiates two otherwise genetically similar populations of the marine mussel Mytilus edulis, plausible evidence for the global hitchhiking hypothesis has been obtained. Global hitchhiking is a neglected phenomenon that might prove to be important in species with large population sizes such as many marine invertebrates.

How important is local selection in maintaining genetic polymorphism at specific loci? Although this question has always interested population geneticists (Hedrick et al. 1976; Hedrick 1986, 2006), and despite the promise of the genomic era (Luikart et al. 2003; Hedrick 2006), the answer remains a matter of debate.

On one hand, several arguments suggest that the proportion of polymorphisms maintained by local adaptation in a heterogeneous environment should be low: (1) theory predicts that the conditions under which such polymorphisms are protected from fixation are restricted (Maynard Smith and Hoekstra 1980; Hedrick 1986; Kawecki and Ebert 2004), (2) although well-studied examples exist (Koehn et al. 1980; Berry and Kreitman 1993; Lenormand et al. 1999; Hoekstra et al. 2004), they remain scarce (Hedrick 2006), (3) in some cases, alternative nonadaptive evolutionary processes that cause similar departures from neutral expectations are difficult to reject (e.g., history, demography, reproductive isolation [Charlesworth et al. 2003; Hofer et al. 2009]), (4) some putative signatures of local selection on specific molecular markers have turned out to be more complex when investigated further (McDonald et al. 1996; Bierne et al. 2003b), (5) even when local selection maintains phenotypic differences it does not necessarily produce genetic differentiation at most of the genes underlying the trait (Le Corre and Kremer 2003), and those that do indeed exhibit genetic structure are expected to affect neutral variation on a very small chromosomal scale (Charlesworth et al. 1997; Barton 2000) and for short periods (Miller and Hawthorne 2005).

On the other hand, the literature is full of observations that population differentiation at single or subsets of loci are substantially higher than the estimated genomic average, and these observations are typically attributed to locally variable selection (Beaumont 2005; Storz 2005; Nosil et al. 2009). Due to technical advances in scoring large numbers of polymorphisms, the approach of scanning genomes for so-called outlier loci (with anomalously high levels of differentiation) has become a standard of population genetics (Luikart et al. 2003), and since the original test of Lewontin and Krakauer (1973), the sophistication and power of tests for outliers has improved dramatically (Beaumont and Nichols 1996; Vitalis et al. 2001; Beaumont and Balding 2004; Foll and Gaggiotti 2008). But as refined as these tests become, they cannot tell us the precise form of the selection that is/was responsible for the high differentiation (Faure et al. 2008). With the exception of very dense genome scans recently conducted in model organisms (e.g., humans and flies [Sabeti et al. 2007; Turner et al. 2008]), it is likely that selection has indirectly modified the pattern of diversity at many outlier loci through genetic hitchhiking (Maynard Smith and Haigh 1974; Barton 2000). It would be mistaken to think that the indirect path through which selection affects neutral diversity could be neglected, and that the behavior of a selected locus simply translates with attenuation on its chromosomal neighborhood. Models that have investigated the hitchhiking effect in structured populations have emphasized how spatial variation in genetic diversity at a hitchhiker locus can arise without spatial variation in the sign or strength of selection acting on the selected locus (Slatkin and Wiehe 1998; Barton 2000; Santiago and Caballero 2005). For example, an unconditionally favorable mutation that spread from its deme of origin to other demes by migration can sometimes enhance population differentiation as measured by FST (Slatkin and Wiehe 1998; Barton 2000). While the favorable mutation spreads from the deme in which it originated to other demes, recombination continues to dissipate associations between the selected allele and the linked neutral variation, and therefore the hitchhiking effect lessens. Spatial differentiation is generated because the frequency of linked neutral alleles is increased more strongly in the birthplace of the favorable mutation than in remote locations in which the mutation arrives by migration. Despite our poor understanding of the evolutionary forces that can drive high FST outlier loci, few studies have tried to identify further the precise nature of the underlying regime of selection (positive vs. disruptive, direct vs. indirect, past vs. ongoing).

In the present study, I evaluate two alternative hypotheses that can account for high FST outliers in species with large population sizes (Wiehe et al. 2005; Faure et al. 2008). The first hypothesis is the typical interpretation that I called “local hitchhiking in a heterogeneous environment.” Here, selection fixes a new mutant allele in an environmental patch in which it is beneficial and a stable cline in allele frequency is formed at the environmental boundaries with neighboring patches in which the new allele is deleterious (i.e., local selection). The second hypothesis has been relatively neglected as an explanation for high FST outliers in reviews of the topic (Luikart et al. 2003; Beaumont 2005; Storz 2005; Nosil et al. 2009), and is called “global hitchhiking in a structured population.” Here, a new unconditionally favorable mutant allele spreads and fixes in the whole range of a spatially structured population. Only by disentangling the two scenarios can we distinguish the footprints of past selection and true ongoing local adaptation.

I will begin by presenting theoretical predictions for the two hypotheses. I will emphasize the distinctive signatures left by the two selective scenarios and how chromosome walking (i.e., identifying and genotyping a set of physically linked informative polymorphisms in the region of the FST outlier locus) can discriminate between the two hypotheses. Local hitchhiking will be shown to generate a single sharp peak of differentiation centered on the adaptive polymorphism (Charlesworth et al. 1997; Feder and Nosil 2010). Global hitchhiking produces a distinctive footprint: two symmetric twin peaks of differentiation each side of the fixed advantageous mutation. I will then present a 5 Kb chromosome walk performed in the mussel Mytilus edulis at the Elongation Factor 1α (EF1α) gene in which length polymorphism in the third intron revealed unprecedented population structure in this species and an analysis of sequence polymorphisms established the impact of genetic hitchhiking (Faure et al. 2008). Finally, the results obtained at the EF1α gene will be compared to theoretical expectations. Although the chromosomal region investigated was too small to evidence the two peaks of differentiation, support is obtained for the hypothesis of global hitchhiking. Specifically, genetic differentiation is shown to decrease toward the inferred selected site, and analogous selective sweeps are detected in both populations, but of weaker magnitude in one of the two.

Theoretical Predictions

In this section, I present simple theoretical results to determine whether the two evolutionary scenarios—local hitchhiking in a heterogeneous environment and global hitchhiking in a structured population—have different genomic signatures. I numerically analyzed deterministic models (i.e., infinite effective population size) of between-deme genetic differentiation using a classical linear one-dimensional stepping-stone model composed of n discrete demes. Although genetic drift was included in the simulations, in the main text I focus on deterministic models, which do not generate genetic structure in the absence of selection. This is justified by the experimental results treated in the next section. In particular, panmixia is not rejected for any M. edulis locus studied so far, with the sole exception of the FST outlier EFbis locus. I considered two linked biallelic loci; one is neutral and the other is either under spatially varying selection (local sweep) or under uniformly positive selection (global sweep). Although each scenario has been previously studied (e.g., Barton 2000; Wood and Miller 2006), the aim here is to contrast the outcomes of the two scenarios (Wiehe et al. 2005) to interpret the results presented hereafter.

NUMERICAL MODEL

A simple simulation model was developed to investigate the hitchhiking effect of a selected locus on its neutral neighborhood. I used a classical linear one-dimensional stepping-stone model composed of n discrete demes. The migration rate between demes was m (m/2 in either direction). I considered two biallelic haploid loci with recombination rate r between them. One locus was neutral and the other was under selection. Two types of selection regime were modeled at the selected locus. For the model of local hitchhiking in a heterogeneous environment, allele B has a selective advantage s1 over the alternative allele b in the left part of the stepping stone (habitat 1) whereas allele b has a selective advantage s2 (s1=s2 when not stated) over B in the right part (habitat 2). For global hitchhiking in a structured population, allele B has a selective advantage s over the alternative allele b everywhere. In both cases, all demes were initially fixed for b and allele B was introduced in one deme (the first deme on the left side of the chain when not stated) at a frequency p0. At the neutral locus, I assumed that one allele, A, was initially in frequency u0 in all the demes of the chain (the other allele, a, being at frequency 1 −u0). Allele B at the selected locus originates on a chromosome carrying A at the neutral locus. Allele A is here called the hitchhiker allele. Genotypic frequencies in each deme at a given generation were deduced from the frequencies of the previous generation after accounting for migration, recombination, and selection. Most of the simulations presented in this article were deterministic. However random drift was also simulated by a simple procedure of multinomial sampling of genotypes within each deme at each generation. Windows© executables for each of the simulated scenarios are available at the following URL: http://kimura.univ-montp2.fr/~nbierne/NewSimulators.zip. Executables (LocalSweep.exe and GlobalSweep.exe for the local and the global hitchhiking models, respectively) run with companion text files specifying the parameters used (localfile.txt and globalfile.txt). Borland Delphi 4.0 source code is available from the author upon request.

HYPOTHESIS 1: LOCAL HITCHHIKING IN A HETEROGENEOUS COARSE-GRAINED ENVIRONMENT

Since the seminal paper by Levene (1953), many models have explored the conditions under which a polymorphism can be maintained in a varied environment (reviews in Maynard Smith and Hoekstra 1980; Hedrick 1986; Kawecki and Ebert 2004). However, fewer studies have investigated the effect of local selection on linked neutral variation (Charlesworth et al. 1997; Miller and Hawthorne 2005; Pollinger et al. 2005; Wood and Miller 2006; Feder and Nosil 2010). Here, I emphasize that the dynamics of the process can be partitioned into two phases: (1) an increase in population differentiation due to local hitchhiking, and (2) the subsequent decline of the differentiation due to recombination and gene flow—that is, introgression between the genetic backgrounds defined by the selected locus. The theory of the second phase is in fact very well developed: it is the effect of a genetic barrier to gene flow (Barton 1979; Petry 1983; Bengtsson 1985). Unfortunately, the theory of genetic barriers has not permeated the literature on local adaptation at the molecular level but is primarily restricted to the hybrid zone literature.

Here, I will consider an environment that is “coarse-grained” (Levins 1968; Slatkin 1973) in that dispersal is sufficiently low relative to the scale of the environmental variation for local adaptation to be stably maintained. Populations are sampled either outside of the cline generated at the environmental boundary where they are fixed for different alleles at the selected locus, or within the cline maintained by a balance between migration and selection, where populations are polymorphic at the selected locus.

The two phases of the process: local hitchhiking and interbackground introgression

To explain the process, I present the results obtained for a population occupying two adjacent environmental patches each composed of 25 demes (Fig. 1). Demes were connected by an appreciable level of migration (m= 0.3) in a one-dimensional stepping-stone model. The model is an extension of the two-deme models studied by Miller and Hawthorne (2005) and Wood and Miller (2006). Demes were initially fixed for the b allele at the selected locus and a locally adapted mutation B appears in the far left deme. The B allele invades the demes of habitat 1 where it is advantageous (Fig. 1A) and stops at the environmental boundary between the two habitats to form a stable cline maintained by a balance between migration and selection. At the linked neutral locus, the A allele was assumed to be linked to the newly arising B allele, and so hitchhikes to high frequency in habitat 1 (Fig. 1A). Allele A will be referred to as the “hitchhiker allele.” At the completion of the local sweep, population structure is created between the two habitats. The formation of the cline at the selected locus constitutes the end of a first phase, local hitchhiking per se, and establishes population differentiation at both the selected and the neutral locus. At this stage neutral differentiation between the two habitats is maximal.

Figure 1.

Local hitchhiking in a heterogeneous environment. Frequency of the selected allele that appeared in the first left deme of a 50-deme stepping stone and is favored in habitat 1 but deleterious in habitat 2 (dotted lines) and frequency of an initially rare neutral allele that hitchhiked with the selected allele (black lines). The number of generations that have elapsed is indicated next to each of the curves. Initial frequency of the new locally adapted mutation: p0= 0.001, initial frequency of the neutral hitchhiker allele: u0= 0.05, migration rate: m= 0.3, selection coefficient: s= 0.05, recombination rate between the selected and the neutral locus: r= 0.001. (A) Phase 1, local hitchhiking. At the end of phase 1 the selected locus forms a stable cline maintained by a balance between migration and selection at the environmental boundary between the two habitats. (B) Phase 2, introgression of the neutral locus through the genetic barrier formed by the selected locus.

Once local hitchhiking is accomplished, the process enters a second phase (interbackground introgression) in which gene flow and recombination result in the homogenization of allele frequencies at the neutral locus (Fig. 1B). The homogenization of allele frequencies is slowed by the genetic barrier to gene flow generated by the selected locus in proportion to linkage between the two loci (Barton 1979). Figure S1 presents the evolution of Δu, the difference in allele frequency between populations outside of the cline at the neutral locus, as a function of time for various values of r. Local hitchhiking is a short phase whereas interbackground introgression can take a very long time. Although Wood and Miller (2006) found that population differentiation decreased rapidly in their two-deme model, spatial structure can substantially delay the homogenization process (Barton 1979; Barton et al. 2007) as observed here.

Introducing finite population sizes can affect the outcomes in at least three ways. First, the population size, N, is expected to be negatively correlated with the magnitude of the hitchhiking effect. But this is mainly because N determines the initial frequency of the favorable mutant (p0= 1/2N), and this is already incorporated in deterministic models (Barton 2000), whereas the stochasticity introduced by random drift in the dynamic of fixation and the decay of linkage disequilibrium remains negligible as soon as N is not too small (Kim and Stephan 2002). Second, random drift decreases the efficiency of selection (Nagylaki 1978) which is expected to decrease the barrier effect imposed by the selected locus on the neutral neighborhood. Introgression is therefore expected to be faster with drift. Finally, and most importantly, when population sizes are finite, the barrier of the selected locus can modify the migration–drift equilibrium at other loci, such that the differentiation does not vanish completely but remains stable at a higher level than the differentiation of unlinked loci (Charlesworth et al. 1997; Feder and Nosil 2010). However, this effect is expected to be much weaker than the genetic structure transiently generated by local hitchhiking, the footprint of which is expected to vanish over time while the equilibrium is reached. All three effects of drift are visible in Figure S2 where the dynamics of Δu are presented for various sizes of deme. As soon as N is not too small, it is clear that drift does not qualitatively affect the outcomes predicted by the deterministic model (see examples in Fig. S2). It is particularly noteworthy that appreciable deviations from the deterministic predictions appear only when N is sufficiently small relative to the strength of migration for genetic structure to be generated even in the absence of hitchhiking (Charlesworth et al. 1997; Feder and Nosil 2010). This can be seen on the right of Figure S2 in which the level of differentiation for neutral loci unlinked to the selected loci is shown. Decreasing N not only results in a stronger differentiation between habitats at loci linked to the selected locus but also inevitably at unlinked loci, while the difference between linked (hitchhiker) and unlinked (nonhitchhiker) loci should not be modified substantially (Feder and Nosil 2010). Importantly, in the Mytilus data analyzed below: (1) genetic structure is not observed at nonoutlier loci, and (2) the genetic signature of a selective sweep is still clearly visible in the form of a star-shaped clade of alleles, negative Tajima's D values, and significant coalescent-based neutrality tests (Faure et al. 2008).

The chromosomal footprint of local hitchhiking

I now investigate the pattern of genetic variation along a recombining chromosome. Figure 2A,B shows the results obtained when populations are sampled outside of the cline, where the selected locus is fixed for one allele. The variation along the chromosome of the allele frequency, u*, of an initially rare hitchhiker allele (u0= 0.05) after a period of introgression is shown in Figure 2A. I deliberately chose u0 to be small to obtain a marked effect. As u* depends on the initial frequency, u0, which is not necessarily shared among loci sampled along a chromosome, such curves will not often be obtained with real data. However, as shown below with the Mytilus data: (1) fitting a model with varying u0 is possible, and (2) loci can be chosen from preliminary data to share a common, preferably low, u0. Because I have chosen such a straightforward and cost-effective strategy for the Mytilus experiments, I here present the theoretical results under this framework of tracking hitchhiker allele frequencies. A more general method allowing the analysis of more or less haphazardly chosen SNPs (e.g., HapMap data) is currently under investigation and will be described elsewhere together with appropriate examples.

Figure 2.

The distinctive chromosomal footprints of local hitchhiking in a heterogeneous environment and global hitchhiking in a structured population. The x-axis represents the distance from the selected locus measured in terms of the ratio between the recombination rate and the selection coefficient (r/s). (A) Local hitchhiking. Frequency in habitat 1 (circles) and habitat 2 (squares) demes of an initially rare neutral allele that hitchhiked with the selected allele after a period of introgression (generation 20,000). Fifty-deme stepping stone, initial frequency of the new locally adapted mutation: p0= 0.001, initial frequency of the neutral hitchhiker allele: u0= 0.05, migration rate: m= 0.3. Curves are fits to the local hitchhiking model (eqs. 3 and 4). (B) Local hitchhiking. Genetic differentiation between the two habitats (filled diamonds) after a period of introgression (generation 20,000). 50-deme stepping stone, p0= 0.001, u0= 0.05, m= 0.3. (C) Global hitchhiking. Frequency of an initially rare neutral allele that hitchhiked with an unconditionally favourable mutation, in the population in which the favourable mutation originated (deme 10, dots) and in a population reached by the favourable mutation by migration (deme 40, squares). Fifty-deme stepping stone, initial frequency of the favourable mutation: p0= 0.001, u0= 0.05, m= 0.01. Curves are fits to the global hitchhiking model (equation 5 and 6). (D) Global hitchhiking. Genetic differentiation between the two populations (deme 10 vs. deme 40). Fifty-deme stepping stone, p0= 0.001, u0= 0.05, m= 0.01.

Figure 2A shows that the standard signature of a selective sweep is observed in habitat 1 demes, where the new mutation is favored (the result is here portrayed as a peak of the hitchhiker allele frequency, u*). Introgression modifies the signature of the sweep quantitatively—it is softer—but not qualitatively. In habitat 2 demes, the pattern is different. Very close to the selected locus, u* is low because the genetic barrier to gene flow is strong and prevents the introgression of the hitchhiker allele. Far from the selected locus, u* is low because the hitchhiker allele did not reach high frequencies in habitat 1. u* takes the highest frequencies for intermediate recombination rates because u* is sufficiently high in habitat 1 demes and the barrier sufficiently permeable.

Population structure takes the shape of a peak of FST centered on the selected locus (Fig. 2B). Because introgression is faster for a neutral locus that is far from the selected locus (Barton 1979; Bengtsson 1985), the peak of FST becomes narrower and sharper as the time elapses since the completion of the local sweep. Varying levels of introgression along the chromosome is the main cause of the peak-shaped signature and is similar to that expected following secondary contact between differentially fixed populations (Barton 1979; Wood and Miller 2006). Indeed, the strength of the barrier is expected to be proportional to 1/r over a broad range of conditions (Barton 1979). Figure S3 shows the results obtained when populations are sampled within the cline (where the selected locus is polymorphic). The peak-shaped signature on population differentiation is preserved, even when samples come from the same side of the cline. When standardized by Δp (the difference in allele frequency at the selected locus), the curves of the difference in allele frequency between populations are indistinguishable (Fig. S3B).

HYPOTHESIS 2: GLOBAL HITCHHIKING IN A STRUCTURED POPULATION

Slatkin and Wiehe (1998) first presented a model in which the spread of a favorable mutation from its deme of origin to other demes generates genetic structure at linked neutral markers. Faure et al. (2008) presented a modified version of Slatkin and Wiehe's model, in which two patches of demes were separated by a strong barrier to gene flow motivated by mosaic hybrid zone of Mytilus spp. in the eastern Atlantic. Santiago and Caballero (2005) considered hitchhiking in a two-deme model with small population sizes, such that the migration/drift equilibrium generates a strong genetic structure in the absence of hitchhiking. The central feature of each of these cases, as summarized by Barton (2000) is that “The net hitchhiking effect is expected to be much smaller than for a single population because it takes much longer for the allele to spread, giving more time for associations to dissipate. The ultimate result is a local increase in the frequency of linked neutral alleles and hence the generation of spatial differentiation.” Here, I consider a deterministic multideme model, but without a strong barrier to gene flow, and specifically, a one-dimensional stepping-stone model. My aim is to emphasize the main qualitative differences between the global hitchhiking model and the previous scenario of local hitchhiking.

The effect of global hitchhiking on a single locus

Figure 3 shows the propagation of a wave of advance of a favorable mutation and of an initially rare neutral allele that hitchhikes with the favorable mutation. The favorable mutation appeared in the tenth deme of a 50-deme stepping-stone model. Contrary to the previous scenario, demes were connected by a low level of migration (m= 0.01). Once the favorable mutation has fixed everywhere, a spatial gradient in allele frequency is observed at the linked neutral locus. As the wave propagates, linkage disequilibrium between the selected and neutral loci decreases due to recombination, and the hitchhiking effect becomes weaker. Modeling a larger metapopulation (more demes), weaker selection or higher recombination can result in the situation in which the wave has no effect on the neutral locus at all because linkage equilibrium is reached. This led Barton to conclude that the local increase in the frequency of linked neutral alleles “will be restricted to the immediate neighborhood of the birthplace of the new allele” (Barton 2000). Once the sweep is finished allele frequencies homogenize at the neutral locus because of migration (Fig. 3). But the time needed to reach homogenization solely by migration can be very long (Barton et al. 2007), and so a cline in neutral allele frequencies can persist for large periods. Figure 3 shows that the “hitchhiking-cline” is still clearly visible 20,000 generations after the sweep. Indeed, under the parameter values required to produce “hitchhiking-clines” (many demes = broad distribution/low dispersal), neutral diffusion is always slow. Therefore, an interesting result of this model is that a cline in allele frequency can be generated by global hitchhiking in a structured population. However, clines restricted to a subset of the markers studied are usually interpreted as evidence of local selection (i.e., selection in action) instead of a footprint of past selection. Spurious “hitchhiking-clines” represent a plausible and unappreciated form of false positives for population biologists.

Figure 3.

Global hitchhiking in a spatially structured population. Frequency of the favourable mutation that appeared in deme 10 of a 50-deme stepping stone (dotted lines) and frequency of an initially rare neutral allele that hitchhiked with the favourable mutation (black lines). Generation times are indicated next to the curves. Initial frequency of the favourable mutation: p0= 0.001, initial frequency of the neutral hitchhiker allele: u0= 0.05, migration rate: m= 0.01, selection coefficient: s= 0.05, recombination rate between the selected and the neutral locus: r= 0.001.

These results are deterministic, but when population sizes are small and migration rates are low, genetic drift can generate genetic structure in the absence of hitchhiking. When neutral structure is strong, global hitchhiking is expected to decrease population differentiation (Santiago and Caballero 2005) rather than increasing it, and so it will not generate high FST outliers. Some stochastic simulations were also performed to investigate the intermediate case, that is, when population size is finite, but large enough relative to the migration rate so that genetic structure is not generated in the absence of hitchhiking (the situation that interests us in this article). The global picture remains essentially the same (Fig. S4). However, drift introduces stochasticity in the decay of linkage disequilibrium between the neutral and the selected locus, and so the spatial pattern immediately after the sweep can be modified in various ways. The “hitchhiking-cline” can either be narrower or flatter, closer or farther from the birthplace of the favorable mutation, and sometimes the decay of linkage disequilibrium can be nonmonotonic leading to a polyclinal pattern (Fig. S4). However, after a period of homogenization by neutral diffusion “hitchhiking-clines” tend to converge toward a similar shape that resembles the deterministic expectation.

The chromosomal footprint of global hitchhiking

I now investigate the pattern of genetic variation along a recombining chromosome. Figure 2C shows the variation along the chromosome in the frequency of an initially rare hitchhiker allele in two well-separated demes (deme 10 in which the favorable mutation originated and deme 40), and Figure 2D shows the variation of FST between the two demes. As for local hitchhiking, I deliberately chose u0 to be small to obtain a marked effect, but again this need not be assumed for the analysis of real data. As for local hitchhiking, the hitchhiking effect is visible in the deme in which the favorable mutation originated. However, contrary to local hitchhiking global hitchhiking produces a similar pattern in both demes although the hitchhiking effect is much stronger in the deme in which the favorable mutation originated than in the deme reached by migration. The consequence is a nonmonotonic variation of the population structure. Maximal differentiation is observed for intermediate map distance, where the hitchhiking effect is strong at the birthplace of the favorable allele but weak in a more remote location (Slatkin and Wiehe 1998). Near the selected locus the hitchhiking effect is ubiquitous. Further away from the selected locus the hitchhiking effect is weak, even in the deme in which the favorable allele originated.

DISCRIMINATING LOCAL FROM GLOBAL HITCHHIKING

The global and local hitchhiking models predict similar qualitative effects on a single neutral locus: population differentiation is enhanced and the frequency of the hitchhiker allele is high in one population and lower in another population. This was exactly what has been observed at the EFbis locus in M. edulis. Although Faure et al. (2008) argued that global hitchhiking could leave a distinguishable pattern of asymmetrical frequencies of the hitchhiker allele between populations, asymmetric introgression is in fact easy to generate under the local hitchhiking model. Asymmetric introgression can result from different fitness effects in the two environments (e.g., the new allele can be strongly favored in habitat 1 and slightly deleterious in habitat 2: s1 > s2) or from differences in the sizes of the environmental patches (e.g., introgression should be faster from the larger to the smaller patch [Barton 1986]). Quantitative differences could potentially be found between the global and local hitchhiking scenarios. Population differentiation reaches very high levels close to the selected polymorphism under local selection that seems hardly achievable with global hitchhiking. If a differentially fixed locus is picked up in an FST scan, local adaptation is likely to be involved. However, FST outliers are rarely very different from the genomic average and are often far from being differentially fixed.

Fortunately, the chromosomal footprints of local and global hitchhiking are distinctive (Fig. 2). (1) Global hitchhiking generates two domes of population differentiation each side of the fixed advantageous mutation (Fig. 2D) whereas local hitchhiking produces a single sharp peak of population differentiation centered on the adaptive polymorphism (Fig. 2A). (2) Under the global hitchhiking model, the characteristic signature of a selective sweep is expected to be observed in every subpopulation (Fig. 2C). It is the strength of the sweep that should differ. Under the local hitchhiking model however the signature of a selective sweep is expected to be observed locally, in populations of the same habitat. In populations of other habitat types, the pattern is expected to be different, with the frequency of hitchhiker alleles decreasing toward the position of the selected locus (Fig. 2A). A decisive test to discriminate the two scenarios is thus to perform chromosome walks—that is, identifying and genotyping a set of physically linked polymorphisms in the region of the FST outlier locus. When moving toward the direct target of selection, which can be identified in the population in which the strength of the selective sweep is the strongest (close to the birthplace of the favorable mutation in the global hitchhiking model or in habitat in which a new adaptation appeared under the local hitchhiking model), population differentiation should increase sharply under the local hitchhiking model whereas it should decrease under the global hitchhiking model.

The simulation work above was designed to aid the interpretation of the Mytilus data presented below. It is important to understand that the explanations are relevant only when the level of genetic differentiation at neutral nonoutlier/nonhitchhiker loci is low. When the basal neutral genetic structure is high, global hitchhiking can decrease population differentiation (Santiago and Caballero 2005) and potentially generates low- FST outliers (i.e., adaptive introgression, Pialek and Barton 1997). Local adaptation is expected to always generate a peak of FST which tends toward an equilibrium when Nem is small and a basal neutral genetic structure is generated (Charlesworth et al. 1997; Feder and Nosil 2010). However, because the neutral variance of FST is expected to increase with the level of differentiation (Lewontin and Krakauer 1973), the chromosomal region in which FST exceeds the neutral envelope is expected to remain small whatever Ne. In addition, one can imagine more complex scenarios that have not been addressed here. Analytical approximations have been derived (Barton 2000) that could be used to obtain more general results, but the major qualitative differences between the two scenarios are unlikely to change providing the basal neutral genetic structure is low. Unless we can confidently identify the true target of selection, investigating how the genetic structure varies along the chromosomal neighborhood of an FST-outlier is unavoidable if we wish to understand the nature of the selection responsible for the unusual differentiation.

Materials and Methods

SEQUENCING THE EF1α GENE IN M. EDULIS

Using the sequence data of Faure et al. (2008), indel polymorphisms were chosen to design allele-specific PCR primers in the 3′ region of intron 3 (1764-F1 and 1778-F2 in Table S1) and in the 5′ region of intron 2 (985-R1 and 987-R2 in Table S1) of the EF1α gene. Allele-specific PCR primers amplify hemizygous DNA segments from a heterozygous template (Ruano and Kidd 1989) and allow direct sequencing of the PCR product. After presence/absence genotyping with both primers, heterozygous individuals (those amplified with the two primers) were chosen from sample LU (Lupin, Charente-Poitou, France), an M. edulis population from the Bay of Biscay. Standard PCR reaction was used to amplify the 5′ region of the gene using a forward primer designed at the beginning of the first exon (25-F in Table S1). The 3′ region was amplified with a long range PCR protocol using the 5′ PCR Extender System kit (5 PRIME, Gaithersburg, MD), using a reverse primer designed at the 3′ end of the mRNA in the 6th exon (4772-R in Table S1). PCR products were treated with ExoSAP-IT (USB Corp, Cleveland, OH) and subjected to a cycle sequencing reaction with the Big Dye Terminator version 3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA) using forward and reverse primers, and for the 3′ region various other primers designed from the mRNA sequence (Table S1). Samples were sequenced with an ABI PRISM 3130 XL Genetic Analyzer (Applied Biosystems).

GENOTYPING INDELS AND NUCLEOTIDE POLYMORPHISMS

The EFbis locus has been identified as a high FST outlier locus between M. edulis populations of the North Sea and M. edulis populations of the Bay of Biscay. Samples LU (Lupin, Charente-Poitou, France) and WS (Wadden Sea, Holland), the two representative samples of these regions already used by Faure et al. (2008), each composed of 47 individuals (i.e., 94 chromosomes), were used for a precise estimation of allele frequencies at nine loci along the EF1α sequence. Five were indel polymorphisms and four nucleotide polymorphisms. They were chosen on a genealogical ground to map at the base of the hitchhiker lineage in the genealogy (most informative polymorphisms to study genetic hitchhiking, Barton 1998) or to map to the internal branch of the genealogy as proved to map the EFbis locus initially identified to be an FST outlier. They were also chosen to be homogeneously distributed along the EF1α gene and to be easy to screen. Taking advantage of the critical importance of perfect matches between the most 3′-end base of PCR primers and their templates, allele-specific PCR primers were designed that selectively amplify only one allele (Table S1). Homozygous genotypes amplify with one of the two primers only, whereas heterozygous genotypes amplify with the two primers. When possible, consecutive nucleotide polymorphisms were used to facilitate selective amplification between the two alleles. This technique was used for the four nucleotide polymorphisms but also for the two indels, within intron 2 and 3, used for allele-specific sequencing. For the remaining three indel polymorphisms, genotyping was performed by simple length difference after acrylamide gel electrophoresis. The fluorescent dye 5′ end-labeled-primer technique was used, with dye 6-FAM (Sigma Genosys, Woodlands, TX). Gels were scanned in a FMBIO II fluorescence imaging system (Hitachi Instruments, Yokohama, Japan) at 505 nm.

DATA ANALYSES

Sequence alignment was performed with ClustalW (Thompson et al. 1994) in the BioEdit interface (Hall 1999) and verified by eye. Two recombinant alleles were easily identified. Recombinant alleles were first discarded to reconstruct the most parsimonious genealogy with the Genetree software (Bahlo and Griffiths 2000). In a second analysis, recombinant alleles were retained to reconstruct the recombination history with the Beagle software (Lyngsø et al. 2005). The tree and the recombination history were then synthesized into a single graph that was scaled with a molecular clock calibrated with the divergence to the sister species M. trossulus and the trans-Arctic interchange (Gérard et al. 2008). Genotype datasets were analyzed with the Genetix software (Belkhir et al. 2002). FIS and FST were, respectively, estimated by the f and θ estimator of Weir and Cockerham (Weir and Cockerham 1984), and their departure from zero tested by permutations. An FST outlier test of selection was performed with the method of Foll and Gaggiotti (2008).

FITTING HITCHHIKING MODELS

The local hitchhiking model assumes a selective sweep in one environmental patch of populations, in which the new mutation is favored, followed by introgression between the two backgrounds defined by the genotypes at the selected locus. At the end of the local sweep, before introgression, hitchhiking occurred in one patch (population/habitat 1, North Sea) and the frequency of the hitchhiker allele at the neutral locus can be approximated by single population expectations (Barton 2000) while nothing happened yet in the other patch (population/habitat 2, Bay of Biscay):

image(1)
image(2)

where u*j,i is the frequency of the hitchhiker allele at locus i in population j, u0,i is the initial frequency of the hitchhiker allele prior to the sweep (assumed the same in both populations), ɛ is the initial frequency of the favorable mutation (or more exactly the frequency from which the favorable mutation started to behave deterministically [Barton 2000]), xi is the physical position of the locus in Kb (x= 0 is the beginning of the EF1α gene), a is the physical position of the selected locus, ρ is the recombination rate, s is the selection coefficient. Combining these frequencies together with equation (80) and (81) of Wood and Miller (2006) and a gene flow factor to account for the barrier to gene flow generated by the selected locus (Barton 1979; Bengtsson 1985), the frequency of the hitchhiker allele after a period of introgression can be approximated by the following two equations:

image(3)
image(4)

where ψ is a constant proportional to the migration rate and the time elapsed since the local sweep. An additional parameter could have been introduced to take into account the asymmetry of introgression. However, it did not prove useful in the present study.

The global hitchhiking model assumes a selective sweep occurred everywhere, although the hitchhiking effect is attenuated as one goes away from the birthplace of the advantageous mutation. Barton gives formulae that predict for the net increase in frequency of the hitchhiker allele, and implicitly for the spatial pattern produced by the global hitchhiking model (Barton 2000). These formulae would be especially useful to analyze a “hitchhiking-cline” (i.e., a neutral cline produced by global hitchhiking, Fig. 3 and Fig. S4). Here, to analyze the two-patch structure in mussels (Bierne et al. 2003a), it is straightforward to fit the data to single population expectations (Barton 2000) by introducing a parameter that accounts for the attenuation of the hitchhiking effect

image(5)
image(6)

where α is a parameter that measures the attenuation of the hitchhiking effect in population 2.

A least-squares method was used to find the parameter values that minimize the sum of the squared deviations between observed and predicted allele frequencies using a simulated annealing algorithm (Kirkpatrick et al. 1983) programmed in a Mathematica (Wolfram 1996) notebook.

Results

GENETIC VARIATIONS

EFbis is a length polymorphism locus located in the third intron of the EF1α gene in Mytilus mussels. The strikingly strong genetic structure observed at the EFbis locus between M. edulis populations of the North Sea and M. edulis populations of the Bay of Biscay, two regions that are undifferentiated at all other markers screened to date, was demonstrated to be the consequence of a selective sweep (Faure et al. 2008). I reiterated a test of selection with a few additional loci and by using the recent method of Foll and Gaggiotti (2008). EFbis was the single locus to exhibit a very high posterior probability to be subject to selection (Table 1). Contrary to most FST scans found in the literature but in marine species, nonoutlier loci revealed a total absence of genetic structure between the two M. edulis populations that implies high Nm. This result also suggests that demography did not profoundly affect genetic structure of nonoutlier loci (also see Boon et al. 2009). The analysis of DNA sequence polymorphism revealed the genetic signature of a selective sweep in the form of a star-shaped clade of alleles, the hitchhiker lineage (Fig. 4). This lineage was nearly fixed in the North Sea and it was segregating at a moderate frequency in the Bay of Biscay. Gene genealogies significantly deviated from the standard coalescent in both populations (Faure et al. 2008). The question raised was: is the presence of the hitchhiker lineage in both populations a consequence of introgression under the local hitchhiking model or a consequence of a softer selective sweep in the Bay of Biscay than in the North Sea as expected under the global hitchhiking model?

Table 1. FST -outlier test of selection.
Locus nameObserved FST1log10(BF)2Locus name Observed FST1log10(BF)2
  1. 1Weir and Cockerham's (Weir and Cockerham 1984) estimator of FST and significance level adjusted for multiple testing. NS: not significant; ***: P<0.001.

  2. 2Probability of departing from the neutral FST distribution as measured by the decimal logarithm of the Bayes factor, log10(BF), computed by the method of Foll and Gaggiotti (2008). According to these authors, log10(BF)>2 is decisive evidence for selection as the data favor the model with selection over the neutral model at odds of more than hundred to one.

EFbis0.255***2.55LAP 0.001NS0.02
DAMP10.038NS0.54PGI 0.001NS0.03
EF20.023NS0.30Mytilin B 0.001NS0.02
Glu-5′0.020NS0.29DAMP2 0.000NS0.01
DAMP30.007NS0.02MGD2-int3−0.002NS0.02
OCT0.006NS0.03EST−0.004NS0
Glucanase0.003NS0.04MGD2-int2−0.008NS0.01
MPI0.003NS0.01Mannanase−0.01NS0
mac-10.003NS0.02   
Figure 4.

Structure of the EF1α gene in Mytilus edulis and sample genealogies of three portions of the gene in the population of the Bay of Biscay. Observed mutations are mapped on the genealogies. Nucleotide substitutions are depicted by dots and insertions/deletions by triangles. Positions of the nine polymorphisms chosen for extensive genotyping are indicated by arrows for nucleotide polymorphisms and by triangles for indels. Mutations mapping to the internal branch of the genealogy are in gray and mutations mapping to the base of the hitchhiker lineage are in black. The small striped rectangle of the E4–E6 section is a portion removed from the analysis. The North Sea sample genealogy of the E2–E4 portion (Faure et al. 2008) is given in an insert.

Using long range PCR, the full-length genomic sequence of the EF1α gene has been obtained (accession numbers HM214778–HM214780). The gene is approximately 5 Kb long and is composed of six exons. The structure of the gene is shown in Figure 4. DNA sequence polymorphism was analyzed in a sample of the Bay of Biscay in which the frequency of the hitchhiker lineage is intermediate (Faure et al. 2008). To a sample of 24 sequences of a region comprising exon 2 to exon 4, was added a sample of 15 sequences of the portion comprising exon 1 to exon 2 (accession numbers HM214778– HM214792) and a sample of nine sequences of the portion comprising exon 4 to exon 6 (accession numbers HM214778– HM214780 and HM214793–HM214798). Although the individuals sampled in the three cases differed for technical reasons, similar gene genealogies to the one observed by Faure et al. (2008) were observed with the two new segments (Fig. 4), as expected.

Five indel polymorphisms and four nucleotide polymorphisms were chosen from the sequence dataset for extensive genotyping in the two populations (North Sea and Bay of Biscay). They were chosen (1) to be homogenously distributed along the EF1α sequence, (2) to be easy to screen (indels or consecutive SNPs), and (3) to carry similar informative genealogical information. Indeed, the positions of mutations in the genealogy are important. Five mutations are young, mapping to the base of the hitchhiker lineage in the genealogy (Fig. 4). They are the most informative mutations in the context of studying genetic hitchhiking (Barton 1998) and should be favored under the strategy used in this study—that is, tracking informative hitchhiker alleles that share a common, preferably low, u0. However, the deletion responsible for the EFbis length polymorphism proved to be older, mapping to the internal branch of the tree (Fig. 4). This deletion and three additional mutations of this branch have been chosen because they share similar information to EFbis, the locus initially identified to be an FST-outlier. The two categories of mutations reflect differences in their initial frequencies, u0, before the selective sweep. Mutations mapping to the base of the hitchhiker lineage were at a lower frequency than the mutations mapping to the internal branch when selection occurred and modified the genealogy by rapidly increasing the frequency of the hitchhiker lineage. Because u0 is a parameter of the hitchhiking model, results are expected to be different for these two categories of mutation.

Departures from the Hardy–Weinberg equilibrium were never significant. The frequency of the hitchhiker alleles increases from the 3′ toward the 5′ region of the EF1α gene in both populations, in the North Sea and the Bay of Biscay. Population differentiation decreases moving toward the 5′ region of the gene (Fig. 5). The patterns of linkage disequilibrium in the two populations are revealed by the haplotype structures at the five polymorphisms that map to the base of the hitchhiker lineage (Fig. S5). They are characteristic of a region affected by a selective sweep due to the fixation of a favorable mutation in the 5′ region of the gene. The haplotype structure also revealed a few recombination events that are best explained by recombination after the sweep, which, in addition to the singletons found in the star-shaped clade (Fig. 4), suggests the sweep was not very recent. A molecular clock calibrated with the divergence to the sister species M. trossulus and the trans-Arctic interchange, allowed a rough estimate of the selective sweep to be about 25 (5–100) thousand years ago.

Figure 5.

Fitting hitchhiking models to allele frequency data of the EF1α gene in populations of Mytilus edulis. (A) Frequency of the hitchhiker allele at nine loci in the sample of the North Sea (circles) and in the sample of the Bay of Biscay (squares). Curves are the fit to the local hitchhiking model (eqs. 3 and 4) with two different u0 values (bold lines: u0= 0.46, thin lines: u0= 0.63). Black symbols correspond to mutations mapping to the base of the hitchhiker lineage in gene trees (Fig. 4) and gray symbols to mutations mapping to the internal branch. (B) Genetic differentiation between the two populations and the best fit of the local hitchhiking model. (C) Frequency of the hitchhiker allele at nine loci in the sample of the North Sea (circles) and in the sample of the Bay of Biscay (squares). Curves are the fit to the global hitchhiking model (eqs. 5 and 6) with two different u0 values (bold lines: u0= 0.1, thin lines: u0= 0.35). Black symbols correspond to mutations mapping to the base of the hitchhiker lineage in gene trees (Fig. 4) and gray symbols to mutations mapping to the internal branch. (D) Genetic differentiation between the two populations and the best fit of the global hitchhiking model.

FITTING HITCHHIKING MODELS

Hitchhiking models have been fitted to the observed allele frequencies in the two populations (see Materials and Methods). Fitting the global hitchhiking model (hitchhiking in the two populations although of different magnitude), the position of the selected locus was inferred to be −3Kb 5′ of the start codon of the EF1α gene, or −4.8 Kb 5′ of the FST outlier EFbis locus. The selection coefficient was inferred to be s= 0.011, the local recombination rate to be ρ= 1.7 cM/Mb, and the initial frequency of the favorable mutation to be ɛ= 5.4 10−6. The hitchhiking effect was estimated to have been 10 times stronger in the population of the North Sea than in the population of the Bay of Biscay (α= 0.1). Such a decrease of the hitchhiking effect implies a very low migration rate between the two populations. Faure et al. (2008) suggested m= 10−8 to be the maximal value that can explain the data under a two-patch model of population structure. The initial frequencies of the hitchhiker alleles were from the 5′ to the 3′ region of the gene: u0,1= 0.32, u0,2= 0.10, u0,3= 0.35, u0,4= 0.13, u0,5= 0.33, u0,6= 0.12, u0,7= 0.37, u0,8= 0.11, u0,9= 0.10. The fit to the model was very good (R2= 0.998, P < 0.0001). It remained very good when the initial frequencies were constrained to two values (R2= 0.997, P < 0.0001), and this was not significantly different from the model with nine u0,i. As expected, initial frequencies of mutations that mapped to the internal branch of the tree were higher than the initial frequencies of mutations that mapped to the base of the hitchhiker lineage. However, differences between mutations of the same type were small and not significantly different. As a consequence, a graphical representation was possible and shows how similar to the theoretical predictions the results were (Fig. 5). An additional support to the global hitchhiking model was found by fitting independently a single population hitchhiking model (eq. 5) to the frequency of the hitchhiker allele in the two populations (North Sea and Bay of Biscay). The position of the selected locus was inferred to be at a similar position in the two populations, at −4.75 Kb in the sample of the North Sea and −3.58 Kb in the sample of the Bay of Biscay, although the r/s ratio was ∼10 times lower in the North Sea as expected. The inferred positions (−4.75 Kb and −3.58 Kb) were not significantly different from the position inferred in the simultaneous fit (−3 Kb). A graphical representation of the two independent fits is presented in Figure S6.

A fit to a local hitchhiking model (hitchhiking in one population followed by introgression) was also performed. When initial frequencies were constrained to two values, the position of the selected locus was inferred to be −3.37 Kb 5′ of the start codon, the selection coefficient was inferred to be s= 0.008, the local recombination rate to be ρ= 0.77 cM/Mb, and the initial frequency of the favorable mutation to be ɛ= 10−9. These results are similar to those obtained with the global hitchhiking model, which is explained by the fact that both models predict hitchhiking in the North Sea. Indeed the fit to the North Sea data is equally good with the two models (local vs. global sweep), it is the fit to the Bay of Biscay data that differ (Fig. 5). The ψ parameter that measures the rate of introgression (it is proportional to the migration rate and the time elapsed since the local sweep) converged toward zero and initial allele frequencies were inferred to be u0= 0.46 for the five mutations mapping to the base of the hitchhiker lineage and u0= 0.63 for the four mutations mapping to the internal branch of the genealogy. A graphical representation of the fit is presented in Figure 5. The result obtained means that under the local hitchhiking model the high frequency of hitchhiker alleles in the population of the Bay of Biscay would be explained by high initial frequencies rather than introgression. This is because the frequency of the hitchhiker allele is decreasing from the 5′ toward the 3′ region rather than increasing—that is, it varies in the opposite direction to that predicted by the local hitchhiking model (Fig. 2A). However, the fit of allele frequencies does not take into account the fact that hitchhiker alleles have been chosen on a genealogical ground. The existence of the star-shaped clade of alleles in both populations (Faure et al. 2008, Fig. 4) implies either that the sweep was global or that introgression occurred after a local sweep. To conclude, in addition to being mediocre, the fit to the local hitchhiking model is not consistent with the analysis of gene genealogies. On the other hand, the global hitchhiking model not only provides a better fit to the allele frequency data but it also provides a simple and coherent explanation for the presence of the star-shaped clade of alleles in both populations.

Discussion

In the age of genomics, the study of adaptation in nonmodel species relies chiefly on the use of neutral markers to infer the action of selection on linked loci. Unfortunately, the indirect effect of selection on linked neutral variation is often assumed to resemble the effects of direct selection with an attenuation factor. However, theory has shown that indirect selection is better understood as within-genome variation in the effective population size within populations (Felsenstein 1974; Gillespie 2000; Charlesworth 2009) and variation in the effective migration rate between populations (Barton 1979; Bengtsson 1985; Ingvarsson and Whitlock 2000). Accordingly, strong genetic differentiation at a locus is not necessarily the consequence of ongoing selection but can also be the transient footprint left in genomes by past selection (Slatkin and Wiehe 1998; Barton 2000; Faure et al. 2008). However, few studies have investigated further how selection is/was responsible for the strong differentiation observed at an FST-outlier locus (Pogson 2001; Wood et al. 2008). The present study provides the first experimental evidences that unusually strong levels of differentiation between populations might be the footprint left by global hitchhiking in a subdivided population, a scenario largely ignored as an explanation for strong population differentiation at specific loci.

TOWARD A METHOD TO DISTINGUISH GLOBAL FROM LOCAL HITCHHIKING

The major result proposed to distinguish between global and local hitchhiking is that very close to the selected locus, global hitchhiking causes no differentiation (at least, once the sweep is complete) whereas local hitchhiking does—hence the double versus single peaked pattern. Ideally, symmetric peaks of FST each side of a valley of diversity would provide the most convincing demonstration of the global sweep phenomenon. However, detecting this signature requires a long chromosome walk, which can be prohibitive in many nonmodel species. Fortunately some other strong predictions can be tested by analyzing just one side of the sweep. The position of the selected locus can be inferred by standard methods of hitchhiking mapping, in the population in which the sweep was strongest. Once the selected locus has been localized, global hitchhiking predicts that geographic differentiation will decrease as you approach the selected site, whereas local hitchhiking predicts that it will increase. I summaries in Table 2 the strategy one can use to test if a high-FST outlier locus was caused by local or global adaptation.

Table 2.  A practical guide to distinguishing local from global hitchhiking.
StepWhat to do
1- Find a locus with an unusually high FST valueScreen as many loci as possible and perform FST outlier tests of selection.
2- Search for evidence of a selective sweepSample DNA sequences and perform tests of departure from the mutation–drift equilibrium (e.g., Tajima's D, Fay and Wu's H, LD tests, coalescence-based tests).
 -No departure: Standard signatures of a selective sweep are expected to be erased when local adaptation is old and the equilibrium between migration, indirect selection and drift has been reached (Charlesworth et al. 1997; Feder and Nosil 2010). You are probably quite close to the selected locus, if the locus is not itself the direct target of selection. The FST-outlier can also be a false positive. Nevertheless, a chromosome walk is advisable.
 -Departure in some populations: A very recent local sweep (not followed by introgression) produces the standard signature of a selective sweep only in those populations in which the allele is favored (e.g., Pollinger et al. 2005).
 - Departure in every population: Global and local hitchhiking are both possible. You need to investigate the chromosomal neighborhood of the FST-outlier. Note that departures may be subtle in some populations in which the hitchhiker lineage segregates at intermediate frequencies (Santiago and Caballero 2005; Faure et al. 2008). Go to step 3.
3- Characterize the longest possible region surrounding the locusYou may be lucky enough to have the complete genome of the species or a closely related species available, or you may obtain new genomic information from existing data (e.g., if the locus belongs to a known gene you may amplify the full gDNA sequence from primers designed with the cDNA sequence, as was done in the present study) or you may need to screen a genomic library (e.g., a BAC library, Wood et al. 2008)
4- Identify the position of the selected locus in the “hard-swept” populationPerform a chromosome walk with the sampled population in which the signal of the selective sweep is strongest. This population is interpreted as being the closest to the birthplace of the favorable mutation if hitchhiking was global, or the population of the habitat type in which the new adaptation appeared if hitchhiking was local. Heterozygosity should decrease toward the position of the selected locus (or the frequency of hitchhiker alleles should increase) whatever the scenario.
5- Investigate the “soft-swept” populationPerform a chromosome walk in another population, where the selective sweep is softer.
 - If population differentiation decreases toward the inferred position of the selected locus, hitchhiking was global. Heterozygosity within the population should also decrease (or the frequency of hitchhiker alleles should increase) toward the selected locus.
 - If population differentiation increases toward the inferred position of the selected locus, hitchhiking was local. The pattern of variation within the “soft-swept” population is expected to be complex with the frequency of the hitchhiker lineage being maximal for intermediate map distance from the selected locus (Fig. 2).
6- The icing on the cakeIf the chromosome fragment investigated is long enough you may be able to identify the selected locus, and observe either a single peaked (local sweep) or a twin-peaked (global sweep) pattern of differentiation.

EVIDENCE FOR THE SCENARIO OF GLOBAL HITCHHIKING IN A SUBDIVIDED POPULATION OF M. EDULIS

Among all the loci analyzed to date between M. edulis populations of the North Sea and of the Bay of Biscay, the EFbis locus is the only one for which a strong and significant genetic differentiation has been observed (Table 1). This locus would have traditionally been interpreted as being under local selection (Beaumont 2005; Foll and Gaggiotti 2008). However, the local hitchhiking model provides a poor fit to the observed patterns of variation in diversity versus differentiation in the chromosomal neighborhood of the EFbis locus. For the North Sea population, both the local and the global hitchhiking models predict that diversity has been swept by the fixation of a favorable mutation. The pattern observed in the North Sea suggests that the selected locus is localized in the 5′ region a few Kb upstream of the EF1α gene (Fig. 5). However, population differentiation decreases toward the 5′ region. Furthermore, the pattern observed in the population of the Bay of Biscay points to the same 5′ region for the localization of the selected locus when analyzed under the global hitchhiking model. The fit to the global hitchhiking model was very good whereas the fit to the local hitchhiking model was mediocre and inconsistent with gene genealogies. It is inferred that a mutation with a selective advantage of ∼1% fixed ∼5 Kb away of the EFbis locus. The sweep would date back to 25,000 years ago and the hitchhiking effect was 10 times stronger in the North Sea than in the Bay of Biscay. The global hitchhiking model predicts the existence of a second peak of high FST on the other side of the selected locus. Stronger support would therefore require an analysis of ∼15 Kb in the 5′ region of the EF1α gene. In addition to the symmetric twin peak of differentiation, such an analysis would allow identifying and characterizing the selected locus. Although demanding in a nonmodel species, such as M. edulis, this work is ongoing.

THE RELATIVE IMPORTANCE OF ONGOING-LOCAL VERSUS PAST-GLOBAL ADAPTATION IN EXPLAINING GENOME DIFFERENTIATION

The result obtained with the first FST-outlier locus studied in detail in Mytilus raises questions of the relative importance of the two alternative selective scenarios (i.e., ongoing spatially varying selection vs. past uniformly positive selection) in explaining the 5–10% of high FST-outlier loci usually detected in genome scan experiments. This is a difficult question for which a general answer is unlikely to exist as it depends on the population size, the dispersal rate, the spatial scale investigated, and the history of the species under consideration. Barton (2000) argued that because we know that the rate of global sweeps is rather low (as witnessed by the slow molecular clock), hitchhiking may only be significant if there are many more local sweeps, that do not ever show up as species-wide divergence. In Drosophila, the rate of adaptive amino acid substitution in the divergence between species (i.e., global adaptation) is estimated to be around 1 adaptive amino acid substitution every ∼500 generations (Smith and Eyre-Walker 2002). This rate seems sufficiently fast to explain (1) the correlation between recombination and nucleotide diversity (Bierne and Eyre-Walker 2004) and (2) the number of genome regions for which diversity has been significantly deflated by a recent, sufficiently strong, selective sweep (Li and Stephan 2006). If local adaptation really proves to be preponderant, these results would suggest that local adaptation does not have the same impact on genome diversity as global adaptation. Furthermore, it might seem more parsimonious to think that most departures from neutrality are footprints left by past selection on genetic diversity rather than evidences for selection in action. The answer should ultimately be found from the detailed analysis of the footprints left by the two kinds of adaptation in genome scans of population differentiation. However, distinguishing the two scenarios needs chromosome walks that are not easy to conduct in nonmodel species as the position and the chromosomal neighborhood of FST outliers are often unknown (Bonin et al. 2006).

At present, the question of the relative importance of the two alternative selective scenarios can only be partly addressed in model species in which many linked loci mapped on a known genome have been screened. In humans and Drosophila, that share a similar history of recent worldwide colonization from Africa, most of the effort to date has been concentrated on the comparison between ancestral and derived populations thought to have recently adapted to new environmental conditions. In humans for instance, when a population-specific selective sweep is detected, the diversity of other populations is typically unaffected (Sabeti et al. 2007; Coop et al. 2009). In a sense the process has stopped at the end of the first phase of the local hitchhiking model, when introgression has not yet occurred between differentially adapted genetic backgrounds. It is too early to speculate how local adaptation may be imposing a barrier to gene flow on the chromosomal neighborhood; gene flow that in any case has remained weak enough to generate population differentiation on the whole genome (genomic average of FST between continents: ∼8–15%, Coop et al. 2009). It is even possible that some population-specific selective sweeps correspond to unconditionally favorable mutations on their way to spreading worldwide, as in the global hitchhiking model. Recently, adaptation has been studied on a smaller spatial scale within continents (genomic average of FST within continents <5%, Coop et al. 2009; Pickrell et al. 2009). In this case, some loci inferred to be candidates for local adaptation within continents may well prove to have been affected by global hitchhiking in structured populations. Although some exhibit the clinal pattern usually interpreted as evidence for local adaptation (e.g., TLR6, Pickrell et al. 2009), clines may also be produced at a neutral locus under the global hitchhiking model (“hitchhiking-clines,”Fig. 3). Recently, Chen et al. (2010) used a new method to identify genome regions with unusual patterns of population differentiation, some of which could correspond to the global hitchhiking model (for example see position 38 to 38.4 Mb of human chromosome 11 in Fig. 7 of Chen et al. [2010]). Anyhow, the size of human populations has long remained too small for interpreting the data with the deterministic model presented here. Genetic structure is pervasive in the human genome, most of it consistent with neutral expectations, and range expansions have likely contributed to exacerbating the observed genetic differences (Hofer et al. 2009). Moreover, the LD-block structure generated by recombination hot spots does not help in identifying gradients of population structure along chromosomes and selective sweeps have rather generated bounded chunks of differentiation (Sabeti et al. 2007; Williamson et al. 2007; Chen et al. 2010).

Stochastic noise is expected to be less pronounced in Drosophila melanogaster due to their larger population sizes. Despite intensive investigation of latitudinal clines (Turner et al. 2008) and of the signature of positive selection within populations (Li and Stephan 2006), few studies have investigated the chromosomal footprints of selection to varying environmental conditions (Berry and Kreitman 1993; Schmidt et al. 2008b). As for humans, the differentiation between continents is strong (FST∼15%) and the scale of interest is within-continent (FST < 5%). The results obtained so far suggest that local selection is the true explanation of the genetic signatures, because genetic differentiation is localized on a very restricted portion of the chromosome, forming a peak of differentiation. As an illustration, I have fitted in Figure S7 a local hitchhiking model to the data of Berry and Kreitman on the Adh gene in D. Melanogaster (Berry and Kreitman 1993). Because the selected site responsible for the fast/slow polymorphism is not fixed, the difference in allele frequency at linked silent sites was standardized by the difference observed at the Adh-F/S site, as was done in Figure S3B. Although the first few clines studied in detail proved to conform well to the local selection model, further investigations of other clines may provide evidence for alternative scenarios. Vasemägi (2006) has already pointed out that simple isolation by distance can produce a high proportion of clinally varying loci. I here suggest an additional explanation, that some clines are the consequence of global hitchhiking in structured populations (“hitchhiking-clines,”Fig. 3). From their genome scans of genetic differentiation, Turner et al. (2008) presented some examples of chromosomal regions exhibiting abnormally strong differentiation. One of these regions, the one surrounding the gene dmrt93b, was dome-like instead of peak-like in North American populations and could potentially fit with the global hitchhiking model. However, the model predicts a second dome of differentiation symmetric to the adaptive substitution (Fig. 4B) which was not noticeable.

Very recently, whole-genome FST scans have been conducted in silkworm (Xia et al. 2009) and maize (Gore et al. 2009), two species with large population sizes as inferred from their levels of diversity. In the silkworm study (Xia et al. 2009), which compared domesticated and wild varieties, the genomic average of FST was high (∼15%) in accordance with the occurrence of a bottleneck during domestication. The example provided in Figure S7 of Xia et al. (2009) fits the local hitchhiking model very well because a peak of high FST is co-localized with a valley of diversity and of Tajima's D in one of the two subpopulations (the domesticated varieties). In the maize study (Gore et al. 2009), which compared tropical and temperate subpopulations, the genomic average of FST was low (3.8%) which suggests that population sizes are large relative to the migration rate. Peaks of high FST do not always co-occur with valleys of diversity and of Tajima's D, but sometimes a valley of diversity is surrounded by two peaks of high FST (for example see position 80–100 of the chromosome 7 in Fig. S4 of Gore et al. (2009)). These data are therefore consistent with the global hitchhiking hypothesis. However the sample size was very small in this study and more detailed investigations would be required to validate this hypothesis, or to infer the relative contributions of local and global adaptation to generating peaks of high FST in the maize genome. Interestingly, a detailed survey of a region encompassing two loci responsible for two closely linked selective sweeps (a major maize domestication locus, Tb1, and a locus involved in flowering time variations, Dwarf8), revealed a peak of FST between tropical and temperate subpopulations in-between the two loci (Camus-Kulandaivelu et al. 2008). This too is consistent with the predictions of the global hitchhiking model, although the analysis is complicated in this context by interference selection between the two selected loci (Camus-Kulandaivelu et al. 2008; Chevin et al. 2008).

GENETIC DRAFT IN STRUCTURED POPULATIONS OF MARINE SPECIES

In model species, dense genome scans performed to date have not provided evidence for the global hitchhiking model, with the possible exception of maize. Why would the first detailed analysis of hitchhiking selection in a structured population of marine mussels strongly suggest global hitchhiking as the most probable explanation? Is this result atypical, or are there some biological characteristics of marine mussels that would favor global hitchhiking? The marine literature has a long history of providing evidence for selection at specific loci (Mitton 1997; Schmidt et al. 2008a). Some have demonstrated correlations between an enzyme polymorphism and an environmental variable (Koehn et al. 1980; Johannesson et al. 1995; Powers and Schulte 1998; Schmidt and Rand 2001), others revealed different levels of population structure between allozymes and nuclear DNA markers (Karl and Avise 1992; Pogson et al. 1995; Lemaire et al. 2000; Riginos et al. 2002; Silva and Skibinski 2009), and FST outliers are now discovered at an impressive rate (Hauser and Carvalho 2008; Schmidt et al. 2008a). Unfortunately, the question of whether selection acted directly on the polymorphisms screened or indirectly through genetic hitchhiking has not been addressed rigorously (Bierne et al. 2003b). As the present study illustrates however, interpretations could be much more diverse and complex, if not unanticipated, under indirect than direct selection. Many marine species possess unique biological characteristics that affect their genetic diversity, structure, and evolution including high fecundities, large population sizes, external fertilization with broadcast spawning, and extended larval dispersal. Although these distinctive features are well recognized when genetic data are interpreted under a neutral framework, their consequences on the genetics of adaptation have not been thoroughly addressed (Hilbish 1996). Because some marine animals such as mussels or oysters have enormous population sizes (Sauvage et al. 2007; Boon et al. 2009), they are good candidates for the genetic draft model of Gillespie (2000). This model proposes that in large populations, neutral variability is recurrently swept by hitchhiking selection such that the mutation/drift equilibrium is hardly ever reached. How would the genetic draft model translate in a spatially or genetically structured population? What proportion of the genome should depart from the migration/drift equilibrium because of recurrent hitchhiking events? Among the unknown parameters required to answer these questions will be the relative importance of local versus global adaptation. Although local adaptation seems preponderant at the spatial and time scales investigated in humans or flies, this should not necessarily be the case in species with larger population sizes and with a different history such as in mussels, or maize. The limited evidence to date in M. edulis, would indeed suggest global adaptation to be preponderant (global:1–local:0). Theoreticians have identified distinguishing between genetic drift and genetic draft as a major goal of modern population genetics (Gillespie 2004; Nielsen 2005; Wakeley 2007). Marine species are thus likely to play an important role in the forthcoming debate on draft. Indeed, who could have anticipated that two populations of mussels could have stayed essentially isolated (m < 10−8) for more than 25,000 years without developing any genetic structure by drift except at a genomic region in which an unconditionally favorable mutation exacerbated the structure by draft?FST scans are just preliminary steps to more detailed investigations of the nature of selection that truly footprints genomes, investigations that are likely to reveal unanticipated complexity as exemplified in the present study.


Associate Editor: M. Hellberg

ACKNOWLEDGMENTS

I am very grateful to A. Kneidinger and H. Mathé Hubert for technical assistance and to F. Cerqueira and E. Desmarais at the IFR119 “Montpellier Environnement Biodiversité” for access to the sequencing platform. I am much indebted to J. Welch and G. Pogson as well as to N. Barton, M. Hellberg, D. Rand, and three anonymous reviewers for thoughtful comments that greatly improved the clarity of the manuscript. This work was funded by the Agence National de la Recherche (Hi-Flo project ANR-08-BLAN-0334-01). This is article 2010-043 of Institut des Sciences de l’Evolution de Montpellier.

Ancillary