The spatial scale of dispersal revealed by admixture tracts

Abstract Evaluating species dispersal across the landscape is essential to design appropriate management and conservation actions. However, technical difficulties often preclude direct measures of individual movement, while indirect genetic approaches rely on assumptions that sometimes limit their application. Here, we show that the temporal decay of admixture tracts lengths can be used to assess genetic connectivity within a population introgressed by foreign haplotypes. We present a proof‐of‐concept approach based on local ancestry inference in a high gene flow marine fish species, the European sea bass (Dicentrarchus labrax). Genetic admixture in the contact zone between Atlantic and Mediterranean sea bass lineages allows the introgression of Atlantic haplotype tracts within the Mediterranean Sea. Once introgressed, blocks of foreign ancestry are progressively eroded by recombination as they diffuse from the western to the eastern Mediterranean basin, providing a means to estimate dispersal. By comparing the length distributions of Atlantic tracts between two Mediterranean populations located at different distances from the contact zone, we estimated the average per‐generation dispersal distance within the Mediterranean lineage to less than 50 km. Using simulations, we showed that this approach is robust to a range of demographic histories and sample sizes. Our results thus support that the length of admixture tracts can be used together with a recombination clock to estimate genetic connectivity in species for which the neutral migration‐drift balance is not informative or simply does not exist.

contributes to population growth relative to local recruitment (Lowe, 2003), that is, demographic connectivity. However, they are often extremely difficult to implement (Broquet & Petit, 2009), especially for marine species in which dispersal usually takes place during a larval stage (Selkoe et al., 2016).
Indirect genetic approaches provide less demanding alternatives to evaluate average dispersal rates and distances (Broquet & Petit, 2009), although they remain uninformative regarding the contribution of dispersal to population demography, and hence stability (Lowe & Allendorf, 2010). Estimation of dispersal scales from isolation-by-distance (IBD) patterns (Rousset, 1997) has been used with success in several marine species (Palumbi, 2003;Pinsky, Montes Jr., & Palumbi, 2010;Pinsky et al., 2017;Puebla, Bermingham, & McMillan, 2012), sometimes providing consistent estimates of single-generation dispersal distances compared to direct parentage assignment methods (Pinsky et al., 2017). Nevertheless, these methods are associated with a number of assumptions which potentially limit their range of application, such as equilibrium conditions between migration and drift. Equilibrium can take a long time to establish in species with large effective population sizes, which is commonly the case in marine species. Moreover, an independent assessment of the effective density of reproducing individuals (i.e., capturing drift effects) is required for estimating the standard deviation of dispersal distances from IBD patterns (Rousset, 1997). Therefore, such approaches are not always applicable even though IBD patterns are often observed in marine species (Selkoe et al., 2016).
The recent availability of genomewide polymorphism data in nonmodel organisms has opened new research avenues for assessing connectivity, especially with the information contained in selected and hitchhiker loci (Gagnaire et al., 2015). On the other hand, a renewed interest in neutral inferences has been shown thanks to the availability of haplotype data, which have a high potential to shed light on dispersal (Cayuela et al., 2018;Gagnaire et al., 2015;Pool & Nielsen, 2009). For instance, long identical-by-descent (IBD) blocks shared between individuals have been used to infer recent demography (Palamara & Pe'er, 2013;Ringbauer, Coop, & Barton, 2017). Similarly, the distribution of migrant tracts has also proved useful for inferring the timing of recent admixture events (Gravel, 2012;Pool & Nielsen, 2009). This second type of approach relies on the fact that gene flow between divergent gene pools (e.g., populations, lineages, subspecies, ecotypes) allows migrant chromosomes to enter a new genetic background with which they recombine. As migrant chromosomes diffuse through the landscape within the introgressed population, they are progressively shortened by recombination at each generation (Liang & Nielsen, 2014;Pool & Nielsen, 2009). Therefore, the length of migrant tracts (also called admixture tracts) is informative of the time elapsed since introgression, while being relatively robust to the effect of effective population size (Racimo, Sankararaman, Nielsen, & Huerta-Sánchez, 2015). Analyzing the migrant tract length distribution in a spatial context should therefore enable to estimate the speed at which migrant tracts diffuse within an introgressed lineage and to ultimately estimate single-generation dispersal distances on conservation-relevant timescales. This method only requires gene flow between genetically differentiated populations and mapped SNPs to detect and measure the length of introgressed tracts using local ancestry inference (LAI). LAI methods have been developed that work even without the need to use phased markers (Baran et al., 2012;Guan, 2014). Nevertheless, a large variety of direct (Snyder, Adey, Kitzman, & Shendure, 2015) and indirect (Browning & Browning, 2011;Rhee et al., 2016) phasing methods can be used to facilitate the delineation of migrant tracts. Since admixture between diverging lineages is relatively common in nature (Payseur & Rieseberg, 2016), introgressed tracts have the potential to bring new information on dispersal in many species that remain difficult to study with direct tagging or classical indirect genetic approaches, which is the case for many marine species.
To illustrate this approach, we apply this framework to estimate the spatial scale of dispersal in a highly exploited marine fish, the European sea bass (Dicentrarchus labrax). This species is subdivided into an Atlantic and a Mediterranean glacial lineage, which started to diverge in allopatry about 300,000 years BP and remain currently partially reproductively isolated (Tine et al., 2014). Since the end of the last glacial period, asymmetrical gene flow allows the entry of Atlantic migrant tracts within the western Mediterranean population. In a recent study, we showed that these migrant tracts are on average shorter in the eastern compared to the western Mediterranean population, consistent with the action of recombination during the diffusion of Atlantic haplotypes across the Mediterranean Sea (Duranton et al., 2018). Here, we estimate the spatial scale of dispersal within the Mediterranean sea bass lineage by comparing the length distribution of introgressed Atlantic tracts between two different populations located at different distances from the contact zone with the Atlantic lineage. Furthermore, we use simulations to evaluate the robustness and the generality of this strategy to different admixture histories and sample sizes. With the development of new LAI methods to estimate the length distribution of admixture tracts (Corbett-Detig & Nielsen, 2017;Medina, Thornlow, Nielsen, & Corbett-Detig, 2018), we expect that quantitative assessments of dispersal will be obtained in several other species, which may help to resolve a long-standing issue in conservation biology.

| Whole-genome resequencing, phasing, and local ancestry inference
Our analysis relies on the use of haplotype-resolved whole-genome sequences already published in Duranton et al. (2018). Briefly, we sequenced different mother-father-child trios obtained in experimental crossings to perform chromosome-wide phasing-by-transmission (Browning & Browning, 2011). Females from the western Mediterranean Sea (Gulf of Lion, n = 8, ♀ W-MED ) were crossed with males from either the Atlantic Ocean (English Channel, n = 4, ♂ ATL ) or the eastern Mediterranean Sea (Turkey n = 2 and Egypt n = 2, ♂ E-MED ) to generate 8 families: 4 ♂ ATL × ♀ W-MED and 4 ♂ E-MED × ♀ W-MED ( Figure 1a). This sampling design allowed generating phased whole-genome sequences from Mediterranean populations located at different distances from the contact zone with the Atlantic lineage (either near: W-MED, or far: E-MED). Since no genetic differentiation has been found between samples from Egypt and Turkey (Duranton et al., 2018), all E-MED individuals were grouped together within a single population. Low-quality and unphased genotypes were filtered to only retain sites with unambiguous transmission patterns and no missing data.
The filtered dataset consisting of 2,628,725 SNPs fully phased into chromosome-wide haplotypes was used to perform LAI (Duranton et al., 2018). Blocks of Atlantic origin introgressed into the Mediterranean genetic background were identified along each individual chromosome haplotype with Chromopainter (Lawson, Hellenthal, Myers, & Falush, 2012). We then refined the delineation of tract junctions to generate the length distribution of Atlantic migrant tracts separately for the western and eastern Mediterranean populations. The limited sampling size for each population in our trio design was largely compensated by the amount of haplotype information per sample, since each individual genome is composed of a mosaic of hundreds of Atlantic and Mediterranean tracts.
Therefore, only a small number of phased whole-genome sequences provided sufficient information to obtain a clear picture of the admixture tract length distribution for the W-MED and E-MED populations (Duranton et al., 2018). This important aspect of the implemented methodology was also assessed using simulations (see below).

| Estimation of introgression time from migrant tract length
Once introgressed within a divergent genetic background, migrant tracts are progressively shortened by recombination across generations (Liang & Nielsen, 2014;Pool & Nielsen, 2009). Therefore, long migrant tracts are expected to have introgressed on average more recently than short migrant tracts. In the European sea bass, blocks of Atlantic ancestry must enter the Mediterranean Sea from its western side near the Gibraltar strait before they diffuse eastward

| Data filtering
The length distribution of migrant tracts is influenced by the temporal dynamics of introgression. Given that gene flow has been introducing Atlantic alleles within the Mediterranean since the end of the last glacial period (Tine et al., 2014), haplotypes of variable ages (and therefore variable lengths) are expected to be found at any Mediterranean location. The shortest tracts that reside in the Mediterranean for a much longer time than it takes to diffuse from west to east have very similar lengths among locations. Therefore, the shift in the length of Atlantic tracts between western and eastern Mediterranean population is all the more important that the tracts have introgressed recently and therefore remain long (Figure 1c,d).
For that reason, we only considered blocks of Atlantic ancestry longer than 50 kb, since shorter blocks are less informative for estimating recent introgression. Applying a length threshold to remove short tracts has been also used to control for technical limitations to measure short introgressed tracts (Ni et al., 2016). Moreover, because migrant tracts length is more difficult to estimate in highly recombining regions of the genome, we excluded such regions from the analysis. In the sea bass genome, local recombination rates tend to be markedly reduced in central chromosomal regions (ρ = 4N e r usually <5 per kb) compared to chromosome extremities (ρ usually >40 per kb) (Tine et al., 2014). We thus applied a population-scaled recombination rate threshold of ρ = 10 per kb to keep only low-recombining regions.
Our analysis of migrant tract length relies on a neutral theory. In order to avoid potentially confounding effects of selection against Atlantic alleles, we filtered genomic regions that probably contain barrier loci that locally reduce gene flow between Atlantic and Mediterranean lineages. These regions, which represent ~4% of the genome (Duranton et al., 2018), were identified using the RND min statistics (Rosenzweig, Pease, Besansky, & Hahn, 2016)

| Estimation of the average tract length (L)
We used two different approaches to estimate the average introgressed tract length for each of the two Mediterranean populations. Our first method specifically addresses the direct influence of local recombination rate (r) variations on the length distribution of introgressed tracts. Broadscale variation in recombination rate along chromosomes is commonly observed in eukaryotes (Haenel, Laurentino, Roesti, & Berner, 2018), including teleost fishes (Bradley et al., 2011;Roesti, Hendry, Salzburger, & Berner, 2012) and among them the European sea bass (Tine et al., 2014). As a result, the length of migrant tracts is expected to decrease at variable rates across the sea bass genome, even though we excluded the most highly recombining regions from our analysis. To account for these variations, we calculated the average length of introgressed tracts locally within nonoverlapping 100 kb windows, using all introgressed tracts that were either fully contained within, or simply overlapping each focal window. The average time since introgression was then estimated separately for each window using equation (1) with the average length of introgressed tracts and the local recombination rate value estimated for that window (Tine et al., 2014). We also used the ob- The second method builds on the fact that the abundance of introgressed tracts as a function of their length follows an exponential distribution (Gravel, 2012;Pool & Nielsen, 2009;Racimo et al., 2015) with a mean L = 1 , where λ corresponds to the rate parameter of the exponential distribution. To estimate λ, we log-transformed the tract abundance values from the introgressed tract length distribution to obtain a linear distribution which slope equals −λ. In order to estimate this slope using data from similar recombination rate regions, genomic windows of 100 kb were grouped into eleven recombination rate categories, which were designed to receive an equal number of windows (i.e., 209 windows in each category). For each category, we then separated the tracts into twenty bins of tract length and used only bins with at least five tracts to fit the linear regression. Finally, we plotted the values of = 1 L estimated for every recombination rate category as a function of the average recombination rate of the 209 windows used in the corresponding category.
We fitted a linear regression to this distribution forcing the intercept to equal zero and determined its slope a = (1 − f)(t − 1), (where f corresponds to the admixture proportion) which allowed us to estimate the time since introgression as t = a 1−f + 1 separately for the eastern and western Mediterranean populations. The difference between the two estimates corresponds to the number of generations necessary for a track to diffuse from west to east.

| Estimate of the least coast distance between the Mediterranean populations
We used the R package marmap (Pante & Simon-Bouhet, 2013) to estimate the least cost distances between western and eastern Mediterranean sampling locations. Since the European sea bass is a neritic benthopelagic species occupying shallow continental waters (Pickett & Pawson, 1994), we considered that dispersal only occurs through areas where the maximum depth is less than 200 m. We estimated the distance between the western and both north and southeastern Mediterranean locations as Dist west-south_east = 4,891 km and Dist west-north_east = 6,005 km ( Figure 1a). Since we analyzed all eastern individuals together without separating northern from southern samples, we used the average distance Dist west-east = 5,448 km between the western and the two eastern populations to calculate the per-generation dispersal distance.

| Validation of the methodology by simulations
We used neutral simulations to test whether the length distribution of introgressed tracts can be used to reliably estimate the diffusion time of Atlantic haplotypes between western and eastern Mediterranean populations. To do so, we used the coalescent simulator msprime v0.6.2 (Kelleher, Etheridge, & McVean, 2016) to simulate the length distribution of Atlantic tracts introgressed within the western Mediterranean population under a secondary contact model (see legend of Figure 4a). Demographic and temporal simulation parameters were set to values that were previously shown to accurately reproduce this distribution (Duranton et al., 2018). We then aimed at modeling the diffusion of introgressed tracts from the western toward the eastern Mediterranean population using the same simulation framework. This diffusion is characterized by the decay of introgressed haplotype length due to the recombination events that occur every generation during the time it takes to cross the Mediterranean Sea, and it is therefore not directly influenced by the Atlantic population. To model this diffusion, we thus consid- Our second objective was to determine the robustness of our method to different sample sizes and demographic histories.
In order to assess the effect of the amount of data, we used the parameter values of the European sea bass (i.e., T c = 2,300 and T diff = 550 generations, N MED *m 1 = 7 migrants per generation) and sampled either 1, 2, 3, 4, 5, 6, or 7 individuals at each time point.
Simulations were run 10 times for each value, and T diff was estimated for every replicate. We then fixed the number of sampled individuals to 4 and made the secondary contact duration parameter (T c ) vary around the estimated number of generations of admixture in the sea bass (i.e., using T c = 250, 500, 1,000, 1,500, 2,300, 3,500, and 5,000). We then did the same for the number of migrants entering the Mediterranean population per generation (using N MED *m 1 = 1, 3, 5, 7, 10, 15, and 20). These explored values allowed us to consider a wide range of gene flow durations and intensities, which partly covers the diversity of settings found in other species. We did not make the divergence time (T div ) vary since this parameter mostly influences the accuracy of the detection of introgressed tracts, which were called directly in our simulations instead of being inferred with a LAI method as we did from real data.

| Dating introgression using the log-transformed distribution of tracts length
Our second method that modeled the log-transformed distribution of admixture tracts length to estimate the mean tract length (L) relied on the analysis of 2,299 windows grouped into eleven recombination rate categories (Table 1). Although we delimited the range of each recombination rate category to evenly distribute windows across categories, the total amount of information slightly differed among categories due to varying amounts of admixture tracts per window. For that reason, the slope of the regression of the log-transformed distribution of admixture tracts length was only marginally significant for some recombination rate categories with limited amount of data (i.e., five categories in the eastern population, Table 1). As expected, the estimated average length of intro-

| Validation of the methodology using simulations
We used simulated data to test whether the average length of introgressed tracts measured at two time points after entering a recipient population can be used to reliably estimate their diffusion time. We showed that there is a strong correlation between the simulated and estimated values of diffusion times (T diff ) (Figure 4b).
This indicates that measuring the difference in time since admixture between two populations connected by gene flow allows to accurately estimate the number of generations that it takes to connect them through dispersal. Although we did not explicitly con-

| D ISCUSS I ON
We used the information contained in the length of admixture tracts as a means to estimate the spatial scale of dispersal within a population receiving genetic material from a distinct lineage. Introgressed tracts entering a recipient population are progressively shortened every generation by recombination, providing a clock that keeps track of the history of introgression (Liang & Nielsen, 2014;Pool & Nielsen, 2009). Here, the proposed methodology relies on the fact that introgressed tracts get on average shorter when they reach locations farther away from the contact zone, a process considered to be relatively independent from the effective population size (Gravel, 2012;Racimo et al., 2015). The difference in tract length between two locations at different distances from the contact zone thus represents the action of recombination during the time needed to connect these two locations through multigenerational dispersal (Figure 1b) (Duranton et al., 2018 (Ni et al., 2016). This category of tracts is, however, less informative for the type of analysis presented here, since their whole history of recombination within the Mediterranean is likely much longer than the time needed to diffuse across the Mediterranean Sea. On the contrary, long migrant tracts are more likely to display contrasted lengths between remote locations. For these reasons, we did not consider highly recombining genomic windows (4N e r > 10) where the differential in tract length is rapidly lost, as well as introgressed fragments shorter than 50 kb in the remaining windows.
Although removing short tracts allows extracting the most in- length within windows, which in turn should affect the distribution of the time since introgression. Therefore, the estimated time may be underestimated for both populations, but possibly more so for the eastern population that contains a relatively higher fraction of short tracts. Our filtering of short tracts may thus lead to an underestimation of the diffusion time between locations, that is, an overestimation of the per-generation dispersal distance.
To overcome this potential difficulty, we used a second methodology that models the mean length of introgressed tracts from their log-transformed distribution. As such, this approach is largely independent of the distribution tail and thus insensitive to removing short tracts. This filtering step only reduces the amount of data and therefore the power of the regression approach, but without modifying the regression slope. On the downside, this method needs to group windows with similar recombination rate values so that each category has enough introgressed tracts to perform powerful regressions. The average recombination rate value used for each of the eleven categories may cause some loss of precision regarding fine-scale recombination rate variation, as compared to our first approach. However, windows within a given category displayed a small variance in recombination rate. Therefore, we speculate that averaging recombination rate values among windows within categories did not strongly affect our inferences. A result supporting this conjecture was the reasonably high amounts of variance in mean tracts length explained by the linear regression models fitted for each recombination rate category (Table 1).
Overall, our two methodologies can be seen as complementary.
The first one allows to consider more fine-scaled variations in recombination rate along the genome but might be sensitive to the removal of short tracts in the presence of historical admixture. The second approach is probably robust to the removal of short tracts but at  d west-east_2 = 6.85 km (CI95% = 5.73; 18.61])). As expected, the second method which is less prone to overestimate dispersal due to the removal of short tracts provided a smaller estimate the per-generation dispersal distance. Since our simulation study also supported the validity of the implemented approach (Figure 4b), we are confident that our empirical numerical estimates provide reliable indications of the spatial scale of dispersal in the European sea bass.
One possible limitation of this study could be that the analytical expectation we used assumes a unique pulse of admixture (Gravel, 2012;Racimo et al., 2015), while gene flow between the two sea bass lineages has been ongoing since the last glacial retreat (Tine et al., 2014). Methods accounting for continuous gene flow (Gravel, 2012;Ni et al., 2016) provide more realistic modeling of the migration history but are more suitable for inferring recent admixture (i.e.,  -Sfar et al., 2000). This illustrates the complex relationships existing between pelagic larval duration and gene flow in marine species (Nanninga & Manica, 2018;Selkoe & Toonen, 2011) and raises the question of the long-distance benefits of marine reserves in terms of demographic connectivity (Manel et al., 2019).
Here, we used the European sea bass as a case study to illustrate the potential of admixture tracts for estimating dispersal in nonequilibrium populations. However, our main message is that similar approaches could be applied to a wide range of species, especially marine organisms in which estimating dispersal distances remains a challenging issue (Gagnaire et al., 2015). An important prerequisite for applying the methodology developed in our study is to accurately identify and measure introgressed tracts. Although this may require phased haplotype data to perform LAI, haplotype phasing approaches are making this task increasingly feasible (Browning & Browning, 2011;Rhee et al., 2016). Having access to a chromosome-level reference genome assembly will no longer remain necessary with haplotype-resolved genome sequencing methods based on long read sequencing technologies (Browning & Browning, 2011;Snyder et al., 2015). In parallel to ongoing progress in sequencing, a wide variety of methods for LAI have been developed (Geza et al., 2018;Yuan et al., 2017 (Baran et al., 2012) or genetic (Guan, 2014) position of variants to identify ancestry blocks (Baran et al., 2012;Guan, 2014). Recently, a new method has also been developed to perform LAI directly from reads pileup data in population samples with arbitrary ploidy (Corbett-Detig & Nielsen, 2017). Therefore, there is a good potential for using the length of tracts to date admixture, including with reduced-representation sequencing data that still represent the most common type of data used in population genomic studies. An example of this kind of approach has been successfully applied with ddRAD SNPs for identifying introgressed tracts in supplemented populations of wild brown trout (Leitwein, Gagnaire, Desmarais, Berrebi, & Guinand, 2018).
The ability to correctly estimate the length of introgressed tracts admittedly depends on the density of markers. However, the minimal marker density required depends on the average length of introgressed tracts and therefore on the time since the beginning of admixture. The more recent the introgression is, the less the density of genetic markers is necessary. In all cases, the precise delineation of migrant tracts will be facilitated by a stronger divergence between admixing lineages (Gravel, 2012). Therefore, reducedrepresentation sequencing approaches such as RAD-sequencing, which can generate from 10 to 1,000 loci per Mb (Andrews, Good, Miller, Luikart, & Hohenlohe, 2016), can offer the flexibility suitable to date both recent admixture between young lineages and more ancient introgression between divergent lineages.
Finally, our simulation study showed that the proposed methodology can give accurate results for a wide range of sample sizes and demographic histories, and this for inferring a large range of diffusion times. For instance, assuming a demographic history similar to that of the European sea bass, we showed that a single individual may be sufficient to provide reliable results. More generally, the method can be applied either with a small number of whole-genome sequences for cases of historical gene flow or using larger sample sizes with reduced-representation sequencing data for studying recent admixture.
Finally, because the method seems to be accurate over a large range of gene flow intensities and diffusion times, it can be used to measure dispersal at a more refined spatial scale than the one considered here.

| CON CLUS ION
Our study illustrates the potential of admixture tracts for estimating the spatial scale of dispersal in nonequilibrium populations, which is an essential parameter to design appropriate management and conservation actions. Although methodological improvements will be needed to better account for ancient migrations, the proposed approach provides a roadmap to generate valuable information on a conservation-relevant timescale and is already well suited for species with relatively recent admixture histories. The development of new methods that simultaneously estimate local ancestry and the time since admixture (Corbett-Detig & Nielsen, 2017) should further accelerate the interest for this kind of approach. This is especially true for species in which direct measures of dispersal are not applicable and the neutral migration-drift balance is not informative or simply does not exist, which is the case for many marine species.

ACK N OWLED G EM ENTS
This work was supported by the ANR grants LABRAD-SEQ 11-PDOC-009-01 and CoGeDiv ANR-17-CE02-0006-01 to P.-A.G. We thank the International Marine Connectivity Network (GDRI iMa-rCo) for insightful discussions. The authors are also grateful to the Associate Editor Luciano Beheregaray and two anonymous reviewers for their constructive comments.

CO N FLI C T O F I NTE R E S T
None declared.

DATA ACCE SS I B I LIT Y
Sequence reads are available on GenBank under the accession code PRJNA472842 (Duranton et al., 2018).