Multiple chromosomal rearrangements in a hybrid zone between Littorina saxatilis ecotypes

Abstract Both classical and recent studies suggest that chromosomal inversion polymorphisms are important in adaptation and speciation. However, biases in discovery and reporting of inversions make it difficult to assess their prevalence and biological importance. Here, we use an approach based on linkage disequilibrium among markers genotyped for samples collected across a transect between contrasting habitats to detect chromosomal rearrangements de novo. We report 17 polymorphic rearrangements in a single locality for the coastal marine snail, Littorina saxatilis. Patterns of diversity in the field and of recombination in controlled crosses provide strong evidence that at least the majority of these rearrangements are inversions. Most show clinal changes in frequency between habitats, suggestive of divergent selection, but only one appears to be fixed for different arrangements in the two habitats. Consistent with widespread evidence for balancing selection on inversion polymorphisms, we argue that a combination of heterosis and divergent selection can explain the observed patterns and should be considered in other systems spanning environmental gradients.

genetic architectures that suppress recombination between loci involved in these traits are likely to evolve (Smadja & Butlin, 2011). This is the case for chromosomal rearrangements, including inversions, translocations and fusions/fissions. Here, we focus on inversions where effective recombination is severely reduced or even completely suppressed in heterozygotes for two arrangements (i.e., heterokaryotypes), particularly near breakpoints (Coyne, Meyers, Crittenden, & Sniegowski, 1993;Navarro, Betrán, Barbadilla, & Ruiz, 1997;Sturtevant 1921;Sturtevant & Beadle, 1936, Schaeffer et al., 2003. It has been claimed that the recombination-suppression effect of inversions can contribute to adaptation and speciation with gene flow in various ways: (a) extending the impact of barrier loci (i.e., loci contributing to reproductive isolation) to linked loci over wider genomic regions and facilitating the accumulation of additional barrier loci within inverted regions despite gene flow between populations (Navarro & Barton, 2003;Rieseberg, 2001), (b) preventing species merging after secondary contact and so paving the way for the accumulation of additional reproductive barriers (e.g., by reinforcement) (Noor, Grams, Bertucci, & Reiland, 2001) and (c) protecting favourable combinations of locally adapted alleles from being lost, including stochastic loss (Kirkpatrick & Barton, 2006;Rafajlovic, Emanuelsson, Johannesson, Butlin, & Mehlig, 2016) or maintaining combinations of alleles that contribute to different barriers, including assortative mating and incompatibilities (Dagilis & Kirkpatrick, 2016;Ortiz-Barrientos, Engelstädter, & Rieseberg, 2016). A prediction underlying these different roles is that, in the presence of gene flow, inversions will tend to be enriched for barrier loci.
Empirical data from an increasing number of taxa support the role of inversions in adaptation and speciation (Hoffmann & Rieseberg, 2008;Hooper & Price, 2017;Wellenreuther & Bernatchez, 2018), although for historical reasons much of the evidence concerning the evolutionary genetics of inversions still comes from one genus; Drosophila (Dobzhansky & Sturtevant, 1938;Krimbas & Powell, 1992). However, the power to detect the genomic regions involved in adaptive traits and/or reproductive isolation is generally higher within rearrangements. This is because the effects of selection extend to linked sites across large regions of the genome, thus increasing the probability of detection by genome scans (Ravinet et al., 2017) and potentially biasing evidence in favour of inversions.
On the other hand, studies showing that adaptation and speciation in some taxa are not influenced by inversions (e.g., Davey et al., 2017;Rafati et al., 2018) may receive less attention than those with positive results. In order to achieve an unbiased view of the occurrence and impacts of inversions, approaches are needed that allow for the detection of inversions without relying on pre-existing information either from cytogenetic evidence, which remains limited to taxa where high-resolution chromosome preparations can be obtained, or from genome scans for differentiation.
Hybrid zones offer a singular setting for investigating the genomic regions involved in reproductive isolation between natural populations (Barton & Hewitt, 1985;Harrison, 1993;Harrison & Larson, 2016). Classic hybrid zone theory predicts that alleles at loci under divergent selection or loci involved in incompatibilities introgress less compared with other markers (Barton & Hewitt, 1985;Rieseberg, Whitton, & Gardner, 1999). This results in clines in allele frequency with the slope at the cline centre, relative to dispersal distance, increasing with the intensity of selection (Barton & Gale, 1993;Barton & Hewitt, 1985;Slatkin, 1973). Inversions may be favoured by selection on one side of the hybrid zone because they may keep together combinations of locally adapted alleles at different loci, preventing or severely reducing recombination with migrant haplotypes from generating less fit individuals (Kirkpatrick & Barton, 2006). Previous studies of hybrid zones between taxa differing by inversions, translocations or fusions, have revealed greater differentiation at neutral markers in genomic regions within or near chromosomal rearrangements, suggesting a barrier to gene flow (Giménez et al., 2017;Lee et al., 2017;Rieseberg et al., 1999). Altogether, this suggests that hybrid zone studies can provide useful information about the presence of chromosomal rearrangements and their role in adaptation and speciation.
Although hybrid zones have been extensively studied, the opportunity that they provide to detect rearrangements de novo using genome-wide markers has not been widely exploited (see Lee et al., 2017 andWestram et al., 2018 for exceptions). A wide variety of genotypes is produced by recombination in the central part of a hybrid zone, but linkage disequilibrium (LD) is continuously generated by dispersal (Barton & Hewitt, 1985). Inverted regions with suppressed recombination are expected to alter the balance between these forces, generating blocks of LD that stand out against the genomic background. In a sample taken from a transect across a hybrid zone, LD will also be generated by differentiation between parental populations but the loci involved are expected to be spread across the genome, rather than gathered in blocks. Therefore, patterns of LD among loci enable the de novo detection of inversions. Importantly, the same data can then be used to estimate inversion clines, allowing simultaneous assessment of their role in divergence. Candidate rearrangements can be validated by complementary approaches (e.g., linkage maps, genome synteny, BAC-FISH). The sequence of events building up associations between adaptive alleles and inversions is not yet well known for most case studies (Jackson, Butlin, Navarro, & Faria, 2016) and hybrid zone studies may help here as well.
The rocky intertidal encompasses steep gradients of several factors (e.g., wave exposure, temperature, salinity, humidity, predation, competition and facilitation; Raffaelli & Hawkins, 1996), providing a fertile ground to improve our understanding of adaptation and the origins of reproductive isolation. The presence of locally adapted distinct ecotypes in the intertidal has been investigated in several gastropod species (Nucella lapillus, Littorina saxatilis and ("Crab" and "Wave") across different geographic regions (e.g., Spain, Sweden and the UK) facing similar selective pressures (mainly crab predation and wave exposure) . In many locations across the species range, the two ecotypes meet at steep environmental transitions (on scales ~10 m). Parallel divergence between ecotypes involves multiple phenotypic traits (e.g., shell thickness, shell size, shell shape, shell colour and boldness) (Johannesson et al., 2010), and multiple loci (Westram, Panova, Galindo, & Butlin, 2016). This provides a setting in which suppressed recombination within inverted regions could play an important role in protecting favourable combinations of alleles at different loci, fostering adaptation to a multidimensional environment.
Despite the identification of multiple genomic regions likely to contain barrier loci between ecotypes, until recently the genetic architecture of ecotype divergence and speciation in L. saxatilis remained unknown.
Low LD in a hybrid zone in the UK suggested that outlier loci were dispersed in the genome (Grahame, Wilding, & Butlin, 2006). However, resources now available for this species, including a reference genome and a genetic map for the Crab ecotype , have altered this picture. A study of a hybrid zone between L. saxatilis ecotypes in Sweden, using targeted resequencing of approximately 40,000 regions of the genome revealed a large number of SNPs (1,891) with clinal patterns that are not compatible with neutral expectations (based on system-specific simulations), suggesting the influence of divergent selection. Remarkably, ~75% of these SNPs (non-neutral or linked to non-neutral loci) were shown to be clustered in large genomic regions of high LD (12.5-29.5 cM) in three out of 17 linkage groups (putative chromosomes), suggesting large regions of low recombination compatible with the presence of chromosomal rearrangements . Finally, rare fixed differentiation between ecotypes, combined with steep clines, at many of these loci led Westram et al. (2018) to suggest a component of balancing selection, rather than purely divergent selection. Interestingly, balancing selection has frequently been documented for inversion polymorphisms (e.g., Butlin & Day, 1985;Dobzhansky, 1950;reviewed by Wellenreuther & Bernatchez, 2018).
Using cytogenetic techniques, the karyotype of L. saxatilis has been established, with a haploid number of 17 chromosomes that appears to be conserved among ecotypes and closely related species (Birstein & Mikhailova, 1990;Janson, 1983;Rolán-Alvarez, Buño, & Gosalvez, 1996). However, the poor resolution of these techniques did not allow for identification of chromosomal rearrangements. Here, we combine genomic resources and genetic data from laboratory crosses for L. saxatilis with LD information from a hybrid zone. We test the proposal that the genomic blocks of outlier SNPs detected by Westram et al. (2018) correspond to inversions and we survey the rest of the genome for additional polymorphic inversions. For all putative inversions detected, we examine arrangement frequency clines in order to reveal evolutionary forces shaping these polymorphisms.

| MATERIAL S AND ME THODS
We reused a data set published by Westram et al. (2018), consisting of SNPs derived from targeted resequencing of individuals from a L. saxatilis hybrid zone transect, a reference genome assembly and a linkage map generated for a Crab-ecotype family. We add similar resequencing data from four families of the Wave ecotype.

| Data from Westram et al. (2018)
Snails were collected in Sweden (Ängklåvebukten; N 58° 52' 15.14", E 11° 7' 11.88") across a 152-m transect along the shore and their positions in three dimensions was recorded (Figure 1). The transect spanned an environmental gradient from a boulder field to a cliff area, the typical habitats of the Crab and Wave ecotypes, respectively (Figure 1a,b). After DNA extraction from 373 individuals using a CTAB protocol , a targeted-capture sequencing approach was implemented using 120-bp probes designed for 40,000 regions in the L. saxatilis genome. After library preparation, sequencing was performed using an Illumina HiSeq 2000 platform. A custom bioinformatics pipeline was implemented, including steps for stringent quality control (for details see Westram et al., 2018). A final set of 44,251 variants was later used in the linkage disequilibrium and principal component analyses. Individuals with more than 50% of missing data were removed.
In addition, we made use of two other key resources from LGs corresponds to the haploid number of chromosomes described for L. saxatilis (Birstein & Mikhailova, 1990;Janson, 1983;Rolán-Alvarez et al., 1996).

| Genotyping of Wave families
In order to infer recombination in the Wave ecotype, juvenile virgin females were collected from a wave-exposed habitat at Ängklåvebukten (north end) and kept in separate aquaria with running seawater. At the time of female maturity (9 months later), adult males were collected from the same area and paired with the females. Crosses resulted in four full-sib families (8, 21, 12 and 11 offspring). Although a different female was used in each cross, the first three families shared the same father. Genotyping of one female parent failed so that only three families were available for analysis of female-informative markers.
Targeted resequencing was performed using the same targeted-capture sequencing approach but using about half of the probe set used in Westram et al. (2018). We preferentially retained informative probes and avoided probes close together within contigs. In total, 25,000 (120 bp) enrichment probes were used. Following Lemmon, Emme, and Lemmon (2012), indexed libraries were prepared for 58 individuals (52 offspring, 4 mothers and 2 fathers) from genomic DNA on a Beckman Coulter FXp liquid-handling robot, and enriched using an Agilent SureSelect enrichment kit at Florida State University's Center for Anchored Phylogenomics (www.anchoredphylogeny.com). Following qPCR and Bioanalyzer-based quality control, libraries were sequenced on a partial Illumina 2,500 lane with paired-end 150-bp reads and 8-bp indexing read.
Raw reads were cleaned with trimmomatic v. 0.36 (Bolger, Lohse, & Usadel, 2014) with default parameters for paired-end reads and quality confirmed with fastqc v0.11.5 (Andrews, 2010), resulting in the removal of three samples due to low quality. Cleaned reads were mapped to the L. saxatilis reference genome using bwa v0.7.15 (Li & Durbin, 2009), retaining all probe regions that were covered by at least five reads in at least 50% of samples. Since the probes cover only a subset of the reference genome, contigs with lower coverage or not included in the probe design were merged into a single "superscaffold" to reduce computational time for SNP calling. Reads were again mapped using bwa to this new reference genome.
PCR duplicates were identified and removed, and InDel realignment performed with piccard v. 1.138 (http://broadinstitute.github. io/picard/), before SNP calling, which was performed using gatk unifiedgenotyper v3.7-0 (DePristo et al., 2011) with default parameters and a minimum base quality filter of 20. The SNP calling was restricted to the better-covered probe region using a bed file, ignoring the entire "superscaffold" region. We removed positions and individuals with <25% call rate and retained only biallelic SNPs then used technical replicates to train a variant quality score recalibration model in order to improve parameter values for SNP calling. Lastly, we used hard-filters (mapping quality >40, Phred-scaled p-value for strand bias <10, symmetrical odds ratio test for strand bias <3 and test for read position bias between 0 and 8.0) and only retained SNPs with coverage depth ≥8. The SNP filtering workflow was performed with vcftools (Danecek et al., 2011) and vcfilter from vcflib (https:// github.com/vcflib/vcflib). This set of SNPs was filtered using the criteria in Westram et al. (2018), with minor allele frequency >0.05 and excluding sites with genotypes for fewer than 20 out of 55 individuals). A genotype file for the final set of SNPs (34,787) was generated using vcftools and was used as input for the recombination analysis.

| Linkage disequilibrium
We analysed patterns of disequilibrium among SNPs in order to detect clusters of loci with unusually high LD that might be generated by chromosomal rearrangements. A matrix of pairwise LD (r 2 ) between all SNPs within each linkage group was generated for all individuals in the transect sample with the r package "genetics" (Warnes, Gorjanc, Leisch, & Man, 2013). This matrix was then used to detect clusters of SNPs in high LD (i.e., outlier clusters relative to other LD clusters within each linkage group) using the r package "ldna"-linkage disequilibrium network analysis (Kemppainen et al., 2015). Two key parameters can be set by the user to make the analyses more lenient or conservative in the identification of outlier clusters (OC). The minimum number of edges |E|min, corresponds to the minimum number of connections among the vertices (SNPs) of a cluster (an "edge" is present between a pair of SNPs if their LD value exceeds a threshold), and indirectly controls the minimum number of SNPs within a cluster. Parameter φ controls the minimum LD threshold above which the median pairwise LD within a cluster is higher than the intercluster LD for the group of SNPs to be considered an OC. After several test runs, we set |E|min = 30, representing a compromise between detecting clusters large enough to represent chromosomal rearrangements and avoiding noise created by small networks that result from physical linkage within contigs.
As in most cases the number of edges did not correspond to the number of SNPs, only clusters with a minimum of 32 SNPs were retained. In order to explore a wide range of the parameter space of φ, we first registered all identified clusters with at least 32 SNPs setting φ = 0 and then increased the value of φ by 1 in each iteration until no more LD clusters were obtained within a linkage group. Given that chromosomal rearrangements are expected to generate strong LD, clusters with a low median intracluster LD (r 2 < 0.3) were also discarded. Whenever clusters obtained for the different values of φ shared SNPs, the one with the smaller number of SNPs (and higher median LD) was retained. The only two exceptions occurred when SNPs from two overlapping clusters became fused into a single larger cluster at higher φ, suggesting a common source of LD. In these cases, only the merged larger cluster was retained for downstream analyses. Although LDna allows detection of two different types of clusters, single-outlier clusters (SOCs) and compound-outlier clusters (COCs), the latter were disabled as they can be generated by different evolutionary forces acting simultaneously, making the interpretation of results difficult (Kemppainen et al., 2015).

| Principal component analysis (PCA)
ldna can detect clusters of loci that are in LD for various different reasons, primarily the effects of inversions (or other chromosomal rearrangements) on recombination, spatial population structure, or structure generated by local adaptation. Kemppainen et al. (2015) suggest that LD clusters due to inversion polymorphism can be identified because the SNPs involved are genomically clustered and they identify groups of genetically distinct individuals that correspond to different karyotypes. Within an inversion segregating in a population, we expect that suppressed recombination between arrangements will result in the presence of three distinct groups of individuals (homokaryotypes for the reference arrangement, heterokaryotypes, and homokaryotypes for the alternative, inverted arrangement).
Allele frequencies at many SNP loci are expected to differ between arrangements because of their partly independent evolution. the vertices. Therefore, we performed PCA using the r package pcadapt (Luu, Bazin, & Blum, 2017) for each SOC within each linkage group, using all the SNPs within the coordinates (not just those in high LD that led to identification of the SOC). For comparison, we also ran a PCA for the SNPs within the same linkage group outside the SOC coordinates, that is, within putatively collinear regions. The composition of groups of genotypes was then identified using the r function "kmeans," which clusters data based on similarity using the algorithm developed by Hartigan and Wong (1979). The number of groups was set to three, or six when two SOCs presented overlap-

| Genetic diversity
If the groups detected in the PCA represent homo-and heterokaryotypic individuals for polymorphic inversions, then we expect the central group (heterokaryotypes) to have high heterozygosity relative to the more extreme groups (homokaryotypes) on PC1 (and PC2 where there are 6 groups) and relative to collinear regions of the same linkage group. This pattern is expected to be particularly marked for SNPs with strong allele frequency differences between arrangements. We tested this prediction for each candidate inversion, by calculating observed heterozygosity (H obs ). Kemppainen et al. (2015) used this prediction to distinguish between LD clusters generated by inversions and those generated by population structure. In our case, this distinction is less clear-cut since we expect an increase in heterozygosity in the centre of the transect and individuals from this region may also fall centrally on PC1. Observed heterozygosity for each variable position in the whole data set was estimated for each group identified by the PCA (the homokaryotypes for each candidate inversion arrangement and heterokaryotypes) using the "dfgenin" function of the r package "adegenet" (Jombart & Ahmed, 2011 to give a first view of the ages of inversions but note that other factors influence these statistics (see below). Nucleotide diversity was estimated for the two homokaryotypes of each LGC using vcftools in order to compare arrangements. π per site was estimated for each SNP and then averaged across all sites within the length of each probe region, including invariant sites (~120 bp). Pairwise divergence between the putative homokaryotypes (d XY ) was estimated in the same way as π for each probe region, using the fact that π t (for a group containing both homokaryotypes) is based on a mixture of comparisons between and within karyotypes. Specifically, where n x and n y are the numbers of the two homokaryotypes, N = n x + n y , and π x and π y are nucleotide diversities for the two homokaryotypes separately. Differences in π and d XY between putatively inverted and collinear regions (as control) within each LGC were examined using Wilcoxon rank-sum tests. Finally, differences in π between the homokaryotypes for each LGC were also tested for inverted and noninverted regions (as control) using Wilcoxon ranksum tests. All tests were adjusted for multiple comparisons using the sequential Bonferroni correction.

| Recombination patterns
The presence of inversion polymorphism can be confirmed by their effects on patterns of recombination (in our case, the realised crossover patterns detected in offspring

| Detection of candidate chromosomal rearrangements
The implementation of the LD analyses followed by our filtering criteria resulted in the detection of 17 LD clusters of loci (SOCs) identified as the candidate chromosomal rearrangements that were then characterized in downstream analyses (Figure 3, Table 1).

Six
LGs contained no LD cluster, 11 LGs contained at least one LD cluster and five of these contained at least two clusters. Each LGC12.1 showed the least support from PC1 (only 15% and 14% of the variance explained, respectively), whereas LGC17 showed the highest (50%).
The PCAs for most LD clusters revealed that individuals were aggregated into three genotypic groups (mainly on the first component) with intermediate genotypes between them absent or rare ( Figures 4 and 5, Supporting Information Figure S1). This pattern suggests the presence of the two alternative homokaryotypes for a given rearrangement, with the heterokaryotypes in the middle, without recombination between the three genotypic groups. The rare exceptions (LGC4.1, LGC6.1, LGC7.2 and LGC12.2; Figure 5 and Supporting Information Figure S1) comprised some individuals with intermediate positions between homo-and heterokaryotypes, compatible with gene conversion or double crossovers that are known to occur in inversion heterokaryotypes (Stevison et al., 2011).
Additionally, six groups of genotypes were observed in the region spanned by two LGCs (6.2 and 14.2), compatible with the presence of three homokaryotypes and three heterokaryotypes without obvious intermediate genotypes ( Figure 5, Supporting Information Figure S1). This pattern is expected in regions of overlap between inversion events and so corroborates our interpretation of overlapping rearrangements on these two LGs.
In contrast, the collinear regions of the LGs containing LD clusters did not reveal obvious genotypic groups (Supporting Information Figure S1, right panels). Two exceptions were LG10, where three distinct groups were observed but the PC1 explained only 6.6% of variance; and LG12 where the number of groups and their limits are not very clear, with the PC1 explaining also a low proportion of variance (9.1%) when compared to all candidate inver- sions. An inversion may be present in one or the other LG, containing a small number of markers and/or composed by SNPs with a lower median intracluster LD (r 2 < 0.3), or the inversions detected on these LGs could extend further than our current estimates so that some SNPs currently included in the collinear regions are actually in the inversions. Alternatively, the grouping pattern could result from the influence of selection on markers that are not contained in a chromosomal rearrangement.
The sizes of the LD clusters in terms of map distance in the Crab linkage map (Table 1)

| Genetic diversity
We compared heterozygosity (H obs ) between the PCA groups in order to confirm the expectation based on the interpretation of LD clusters as inversions. Putative heterokaryotypes (central PCA groups) were expected to have higher heterozygosity than putative homokaryotypes. Heterokaryotypes were also expected to have higher heterozygosity than is observed in corresponding collinear regions. In most LGs, H obs within the inverted regions of the putative heterokaryotypes was significantly higher than within the inverted regions of the putative homokaryotypes (Figures 4 and 5, Supporting Information Figures S2 and S3, Table S1). The only exception was observed for the putative inversions located in LG12, where one of the homokaryotypes did not show significant differences from the heterokaryotypes (Table S1). When comparing the putative heterokaryotypes with collinear regions for the same individuals, significantly higher H obs was again revealed for almost all LGs containing LD clusters (Figures 4 and 5 Table S2). The arrangement with lower π can be inferred to be the derived arrangement, although the arrangements may also have been differently influenced by selection.
The same tests performed in the noninverted regions of the same LGs (as control, using groups of individuals defined by their inversion karyotypes) revealed no significant differences with only one exception (LG12.1).
In all LGCs, divergence between arrangements, mean d XY , was higher than control values from collinear regions (which should equal the diversity, π, in those regions), implying accumulation of genetic differences since the origin of the inversion (Figures 4 and 5

, Supporting
Information Figures S6 and S7, Table S3). Although only eight cases remained significant after the Bonferroni correction, the sample size in one of the two groups was often small, resulting in low power.
TA B L E 1 LD clusters comprising our 17 candidate rearrangements obtained with LDna and after filtering based on the PCA. Linkage groups, boundaries (with start and end positions according to the Crab linkage map produced by Westram et al. (2018)), numbers of SNPs in LD and their contigs, median LD between these SNPs as well as the variance explained by PC1 are shown

| Recombination patterns
Based on the hypothesis that LD clusters represent inversions, we predicted that recombination in heterokaryotypes would be absent or rare (and only due to gene conversion or double crossover) within the region covered by the LD cluster. For homokaryotypes with the alternative arrangement (relative to the Crab map reference) recombination patterns were predicted to be more consistent with the reversed gene order (Figure 2).
The identification of recombination events within four Wave families revealed no recombination event within the candidate inversions in 124 possible cases (parent-offspring combinations for each candidate inversion) in offspring genotyped from parents that were heterozygous for the different arrangements (karyotype "RA," where "R" is the reference and "A" the alternative arrangement) (Table S4). In contrast, 145 recombination events were detected in the same regions of 1,137 cases in offspring from parents that were homozygotes for the inversions (RR and AA) (

| Inversion frequencies along the transect
The distributions of the different alternative arrangements for each LG cluster across the transect were highly variable. Some LGCs had homokaryotypes that were present only or mainly in one of the ecotype-specific habitats (e.g., LGC6.1) while others had one homokaryotype that was found mainly in the central part of the transect (e.g., LGC11.1) or both homokaryotypes present across the entire transect (e.g., homokaryotypes of LGC1.1) (Figures 4 and 5, Supporting Information Figure S1).
Most inversions showed a significant clinal change in arrangement frequency across the transect (except LGC1.2, LGC6.2-group 2, LGC7.2; Table S5). The asymmetrical cline model was never a better fit than the symmetrical cline (not shown). Additionally, the cline was a poor fit to the data (deviance explained <10%), despite meeting the ∆AIC criterion compared to a constant frequency, for three other inversions (LGC9.1, LGC11.1 and LGC14.2-group 2), which were excluded from subse- Three main groups were observed in the inverted region consistent with two homokaryotypes (black and green) and heterokaryotypes (red). (b) Observed heterozygosity across the genetic map for each of these groups (b1-b3). (c) Distribution of the three groups across the transect (distance from the Crab end is shown). (d) Boxplot of nucleotide diversity (pi) for the inverted and collinear regions of the groups 1 and 3. Divergence (d XY ) between groups 1 and 3 within (Inv) and outside (Col) the inverted region (e1), as well as across the genetic map (e2) polymorphic in both transect ends (Supporting Information Table S5, Figure 6).

| D ISCUSS I ON
Early work emphasized the impact of chromosomal inversions on adaptation and speciation (Dobzhansky, 1970;Sturtevant, 1926Sturtevant, , 1938 but, subsequently, structural rearrangements received less attention, despite some prominent exceptions (e.g., Balanyà, Huey, Gilchrist, & Serra, 2009;Coluzzi, Sabatini, della Torre, Di Deco, & Petrarca, 2002). More recently, inversions have been detected in many systems (Wellenreuther & Bernatchez, 2018), prompting renewed interest in the role they play in local adaptation and speciation. Advances in sequencing technologies and genomics have promised to make structural variation more readily detectable (Alkan, Coe, & Eichler, 2011). However, this task remains difficult for many nonmodel organisms, with approaches based on short reads especially prone to both high rates of false positives and negatives (Sedlazeck et al., 2018;Lledó & Cáceres, 2013 and refs. therein). In practice, the presence of inversions has often been inferred from patterns of divergence in genome scans (e.g., Jones et al., 2012) with subsequent confirmation using sequencing or genetic mapping approaches (e.g., Twyford & Friedman, 2015). This can give a biased impression of the role of inversions because suppression of recombination makes their contribution to adaptive differentiation easier to detect than the contribution of loci in freely recombining regions and because inversions that are not differentiated between populations may remain undetected.
The linkage-disequilibrium-based approach we implemented here, using data gathered from a hybrid zone between two Littorina saxatilis ecotypes, allowed us not only to detect rearrangements de novo but also to infer their frequencies and putative contribution to ecotype divergence. This not only circumvented the need for subsequent genotyping to estimate karyotype frequencies but also contributed to reducing the number of candidate rearrangements for further validation because attention could be focused on those with strong clinal patterns. Providing that linkage maps (recombination information) or high-quality reference genomes are available, the candidate rearrangements detected by their LD signatures can be confirmed and their type (e.g., inversions or translocations) can also be identified. Detection of rearrangements using information from LD between markers, complemented by PCA and genetic diversity information, was proposed previously (Kemppainen et al., 2015). Our results further demonstrate the utility of this approach.
However, clusters of markers in high LD can be generated by other processes, especially when selection acts in opposition to gene flow on regions of low recombination (Burri, 2017), and the choice of thresholds in LDna has not been validated by comparison to simulations. This means that the LD clusters themselves are only indicative. Observing distinct genotypic clusters through PCA, with the expected patterns of heterozygosity, supports the hypothesis that LD clusters represent inversions. However, additional independent lines of evidence are needed to confirm the chromosomal rearrangements. This evidence can come from recombination mapping, as we used here.
We detected 17 candidate rearrangements, including three that correspond to LD blocks reported by Westram et al. (2018). This number is dependent on the parameter values chosen in the initial LDna and PCA and may be an underestimate because we aimed to set conservative thresholds. All candidates, except three (LGC7.2, LGC9.1 and LGC14.3), were supported by recombination patterns from Crab or Wave families, which tends to confirm that our criteria were stringent. Due to the limited number of offspring available to identify recombination events and the particular genetic composition of the parents used in the crosses, not all of the remaining inversions were equally supported. Thus, future validation of some of these candidates using cytogenetics (as in Lee et al., 2017) and/or long-read sequencing is desirable.
Studying other localities across the wide geographic and environmental range occupied by this species may well reveal further rearrangements. We believe that this approach can be extended successfully to other case studies with similar data available, but it is likely that thresholds (e.g., minimum number of loci within an LD cluster) will need to be fine-tuned through exploratory analyses in order to make informed decisions concerning some parameters.
The number of inversions detected in L. saxatilis is high when compared with other systems (Wellenreuther & Bernatchez, 2018), likely at least partly due to the use of different methodology. If inversions cause a fitness cost on heterokaryotype individuals due to the generation of unbalanced gametes when single crossovers occur within inversions, then this large number of polymorphic inversions could represent a substantial load. However, although it occurs in plants, this type of cost seems rare in animals (Hoffmann & Rieseberg, 2008;Rieseberg, 2001). The range of inversion sizes in this system is within that observed for other species (Wellenreuther & Bernatchez, 2018). However, it is important to keep in mind that inversion sizes, defined according to the Crab map, are unlikely to F I G U R E 5 Characterization of LGC6.2 (smaller inversion in grey overlapping with LGC6.1). PCA based on markers located in: (a1) the region spanned by the first part (from 0 to 8.7 cM) of LG6, spanned by LGC6.1; (a2) the region of LGC6.2, which overlaps with the LGC6.1 inversion; and (a3) in the collinear region. Unlike the first part, where three main groups were observed as in most of the other LGs, six groups were observed in the region where the two inversions overlap (from 8.7 to 29.3 cM), consistent with three homokaryotypes (blue, pink and red) and three heterokaryotypes (green, black and cyan LGC2 Information gathered from the Crab map . c Both Crab parents were heterozygotes for the inversion. d One exception contradicting the expected pattern out of 28. e RR supported, recombination in AA not informative. correlate well with physical lengths because the parents used to construct that map were sometimes heterozygotes for those inversions (e.g., LGC1.1 and LGC4.1; Figure 2). Also, this approach alone cannot be used to infer rearrangement breakpoints with precision.
The coordinate ends we present must be interpreted as boundaries of the regions influenced by the rearrangements and some of the SNPs at the ends of an LD cluster may actually be outside, although close to the rearrangements' breakpoints. Nevertheless, assessment of the genotypic information from the Wave families allowed us to verify that the inferred boundaries were compatible with inversions (changes in orientation within the same chromosome) rather than with the exchange of genetic material between chromosomes through translocations.
The levels of observed heterozygosity further supported the inversion status of the LD clusters. The middle groups identified in the PCA presented higher H obs within each of the LD cluster regions than the other two groups, as expected for heterozygotes for the inversions relative to the homokaryotypes. For most LD clusters, the two homokaryotypes presented significant differences in nucleotide diversity. This imbalance is expected for inversions where the derived arrangement is young, having originated recently as a single haplotype. Over time, the younger haplotype is expected to accumulate diversity through mutation but an imbalance may remain because the less common haplotype has a smaller effective population size and because of the strong effect of background selection and selective sweeps on both haplotypes.
Divergence between arrangements is also expected to accumulate with time, due to suppression of recombination in heterokaryotypes. This prediction is generally supported by our data in some LGCs (Supporting Information Figure S7 and Table S4). Observed divergence may also be influenced by selection and by gene flux due to double recombination and gene conversion. It would be premature to interpret these diversity and divergence data in terms of inversion ages, but they do suggest that the origins of the inversions pre-date postglacial colonization of the Swedish coast (<10,000 generations ago).
Recombinants between the three or six genotypic groups from the hybrid zone were generally absent in the transect within rearranged regions ( Figure S1). The rare exceptions could result from gene conversion or double crossovers, which are known to occur within inversions, although at low rates (~10 −4 for double crossovers and ~10 −5 for gene conversion in Drosophila; Stevison et al., 2011). Missing data at informative markers for distinguishing the inversion genotypes (defined according to the PC1 scores) could also result in apparent intermediacy of individuals in the PCA. A close inspection of their genotypes for informative markers supports both explanations (not shown). Nevertheless, the recombination information gathered from the Wave families showed recombination to be absent (or rare) within the candidate inverted regions for the heterokaryotype parents. This, together with recombination patterns, is consistent with a reversed gene order relative to the Crab map and provides independent support that these candidate regions correspond to inversions. According to our results, most LGs (at least 11 out of 17) carry inversions, together encompassing ~25% of the total number of SNPs analysed. Given the suppressedrecombination effects observed here, these inversions are likely to play a major role in shaping the recombination landscape in this system.
Given that many inversions are segregating in this population, an important question is whether they contribute to local adaptation. Are these inversions influenced by divergent selection? Westram et al. (2018) estimated that the majority of outlier SNPs were clustered in regions that overlap with the inversions that we detected in LG6, 14 and 17. Our cline-fitting analysis of most inversions revealed that their frequencies change clinally across the transect, with varying width and position. However, simulations of this system by Westram et al. (2018) show that a clinal pattern can appear for neutral loci due to isolation by distance and a genome-wide barrier effect close to the habitat transition.
Therefore, significant cline fits are not, in themselves, good evidence for divergent selection.  Westram et al. (2018). Balancing selection has been demonstrated for adaptive shell colour traits in this system (Johannesson & Butlin, 2017) and may also influence other traits.
If this hypothesis is confirmed, further studies should also aim to distinguish among the different forms of balancing selection that may play a role (e.g., frequency-dependent selection, heterosis or spatially variable selection) and, if heterosis is observed, to understand how it is generated (e.g., associative overdominance or coadaptation; Butlin & Day, 1985;Kirkpatrick, 2010). Further work is needed to test the hypothesis of a combination of balancing and divergent selection, seeking observations or experiments that clearly distinguish it from the other hypotheses mentioned above.
Nevertheless, we suggest that this possibility should also be considered for inversion polymorphisms in other species.
Balancing and/or divergent selection between habitats could have maintained inversions for long periods of time, resulting in the high diversity and divergence for some inversions noted above (Faria, Johannesson, Butlin, & Westram, 2019) . These inversions may have been segregating in ancestral populations where analogous Crab and Wave environments occur, and subsequently underpinned rapid adaptation to the Crab and Wave habitats following colonization of the Swedish coast. The presence of many inversions encompassing a large proportion of the genome can explain why we observe such a high number of divergent loci between ecotypes after such a short time since postglacial colonization. In addition, more recent gene flow between populations is likely to contribute to the efficient spread of inversions, especially if they contain adaptive variation (Johannesson et al., 2010;Morjan & Rieseberg, 2004). In these ways, inversions could help to explain the pattern of sharing of loci putatively influenced by selection, which is greater on smaller than on large geographic scales . Thus, determining the ages and spatial distributions of the inversions described here will be critical to further understanding of local adaptation and the evolution of reproductive isolation between L. saxatilis ecotypes.
Finally, complementary evidence to understand the link between inversions, adaption and selection can come from determining the genes present within inversions and the phenotypes that are associated with inversion polymorphisms. Although most of the outlier SNPs identified by Westram et al. (2018) are located within inversions, it is unlikely that they are all under direct selection. Genome annotation for L. saxatilis (M. Panova and T. Larsson, personal communication) will allow the identification of candidate genes and functions that may play a role in adaptation and ecotype divergence.
Association mapping in this hybrid zone has already revealed that a large proportion of the genetic variance observed for some key adaptive phenotypes may be explained by genetic variation within some of these inversions . Studies with additional localities and further phenotypes will extend this connection.
We have demonstrated the power of LD patterns to detect inversions. There may be some bias in this approach towards inversions associated with local adaptation. Nevertheless, in our study site, inversions apparently make a major contribution to adaptive divergence.

ACK N OWLED G EM ENTS
We would like to thank particularly Laura Brettell, Mårten Duvetorp,

DATA ACCE SS I B I LIT Y
DNA sequence data and genotypes to assess recombination in the Wave family were archived in NCBI SRA (PRJNA493979) and Dryad