Modular chromosome rearrangements reveal parallel and nonparallel adaptation in a marine fish

Abstract Genomic architecture and standing variation can play a key role in ecological adaptation and contribute to the predictability of evolution. In Atlantic cod (Gadus morhua), four large chromosomal rearrangements have been associated with ecological gradients and migratory behavior in regional analyses. However, the degree of parallelism, the extent of independent inheritance, and functional distinctiveness of these rearrangements remain poorly understood. Here, we use a 12K single nucleotide polymorphism (SNP) array to demonstrate extensive individual variation in rearrangement genotype within populations across the species range, suggesting that local adaptation to fine‐scale ecological variation is enabled by rearrangements with independent inheritance. Our results demonstrate significant association of rearrangements with migration phenotype and environmental gradients across the species range. Individual rearrangements exhibit functional modularity, but also contain loci showing multiple environmental associations. Clustering in genetic distance trees and reduced differentiation within rearrangements across the species range are consistent with shared variation as a source of contemporary adaptive diversity in Atlantic cod. Conversely, we also find that haplotypes in the LG12 and LG1 rearranged region have diverged across the Atlantic, despite consistent environmental associations. Exchange of these structurally variable genomic regions, as well as local selective pressures, has likely facilitated individual diversity within Atlantic cod stocks. Our results highlight the importance of genomic architecture and standing variation in enabling fine‐scale adaptation in marine species.

the rate and predictability of adaptation (Stern, 2013;Tigano & Friesen, 2016). Recent evidence has demonstrated that the amount of shared variation within a genome and how this variation is organized structurally can facilitate local adaptation and may influence rates of genomic parallelism (Nelson & Cresko, 2018;Pearse, Miller, Abadia-Cardoso, & Garza, 2014). However, empirical support demonstrating the role of shared variation and structural variation across systems is only beginning to be understood.
Standing variation in these genomic architectural features can allow rapid responses to selection and directly influence genomic parallelism (Bolnick, Barrett, Oke, Rennison, & Stuart, 2018).
Atlantic cod (Gadus morhua), a heavily exploited, demersal, broadcast spawning marine fish, provides a study system well suited to investigate the extent of modularity and parallelism among genomic architectural features. Atlantic cod exhibit high connectivity and dispersal, but also show divergent adaptation and genetic structuring associated with genomic architectural variation (Barth et al., 2017;. Past genetic studies of Atlantic cod have suggested parallel environmental adaptation to temperature and salinity in the east and west Atlantic (Berg et al., 2015;Bradbury et al., 2010). Phenotypic parallelism of migratory and stationary ecotypes occurring in Europe and North American coastal waters has also been identified (Robichaud & Rose, 2004).
This environmental and behavioral divergence has likely been facilitated by rearrangements consisting of multiple potential inversions on four separate linkage groups (LG 1, 2, 7, and 12, Kirubakaran et al., 2016;Sodeland et al., 2016). However, several outstanding questions remain regarding the origin and function of diverged, rearranged genomic regions among individual Atlantic cod across the range. The extent of coinheritance and functional modularity of rearrangements in Atlantic cod, as well as the extent that the same rearrangements are shared across the Atlantic remains unclear.
Previous range-wide comparisons have revealed interchromosomal linkage between rearranged regions, suggesting a shared evolutionary trajectory for rearranged regions (Berg et al., 2015;Bradbury et al., 2014). However, structuring associated with environmental clines and trans-Atlantic divergence can produce interchromosomal linkage that may mask how frequently chromosomal rearrangements are co-inherited at finer scales.
Here, we build on previous studies and explore the potential for exchange of these rearrangements across the entire Atlantic range, and their role in local adaptation to fine-scale ecological variation in Atlantic cod. First, we quantify coinheritance of chromosomal rearrangements within populations to test whether multiple rearrangements show a shared evolutionary trajectory. We then conduct environmental and migratory associations with genomic markers using multivariate and machine-learning methods that can detect relationships between alleles that would be missed in single-locus models and genome scans (Forester, Lasky, Wagner, & Urban, 2018;Gagnaire & Gaggiotti, 2016;Stephan, Stegle, & Beyer, 2015). Next, we use multivariate clustering to determine whether loci within rearrangements show consistent associations with the same environmental variables, indicating functional modularity (Lotterhos, Yeaman, Degner, Aitken, & Hodgins, 2018). Finally, we test for signatures of adaptation from shared variation to infer whether the same haplotypes in rearranged regions are shared across the Atlantic.
Overall, the results from our study highlight how features of genomic architecture and history can enable repeated differentiation and adaptation across a variety of habitats within a highly connected marine species.

| Genotype data
To identify population structure and genome architecture variation across the range of Atlantic cod, we combined five datasets that were previously genotyped using the same 12K SNP array, along with additional samples from the Flemish Cap genotyped using the same array (Barth et al., 2017;Berg et al., 2017;Kess et al., 2019;. Atlantic cod genotypes from North America and Europe were obtained from 44 separate sampling trips from 37 distinct sampling locations (Figure 1a, sampling site and year in Table S1). Detailed genotyping and collection information are available in the studies cited above. These genotypes were combined into a final dataset of n = 1,230 (n = 531 North America, n = 699 Europe) individuals genotyped at 6,669 informative SNPs genotyped in all populations, corrected for strand flips, and filtered to remove SNPs with ambiguous genotype coding across datasets.

| Range-wide population structure
We quantified range-wide population structure using principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE), a nonlinear machine-learning clustering algorithm (van der Maaten & Hinton, 2008), with genotypes from all 1,230 individuals from all populations. We detected outliers driving population clustering in PCA by calculating Mahalanobis distances for each SNP from a vector of z-scores describing its relationship with K tested principal components. Using the R package pcadapt (Luu, Bazin, & Blum, 2017), we carried out PCA with K = 2 to identify SNPs associated with the first two PC axes explaining most geographic variation, assuming a false discovery rate (FDR, Storey & Tibshirani, 2003) cutoff of q = 0.1.
We then used K = 25 to identify SNPs that were correlated with principal components that explained less variation but captured more subtle genetic differences between populations and individuals. To represent this variation on two axes, we also calculated t-SNE scores for each individual using the tSNE R package (Donaldson, 2016). We assigned a perplexity value of 25; this parameter characterizes the number of effective nearest neighbors expected in the data.

| Individual genetic variation
To quantify individual genetic variation, we separated individuals into geographic groups from Canada North, Canada South, Gilbert Bay, Flemish Cap, Iceland and Northeast Artic, and Coastal Europe ( Figure 1a,b). These groupings are supported by separation of geographic regions in both PCA and t-SNE, although the split with Glibert Bay is more distinct in t-SNE, suggesting multiple axes of genetic divergence (Figures 1b and S1). We then conducted PCA within each of these clusters using pcadapt. Outliers differentiated between individuals were identified by significance of Mahalanobis distances for each SNP, using an FDR of q = 0.

| Range-wide genetic structure of rearrangements
Separate Bayesian clustering of SNPs within each chromosomal rearrangement was conducted using STRUCTURE v2.3.4 (Pritchard, Stephens, & Donnelly, 2000) to determine individual rearrangement genotype. We carried out separate analyses of the 531 individuals within North America and the 699 individuals within Europe. We conducted separate runs for all SNPS within rearrangements on LG1, LG2, LG7, and LG12 in R using parallelstructure (Besnier & Glover, 2013

| Coinheritance and linkage disequilibrium
We tested for coinheritance of all pcadapt outlier SNPs identified within each regional cluster exhibiting within-population architectural variation using two linkage-based methods. We calculated an r 2 matrix quantifying pairwise linkage disequilibrium (LD) between each SNP using plink 1.9 (Chang et al., 2015). LD calculations were performed separately within Canada North (n = 553 SNPs), Canada South (n = 490 SNPs), Iceland and Northeast Arctic (n = 618 SNPs), and Coastal Europe population clusters (n = 730 SNPs). We visualized linkage between outlier SNPs using an LD heatmap in the superheat R package (Barter & Yu, 2017). Using the LDna R package (Kemppainen et al., 2015), we generated network plots to visualize each SNP as a node connected by edges exceeding an r 2 threshold of .25.

| Environmental and migratory association with genomic variation
To identify association of all 6,669 SNPs with environmental gradients, we carried out genome-wide association using two polygenic methods: redundancy analysis (RDA, Rao, 1964) and random forest (Breiman, 2001). We conducted RDA and random forest separately for all individuals in North America and in Europe to limit the possibility of nonparallel genetic architecture and residual population structure diluting signals of local adaptation. We extracted 12 environmental variables for each sampling location from the Bio- we standardized each variable by its standard deviation and then conducted separate PCAs on variables for salinity, temperature, and dissolved oxygen. We used the first PC for each of these variables as individual phenotypes used as predictors in an RDA model to identify environmental association. SNPs with absolute RDA scores exceeding the 95th percentile were categorized as outliers. We conducted redundancy analysis using the vegan R package (Oksanen et al., 2011). We did not include a covariate for geographic structure in environmental association, as true signals of association may be lost when environmental adaptation tracks geographic separation (Yeaman et al., 2016), and dimensionality reduction used in redundancy analysis has been shown to effectively control for background structure (Forester, Jones, Joost, Landguth, & Lasky, 2016 We explored genetic associations with migration phenotype for all individuals in Europe using all 6,669 SNPs, following the association methods used in Kess et al. (2019) in North America.
Regional-level migration phenotype was assigned to each individual from migratory classes identified by Robichaud and Rose (2004). As above, we identified SNP associations with migratory phenotype using redundancy analysis without correction for geographic structure and random forest classification.
To determine the direction of association between environmental and migratory variables and rearrangement frequencies, we calculated Pearson's correlation between the frequency of each rearrangement and each environmental variable. We assessed significance at a lenient p < .05, as the goal of this test was to quantify direction of association.

| Clustering of loci by multiple environmental associations
We examined patterns of SNP co-association to uncover groups of loci that exhibited similar patterns of associations across multiple environmental variables. As in Lotterhos et al. (2018) variables will be found in the same cluster using hierarchical clustering; these clusters represent proxies for functional modules. The R package superheat was then used to visualize patterns of shared associations using paired dendograms and heatmaps.

| Signatures of adaptation from shared variation
To identify whether environment and migration-associated rearrangements represent adaptation from shared variation, or from de novo mutations, we measured differentiation and separa-  (Langella, 2012) and visualized trees using FigTree v1.74 (Rambaut, 2012).

| Population structure and individual genetic variation
Using dimensionality reduction, we identified population structure associated with differentiation among geographic regions, and structuring by rearrangement genotype within these geographic regions.
Structuring within the total dataset uncovered trans-Atlantic and lati-  (Table S2).

| Coinheritance and linkage disequilibrium
Consistent with independent inheritance of rearrangements, we ob-

| Environmental and migratory association with genomic variation
Tests of polygenic environmental adaptation revealed associations primarily within chromosomal rearrangements in both North America and Europe, supporting a role for these regions in local adaptation.
Redundancy analysis (RDA) indicated that variation in principal components of salinity, oxygen, and temperature was strongly associated in with LG1, LG2, LG7, and LG12 rearrangements in both North America and Europe (Figure 5a,b). Random forest regression of SNPs with environmental variables also revealed consistent association of rearrangements with variation in temperature, oxygen, and salinity (Table 1, Figures S5 and S6). We also identified a shared positive association with at least one temperature variable and LG2, LG7, and LG12 rearrangement haplotype frequency on either side of the Atlantic (Table S3).
Redundancy analysis and random forest classification have been previously used to identify an association between migratory behavior and the LG1 rearrangement in North America (Kess et al., 2019). Using both of these methods to repeat this analysis in European waters, we found migratory phenotype associations predominantly with SNPs within the LG1 rearrangement, (Figure S7), suggesting parallelism of genetic architecture for this trait. We also found that frequency of the rearranged LG1 haplotype was positively correlated with migration in both North America and Europe (Table S3).

| Clustering of loci by multiple environmental associations
We found that most SNPs within rearranged regions belonged to the same functional cluster, indicating functional modularity, but in several instances outlier SNPs within multiple rearranged regions were found in the same environmental co-association cluster (Figure 6a,b). In North America, environmental clusters were driven by strength of association with principal components. Although  SNPs within the same rearranged region tended to show similar patterns of co-association, we observed SNPs from each rearrangement within the same cluster across multiple major co-association clusters (Figure 6a). In Europe, environmental clustering was driven by associations between principal components as well as individual environmental variables. The pattern of co-association of SNPs with the same environmental variable across rearrangements was less pronounced in Europe, indicating greater functional modularity.

Reduced co-association between rearrangements was greatest in
LG12; SNPs in LG12 exhibited almost no clustering with SNPs within other rearrangements (Figure 6b).

| Signatures of adaptation from shared variation
For each rearrangement, trees of genetic distance showed groupings distinct to rearrangement genotype. For LG1, individual genetic distance trees revealed no separation of LG1 homozygous rearranged genotype individuals by geography, supporting a hypothesis of adaptation from shared variation for these rearrangements (Figure 7a). LG2 rearranged homozygous LG2 nonrearranged homozygous LG2 rearrangement heterozygous

Rearrangement LG7
LG7 rearranged homozygous LG7 nonrearranged homozygous LG7 rearrangement heterozygous LG12 rearranged homozygous LG12 nonrearranged homozygous LG12  In LG2 and LG7, we observed separate clusters corresponding to homozygous rearranged, homozygous nonrearranged, and heterozygous genotype individuals (Figure 7b-c), which exhibited low trans-Atlantic differentiation relative to divergence by rearrangement genotype. Genetic distances between these clusters were larger in LG7 relative to LG2, indicating different rates or of differentiation.
In LG12, we observed low divergence among rearranged genotype individuals, suggesting this region is also shared across the North Atlantic (Figure 7d). We found strong separation between North America and Europe in nonrearranged genotype individuals in LG12 and observed distinct heterozygous genotype clusters for these regions, reflecting divergence in nonrearranged LG12 genotype individuals (Figure 7d).

Comparison of differentiation between North American and
European individuals at the same rearranged genomic region revealed patterns consistent with adaptation from shared variation for each rearranged region, but also revealed elevated trans-Atlantic divergence within nonrearranged haplotypes for LG1 and LG12 (Figure 7e,g,h).
These values were significantly below the chromosome-wide average for rearranged genotype individuals in LG1 (W = 5,129, p < .04, Independent inheritance of adaptive chromosomal rearrangements could facilitate exchange of these rearrangements as distinct adaptive units, enabling rapid production of novel ecotypes and colonization of a variety of habitats (Kirkpatrick & Barrett, 2015).
We found a shared pattern of migratory association of the LG1 rearrangement in independent polygenic association tests in Europe and North America. Associations of LG1 with migratory phenotype found here are consistent with past genome scan studies using migratory and nonmigratory ecotypes from waters around Europe  and genome-wide association in North American waters (Kess We find signatures of adaptation from shared variation across the Atlantic for chromosomal orientations specific to each chromosome. Using differentiation-based and genetic distance comparisons, we show low trans-Atlantic differentiation between the rearranged orientation of LG1, and LG12, and the nonrearranged orientation of LG7, suggesting selection on shared variation in these regions in both Europe and North America. These patterns of localized reduced differentiation are consistent with incorporation of these genomic regions into the genetic architecture of adaptive traits from shared variation (Barrett & Schluter, 2008). This observation is similar to differentiation observed between marine and freshwater stickleback in comparison with the inverted region housing the Atp1a1 gene, although the stickleback study focused specifically on multiple derived populations (Roesti, Gavrilets, Hendry, Salzburger, & Berner, 2014).
Our results provide further support for the increasingly prominent role of shared structural variation in facilitating local adaptation (Barrett & Schluter, 2008;Kirkpatrick & Barton 2006).
We also uncovered elevated trans-Atlantic differentiation in the nonrearranged orientations of LG1 and LG12, indicating distinct evolutionary histories of these chromosomal regions in Europe and North America. The identification of elevated trans-Atlantic divergence within rearranged LG1 and LG12 genotype individuals suggests gene content in these regions may be amenable to repeated selection during ecological specialization in Atlantic cod. Repeated divergence of the same genomic region has been found to underlie parallel evolution of pelvic spine reduction in threespine stickleback through repeated deletions within Pitx1 regulatory regions (Chan et al., 2010), and evolution of albinism in Astyanax cavefish has also arisen from independent deletions within the Oca2 locus (Protas et al., 2006). Alternatively, reduced recombination and a long period of trans-Atlantic divergence in ancient rearrangements predating speciation may also drive this elevated divergence relative to the genomic background. Surprisingly, we find that nucleotide diversity is reduced in the LG12 rearranged region for nonrearranged individuals despite high trans-Atlantic F ST and d XY , suggesting diversity-reducing sweeps in these regions. However, low precision due to low density of SNPs, ascertainment bias, and high variance in d XY when applied to SNP data may bias measures of diversity and divergence (Wakeley, 1996). Future characterization of long-read sequence from LG1 and LG12 across Europe and North American Atlantic cod may help clarify factors driving divergence identified here.
Shared, modular, structural variation is hypothesized to be integral to ecological adaptation and may facilitate repeatability of the genomic architecture underlying parallel adaptation. We show that modular, shared chromosomal rearrangements are associated with ecological adaptation within Atlantic cod. Additionally, we show that separate divergence in LG12 and LG1 rearrangements has occurred in North American and European populations, revealing a nonparallel component to adaptation within the same, structurally variable region of the genome. Our findings here also provide new understanding of biocomplexity organized at different spatial scales within this species. Substantial cryptic variation is mediated by these rearrangements, indicating different adaptive potentials among cod stocks. Understanding and conserving this variation may be crucial in F I G U R E 7 Comparison of F ST along LG1, 2, 7, and 12 between individuals with each rearrangement genotype in North America and Europe and genetic distance trees for each rearrangement. (a-d) Comparison of F ST along LG1, 2, 7, and 12 between alternative homozygous rearrangement genotype categories in North America and Europe. Black vertical lines indicate the boundaries of each rearrangement. (e-h) Range-wide individual trees of genetic distance for LG1, 2, 7, and 12 rearrangements calculated using Cavalli-Szforza and Edwards chord distances mitigating anthropogenic impacts on Atlantic cod such as those imposed by climate change and overharvest.

ACK N OWLED G M ENTS
We thank the Centre for Integrative Genetics (CIGENE) for genotyping of Atlantic cod used in this study. This study was supported by the Genomics Research and Development Initiative (GRDI) of the Department of Fisheries and Oceans Canada (DFO) and the Natural Sciences Engineering and Research Council Canada (NSERC).

CO N FLI C T O F I NTE R E S T
None declared.