Genomic differentiation in Pacific cod using Pool‐Seq

Abstract Patterns of genetic differentiation across the genome can provide insight into selective forces driving adaptation. We used pooled whole genome sequencing, gene annotation, and environmental covariates to evaluate patterns of genomic differentiation and to investigate mechanisms responsible for divergence among proximate Pacific cod (Gadus macrocephalus) populations from the Bering Sea and Aleutian Islands and more distant Washington Coast cod. Samples were taken from eight spawning locations, three of which were replicated to estimate consistency in allele frequency estimation. A kernel smoothing moving weighted average of relative divergence (F ST) identified 11 genomic islands of differentiation between the Aleutian Islands and Bering Sea samples. In some islands of differentiation, there was also elevated absolute divergence (d XY) and evidence for selection, despite proximity and potential for gene flow. Similar levels of absolute divergence (d XY) but roughly double the relative divergence (F ST) were observed between the distant Bering Sea and Washington Coast samples. Islands of differentiation were much smaller than the four large inversions among Atlantic cod ecotypes. Islands of differentiation between the Bering Sea and Aleutian Island were associated with SNPs from five vision system genes, which can be associated with feeding, predator avoidance, orientation, and socialization. We hypothesize that islands of differentiation between Pacific cod from the Bering Sea and Aleutian Islands provide evidence for adaptive differentiation despite gene flow in this commercially important marine species.

elevated differentiation have been referred to as 'genomic islands of differentiation' (Wolf & Ellegren, 2017) or 'genomic islands of speciation' (Turner et al., 2005). We use the phrase 'genomic islands of differentiation' because here we focus on local adaptation rather than speciation and these regions are not necessarily associated with speciation.
Natural selection in different environments may influence the emergence and persistence of islands of differentiation, resulting from a complex range of mechanisms (Yeaman, 2013). Islands of differentiation can arise from chromosomal rearrangements, selection with hitchhiking, and background selection. Low recombination in regions of genomic rearrangements may facilitate the evolution of islands of differentiation, because genetic linkage reduces recombination of adaptive combinations of alleles (Via, 2012;Yeaman, 2013).
Chromosomal rearrangements, or inversions, restrict recombination in heterokaryotypes and can be adaptive or neutral (Lotterhos, 2019;Noor & Bennett, 2009). Background selection can result in increased levels of differentiation due to selection against deleterious alleles at linked loci (Charlesworth et al., 1993). Selection with hitchhiking occurs when regions of the genome subject to strong selection promote secondary changes in gene frequencies in closely linked regions (Smith & Haigh, 1974;Wolf & Ellegren, 2017).
The capacity for local adaptation rests on a balance between the strength of selection and gene flow, or migration-selection balance (Graham et al., 2018). During the emergence of local adaptation among connected populations, the majority of the genome is thought to be subject to gene flow and few divergent clusters are expected (Nosil et al., 2009;Via, 2012). Stronger selection is thought to cause islands of differentiation to increase in size (Graham et al., 2018;Nosil et al., 2009;Yeaman & Whitlock, 2011).
Background selection against strongly deleterious mutations can also lead to nonneutral distortions in the allele frequency spectrum (Charlesworth et al., 1993;Cvijović et al., 2018;Good et al., 2014).
In a region in which the deleterious mutation rate is elevated, there will be an excess of rare alleles because selection cannot act instantaneously (Cvijović et al., 2018;Good et al., 2014). As such, distinguishing background selection from positive selection and selection with hitchhiking might not be possible because they may both lead to an excess of rare alleles in the site frequency spectrum (Cvijović et al., 2018). Nevertheless, a comparison of relative (F ST ) and absolute (d XY ) differentiation between populations with Tajima's D and nucleotide diversity within populations may provide insights into the relative importance of selection and migration in shaping patterns of genome variability.
Selection in connected populations is particularly relevant in marine species supporting commercial fisheries to understand the biological basis behind established management units. Genetic population structure has been documented among parapatric groups of Pacific cod that spawn in ecologically and biophysically different regions across their range (Drinan et al., 2018;Spies, 2012;Spies et al., 2020). In Alaska, Pacific cod are managed in three management units corresponding to the Gulf of Alaska, Aleutian Islands, and eastern Bering Sea. A general pattern of isolation by distance among spawning groups of Pacific cod (Cunningham et al., 2009;Drinan et al., 2018;Spies, 2012) is punctuated by locally pronounced genetic differentiation between the Aleutian Islands and eastern Bering Sea (Spies, 2012) and between the western and eastern Gulf of Alaska (Drinan et al., 2018;Spies et al., 2021). However, divergence across the genome between Aleutian Islands and eastern Bering Sea cod has not been examined. Such data may further elucidate levels of connectivity as well as mechanisms of local adaptation in these areas, which support the largest cod fishery in the United States, with an average annual gross value of $379 million between 2016 and 2020 (Fissel et al., 2020).
Examination of adaptive divergence in Pacific cod is also merited given differences observed in physical environment and prey of cod in these two regions. The Bering Sea contains a prominent and broad (500 km) continental shelf adjacent to a deep-sea basin that stretches from the Alaska Peninsula to the Bering Strait. In contrast, the Aleutian Islands consist of a narrow steep shelf formed by an 1800 km long chain of volcanic mountaintops (Logerwell et al., 2005). While Pacific cod occurs in the western and eastern Bering Sea (O'Leary et al., 2022), we focus on Pacific cod in the United States management region of eastern Bering Sea (Stevenson & Lauth, 2019). The eastern Bering Sea (EBS) continental shelf consists of three depth regions: the inner (0-50 m depth), middle (50-100 m), and outer domain (100-180 m; Stabeno et al., 2012). The extent of sea ice in the Bering Sea during winter varies interannually, but has declined in recent years (Grebmeier et al., 2006;Stevenson & Lauth, 2019). The timing of the maximum ice extent and its retreat affects the species composition across a range of taxa from the zooplankton community to higher trophic level predators (Stevenson & Lauth, 2019). The physical environment of the Aleutian Islands is dynamic; islands are separated by passes that allow the transfer of water from the North Pacific Ocean northward and the water quality and species distributions change along the chain in response to shifting currents and passes (Hunt Jr & Stabeno, 2005;Logerwell et al., 2005). For example, there is an abrupt shift in the physical environment and species composition at Samalga Pass (169° W), an area that divides warmer, nutrient-poor coastal waters of the Alaska Coastal Current to the east and the colder, nutrient-rich oceanic Alaskan Stream to the west (Hunt Jr & Stabeno, 2005).
These distinct physical environments, and the resulting shifts in ecological communities, drive diet differences between Pacific cod in the Aleutian Islands and EBS. Walleye pollock (Gadus chalcogrammus) are a major diet component of Pacific cod in the EBS (26%), but in the Aleutians, Atka mackerel (Pleurogrammus monopterygius) and sculpins are the predominant fish prey (15% each), while pollock comprise only 5% (Aydin et al., 2007). Snow and tanner crab (Chionoecetes opilio and C. bairdi) make up 9% of cod diets in the EBS, but less than 3% in the Aleutian Islands. In contrast, squids comprise over 6% of cod diets in the AI but are a negligible proportion of diets in the EBS. Myctophids (family Myctophidae) are also found only in cod diets in the Aleutian Islands, but not in the EBS even though they are present in the diets of other EBS fish species (Aydin et al., 2007;Lang & Livingston, 1996).
Lessons from the congeneric Atlantic cod (Gadus morhua) may help inform the genomics of Pacific cod; in Atlantic cod, several ecotypes are known, including Northeast Arctic cod (NEAC) and the Norwegian coastal cod (NCC). The genomic basis of these ecotypes appears to be related to inversions on linkage groups 1, 2, 7, and 12 (Árnason & Halldórsdóttir, 2019;Barth et al., 2017;Hemmer-Hansen et al., 2013).
Genes within these inversions appear to be responsible for maintaining selective differences related to habitats and life history (Barth et al., 2017;Kirubakaran et al., 2016). It is currently unknown if such inversions also drive population structure in Pacific cod.
Some knowledge of the demographic history of Pacific cod is valuable for interpreting genomic differences. Pacific cod are believed to have originated from an Atlantic cod ancestor that moved into the Pacific Ocean 3.8 million years ago (Árnason & Halldórsdóttir, 2019).
During Pleistocene glaciations, water levels fluctuated and the Bering Sea shelf was exposed during several time periods, forming the Bering land bridge as recently as 30,000-18,000 years ago (Elias et al., 1996). Glaciers extended from the Alaska Peninsula to the Aleutian chain through the southern extent of Alaska to northwestern Canada southward to Puget Sound (Batchelor et al., 2019), during which time cod likely migrated southward. Cod recolonized the North Pacific as glaciers receded, as recently as 14-15 kyr before present (Mann & Hamilton, 1995;Menounos et al., 2009). Analysis of the putative zona pellucida gene is consistent with this scenario, as the present-day southern populations are more closely related to Atlantic cod and more northerly populations are more derived at this locus (Spies et al., 2021).
The aim of this study was to compare genomic patterns of differentiation among proximate versus distant Pacific cod populations from ecologically distinct regions and evaluate regions of the genome that may be subject to local adaptation. We applied Pool-Seq to Pacific cod from the ecologically different Aleutian Islands and Bering Sea among which gene flow may occur, and to allopatric groups (EBS and Washington Coast), between which migration has not been observed despite thousands of tagging records (Bryan et al., 2021;Rand et al., 2014;Shimada & Kimura, 1994). These parapatric and allopatric groups were selected to provide examples across several levels of population divergence (Stankowski & Ravinet, 2021;Wolf & Ellegren, 2017). At the time this study was initiated, individual-based whole genome sequencing was not a costeffective option. The power to identify departures from panmixia in marine fish with large effective population sizes increases with the number of genetic markers and individuals (Kalinowski, 2005;Vendrami et al., 2017;Waples & Gaggiotti, 2006). Therefore, we selected Pool-Seq, which offers the capacity to sequence the genome more thoroughly than sequence capture (Harvey et al., 2016) or reduced representation sequencing (e.g., RAD-seq, Andrews et al., 2016), and at a lower cost than individual-based sequencing.
Pool-Seq also provides the combined contributions of individuals in the pool and has been shown to provide robust and reliable allele frequencies (Anand et al., 2016). The disadvantages to Pool-Seq stem from the need to calculate allele frequencies over all pooled samples, which masks individual genotypes and precludes its utility for tests of linkage disequilibrium or population assignment (Dorant et al., 2019). In addition, unequal contributions of individuals in the pool can increase the error of allele frequency estimates (Rode et al., 2018). Nonetheless, Pool-Seq has been used to successfully examine adaptive divergence and genetic population structure in a range of species (Dennenmoser et al., 2017;Dorant et al., 2019;Han et al., 2020;Kurland et al., 2019). This study was also well suited to Pool-Seq because previous studies have defined genetic population structure and spawning units, and all cod were presumed to be in or near spawning stage (Cunningham et al., 2009;Spies et al., 2020).
A primary hypothesis was that genomic divergence between eastern Bering Sea and Aleutian Islands cod would manifest as a limited number of highly differentiated clusters if selection were sufficiently high, because gene flow is thought to be limited (Spies, 2012;Spies & Punt, 2015;Yeaman & Whitlock, 2011). We also expected that temperature would play a role in local adaptation, as winter bottom temperatures in the Aleutian Islands are roughly 2°C warmer in the winter than the Bering Sea, and cod are adapted to a relatively narrow temperature range at all life stages (Barbeaux et al., 2020;Hurst et al., 2010;Laurel et al., 2008). For example, in the Gulf of Alaska, Pacific cod has suffered recent declines in abundance due to the impact of marine heatwaves, and increased ocean temperatures are predicted under climate change (Barbeaux et al., 2020;Capotondi et al., 2012). Specifically, we hypothesized that (1) genetic differentiation between parapatric groups which may be subject to gene flow would vary across their genomes, while differentiation would be both more pronounced and more widespread across the genomes of allopatric groups, (2) regions of elevated divergence may provide clues to the selective mechanisms responsible for divergence among the   Figure 1). Studies of genetic stock structure in Pacific cod are typically limited to spawning fish because populations may intermingle outside the spawning season (Cunningham et al., 2009;Drinan et al., 2018;Rand et al., 2014). While Pacific cod typically spawn between January and April , the Zhemchug sample, taken May 9, 2017, was considered close enough to that time window to be considered a late spawning sample. Individual gonadal condition was not recorded, but all samples were from adult cod that were taken from known spawning areas during spawning season and likely to be reproductively mature and in spawning condition. DNA from tissues stored in 95% ethanol from 88 to 96 individuals from each unique collection location and year ( Table 1)  nwgs.gs.washi ngton.edu/). Pooled samples were sheared using a Covaris LE220 focused ultrasonicator targeting an insert size of 350-380 base pairs (bp). The resulting sheared DNA was cleaned with Agencourt AMPure XP beads to remove sample impurities prior to library construction. A two-sided AMPure cleanup was then performed in order to further restrict the fragment sizes to the desired range. End-repair, A-tailing, and ligation were performed as directed by the KAPA Hyper Prep Kit (KR0961 v1.14) protocol without amplification. A final AMPure cleanup was performed after ligation in order to remove excess adapter dimers from the library. All library construction steps were automated on a Janus Automated workstation (Perkin Elmer, Inc.).
Final library concentration prior to sequencing was determined by triplicate qPCR using the KAPA Library Quantification Kit (KK4824), and molecular weight distributions were verified using an Agilent Bioanalyzer (Agilent Technologies). Samples were sequenced on a HiSeq X using Illumina's HiSeq X Ten Reagent Kit (v2.5). Cluster generation was performed on a cBot modified for use with the HiSeq X flow cells, and flow cells were loaded onto the HiSeq X machine for sequencing.

| Bioinformatics pipeline
The pipeline we used deviated from the standardized pipeline to account for pooled sequences, the nonmodel species, and the alignment of our sequence data to the reference genome of a related species (Atlantic cod). Within GATK (v4), we used the GenomeAnalysisToolkit v. 4.1.2.0 to identify a set of singlenucleotide polymorphisms (SNPs) and to estimate allele frequencies for each pool. We designed and built a variant calling pipeline, including filtering for minimum and maximum read depth, TA B L E 1 Collection location of samples used in each pool, month and year of collection, latitude, longitude, and the number of individuals in collection (n)

Location
Month A second set of filters was applied to remove SNPs exceeding set ranges for read depth per pool. The minimum read depth was 20× for the linkage group and scaffold data, and the maximum read depth was 2% of the total reads of each pool. The second filtering step removed any loci with missing data in any of the pools and retained only biallelic and variant SNPs. The full pipeline is described in more detail in the Appendix S1. Lin's (1989Lin's ( , 2000 concordance correlation coefficient, ρ c , was calculated to measure relative agreement and validate results between the estimated allele frequencies among duplicated pools, using the R package epiR (Stevenson, 2020). More closely related pools were expected to have higher correlation. This statistic, which ranges from 0 to 1, with larger values indicative of higher concordance, compares allele frequency calls between two pools to determine deviation from perfect concordance:

| Assessing patterns of genetic differentiation
where μ x and μ Y are the means of the allele frequencies in pools X and Y, and X and Y are the corresponding variances. The concordance correlation coefficient, ρ c , was calculated for the three duplicated pools (Washington, Pervenets, and Kodiak) to determine the relative correlation among them, as well as all other pools for comparison. The allele frequency matrix with the full filtered set of 1,944,780 variant SNPs was used for this analysis.
F ST and d XY were used to screen genomic regions of elevated divergence. F ST is a relative measure of genetic differentiation (Wright, 1931) whose expectation is inversely correlated with genetic variation in one or both populations under comparison. Elevated pairwise F ST may be due to lower within-group genetic variation, rather than divergence among groups (Cruickshank & Hahn, 2014).
Therefore, an absolute measure of divergence, d XY , whose expectation is independent of genetic variation was also used. The statistic d XY summarizes the average number of nucleotide differences between pools of samples using an equation for unphased data: where p X is the proportion of the reference allele in pool X and p Y is the proportion of the reference allele in Dennenmoser et al., 2017;Schirrmann et al., 2018). Pairwise F ST was calculated globally between all pools and per-SNP between the three superpools described below (EBS, AI, and WA). F ST was calculated in PoPoolation2 (Kofler et al., 2011) using Nei's (1973) where Ĥ T is the expected heterozygosity in combined pools/superpools and Ĥ S is the expected heterozygosity in each pool/superpool (Hivert et al., 2018). The components Ĥ T and Ĥ S were calculated using Hivert et al. (2018), equations 17 and 18. While this measure of F ST is considered biased (Hivert et al., 2018), the bias was consistent among pools because we retained similar read depth and sample size per pool (Table 1).
Pooled data were grouped into superpools according to the previously identified patterns of genetic differentiation between cod spawning in those regions (Drinan et al., 2018;Spies et al., 2020) and patterns identified in the principal components analysis (PCA) described below. Collections from Kodiak Island were included in the Bering Sea superpool based on these criteria. Three superpools 1 40°W 120°W Longitude Latitude were formed: the eastern Bering Sea superpool ("EBS"), the Aleutian Islands superpool ("AI"), and the Washington Coast ("WA") superpool (Table 1). The Washington coast samples were designated as a third superpool due to their genetic distinctiveness. Allele frequencies for superpools were calculated as the arithmetic mean of the pool allele frequencies. We adapted the methodology of Hohenlohe et al. (2010) and Waters et al. (2018) for the KSMWA to accommodate pooled whole genome sequence data. The window was shifted along the genome by a step size that allowed the windows to overlap.

| Sliding window analysis and islands of differentiation
We evaluated a range of step sizes from 7 to 50 kb and choices of window size sigma (σ) from 10 to 80 kb to balance smoothing the Gaussian weighted average was calculated for each of the 10 6 draws, arranged in random order. p-values for each window were estimated as the proportion of bootstrap replicates higher or equal than the empirical weighted mean F ST 's. Statistics were adjusted for multiple tests using the Benjamini-Hochberg false-discovery rate method (Benjamini & Hochberg, 1995 was generally a threshold above which outlier windows appeared prominent ( Figure S1). Outlier regions of interest were defined by the center points of full windows that shifted by step sizes of 20,000 bp. Therefore, the size of outlier regions was allowed to be smaller than full window sizes (180,000 bp). The coefficient of variation of the number of significant F ST outlier windows per linkage group was also quantified for the EBS-WA and the EBS-AI superpool comparisons.
Weighted mean d XY and F ST , and mean nucleotide diversity (π), and Tajima's D (T D , Tajima, 1989) were summarized over all F ST outlier and nonoutlier windows for each KSMWA comparison (EBS-AI and EBS-WA) and for each EBS-AI island of differentiation. Islands of differentiation for EBS-WA comparisons were not included due to the much larger scale of differentiation among these superpools. For islands of differentiation between EBS-AI superpools, we quantified the presence of windows with d XY greater than the upper 5% quantile over all windows and windows with d XY greater than the genome-wide average. In early stages of population differentiation, d XY may not be a reliable indicator of reduced gene flow (Cheng et al., 2021;Dennenmoser et al., 2017;Feulner et al., 2015). Therefore, for the EBS-AI superpool comparison, Tajimas's D provided information on whether selection or reduced gene flow may be implicated in regions of increased divergence, and nucleotide diversity was used to assess genetic variation (Nei, 1986). Reduced nucleotide diversity (π) can result from strong directional selection, as well as background selection and selective sweeps (Booker & Keightley, 2018). Nucleotide diversity (π) was estimated in each of the 29,825 windows for each superpool as: where j is the derived allele count (j = the reference allele frequency × ploidy), C is the ploidy, or (20) × the number of pools in each superpool, summation takes place over all sites (including monomorphic sites) in each window, and n is the size of the window (180,000 bp). Tajima's D was also estimated in each sliding window as: Here, S n = the number of SNPs in each sliding window. Note that each superpool had the same number of SNPs per window, but the number of SNPs per window varied.
Principal components analysis (PCA) was performed using all 1,944,780 SNPs in the full dataset using the R package PCAdapt (Luu et al., 2017) to visualize the genetic relationships among pools. The optimum number of principal components retained for analysis was determined based on Cattell's rule (Cattell, 1966). We

| Gene annotations
The 11 EBS-AI islands of differentiation were aligned with all Chlorophyll has been widely used as a proxy for marine productivity because it is indicative of phytoplankton biomass, and zooplankton, the prey of larval fish, forage on phytoplankton (Boyce et al., 2014;Hughes et al., 2018;Kristiansen et al., 2011). Recruitment in Pacific cod is more correlated with flow along and across the Bering Slope than other groundfish species, indicating that current velocity and direction are significant factors in Pacific cod early life history (Vestfals et al., 2014). We used BayPass v.2.2 (Gautier, 2015) to identify SNPs that were correlated to environmental variables within a Bayesian framework.

| Sliding window analysis and islands of differentiation
For the F ST sliding window analysis, a step size of 20 kb and σ = 30 kb was selected for the Gaussian kernel smoothing moving weighted average (KSMWA) because it optimized between SNPs per window, variation in F ST , and reduced noise within windows ( Figure S3 and  (Figure 3). Furthermore, the coefficient of variation of the number of outlier windows per linkage group was higher for the EBS-AI comparison (CV = 0.026) than for EBS-WA (CV = 0.015), indicating more uneven distribution of outlier regions in the EBS-AI comparison. The distribution of weighted average F ST values for all windows was higher in EBS-WA than in EBS-AI outlier and nonoutlier windows, indicative of a generally higher F ST between allopatric than parapatric Pacific cod (Table 2, Figure 4). In contrast, mean d XY was identical between EBS-WA and EBS-AI comparisons in nonoutlier regions, and slightly higher (0.00059 vs. 0.00053) in the EBS-WA comparisons than in the EBS-AI comparison ( Table 2).

Mean Tajima's D was generally negative across all outlier and
nonoutlier regions across all EBS-AI and EBS-WA superpool comparisons, but lower in outlier regions versus nonoutlier regions (Table 2). Tajima's D was similar between the EBS among outlier versus nonoutlier regions (EBS-WA comparison), but remained negative. Nucleotide diversity, π, was lower in the EBS and Aleutian Islands outlier than nonoutlier regions, and EBS pools showed higher nucleotide diversity than Washington coast or Aleutian Island pools, particularly on LG06 and LG19 islands of differentiation (Table 2).

| Gene annotations
Within the 11 islands of differentiation, we identified 68 regions with similarity to known genes (Table 4 and Table A5, Figure 5). The genes were responsible for a variety of functions, but notably there were five genes related to vision were identified within F ST outlier regions (Table A5) Table A5.

TA B L E 4 Annotated genes within
which F ST outlier SNPs regions were found located adjacent to the F ST outlier on LG18 in a region of aboveaverage d XY and were therefore considered potentially relevant.

| Environmental correlation
Salinity, temperature, current velocity, and chlorophyll patterns differed among regions ( Figure S4 and Table A6). The lowest chlorophyll was typically found in Bering Sea sampling areas, Zhemchug and Pervenets. Salinity was highest at the Washington Coast site and lowest off Kodiak. Temperature was highest off the Washington Coast (~8°C) and lowest in Bering Sea sites, 2-3°C.
The highest current velocities were found in the Aleutian Islands, with Adak the highest, followed by Near and Kiska. BayPass results were significantly correlated over each set of three independent runs (Table A7); therefore, results were averaged over each set of three runs. Among the EBS-AI comparisons, the most notable increase in Bayes factors in regions identified as islands of differentiation was found in linkage group 12 ( Figure   S5). Tabulation of the correlates most strongly correlated with allele frequencies indicated that in LG12_1, velocity had the largest effect, with 135 SNPs with Bayes factors >30 (Table 2).
Velocity was the biggest covariate in LG08_1, with 38 SNPs strongly correlated. The second most significant correlate after velocity was salinity, with 29 strongly correlated SNPs in LG08_1 (Table 2). In the EBS-WA BayPass analysis, results were tabulated over all data due to the increased number of high F ST regions. In this comparison, the highest number of strongly correlated Bayes factor SNPs was correlated with salinity (36%) and velocity (32%), versus temperature (20%) and chlorophyll (12%, Table 2).
The EBS shelf has lower water clarity than the Aleutian Islands  Table 2. Shaded vertical lines represent outlier regions within which linkage groups with F ST outliers are greater than 0.030 and pairwise d XY is greater than the genome average (0.000539 × 10, horizontal black line). Blue points represent the upper 5% quantiles for d XY , and pink dots indicate F ST outlier windows. Orange triangles represent locations of annotated genes within F ST outlier windows, which are labeled corresponding to the gene number in Table A5, and vision gene names are listed.

depth, where light transmission to the seafloor in the Aleutian
Islands is slightly more than an order of magnitude higher than the EBS (difference in near-bottom optical depth: 2.45). In addition, there is an east-west break in GAM residuals in the Aleutian Islands around Samalga Pass (169°28′W), where bottom optical depth is higher ("darker") than predicted to the east of Samalga Pass than to the west (Figure 6b).

| DISCUSS ION
While Pacific cod from the Aleutian Islands and EBS are sufficiently genetically differentiated to merit separate management units (Drinan et al., 2018;Spies, 2012;Spies & Punt, 2015), there is little information on the mechanisms driving these differences.
We examined how genomic patterns of differentiation and environmental covariates differed among these proximate and distant comparisons, inferred processes by which differentiation occurs across the genome, and examined genetic factors that may play a role in divergence within islands of differentiation between Aleutian Island and Bering Sea cod. A primary finding was a pattern of heterogeneous genomic differentiation between Pacific cod from the Aleutian Islands and EBS. We identified 11 genomic islands of differentiation in nine (of 23) linkage groups between the proximate EBS and Aleutian Islands, and the presence of many more F ST outlier regions between the EBS and Washington coast (Table 2; Figure 3).
The EBS-AI islands of differentiation ranged in size from 60,000 to 940,000 bp, and weighted average F ST over these 11 islands ranged from 0.023 to 0.067 ( Figure 3; Table A4). The limited number of EBS-AI islands of differentiation interspersed with regions of low F ST were consistent with emergent local adaptation among populations experiencing migration-selection balance, similar to our first hypothesis, although alternative explanations and limitations are discussed below.
High F ST and elevated d XY in five of the 11 genomic islands, particularly on LG12 and LG16 ( Figure 5; Table 2), may indicate local adaptation or inversions (Lotterhos, 2019), but the Pool-Seq data preclude further examination of the nature of the outlier regions.
There was evidence for selection in the Aleutian Islands region of LG06_1 and in the EBS region of LG08_1. In LG08_1, reduced nucleotide diversity in the EBS and Aleutian Islands could alternatively be a result of background selection or directional selection with hitchhiking in both regions. In LG14_1, reduced Tajima's D and reduced nucleotide diversity provided evidence for background selection, although it is not always possible to distinguish between background selection and directional selection with hitchhiking (Booker & Keightley, 2018). Higher nucleotide diversity in EBS pools could be a result of selection on Aleutian Islands or Washington Coast cod, or due to the disproportionately larger size of the EBS cod stock, as larger population sizes may lead to higher nucleotide diversity (Spies et al., 2020;Subramanian, 2019).
The pairwise F ST between the EBS and Washington coast was roughly double that between the Aleutian Islands and Bering Sea spawning populations (Table 3; Figure 3). This may be consistent with a framework in which gene flow acts to homogenize allele fre- Islands populations began diverging soon after they colonized the region following the recession of the last glacial maximum (Canino et al., 2010).
Most EBS-AI F ST outlier regions were also present in the EBS vs. Washington comparison (Figure 3), indicating that these EBS-AI  outlier regions may be important for EBS cod. Regions of elevated differentiation due to hitchhiking with genes under selection start as smaller regions that grow into wider regions over time and tend to be more pronounced in small populations where linkage is higher (Feder & Nosil, 2010). In contrast, islands of differentiation that are the result of genomic inversions are more likely to emerge in high gene flow species with strong selective pressure (Schaal et al., 2022). The divergent region on linkage group 12, the most predominant island of differentiation with the highest EBS-AI mean F ST , was not present in the EBS-WA comparison, suggesting that this outlier region may have developed relatively recently if EBS and AI populations diverged more recently (Figure 3).
While there is no clear evidence for the mechanism leading to the island of differentiation observed on LG12 in Pacific cod, it is slightly less than 1 megabases in size, much smaller than four large inversions that have been identified in Atlantic cod between Northeast Arctic and Norwegian Coastal ecotypes Sodeland et al., 2016). In Atlantic cod inversions on linkage groups 1, 2, 7, and 12 were 17, 5, 9.5, and 13 megabases long, respectively (Kirubakaran et al., 2016;Sodeland et al., 2016). While the island of differentiation in Pacific cod is on the same linkage group as one inversion in Atlantic cod (linkage group 12), it appears located between 15 and 16 mB and is outside the range suggested by previous studies (0.4-0.6 mB-14mB) which aligned to GadMor2 (Barth et al., 2017), and GadMorCeltic (Kirubakaran et al., 2016).
We hypothesized that temperature would be the most significant environmental correlate to genomic variation; however, current velocity and vision stood out as potential factors leading to islands of differentiation but temperature did not. Strong correlation with velocity based on BayPass results indicated that genes located within linkage group 12 outliers were associated with different environments of the EBS and AI, although the precise relationship is not clear (Table 2, Figure S4). Furthermore, the finding of five genes related to vision in outlier regions differentiating the Aleutian Islands and Bering Sea may be an indication of the genes underlying selection. However, identifying targets of selection within islands of differentiation is difficult because the lack of recombination, whether due to inversions or selection with hitchhiking, can link true targets selection with false positives . Visual systems have been shown to be under strong natural selection in several fish species including Atlantic cod in which differential expression of visual opsins and the rhodopsin rhI gene varies by ecotype Hofmann & Carleton, 2009;Pampoulie et al., 2015;Valen et al., 2018). In other species, visual systems have evolved in response to environment; for example, the visual system of Midas cichlids (Amphilophus cf. citrinellus) has rapidly evolved to adapt to the clear water in crater lakes from more ancestral turbid lakes (Torres-Dowdall et al., 2017).
Our analysis of near-bottom optical depth suggests there are depth-associated differences in water clarity among cod habitat in the EBS and Aleutian Islands, as well as differences within the Aleutian Islands chain itself ( Figure 6). The break in water clarity conditions at Samalga Pass is presumably due to higher concentrations of chlorophyll (Mordy et al., 2005), chromophoric dissolved organic matter, and nonalgal particulate in the Alaska Coastal Current than in the more ocean-influenced waters to the west. Specific genes identified in outlier regions for Pacific cod included CRB1, which plays a role in photoreceptor morphogenesis in the retina, two genes related to retinoic acid (rpe65c and Gprc5), a gene related to retinal rod rhodopsin (PDE6G) and an opsin receptor (OPN3), and are consistent with life history strategies and adaptation associated with vision (Table 4). While the association between vision genes and islands of differentiation in Pacific cod is not evidence for causation, research in other teleosts and the commonality with Atlantic cod in which vision genes are differentially expressed among ecotypes provides a basis for further exploration of potential effects of foraging, predator avoidance, orientation, and social behavior (Hofmann & Carleton, 2009;Valen et al., 2018).
The data passed multiple checks, confirming the reliability Pool-Seq methodology for genotyping. Biological replicate pools have been used in other studies too as a quality check for accurate sample contribution and allele frequency calling (Dorant et al., 2019;Guirao-Rico & González, 2021;Hivert et al., 2018). Read depth (50-100) was within the range that is considered sufficient for resolution of the allele frequency spectrum, distinguished evolutionary patterns, and provided sufficient power for Tajima's D (Ferretti et al., 2013), while the number of samples per pool (43-48) was sufficient to reduce experimental bias (Dorant et al., 2019;Kofler et al., 2011). High relative agreement in allele frequencies among the Washington Coast pools was consistent with expectations that pools consisting of the same or similar individuals would produce highly correlated allele frequencies. The PCA also confirmed similarity among duplicate and proximate samples (Figure 2), although allele frequencies differed between the two Washington replicated pools by only adding five individuals (Figure 2). In the case of the Pervenets pools, high correlation indicated that pooling based on concentration of DNA within the observed ranges did not impact the results. F ST estimates decreased with increasing concordance correlation coefficients, providing a second line of evidence for genotype similarities among pools ( Table 3). The concordance correlation coefficient between the two Kodiak samples taken in different years was relatively high (6th highest out of 55 comparisons), but was lower than several comparisons between Kodiak and EBS spawning cod. The similarity between Kodiak and Pervenets samples is confirmed by pairwise F ST ; F ST between temporal replicates from Kodiak were higher (indicating less similar) than any pairwise comparisons between Kodiak 2003Kodiak /2005 and Pervenets Pool A or B. We do not completely understand the level of connectivity between western Gulf of Alaska spawning cod and those from the Bering Sea shelf, although the PCA is consistent with isolation by distance observed in previous studies (Figure 2; Cunningham et al., 2009). Similarly, several previous studies found similarity among Kodiak samples and Unimak spawning cod from the EBS ( Table A2; Cunningham et al., 2009;Drinan et al., 2018).
While this was the first population genomics study of Pacific cod using whole genome sequencing, the pooled WGS platform posed some drawbacks. Without individual-level data, there was no insight into linkage disequilibrium, which could provide evidence for genomic inversions. Further, ploidy was limited to 20 per pool, so differences among pools could only be observed in increments of 0.05, and subtle differences among pools may have been lost, although this is unlikely given the large number of SNPs. High replicability between pools provided confidence that the underlying measures of population allele frequencies were accurate. While the patterns observed in the PCA appear sound, the nature of Pool-Seq limited the number of principal components in the data to 10, and the resulting PCA was optimized with only a single principal component that explained 13.3% of the variance. We also acknowledge that our analyses were performed over large window sizes (σ = 30 kb, step size = 20 kb), which could reduce the possibility of finding the effects of single genes. Another concern was that while alignment to the genome of a congeneric species is routine among nonmodel species, chromosomal rearrangements may not have aligned and could have been missed. Therefore, we anticipate that our approach laid the groundwork for future studies to examine outlier regions in Pacific cod in more detail. Future work that includes low coverage whole genome sequencing (lcWGS) on an individual basis will provide further understanding of the outlier regions identified here, as well as identifying whether outlier regions represent inversions or regions of linked selection (Lou et al., 2021).
Overall, data provided new evidence for heterogeneous differentiation across the genome between spawning populations of cod from the EBS and Aleutian Islands, which we hypothesize may be the result of local adaptation despite some low level of gene flow. Furthermore, we found more extensive levels of relative differentiation but similar levels of absolute divergence among cod from the allopatric EBS vs. Washington coast. These results indicate that the EBS and Aleutian Islands may have started diverging soon after cod recolonized the North Pacific following the last glacial maximum.
The presence of 11 islands of differentiation, five of which showed some level of elevated d XY , and evidence for directional selection and background selection on two other islands of differentiation indicate that local adaptation has occurred between cod from these different environments. Results also indicate that these spawning groups are in migration-selection balance and that selection may be strong enough to balance the effects of gene flow. While our study was not designed to provide direct evidence of genes responsible for adaptation, annotated genes suggested that vision may play a role in adaptation to the distinct ocean environments of the Bering Sea and Aleutian Islands. Results build upon previous studies indicative of divergence between EBS and Aleutian Islands Pacific cod.

ACK N OWLED G M ENTS
The authors thank the RACE Groundfish Programs of the NOAA Fisheries Alaska Fisheries Science Center, Alisa Abookire, Asia

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in the Sequence Read Archive (SRA), which is accessible from National Center for Biotechnology Information (NCBI) at https:// www.ncbi.nlm.nih.gov/sra/, reference number PRJNA675289 (University of Washington, 2021).