Evidence for selection and spatially distinct patterns found in a putative zona pellucida gene in Pacific cod, and implications for management

Abstract Genetic differentiation has been observed in marine species even when no obvious barriers to gene flow exist, and understanding such differentiation is essential for effective fisheries management. Highly differentiated outlier loci can provide information on how genetic variation might not only contribute to local adaptation but may also be affected by historical demographic events. A locus which aligned to a predicted zona pellucida sperm‐binding protein 3 gene (ZP3) in Atlantic cod (Gadus morhua) was previously identified as the highest outlier based on F ST in a RADseq study of Pacific cod (Gadus macrocephalus) across the West Coast of North America. However, because of the limited length of the RAD sequence and restricted geographic area of sampling, no conclusion on the functional significance of the observed variation was possible. In other marine species, ZP3 is involved in reproductive isolation, local adaptation, and has neofunctionalized as an antifreeze gene, and so it may provide important insights in functional population structure of Pacific cod. Here, we sequenced a 544‐bp region of ZP3 in 230 Pacific cod collected from throughout their geographic range. We observed striking patterns of spatial structuring of ZP3 haplotypes, with a sharp break near Kodiak, Alaska, USA where populations within ~200 km of each other are nearly fixed for different haplotypes, contrasting a pattern of isolation by distance at other genetic markers in this region (F ST = 0.003). Phylogenetic analysis of ZP3 haplotypes revealed that the more southern haplotypes appear to be ancestral, with the northern haplotype evolving more recently, potentially in response to a novel selective pressure as Pacific cod recolonized northern latitudes after glaciation. The sharp break in haplotype frequencies suggests strong selective pressures are operating on small spatial scales and illustrates that selection can create high divergence even in marine species with ample opportunities for gene flow.


| INTRODUC TI ON
A primary objective of fisheries management in the marine realm involves understanding intraspecific genetic diversity. While the ocean offers few apparent boundaries to long distance dispersal at any life stage, population divergence has been observed in many species (Hauser & Carvalho, 2008). Accounting for differences among local populations and understanding the mechanisms behind divergence are key to managing for optimal yield (Spies et al., 2015).
Genetic differentiation across the genome is the result of complex interactions between genetic drift, selection, mutation, migration, and recombination (Nosil et al., 2009). Natural selection may override the homogenizing effects of gene flow, not only at loci under selection but also at linked loci, often leading to 'genomic islands of divergence', that is, sections of the genome where differentiation is much higher than elsewhere (Nosil et al., 2009;Via, 2012). Outlier loci with particularly high levels of divergence can provide the opportunity to evaluate population structure, examine selective forces that maintain population diversity , identify migrating individuals (Fisher et al., 2021), and estimate population contributions to mixed fisheries (Bekkevold et al., 2015;Dahle et al., 2018).
Pacific cod (Gadus macrocephalus) provide an interesting case study to investigate the effects of selective differentiation. Pacific cod are found demersally throughout the Western Pacific as far south as the Korean Peninsula, northward to the Bering and Chukchi Sea, throughout the Aleutian Islands and southward along the eastern Pacific as far as northern California. Populations at the southern edges of both the eastern and western Pacific show deep genetic divergence from northern populations (Canino et al., 2010), and studies of neutrally evolving loci have identified moderately genetically differentiated populations of Pacific cod (F ST < 0.02) throughout the North Pacific that display a general isolation-by-distance pattern indicating somewhat limited dispersal (e.g., Cunningham et al., 2009;Drinan et al., 2018;Smirnova et al., 2018Smirnova et al., , 2019. Pacific cod undertake annual feeding migrations that are not well characterized but they return to their natal spawning areas during winter months to reproduce Rand et al., 2014). While many of the genetic differences between Pacific cod populations can be explained by reproductive isolation caused by strong homing behavior, the mechanisms underlying selection have not been specifically explored. Such selective processes are increasingly relevant to management, given recent ocean warming events that have resulted in steep declines in Gulf of Alaska populations (Barbeaux, Holsman, et al., 2020), and anomalous summer feeding migrations into the northern Bering Sea (Spies et al., 2020). These events emphasize sensitivity to temperature related to growth and survival in Pacific cod (Barbeaux, Holsman, et al., 2020;Hurst et al., 2009;Laurel et al., 2008). Furthermore, the broad species range of Pacific cod, from temperate climates of Korea (34°N) and the Pacific Northwest (47°N) to the arctic and subarctic conditions of the Bering and Chukchi Seas (>58°N) suggests that local populations of cod are adapted to thermal profiles specific to their habitat.
One of the prime candidates for a gene under selection detected in a previous restriction site-associated DNA sequencing (RADseq) study (Table A1, Drinan et al., 2018) was a gene encoding for the zona pellucida sperm-binding protein 3 (ZP3) predicted in Atlantic cod (Gadus morhua). ZP3 egg coat proteins are part of a multigene family that is central to reproductive isolation and prevention of polyspermy in mammals (Shu et al., 2015), but that may have more complicated functions in fish (Conner et al., 2005;Wassarman, 2008). Zona pellucida egg coat proteins function in a range of capacities such as protecting the embryo and mediating fertilization.
Furthermore, duplicated copies of zona pellucida genes have neofunctionalized to produce antifreeze proteins in the eggs of Antarctic notothenioid fishes (Cao et al., 2016;Gupta et al., 2012). In general, genes coding for proteins on gamete surfaces appear to evolve quickly and may be subject to positive selection (Palumbi, 2009).
In this study, we used DNA sequencing to assess geographic variation and examine signals of selection in the putative ZP3-coding gene among samples representative of the range of Pacific cod. We also integrated ZP3 data with a previously published RADseq dataset (Drinan et al., 2018) to compare neutral variation with variation at the putative ZP3-coding region. Our study had three specific aims: (i) to determine spatial patterns of differentiation at ZP3 across most of the range of Pacific cod, (ii) to confirm elevated differentiation among Pacific cod populations at ZP3 compared to neutral markers, and (iii) to test for signals of selection at this gene. We hypothesized that patterns of diversification in this putative ZP3 gene would provide further support for strong divergent selection across small spatial scales and would provide information about the functional significance of this gene.

| DNA extraction, DNA sequencing, screening for variation
Fin clips from Pacific cod were collected from 16 spawning locations across the species range and preserved in 95% ethanol (Table 1).
Samples were taken during the winter spawning season (December through May) to take advantage of spawning site fidelity and avoid sampling of population mixtures. Samples were taken from a range of years from 2003 to 2019. Pacific cod populations have demonstrated genetic stability over time; therefore, samples collected over a range of years were expected to be representative of the population-specific allele frequencies (Cunningham et al., 2009).

K E Y W O R D S
fisheries management, Pacific cod, selection, zona pellucida DNA was extracted using Qiagen Blood and Tissue kits following manufacturers' instructions (Qiagen, Inc.).
Four sets of DNA primers were designed to screen for variation in four exon regions aligning with the ZP3-coding region in Pacific cod (Table A2). All primers were designed using Primer3 v. 0.4.0 (Koressaar & Remm, 2007). Five individuals of Pacific cod from the Hecate Strait, Prince William Sound, Washington (WA) Coast, and Kodiak Island collections were sequenced for initial screening (Table 1). Polymerase chain reactions (PCRs) for the first four sets of primers were carried out in 40 μl volume with 10 μM forward and reverse primers using the Qiagen Taq PCR Master Mix kit and approximately 200 ng DNA (Table A2). Thermal cycling conditions were 95°C for 15 min, followed by 35 cycles of 94°C for 30 s, T M (Table A2)  A total of 230 Pacific cod were sequenced to screen 544 bp of the variable region of ZP3 using primers ZP_GM (Table 1). Sanger sequencing was performed bidirectionally using forward and reverse primers at MCLAB (320 Harbor Way). Contigs were aligned using Sequencher v. 5.0 (Gene Codes Corporation) and scores below 80% quality were discarded. Sequence calls were double checked and confirmed by two readers when ambiguities were present. Consensus sequences were aligned in BioEdit v. 7.2 (Hall, 1999). Sanger DNA sequence data were transformed from unphased sequences to fasta files with two haplotypes for all individuals. Haplotypes for ambiguous (heterozygous) nucleotides at ZP3-segregating sites were inferred using Bayesian methodology with a priori expectations based on coalescent theory in PHASE v2.1.1 and SeqPHASE (Flot, 2010;Stephens et al., 2001). The most likely pair of haplotypes was selected for each individual based on the highest posterior probability.
Sequences were aligned to two Atlantic cod genomes in GenBank using BLASTn to ensure our sequenced region aligned with ZP3 and to identify synonymous and nonsynonymous substitutions (Sayers et al., 2019). The Atlantic cod genome is the most closely available genome to Pacific cod (Árnason & Halldórsdóttir, 2019). The gadMor3.0 (RefSeq GCF 902167405.1) assembly of Atlantic cod originates from an individual from the North East Arctic Sea, and thus likely from a cold-adapted type, while a genome assembly available from a more southerly Celtic Sea cod likely represents a warm-adapted genotype (Kirubakaran et al., 2020). We chose to test alignments in the northern and southern Atlantic cod reference genomes to examine whether ZP3 would differ among warm-and cold-adapted individuals. The predicted zona pellucida sperm-binding protein 3 (ZP3) spans a 3407-bp DNA sequence of  (Schliep et al., 2017). We selected best model using the Bayesian Information Criterion (BIC) because BIC places a higher penalty on model complexity than Akaike Information Criterion (AIC Akaike, 1978).
G ST was selected to measure the variance in allele frequencies among populations because it is a standard measure of genetic differentiation for DNA sequence data (Nei, 1973). G ST can be dependent on sample size, so we tested relationship between sample size and G ST using a simple regression. Global and pairwise G ST were calculated using the R package mmod, and confidence intervals were generated with 1000 bootstrap replicates (Winter, 2012).
Dendrograms of hierarchical clusters of sample collections were generated based on pairwise G ST using the pvclust package, and support for each cluster was based on 10,000 bootstrap replicates (Suzuki & Shimodaira, 2006). Dendrogram probabilities were presented as bootstrap probabilities (BP, green) and approximately unbiased values (AU, red), which may be more accurate than bootstrap values (Suzuki & Shimodaira, 2006). Haplotype (h) and nucleotide diversity (π) were calculated using the R package pegas. We calculated expected heterozygosity (H e ) per population using the estimator θ H , which corrects for potential overestimation when few loci are present, in Arlequin v. 3.5 (Excoffier & Lischer, 2010

| Statistical tests of selective neutrality
Nonsynonymous ( sampling theory (Slatkin, 1996) compares the probability of the observed sample with that of a random neutral sample with the same number of alleles and identical size, while the Ewens-Watterson homozygosity test (Watterson, 1986) is based on Ewens sampling theory (Ewens, 1972). For these statistics, the observed haplotype frequencies (F) were calculated from the data and the expected F

| Analyses with previously published RADseq dataset
We re-analyzed a previously published RADseq dataset that in-  (Rousset, 2008).
We also calculated pairwise F ST (Nei, 1987) between Kodiak Island and Prince William Sound collections using the R packages adegenet and hierfstat (Goudet, 2005;Jombart et al., 2008), to compare F ST at the ZP3 region with other genomic locations.
The short read sequence from Drinan et al. (2018) contained two SNPs at positions 313 and 339 (

| DNA sequencing, screening for variation
Our initial screening efforts revealed no variation in DNA sequence within Exons 1, 5, 6, or 8, which were sequenced using primers ZP_ L1, ZP_L3, or ZP_L4. Variation was only observed in Exons 2, 3, and 4 in sequences generated using primers ZP_L2 (GenBank Accession numbers MW468336-MW468402). New primers (ZP_GM) were designed to sequence variation observed in Exons 2, 3, and 4 because primer ZP_L2 did not amplify in all collections (Table 1, Sequences were trimmed to a final length of 544 bp, which included 40 amino acid residues from Exon 2, 35 amino acid residues from Exon 3, and 4 amino acid residues from Exon 4. There were nine variable sites; three were located in exons and six in introns (Table 2). Variable sites at location 313 and 339 bp were located in Exon 3 and the variable site at location 542 was in Exon 4 (Table 2).

TA B L E 2
The position of each segregating site in the 544-bp sequence of Pacific cod (Gadus macrocephalus), and its position on the zona pellucida sperm-binding protein 3 DNA-coding region of linkage group 9 in the gadMor3.0 alignment (NC_044056.1). Also shown are its codon position (1, 2, or 3), location in an intron or exon, and resulting amino acid change

| Spatial structuring
There were 14 unique DNA haplotypes present among all sample collections ( Translation of the coding regions into amino acid sequences resulted in eight segregating polypeptide sequences (Table 4)

| Sequence variation
G ST was not found to be dependent on sample size (R 2 = 0.07).
The model with the lowest BIC score was a four-parameter model with a proportion of invariable sites (I) and a moderate number of parameters, 31 (F81, Felsenstein, 1981). Under this model, Haplotype 13 had the lowest d F81 , indicating the greatest similarity to gad-Mor3.0, followed by Haplotypes 12 and 5, and then Haplotypes 9 and 11 (Table 3). Haplotype 6 was the most different from Atlantic cod, followed by 7, then 14, 1, 3, and 2.

| Statistical tests of selective neutrality
Patterns of selective neutrality varied by region and were not consistent among all tests (Table 5). Large and significant values of Ewens-

Watterson homozygosity test (E-W p) and the exact test based on
Ewens' sampling theory (Slatkin's p) (>0.9), indicative of directional selection, were observed within the Kodiak collections, and within the combined Bering Sea (Table 5). Slatkin's p was also significant and >0.9 in the WA coast sample (  Table 5). The R 2 test showed only one significant increase in population size in the Near Islands sample (Table 5).

| Analyses with previously published RADseq dataset
In total, we retained 177 SNPs from the RAD data on LG09 after aligning to gadMor3.0 as described. The only SNP with F ST > 0.01 was found in the short read sequence identified in Drinan et al. that aligned to the predicted ZP3 gene (2018; values > 0.20 ( Figure 5), indicating that there were no major inversions within this linkage group.

| D ISCUSS I ON
Multiple studies have documented neutral genetic diversity and limited gene flow in Pacific cod, but the extent of local adaptation is currently unknown. Our goal was to assess spatial variation in ZP3, a gene that is putatively under strong selection (Drinan et al., 2018), to understand whether there was evidence for selection across a large geographic range and to investigate potential mechanisms for this selection. Significant recent declines have occurred in this species due to anomalously warm conditions in the Gulf of Alaska, and results were intended to assist in conservation and management of these stocks. We hypothesized that patterns of population structure at the ZP3 gene may differ from genetic markers in the rest of the genome, and show evidence for directional selection. We also anticipated that patterns of diversity in ZP3 might provide insight into its functional significance.
Our first significant finding was striking spatial differences based on DNA sequence data of the ZP3 variable region that suggest limited gene flow at this locus. DNA sequence variation clus- On the western side of the Pacific Ocean, there appeared to be a transition region north of Japan where northern and southern haplotype co-occurred (Figure 3). Samples from Korea exhibited only a single coding sequence type, and could be considered a third subgroup (Polypeptide II, Figure 2).
Our second main finding was high genetic differentiation in ZP3 compared to the rest of the genome (Figure 4)

F I G U R E 2
Map with pie charts showing (a) relative haplotype frequencies between all collections, and (b) amino acid sequences based on nonsynonymous changes in the ZP3 DNA-coding region. The haplotype numbers correspond to Table 3, and roman numerals indicate polypeptide designation, as shown in Table 4  lineages (Kryazhimskiy & Plotkin, 2008). Local adaptation is likely responsible for the geographic pattern of coding sequence haplotypes coupled with high levels of differentiation. The lack of linkage disequilibrium ( Figure 5), negative Tajima's D values, and significant departures from HWE (Table 5) associated with ZP3 SNPs support the conclusion that the ZP3 gene is under selection. The lack of linkage disequilibrium also indicates selection rather than hitchhiking associated with a nearby gene ( Figure 5).
In samples with high diversity, F IS was remarkably high, and there was a notable lack of heterozygotes in several collections, such as Prince William Sound, Strait of Georgia, Kiska, and Puget Sound (Table 5). In addition, there were no heterozygotes for the two most common haplotypes, Haplotypes 1 and 13 (Table A5). This deviation from Hardy-Weinberg equilibrium may suggest that the two haplotype groups mate assortatively, with very little gene flow between them. However, additional experimental studies are required to test this notion.
While one premise of this paper was to examine the putative ZP3 gene for evidence of selective diversification, the gene function, especially in relation to gamete recognition and temperature tolerance, cannot be tested without laboratory experimentation.
Such studies would help provide insight into the implications of diversification of ZP3 in Pacific cod. A greater understanding of the role of thermal adaptation for the productivity and persistence of Pacific cod would be beneficial to management. ZP3 appears to have undergone neofunctionalization in fish species as antifreeze glycoproteins for cold-water adaptation (Cao et al., 2016;Conner & Hughes, 2003). Experiments have shown that zona pellucida proteins can noncolligatively lower the freezing and melting point of a solution (Cao et al., 2016). Zona pellucida paralogues as antifreeze glycoproteins are expressed in the exocrine pancreas in adult fishes, circulated throughout the body, and may aid in reducing body ice load (Cao et al., 2016). The collections in the Bering Sea group were situated within or adjacent to the southeastern Bering Sea, an ecosystem structured by cold pool water <2°C that remains following the retreat of the Bering Sea ice sheet in spring (Stevenson & Lauth, 2019). Residence within colder water may be consistent with selective adaptation to colder temperatures via ZP3 as antifreeze glycoproteins, but further testing is needed to substantiate this hypothesis. The narrow thermal window (3-6°C) of successful hatching in Kodiak Island cod may be in part responsible for the high mortality in the Gulf of Alaska during recent marine heat waves (Laurel & Rogers, 2020). Given the amino acid differences between different ZP3 haplotypes, further investigation of the significance of this genetic variation for tolerated thermal windows may be of interest for management and conservation under climate change.
Demographic trajectories of cod in different parts of the Gulf of Alaska support the difference in temperature tolerance within the Gulf of Alaska. During and after the heat wave in the Gulf of Alaska (2014)(2015)(2016), the population size of the Eastern Gulf of Alaska cod stock remained low but stable (Barbeaux, Holsman, et al., 2020). In contrast, the cod stock in the Central Gulf of Alaska declined by about 30% per year, and by 23% per year in the Western Gulf of Alaska. Although these population trajectories are consistent with different thermal tolerances related to ZP3 haplotypes, this hypothesis remains to be tested.
In Atlantic cod, genetic divergence among ecotypes has been shown occur within large genomic inversions on linkage groups 1, 2, 7, and 12, each of which span several Mb (Berg et al., 2017;Kirubakaran et al., 2016). While RADseq data are not powerful for identifying small genomic inversions, larger inversions would cause additional F ST outliers and linkage disequilibrium extending over greater genomic distances (Huang et al., 2020;Petrou et al., 2021). Here, only one SNP out of 177 SNPs on linkage group 9 had an F ST over 0.01, and linkage disequilibrium was minimal between loci separated by more than a few The specific mechanism driving selection and diversification in Pacific cod certainly merits further research.
Our data are consistent with a demographic movement scenario as follows: Pacific cod likely originated in the Pacific Ocean from an ancestor to Atlantic cod that invaded the Pacific ~4 Million years ago (Árnason & Halldórsdóttir, 2019). Oceanographic events of the Pleistocene likely forced extant Pacific cod southward. Pacific cod with ZP3 haplotype 13 identical to the warm-and cold-adapted Atlantic cod counterparts are currently found in the southern part of the range (Table 3). During Pleistocene glaciations, water levels fluctuated and the Bering Sea shelf was exposed during several time periods, forming the Bering land bridge as recently as 30,000-18,000 years ago (Elias et al., 1996;Hopkins, 1967 Pacific, Aleutian Islands, and Bering Sea (Canino et al., 2010), mutations leading to the common haplotype 1 likely occurred in the putative ZP3 gene. The Bering Sea in the northernmost extent of the Pacific Ocean contains a deep basin <3500 m to the southwest and continental shelf <200 m to the northeast (Stabeno et al., 1999), and is currently habitat for the largest spawning population of Pacific cod within its range (Spies et al., 2020).
The extreme differentiation among Pacific cod populations at ZP3 with almost no shared haplotypes between groups suggests an almost complete isolation of cod in eastern and western Gulf of Alaska. This finding is hard to reconcile with the lack of differentiation elsewhere in the genome. An alternative explanation for our finding would be that the different haplotypes of ZP3 actually represent paralogous loci that amplified differentially between the two populations. Indeed, Antarctic fishes may have up to 35 copies of ZP genes that occur as single copies in temperate fishes (Cao et al., 2016). It is, therefore, not impossible that the observed differentiation represents copy number variation among Pacific cod populations, similar to that observed in American lobster Homarus americanus  and capelin Mallotus villosus . However, this explanation seems unlikely, because (i) a series of different primers were tested, with identical results, and (ii) there was no evidence for sequencing ambiguities other than those in heterozygotes. In any case, even if the variation we observed was copy number variation rather than allelic variation, it would demonstrate a genetic difference between these two groups that is readily identifiable and likely functionally significant. Nevertheless, for accurate functional analysis, the possibility of gene duplication should be further investigated.
The ability of Pacific cod to adapt in a changing climate may depend on the depth of genetic variation acquired through its complex evolutionary history (Árnason & Halldórsdóttir, 2019 Rand, and two anonymous reviewers for providing helpful suggestions. This work was funded by the Alaska Fisheries Science Center.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available Note: Primer set GP_GM was used to amplify a variable region. The annealing temperature, T M (°C), used in polymerase chain reaction is shown for each primer set. The position of the forward and reverse primer on the gadMor3.0 genome sequence is shown, as well as which exons are included in the intervening sequence.

TA B L E A 3 Frequency of observed nucleotide haplotides in sample collections of Pacific cod (Gadus macrocephalus)
No.