Low‐coverage whole‐genome sequencing reveals molecular markers for spawning season and sex identification in Gulf of Maine Atlantic cod (Gadus morhua, Linnaeus 1758)

Abstract Atlantic cod (Gadus morhua, Linnaeus 1758) in the western Gulf of Maine are managed as a single stock despite several lines of evidence supporting two spawning groups (spring and winter) that overlap spatially, while exhibiting seasonal spawning isolation. Low‐coverage whole‐genome sequencing was used to evaluate the genomic population structure of Atlantic cod spawning groups in the western Gulf of Maine and Georges Bank using 222 individuals collected over multiple years. Results indicated low total genomic differentiation, while also showing strong differentiation between spring and winter‐spawning groups at specific regions of the genome. Guided regularized random forest and ranked F ST methods were used to select panels of single nucleotide polymorphisms (SNPs) that could reliably distinguish spring and winter‐spawning Atlantic cod (88.5% assignment rate), as well as males and females (95.0% assignment rate) collected in the western Gulf of Maine. These SNP panels represent a valuable tool for fisheries research and management of Atlantic cod in the western Gulf of Maine that will aid investigations of stock production and support accuracy of future assessments.


| INTRODUC TI ON
Fisheries stock assessments and management rely on accurate biological data to effectively estimate population abundances and project the status of stocks. The task of building an accurate stock assessment can become difficult when a species exhibits a complex life history, population structure, or migratory behaviors. Delineating stock boundaries is crucial to unbiased stock assessments  and becomes even more important when populations exhibit substantial differences in life history. When a stock is delineated, it is treated as one homogenous population; however, that is not the case for species exhibiting subpopulation structure (especially those with differing reproduction and recruitment) and any metapopulation stock assessment operating under an assumption of homogeneity will likely suffer inaccuracies (Dean et al., 2019).
The Atlantic cod (Gadus morhua, Linnaeus 1758) is a primary historical example of external forces causing the collapse of a valuable commercial fishery, necessitating swift and effective fisheries management. High fishing pressure in the 1960s and 1970s led to drastic reductions of several Atlantic cod stocks and despite decades of management, many of those stocks are still exhibiting depleted biomass and fishing mortality rates at unsustainable levels (COSEWIC, 2010;NEFSC, 2013). The management of Atlantic cod stocks in the western Gulf of Maine (wGOM) is made difficult because they exist as metapopulations Kovach et al., 2010; but are currently managed as one homogenous unit (NEFSC, 2013), despite evidence that suggests employing substock models would increase productivity yield in the wGOM (Kerr et al., 2010).
A multidisciplinary evaluation of biological stock structure for Atlantic cod in U.S. waters has recently been conducted by the Atlantic Cod Stock Structure Working Group that incorporated data from fishermen's ecological knowledge, life history, genomics, natural markers, and tagging to propose five biological stocks in the U.S.: Georges Bank, southern New England, wGOM and Cape Cod winter spawners, wGOM spring spawners, and the eastern Gulf of Maine (NEFSC, 2021). Each of the proposed stocks is spatially distinct except for the wGOM where there is spatial overlap between spring and winter-spawning groups that are genetically isolated (Barney et al., 2017;Clucas, Kerr, et al., 2019;Kovach et al., 2010) by seasonal differences in spawning behaviors. The spawning groups likely developed due to strong interannual spawning site fidelity , group spawning aggregation behaviors , and were likely facilitated by larval recruitment and/or natal homing. While all Atlantic cod in the wGOM spawn during similar bottom water temperature conditions (6-8℃), "spring" spawning peaks in May-June and "winter" spawning peaks in November-December Zemeckis, Hoffman, et al., 2014). These seasonal differences in spawning time have large effects on the temperature regimes and oceanic conditions experienced by pelagic larvae and early juveniles, with spring-spawned Atlantic cod hatching into warm stratified waters and winter-spawned Atlantic cod hatching into cooling, wellmixed waters (Huret et al., 2007). It is possible that these conditions could translate to stark differences in production and recruitment success, especially as climate change impacts the wGOM and average summer temperatures continue to climb (Pershing et al., 2015).
The complex population structure and reproductive dynamics observed in the wGOM Atlantic cod have been shown to be perpetuated on a molecular level by selective pressures acting upon chromosomal inversions that affect spawning behavior in the face of gene flow (Barney et al., 2017;Clucas, Kerr, et al., 2019;Clucas, Lou, et al., 2019); however, chromosomal inversions are not unique to Atlantic cod in the wGOM. In Scandinavia, Atlantic cod have repeatedly been characterized by chromosomal inversions linked to migratory behaviors or inshore/offshore ecotypes (Barth et al., 2017;Berg et al., 2016;Karlsen et al., 2013;Kirubakaran et al., 2016;Rodríguez-Ramilo et al., 2019;Sodeland et al., 2016) and physical water parameters like salinity, oxygen, and temperature (Berg et al., 2015).
Similarly, Therkildsen et al. (2013) found that chromosomal inversions in Greenland Atlantic cod corresponded with inshore/offshore ecotypes, as well as temperature and salinity. In Canadian waters, Atlantic cod showed differences in chromosomal inversions associated with migratory behaviors (Sinclair-Waters, . When examining Atlantic cod collected at several locations throughout its range, chromosomal inversions were associated with ocean basin Bradbury et al., 2010Bradbury et al., , 2013Bradbury et al., , 2014, migratory behaviors Kess et al., 2019), or ocean temperature (Bradbury et al., 2010). Chromosomal inversions and selection are common in Atlantic cod, and the genomic differentiation they create may represent a valuable opportunity to develop new tools for Atlantic cod science and management.
Taking advantage of genomic differentiation to create molecular tools for management has been a valuable endeavor for Atlantic cod in other parts of northwest Atlantic. For instance, Sinclair-Waters,  used SNP genotyping to develop a method that can accurately assign Atlantic cod to a genetically distinct population in a marine protected area (MPA), allowing managers to protect a critical species by designating MPA boundaries in a biologically meaningful way, as well as providing new data to redesign and reevaluate future MPA boundaries. A similar approach could be utilized in the wGOM to develop a molecular tool to identify the spawning season, and potentially sex, of Atlantic cod. A recent nonmolecular tool has been developed to identify the spawning season of wGOM Atlantic cod using the diameter of first annulus of the otolith and a logistic regression model (Dean et al., 2019). This otolith method has provided valuable data to the management of Atlantic cod in the wGOM, including the identification of decreased recruitment of spring-spawned Atlantic cod over time. Likewise, a molecular tool may augment these results by increasing potential for automation, decreasing processing times, potentially improving accuracy, and adding sex information without sacrificing individuals.
In the present study, we used low-coverage whole-genome sequencing to evaluate the genomic population structure of Atlantic cod collected in the wGOM and Georges Bank with samples collected over several years. We report levels of genetic divergence for these cod populations as well as patterns across the genome, showing isolated divergence patterns that are consistent with previous studies. In accordance with a research recommendation from the Atlantic Cod Stock Structure Working Group for the development of rapid assessment tools for assignment of spring and winters spawners in the wGOM (McBride et al., 2021), we then focus on the selection of small numbers of genetic markers to develop powerful assignment tools for Atlantic cod, that can identify spawning group and sex of unknown fish. It is our hope that the results will provide a wealth of new data for understanding Atlantic cod population biology and stock structure. In total, this represents a profound opportunity to add valuable science to support dynamic, biologically meaningful Atlantic cod management in the wGOM.

| Sample collection
Caudal fin clips were taken as a source of genetic material from 222 Atlantic cod individuals collected in the wGOM and Georges Bank ( Figure 1; Table 1). In the wGOM, Atlantic cod were col-

| DNA extraction, library preparation, and sequencing
DNA was extracted from caudal fin tissue using the Macherey-Nagel genomic DNA NucleoSpin Tissue kit according to manufacturer's instructions and adapted for use on an Eppendorf EPmotion TMX5075 automated liquid handling workstation. Following extraction, all DNA was purified and size-selected to remove fragments less than ~1,000 bp in length using a paramagnetic bead cleanup with a 0.5:1 bead to sample ratio (KAPA Pure Beads) and a final elution in 10 mM Tris-HCl, pH 8.0-8.5. The concentration of DNA of each size-selected sample was estimated using a SpectraMax QuickDrop Micro-Volume Spectrophotometer (Molecular Devices) prior to library preparation.
A separate 250-450 bp dual-indexed library was prepared for each individual using the KAPA Hyper Prep Kit adapted for use on the Eppendorf EPmotion TMX5075 automated liquid handling workstation following the manufacturers' recommendations.
The mode size of each library preparation was estimated using a Fragment Analyzer 5300 (Agilent) and the DNA concentration of each library was estimated using the NEBNext Library Quant Kit for Illumina (New England BioLabs) and run on a QuantStudio12K Flex Real-Time PCR System (Thermo Fisher). Based on the size and concentration results, libraries were pooled into equimolar concentrations prior to paired end sequencing (2 × 150 bp) in three separate runs using NextSeq 500/550 v2 high output kits on an Illumina NextSeq 500. Libraries were then re-sequenced using NextSeq 500/550 v2 paired end (2 × 150 bp) mid output kits with pools reassembled based on previous sequence coverage in an attempt to sequence all individuals to between 1x and 4x coverage.

| Sequence filtering, mapping, and SNP calling
Bioinformatic processing generally followed the approach outlined by Therkildsen and Palumbi (2017). Forward and reverse raw reads from each lane were concatenated into aggregate files before removing any bases with a quality score threshold below three from the leading and trailing ends using Trimmomatic (Bolger et al., 2014).
Additionally, a sliding window approach was used to remove any bases in a four base window that had a mean quality score below 15.
Any remaining reads that fell below a minimum length of 36 bases were removed.
Trimmed reads were aligned to the gadMor2 reference ge- Reads for each individual were aggregated and realigned around indels using IndelRealigner within GATK v3.8 (McKenna et al., 2010) prior to calling SNPs at sites with a probability of <1 -6 being monomorphic based on the mapped reads for all 222 individuals using ANGSD v0.930 (Korneliussen et al., 2014). The following filters and parameters were used in the ANGSD analysis: Sites with a total read depth <50 and >500 were excluded, and the retained sites were further filtered by setting a minimum number of 150 individuals with data at each site and a minimum quality score of 20. Individual genotype posterior probabilities and genotype likelihoods in Beagle format were generated only for loci with a minor global allele frequency of ≥5%. Additionally, major and minor alleles were inferred from genotype likelihoods across all individuals at all SNPs and any SNP loci that were out of Hardy-Weinberg Equilibrium (HWE) at p ≤ 1 -6 were removed. Genotype likelihood estimation and SNP calling were repeated in the same fashion for individuals collected in the wGOM (excluding Georges Bank fish).

| Population genomics
To produce a visualization of potential Atlantic cod differentiation, a covariance matrix between all individuals was produced based on genotype likelihoods from ANGSD in PCAngsd (Meisner & Albrechtsen, 2018) and resolved into principal components using the eigen function in R 4.0.3 (R Core Team, 2020). The resulting principal component analysis (PCA) was plotted along the first two principal components in R using ggplot2 (Wickham, 2016 window output for wGOM individuals comparing spawning season and sex was plotted across all LGs using the ggplot2 package in R.  et al. (2018). Random forest was run using 5,000 trees and the number of variables randomly sampled as candidates at each split equal to the square root of the number of loci evaluated. The GRRF was run ten separate times with 5,000 trees, and a coefficient of regulation for each locus dictated by importance scores from the random forest run and gamma=0.3. After each GRRF run, any locus with a positive mean decrease accuracy (importance score) was considered a selected locus. The final list of selected loci were any loci with positive importance scores from any of the ten GRRF runs; selected loci were aggregated and ranked according to mean importance scores across all runs.

| Evaluating population assignment accuracy
The total number of loci selected from the GRRF runs was used to compare to the ranked F ST method so an equal number of top-ranked F ST loci were also selected for evaluation. The genotypes of selected loci by ranked F ST and GRRF were pulled out of the total data set, and any missing data were again imputed using random forest in the method described above. Assignment accuracy for each selection method was estimated using K-fold cross-validation and implemented in the R package assignPOP (Chen et al., 2018 (Anderson, 2010). The K-fold cross-validations were run so that K = 2-10 for each set of selected loci tested with loci tested five at a time starting with the top five ranked loci (by F ST or GRRF), proceeding to the top ten ranked loci, and so on until in increments of five until the total number of selected loci was evaluated. The mean assignment rate was calculated across all values of K for each set of loci tested and plotted using ggplot2 in R.

| Reevaluating sex and spawning season differences using selected loci
After identifying the locus suites that were most accurate and practical for assigning wGOM Atlantic cod to a sex or spawning season, genotype likelihood data for only those locus suites were used to produce separate sex and spawning season covariance matrices in PCAngsd and further principal components using the prcomp function in R. The results were plotted across two principal components with normal data ellipses around each group using ggbiplot (Vu, 2011) in R.

| Genomic sequencing data quality
The 222 individuals in the Atlantic cod dataset received a mean of 3.5 Gbp and 22.3 million reads. On average, 4.5% of bases were removed from the raw reads following quality trimming and 94.3% of the reads mapped to the gadMor2 reference genome. The average coverage of the reference genome across all individuals was 1.2× (range: 0.5-3.6), which yielded 3,760,540 called SNPs after 21 loci were removed due to failure to meet HWE expectations.

| Genomic differentiation
Pairwise comparisons of F ST between spring and winter-spawning Atlantic cod as well as Georges Bank fish were low, indicating little genetic differentiation among the groups (Table 2 While the genetic distance between spring and winter-spawning Atlantic cod in the wGOM was negligible on a genome-wide scale, estimating locus-by-locus F ST values between spring and winter fish showed that there were isolated areas of the genome (LGs 2, 12, 21, and most strongly 18) showing elevated levels of differentiation above the general baseline ( Figure 3). Examining differentiation across the genome for male and female Atlantic cod captured in the wGOM revealed a peak of elevated genetic distance on LG 11 ( Figure 4); however, the baseline and the peak F ST values between males and females were much lower relative to those covered during the seasonal comparison. There were several areas of the genome under significant selection likely associated with chromosomal inversions or selection on LGs 1, 2, 3, 7, 12, 18, and 21 ( Figure 5), many of which overlap with the high areas of differentiation observed between spring and winter-spawned Atlantic cod.

| Marker selection, validation, and individual assignment
The GRRF method identified a total of 262 SNP loci with positive importance scores among the ten individual runs that suggested   (Figure 7b). The low genetic differentiation observed here also coincides with observed levels of neutral variation in previous studies (Barney et al., 2017;Clucas, Kerr, et al., 2019;Clucas, Lou, et al., 2019;Kovach et al., 2010) and indicates that there is potential gene flow occurring between the seasonal spawning groups in the wGOM. While limited adult movement and strong spawning site fidelity promote stock structure and metapopulations for Atlantic cod , there are likely a small number of "migrants" among the seasonal spawning groups in the wGOM that exhibit spawning behavior in the opposite season of their natal spawning season, which decreases genetic distance between spring and winter-spawned cod. While there is evidence of gene flow between groups of Atlantic cod collected in the northwest Atlantic, neutral genetic variation is only one piece of evidence for the delineation of fish stocks, which should ideally be a multidisciplinary approach (Begg & Waldman, 1999;, that includes differences in life history, natural markers, tagging, and fishermen knowledge, all of which support separating Atlantic cod in the western wGOM into spring and winter-spawning stocks with Georges Bank Atlantic cod as a separate stock (NEFSC, 2021).

| D ISCUSS I ON
Some of the strongest evidence to support the delineation of spring and winter-spawning Atlantic cod in the wGOM is presented here and in previous studies (Clucas, Kerr, et al., 2019;Clucas, Lou, et al., 2019;Kovach et al., 2010)  and not on LG 12. As previously discussed, the observed selection in Atlantic cod in LGs 1, 2, 7, and 12 is likely driven by documented chromosomal inversions that have persisted in response to water chemistry parameters including temperature, salinity, and oxygen (Berg et al., 2015;Bradbury et al., 2010;Therkildsen et al., 2013). It is unclear whether SNPs under selection on LGs 3, 18, and 21 represent chromosomal inversions or just areas of strong selection. Many of the genes found on these regions are crucial to the production or reception of reproductive hormones and selective pressures causing differentiation at these loci are likely the cause of the prezygotic isolation between spring and winter-spawning Atlantic cod in the wGOM (Clucas, Lou, et al., 2019).
The GRRF method routinely outperformed ranked F ST when assigning to Atlantic cod to spawning season, which is similar to past findings when differentiating population segments or phenotypes in fish populations (Brieuc et al., 2015  . Additionally, 4 of the SNPs matched loci previously identified by Star et al. (2016)  differentiation among sexes may have made it difficult for meaningful trees to take root in the GRRF method for distinguishing males and females.
Another factor that may play a part in the efficiency of the GRRF method is the complexity of the trait being examined. The random forest technique has traditionally been used to distinguish between phenotypes of complex polygenic traits (Bureau et al., 2003 A final reason for the difference in success of the methodologies could be due to the evolutionary timing of the divergence observed between traits. The evolution of the sex gene in gadoids occurred at least 45 million years ago (Kirubakaran et al., 2019), well before TA B L E 3 List of SNP loci used for assigning GOM Atlantic cod to spawning season (i.e., spring or winter-spawning groups) and sex (males or females) with loci naming conventions of LG followed by the position on each LG and loci listed in order of importance scores

F I G U R E 7
The resulting SNP panels developed here will prove to be a valuable tool for fisheries research, management, and future stock assessments for Atlantic cod in the wGOM. With declining Atlantic cod biomass in the wGOM (NEFSC, 2013) and a severe decline in the recruitment of spring-spawned Atlantic cod (Dean et al., 2019), there is a critical need to incorporate accurate tools that can be applied nonlethally into future stock assessments and research. The biology of Atlantic cod in the wGOM creates a complex and dynamic metapopulation structure, which should make the management of the fishery dynamic in turn. Incorporating all available information and tools into our understanding and management of wGOM Atlantic cod will provide the most accurate, concrete, and useful data to enact and evaluate any future regulatory changes.

ACK N OWLED G M ENTS
Atlantic cod fin samples were provided thanks to the coordination and efforts of M. Armstrong, M. Dean, and W. Hoffman of MADMF.
Thanks to Z. Dench for contributing to laboratory processing of Atlantic cod samples. Funding for this project was provided by Massachusetts state earmark funds in the year 2018-2019.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data generated during the present study in the form of raw sequencing reads have been uploaded to NCBI's Sequence Read Archive, accessible by bioproject number PRJNA669394, accession numbers SAMN16447555-SAMN16447776. All scripts used during bioinformatic processing and Beagle format genotype likelihood data files for selected SNP panels were uploaded to Dryad, DOI accession number: https://doi.org/10.5061/dryad.ht76h drg3.