Genome‐wide analysis reveals the genetic stock structure of hoki (Macruronus novaezelandiae)

Abstract The assessment of the genetic structuring of biodiversity is crucial for management and conservation. This is particularly critical for widely distributed and highly mobile deep‐water teleosts, such as hoki (Macruronus novaezelandiae). This species is significant to Māori people and supports the largest commercial fishery in New Zealand, but uncertainty about its stock structure presents a challenge for management. Here, we apply a comprehensive genomic analysis to shed light on the demographic structure of this species by (1) assembling the genome, (2) generating a catalogue of genome‐wide SNPs to infer the stock structure and (3) identifying regions of the genome under selection. The final genome assembly used short and long reads and is near complete, representing 93.8% of BUSCO genes, and consisting of 566 contigs totalling 501 Mb. Whole‐genome re‐sequencing of 510 hoki sampled from 14 locations around New Zealand and Australia, at a read depth greater than 10×, produced 227,490 filtered SNPs. Analyses of these SNPs were able to resolve the stock structure of hoki into two genetically and geographically distinct clusters, one including the Australian and the other one all New Zealand locations, indicating genetic exchange between these regions is limited. Location differences within New Zealand samples were much more subtle (global F ST  = 0.0006), and while small and significant differences could be detected, they did not conclusively identify additional substructures. Ten putative adaptive SNPs were detected within the New Zealand samples, but these also did not geographically partition the dataset further. Contemporary and historical N e estimation suggest the current New Zealand population of hoki is large yet declining. Overall, our study provides the first genomic resources for hoki and provides detailed insights into the fine‐scale population genetic structure to inform the management of this species.


| INTRODUC TI ON
Defining biologically meaningful units for sustaining biodiversity is one of the major goals of population management and conservation biology (Allendorf et al., 2010b;Moritz, 1994). In particular, the detection of genetic structure provides a crucial tool to identify isolated units, and to assess the degree of connectivity among populations (Bernatchez, 2016). Neglecting consideration of population structure may increase risks of overexploitation or mismanagement (Waples, 1998). Marine ecosystems are traditionally considered to be highly connected, and this is typically attributed to the large population sizes of many marine species, coupled with the presence of few barriers to gene flow (Nielsen et al., 2009). In addition, in many marine teleost species, the early life history is characterized by the presence of a planktonic larval stage during which larvae can be transported over long distances by ocean currents (e.g. 100s of km). Based on these characteristics, it is expected that marine species are usually highly connected owing to the combination of large population sizes and high dispersal, which makes it more challenging to characterize population structures accurately.
Small datasets containing neutral loci have been widely used to analyse population structure, gene flow and demographic changes over time. However, the small numbers of markers often lacked the statistical power to detect low rates of genetic differentiation in the high gene flow environments common for marine teleost species (Nielsen et al., 2009). Recent population genomic approaches employing thousands of genome-wide markers hold promise to provide the degree of resolution required for essentially any socioeconomic or ecologically important marine species (Ellegren, 2014). While the field is still evolving, numerous studies on marine species already demonstrate its potential to offer deeper insight into the dynamics of natural populations (Hohenlohe et al., 2010;Larson et al., 2014).
The ability to apply large numbers of DNA markers to conduct dense genome scans not only has greatly enhanced the power to identify genomic regions exhibiting genetic structure (Nielsen et al., 2012), but also has enabled identification of outlier regions associated with adaptation. Adaptive genomic signatures may be associated with local adaptation or reveal traces of cryptic population structure obscured by gene flow across most of the genome (Duranton et al., 2018;Gagnaire et al., 2015). These outlier loci can reveal regions in the genome of genetic differentiation where neutral markers often remain uninformative, and can prove useful to delineate locally adapted stocks and redefine conservation units (Funk et al., 2012). This approach is appealing because selection may be more efficient than genetic drift in opposing the homogenizing effect of migration, in particular when populations have large effective population sizes, which is the case for many important fisheries species. Genome scans work by detecting significant departures from genomic background patterns observed (Ahrens et al., 2018), while Gene-Environment-Associations (GEA) methods work by identifying genetic variants associated with particular environmental factors (Dallaire et al., 2021). However, outlier loci can also arise through a wide variety of evolutionary mechanisms apart from local adaptation (Bierne et al., 2011), in particular in response to varying patterns of recombination (Booker et al., 2020). Such patterns are commonly caused by structural genomic variants (Mérot et al., 2020;Wellenreuther & Bernatchez, 2018;Wellenreuther et al., 2019), and recent work indicates that these can affect more base pairs than SNP variants (Catanach et al., 2019), and are widespread throughout the genome.
Here, we assess the population genomic structure of hoki (Macruronus novaezelandiae, Family: Merlucciidae), which supports one of the most valuable deep-water fisheries in New Zealand. Hoki are widely distributed throughout New Zealand and Australian waters, being found in most abundant quantities in depths of 200-800 m (Horn & Sullivan, 1996;Livingston & Schofield, 1996). They have long pelagic larval and juvenile phases, maturing at the age of four, and exhibit extensive migratory behaviours (Horn, 2011). The current stock assessment for hoki in New Zealand is based on an assumed two-stock migration model between the Western and Eastern stocks ( Figure 1). These have been defined based on data showing that fish in different geographical locations grow and mature at different rates and have different morphometric characters (Horn & Sullivan, 1996;Livingston & Schofield, 1996;Livingston & Sullivan, 2007). The Western stock encompasses spawning hoki from the West Coast of the South Island of New Zealand. The larvae and juveniles originating from this stock are thought to be then transported to the East Coast nurseries feeding areas on the Western Chatham Rise via the Westland and D'Urville currents (Smith et al., 1996). As young adults, these fish are then thought to migrate from the Chatham Rise to feeding grounds in sub-Antarctic waters, and subsequently moving between these and the spawning grounds on the West Coast of the South Island as mature adults. In contrast, hoki belonging to the Eastern stock are thought to spawn in the Cook Strait (and at other locations east of the South Island), with larvae and juveniles then migrating from these locations to the Chatham Rise nursery and feeding grounds. Juveniles are assumed to recruit to their respective stocks at maturity at ages of 3-8 years (O'Driscoll, 2004), and mature adults are thought to move between the Chatham Rise and the known spawning grounds in Cook Strait and at Pegasus Canyon. In the spawning grounds, hoki typically form large midwater aggregations, consisting almost entirely of the species.
Modern genomic technologies offer a powerful toolset to independently determine stock structure in this species, but has not been used to date. To enable assessment of this species using genomics solutions and given the significance of this species to Māori, Te Ohu Kaimoana and kaitiaki (cultural guardians) provided advice on a process to manage the gathering, storage, access and use of genetic data. We assembled a high-quality de novo genome using a combination of short-and long-read sequencing. To investigate the genomic stock structures, we sampled 12 locations in New Zealand and two locations in Australia (≥30 individuals per location) and performed whole genome sequencing at greater than 10× coverage to generate a powerful genome-wide SNP dataset. This genome-wide dataset was then used to assess the degrees of genetic diversity between all sampling sites to identify independent clusters, and how these are related to other sampling locations. The dataset was further used to test for population genetic differentiation using both neutral SNPs and putative adaptive SNPs. Our results are compared and discussed in the light of previous studies on hoki, and other teleost species.

| Indigenous considerations
Gathering genomic data requires strong relationships with indigenous peoples (Hudson et al., 2020). Engagement with Te Ohu Kaimoana started before the operational aspects of the project were initiated and carried on throughout the project. Specifically, a working group was developed to advise on a process to manage the gathering, storage, access and use of samples and genetic data before, during and after the research presented here. An agreement considering Māori tikanga (traditional protocols) was set up for the genomic data lifecycle, and the raw and analysed data were placed in a managed repository based in New Zealand.

| Sample and metadata collection and DNA extraction
A single adult female hoki from the Cook Strait was sampled in May 2020 (commercial fishing vessel FV Otakou) to generate DNA for the genome assembly. Five tissues were preserved in RNAlater following the manufacturer's instructions: brain, gills, liver, heart and white muscle. High-quality DNA was extracted from preserved liver with a CTAB-based extraction buffer as follows: 100 mg of tissue was homogenized in 1 ml CTAB buffer (2% CTAB [hexadecyltrimethyl ammonium bromide], 2% PVP K40 [polyvinyl pyrrolidinone K40], 2 M NaCl, 25 mM EDTA, 100 mM Tris-HCl pH 8.0) with a sterile plastic pestle, on ice. After adding 25 µl Proteinase K (20 mg/ml) and 10 µl 1 M dithiothreitol, the mixture was incubated at 50°C for 18 h in a thermomixer 30 s on, 300 rpm, 30 min off. The sample was extracted once with an equal volume of chloroform:isoamyl alcohol (24:1 v/v), by gently mixing by hand for 5 min. The phases were separated by centrifugation (5 min at 16,089 g at room temperature), and the aqueous phase was transferred to a new tube and precipitated with 2 volumes of 100% ethanol at room temperature. After ~5 min, the DNA was collected by centrifugation (10 min at 13,000 rpm at room temperature). The pellet was dissolved in 400 µl TE buffer F I G U R E 1 Sampling locations of hoki in New Zealand and Australia used for the population genomic part of this study. See Table 1 for site information (10 mM Tris-HCl, 1 mM EDTA pH 7.5) and 100 µl 5 M NaCl. RNA was digested by adding 8 µl of RNase A (50 mg/ml) and incubating at room temperature for 5 min. Twenty-five microlitres of 10% SDS was added, and the sample was incubated in the thermomixer at 37°C at 300 rpm for 15 min. The sample was extracted with an equal volume of chloroform:isoamyl alcohol (24:1 v/v), mixing gently by hand for 5 min, and the phases were separated as described above. The DNA contained in the aqueous phase was precipitated with 0.7 volumes isopropanol, mixed, and incubated at room temperature for 10 min. The DNA was collected by centrifugation (10 min at 13,000 rpm at room temperature). The pellet was washed with 1 ml 70% ethanol and centrifuged again. The In total, two sets of 100 individual samples were collected per trawl and data were recorded on fish size and sex. An individual larger than 75 cm was considered to be an adult, and smaller individuals were classed as juveniles but sufficiently old to have separated as per the stock model hypothesis. As fish under 45 cm were still deemed to be at risk of being from either presumptive stock, these were not sampled. Some adults were classed as spawning individuals when they were collected during July and August at known spawning sites, and these were as follows: Hokitika Canyon, Kahurangi (both these are off the West Coast of the South Island of New Zealand) and Cook Strait and Pegasus Canyon (the latter off the East Coast of the South Island). The GPS location of trawls was recorded and used as location indicators. One fin clip was collected from each fish and stored in 95% ethanol in a screw-cap tube at room temperature until DNA extraction. A random subset of 30 or 40 fish was used from 14 collection sites for the DNA extraction and sequencing. The collection sites were selected to represent a geographic distribution, including various life stages. A description of the 510 hoki samples can be seen in Table 1. DNA extraction was performed at Neogen GeneSeek, Lincoln, NE, USA, based on the LGC sbeadex™ magnetic bead kit that uses a two-step binding mechanism to provide high-quality DNA for downstream SNP and NGS protocols (www.lgcgr oup.com).

| Genome sequencing and assembly
High molecular weight DNA was sent in July 2020 to the Australian Genome Research Facility (AGRF) for short-and longread sequencing. The DNA was quality checked by AGRF upon arrival, and passed quality thresholds. The TrueSeq DNA Nano short-read sequencing (150 bp PE reads). For long-read sequencing, the Oxford Nanopore Technologies (ONT) technology was used. Three MinION FLO-MIN106 flow cells were run in-house and at AGRF, to verify that DNA could be sequenced with this technology, which produced approximately 3 Gb of sequencing data. Then, ONT libraries were prepared at AGRF using the SQK-LSK109 kit and run using a PromethION FLO-PRO002 flow cell, using minKNOW version 4.0.5 and base calling using Guppy 4.0.11, producing 7.57 M reads totalling 19.7 Gb of data (passed reads).

| Outlier scans for stock structure assessment
Outlier SNPs were identified using three methods -tess3 (Caye et al., 2016), pcadapt (Luu et al., 2017) and LEA (Frichot & François, 2015). Tess3 searches for genomic variants under selection by applying matrix factorization algorithms to allele frequencies; pcadapt runs genome-wide selection scans based on principal component analyses; and LEA calculates F ST statistics from ancestral allele frequencies that are estimated using the packages snmf function.
The R package pcadapt v4.3.3 was implemented using default settings and method = "mahalanobis". The following parameters were used to run LEA's snmf function: genotype, K = 1:14, entropy = T, ploidy = 2, repetitions = 10, tolerance = 1e−05. The three analyses were conducted on two different datasets -the first contained all filtered SNPs and all hoki samples (510 samples); the second contained all filtered SNPs but only New Zealand samples (450 samples).
A Q-value of 0.05 was used as a threshold of statistical significance for all three analyses. Q-values were calculated using the qvalue R package (Dabney et al., 2010). Variants that met this threshold for all three analyses were considered as outliers that were putatively under divergent selection, whilst any variant that was not considered under selection by any of the analyses was considered to be putatively neutral.

| Stock structure and size
Genetic diversity and population structure were investigated using p-values were also estimated for each dataset and each sampling site using R package StAMPP (Pembleton et al., 2013), applying: nboots = 1000, per cent = 95. Population structure was investigated using the adegenet R packages find.clusters and DAPC functions (Jombart & Ahmed, 2011). Find.clusters was used to initially explore the optimal K value (number of ancestral clusters) for each dataset before the DAPC analysis was run using this optimized K value, in addition to an optimized number of principal components (PCs). The snmf function in the R package LEA was used to explore the ancestral admixture (Frichot & François, 2015). A range of K values were explored (K = 1:14), in addition to the following parameter settings: entropy = T, ploidy = 2, repetitions = 10, tolerance = 0.00001. For the DAPC and LEA analyses, individual analyses were run for each of the six datasets. Contemporary effective population size (N e ) was estimated for New Zealand samples using the linkage disequilibrium (LD) method in NeEstimator v2 (Do et al., 2014), whilst SNeP v1.1 was used to estimate historical N e (Barbato et al., 2015). The complete SNP dataset for New Zealand samples was thinned by 10,000 sites in vcftools before contemporary and historical N e values were estimated.

| Reference genome assembly
In total, 22.7 Gb of ONT long-read sequencing data was generated.
PromethION and MinION sequencing resulted in 19.69 and 3.01 Gb of passed reads with N50 greater than 4.9 and 5.2 kb, respectively.
Assembly using the ONT data and using the FLYE software resulted in a total of 501,488,236 bp assembled into 566 scaffolds ( Table 2).
The N50 of scaffolds was greater than 11 Mb and 15 scaffolds accounted for more than half the total assembly (L50). The largest scaffolds were approximately 26 Mb in length. BUSCO analysis after polishing using Illumina short reads and using PILON resulted in 93.8% complete BUSCOs (0.9% duplicated) out of 3640 zebrafish conserved genes.

| Individual hoki samples sequencing, variant calling and filtering
A total of 6148 Gb of Illumina sequencing data were generated for all 510 sampled fish (Table S1), corresponding to a mean of 12.05 Gb of data per individual (median 11.68 Gb), which corresponds to a mean read depth of ~24× based on the assembled 501 Mb hoki genome. Only four individuals had less than 9 Gb of total data and the individual with the lowest yield had greater than 5 Gb of data, which is still greater than the target 10× coverage. The mapping rate to the reference genome ranged between 81.5% and 94.9%, with a median of 93%, and only seven individuals had mapping rates lower than 90%. In total, 3,906,729 raw variants were detected of which 227,490 were retained after filtering. The complete SNP dataset was also examined with only New Zealand samples included to determine whether there was any additional structure among the New Zealand sampling sites. The kmeans clustering analysis determined there was one cluster (K = 1) within the dataset ( Figure S3a,b), with the DAPC scatterplot displaying overlap between all sampling sites ( Figure S4). FMA6 (FMA denotes Fisheries Management Areas) Snares and FMA3 J1 did, however, display a low rate of genetic divergence from the other sites when DA 1 and DA 2 were examined. However, it should be noted here that a replicate trawl from FMA3 did not show the same pattern as FMA3 J1. The LEA ancestral admixture analysis also determined that the optimal number of ancestral populations for the New Zealand samples was K = 1 ( Figure S3c).  Figure S10).

| DISCUSS ION
Knowledge of the stock structure of important fisheries species is a key component that informs management plans. Marine fish populations typically consist of large numbers of individuals with high dispersal potential, which can result in high gene flow and weak population structure (Bernatchez et al., 2017;Papa et al., 2020).
Genomic approaches provide an excellent tool to gain insights into the fine-scale population structure and can also shed light on the environmental drivers that impose selection on the genome (Bernatchez, 2016  While panmixia of New Zealand hoki stocks was inferred, outlier scans including the two Australian sites revealed a large number of loci under putative selection. In total, 2797 SNPs could be identified as outliers that were in common across all three analysis methods.
This high number of SNPs under putative selection indicates that the wider Australasian hoki population has a high potential for future genetic adaptation in response to changing environmental conditions.
In the context of a rapidly changing world, this high standing adaptive potential may provide this species with a resilience buffer to combat future changes. In contrast to the large number of outlier loci in the overall dataset, when only New Zealand locations were analysed, the number of shared outlier loci between sampling locations fell to only 10 SNPs. This finding could indicate that large-scale environmental selection selects for different locally adapted genome associations in each cluster; however, further analysis is required to better understand this small number of shared outlier SNPs. Our finding of multiple outlier loci, particularly in the complete dataset, is not in line with previous views on the adaptive ability of marine teleosts (Waples, 1998). In the past, adaptive divergence was widely believed to be rare in marine fishes owing to fewer barriers to gene flow in the marine environment, the often-long dispersal phase of larvae, and also migratory adults; together, these factors were thought to hinder local adaptation (Kawecki & Ebert, 2004). Our results, and those of other studies in recent years (see for a review Bernatchez, 2016), strongly indicate that this view is no longer warranted and that many marine fishes exhibit signatures of local adaptation. Indeed, recent work suggests that the often-large population sizes of marine fishes may be fuelling selection, particularly when selection is stable and consistent over large areas, and the selection differentials are large (Hemmer-Hansen et al., 2007, 2014). This appears to apply strongly to species that are characterized by large populations, meaning that they are generally little affected by random genetic drift, and probably respond to even relatively weak selection, as locally beneficial alleles have a good chance of sweeping through the population.
Our population size estimates of N e indicated a general decline in the population size of hoki, from a historical N e of 228,681, 999 generations ago, to a N e of 42,816, approximately 189 generations ago. These estimates have to be taken with caution, however, as estimation of N e values is always plagued by high uncertainty owing to the need to sample a high number of individuals to produce accurate estimates (Blower et al., 2019). In fact, simulations indicate that around 1% of the total number of individuals might have to be sampled to ensure sufficiently precise estimates of N e, and our sample size is well below this recommend threshold. Indeed for species with large populations, this would mean that several thousands to millions of individuals would have to be sampled (Marandel et al., 2019). Promising recent developments that rely on the reconstruction of close relatives, such as the close kin mark-recapture (CKMR) approach (Bravington et al., 2016), have demonstrated an improved ability to calculate N e with smaller confidence intervals (Ruzzante et al., 2019). However, application of this method to species with large population sizes has been little explored to date and is likely to be cost prohibitive in most cases.  In New Zealand, hoki forms the largest fishery by volume of catch (Papa et al., 2020). Our population genomics findings about hoki stock structure should be interpreted alongside other biological data on the species. Data on spawning patterns, morphometrics and growth rates in part corroborate the currently used two-stock model of hoki (Hicks et al., 2003;Livingston & Schofield, 1996;Livingston & Sullivan, 2007). However, our findings suggest that the Eastern and Western spawning grounds may undergo sufficient mixing to prevent genetic divergence to the degree where genetic clustering emerges. Furthermore, the finding of overall genetic panmixia and the low rate of pairwise location differentiation together indicate that growth differences could be due to phenotypic plasticity rather than genetically derived stock differences. Despite genetic panmixia in New Zealand, we detected some evidence for selection on some genomic regions, indicating that the overall hoki stock may hold some locally adapted genomic regions that may convey a fitness advantage. In the light of the prediction that changing climatic conditions can negatively affect marine productivity (Lindegren et al., 2018), it will be important for fisheries management to monitor the adaptive potential of this fishery. Future steps may focus on management approaches that seek to maintain locally adaptive variants to avoid depletion of biodiversity that could potentially lead to population decline (Reiss et al., 2009 (Dahle et al., 2018;Hemmer-Hansen et al., 2019).
With the data from this study, it is already possible to use identified divergent SNPs between the New Zealand and Australian clusters to inform hoki seafood traceability, as these could be used to link the catch to its geographic origin (A list of these SNPs is available upon request).
In conclusion, genetic variation at the genome-wide level is invaluable to identify fish stock structure in fisheries management, and the recent increase in the accessibility and resolution of population genetic data has facilitated the detection of previously unidentified structures, as well as signatures for natural selection in wild populations (Bernatchez, 2016). We highlight the importance of using large numbers of markers distributed across the genome to fully characterize the genetic diversity of marine species. In our current study, this allowed us to separate the Australian and New Zealand clusters, as well as some more subtle differences within New Zealand hoki that otherwise could have been overlooked, and allowed us to scan the genome to detect regions under selection.
This study contributes to increasing the genetic knowledge of this important fisheries species, and results can be used to improve our understanding of population dynamics and stock structure. Insights into the species demography is particularly challenging in high gene flow environments, such as many marine fishes, where small genetic differences across most of the genome can mask genetic divergence of strong functional significance. Thus, our study also serves as an example of the increased power offered by population genomics for conservation and management (Allendorf et al., 2010a;Hunter et al., 2018).

ACK N OWLED G EM ENTS
We like to thank all the vessel owners, managers and crew as Mark Jarvis for help with contracting.

CO N FLI C T O F I NTE R E S T
The authors declare no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
Permission from representatives of the Indigenous Peoples (Māori) was obtained for specimens used in this study. Further studies using this material, raw sequencing data and final genome assembly will require consent from the Māori iwi (tribe) who exercises guardianship for this material according to Aotearoa New Zealand's Treaty of Waitangi and the international Nagoya protocol on the rights of indigenous peoples. Raw and analysed data are available through the Genomics Aotearoa data repository at https://www.genom ics-aotea roa.org.nz/data. Access to these data will require permission from Te Ohu Kaimoana (ika@teohu. maori.nz).