Estimating the contribution of Greenland Halibut (Reinhardtius hippoglossoides) stocks to nurseries by means of genotyping‐by‐sequencing: Sex and time matter

Abstract Identification of stocks and quantification of their relative contribution to recruitment are major objectives toward improving the management and conservation of marine exploited species. Next‐generation sequencing allows for thousands of genomic markers to be analyzed, which provides the resolution needed to address these questions in marine species with weakly differentiated populations. Greenland Halibut (Reinhardtius hippoglossoides) is one of the most important exploited demersal species throughout the North Atlantic, in particular in the Gulf of St. Lawrence, Canada. There, two nurseries are known, the St. Lawrence Estuary and the northern Anticosti Island, but their contribution to the renewal of stocks remains unknown. The goals of this study were (a) to document the genetic structure and (b) to estimate the contribution of the different identified breeding stocks to nurseries. We sampled 100 juveniles per nursery and 50 adults from seven sites ranging from Saguenay Fjord to offshore Newfoundland, with some sites sampled over two consecutive years in order to evaluate the temporal stability of the contribution. Our results show that after removing sex‐linked markers, the Estuary/Gulf of St. Lawrence represents a single stock which is genetically distinct from the Atlantic around Newfoundland (F ST = 0.00146, p‐value = .001). Population assignment showed that recruitment in both nurseries is predominantly associated with the St. Lawrence stock. However, we found that the relative contribution of both stocks to the nurseries is temporally variable with 1% contribution of the Newfoundland stock one year but up to 33% for the second year, which may be caused by year‐to‐year variation in larval transport into the Gulf of St. Lawrence. This study serves as a model for the identification of stocks for fisheries resources in a context where few barriers to dispersal occur, in addition to demonstrating the importance of considering sex‐linked markers and temporal replicates in studies of population genomics.


| INTRODUC TI ON
The use of genomic tools for conservation and management offers the opportunity to better understand the dynamics and demography of natural populations (Manel, Gaggiotti, & Waples, 2005). Population genomics allows answering conservation issues, such as assessing reproductive success (Ford, Pearsons, & Murdoch, 2015), testing for local adaptation (Grummer et al., 2019;O'Malley, Camara, & Banks, 2007), documenting migratory patterns of exploited and/or threatened populations (Larson et al., 2013), and defining management units Palumbi, 2004). In the case of marine species, the lack of statistical power is a main issue when trying to define stocks of exploited species, where exchange of genetic material is high due to minimal dispersal constraints and differentiation between stocks is further limited because of large effective population sizes (Waples, Punt, & Cope, 2008). However, the improvement of next-generation sequencing since the mid-2000s has allowed the definition of subtle differentiations at fine local scales in order to delineate populations (i.e., stocks) by making it possible to genotype thousands of markers on hundreds of individuals in a relatively short time (Metzker, 2010), as exemplified by recent studies of various marine organisms such as the American lobster (Homarus americanus; Benestan et al., 2015), the giant California sea cucumber (Parastichopus californicus; Xuereb, Kimber, Curtis, Bernatchez, & Fortin, 2018), or the Atlantic cod (Gadus morhua; Barth et al., 2017), to name a few.
Nevertheless, increasing the number of markers is not always sufficient to resolve population structure for species with large populations and/or high connectivity such as marine species (Gagnaire et al., 2015). For reduced-representation sequencing data, it is known that filtering parameters have a major impact on results (Puebla et al., 2014;Rodriguez-Ezpeleta et al., 2016Shafer et al., 2017). For example, as shown by Gagnaire et al. (2015), removing SNPs under directional selection from assessment of migratory pattern and connectivity of marine species populations might result in elimination of informative loci with high F ST between "real" populations and thus creates false-negative interpretation. Also, Roesti, Hendry, Salzburger, and Berner (2012) showed that the outcome of population genomics studies can be systematically biased if markers with a low minor allele frequency are included in the analysis. The reason is that these "uninformative" polymorphisms lack the adequate potential to capture signatures of drift and hitchhiking. Sex-linked markers, which consist of genomic markers located in genomic regions associated with sex determinism, may also influence population structure results. As shown in Benestan et al. (2017), removing SNPs linked to sex resulted in a lower and more realistic estimation of differentiation for the anadromous species Arctic charr (Salvelinus alpinus) and in American lobster (Homarus americanus). More recently, a study of stock assessment in deacon rockfish (Sebastes diaconus) found similar results. When removing the 92 identified SNPs presumably linked to sex, F ST values between males and females decreased from 0.45 to 0.0036, although still significant (Vaux et al., 2019). Despite these results, little to no attention has been made to evaluate the importance of this type of marker in other species.
One specific application of genetic stock definition is to determine their relative contribution in a context of mixed stock fisheries. In particular, this issue may be addressed using assignment methods, which consist of determining the most probable origin (or reference stock in the case of fisheries) of an individual or a group of individuals from unknown sources based on multilocus genotype information Dann, Habicht, Baker, & Seeb, 2013;Ensing, Crozier, Boylan, O'Maoiléidigh, & McGinnity, 2013). The statistical power of assignment tests highly depends on the level of differentiation between putative source populations (Pritchard, Stephens, & Donnelly, 2000).
Greenland Halibut (Reinhardtius hippoglossoides), also named turbot, is a flatfish of the Pleuronectidae family with a circumpolar distribution throughout the Northern Hemisphere. In the Gulf of St.
Lawrence (Canada), spawning occurs every year during the winter months from January to March. Following emergence, larvae drift for a few months and then settle in a nursery area until growth is complete (Sohn, Ciannelli, & Duffy-Anderson, 2010). A nursery can be defined as an area where the density of juveniles (fish older than larvae, but in which gonads are not yet mature) is above average compared to elsewhere in the range of the species. These habitats are also usually characterized by an abundance of smaller prey and a relatively low predation rate (Beck et al., 2001;Dahlgren et al., 2006). There are two known nurseries of Greenland Halibut in the St. Lawrence system, one being located in the Estuary and the other in the Gulf north of Anticosti Island (Youcef, Lambert, & Audet, 2013). Benthopelagic fishes such as Greenland Halibut are characterized by a high potential to disperse due to the prolonged pelagic larval phase, which increases connectivity between populations, thus complicating the definition of genetically distinct population and management units (Bailey, 1997;Diopere et al., 2018;Hoarau, Rijnsdorp, Veer, Stam, & Olsen, 2002;Le Moan, Jiménez-Mena, Bekkevold, & Hemmer-Hansen, 2019;Vandamme et al., 2014). Several studies have aimed at documenting the extent of genetic connectivity of the species in the western Atlantic with different outcomes. First, several authors proposed that Greenland Halibut from the Gulf of St. Lawrence comprise a single population, distinct from other stocks in the Atlantic. This hypothesis has been supported from studies on allozymes (Fairbairn, 1981) and prevalence of parasites (Arthur & Albert, 1993;Khan, Dawe, Bowering, & Misra, 1982). Arthur and Albert (1993) (Roy et al., 2014). It is possible that these contradictory results stem from the limited resolution offered by previous types and number of genetic markers that were used previously to assess population structure in Greenland Halibut. This situation supports revisiting the population genetics of the species using a more powerful method, such as genotyping-by-sequencing (GBS).
Moreover, no study has attempted to document the contribution of different putative Halibut populations to the juveniles that use the two known nurseries within the St. Lawrence system.
The first goal of this study was to test the null hypothesis of the absence of population structure among Greenland Halibut from the St.
Lawrence system and Atlantic sampling locations near Newfoundland.
The second goal was to determine the origin of juveniles from the two nurseries located within the St. Lawrence system. We used GBS to genotype a total of 850 adults from 7 localities distributed from Saguenay Fjord to offshore of Newfoundland and 200 juveniles from both nurseries (two annual temporal replicates of 100 for each nursery). In addition, we assessed the effect of sex-linked markers on the definition of population structure. In light of our results, we discuss (a) the importance of using a large number of markers to refine the resolution of population structure in weakly structured marine species, (b) the importance of sexing individuals being genotyped, and (c) the importance of performing temporal replicates to assess the temporal stability of patterns being documented.

| Sampling
Fisheries and Oceans Canada (DFO) and volunteer fishermen from the area collaborated to achieve the sampling (Figure 1). Every fish was measured and its sex identified at the time of capture. Fish larger than 31 cm were considered mature, whereas those under were classified as juveniles (DFO, 2012;Morin and Bernier, 2003

| DNA extraction
The genetic material was extracted according to the protocol detailed in Aljanabi and Martinez (1997). Samples were migrated on a 1% agarose gel to ensure genomic DNA was sufficiently abundant and of high quality for DNA sequencing. DNA concentration was determined by colorimetry using a spectrophotometer (NanoDrop).

| Bioinformatics and genotyping
Following DNA extraction, every sample was diluted to a concentration of 10 ng/μl in 10 μl to normalize at 100 ng of DNA  The resulting dataset was checked for overall quality using FASTQC (Andrews, 2010). Adapters were removed from reads using cutadapt v.1.8.1 (Martin, 2011). Sequences were demulti- and alignments with score lower than 0 were discarded (T = 0).
SNPs were identified at each locus with pstacks with a minimum depth of coverage to report a stack set to 1 (m = 1). Then, cstacks was selected to construct a reference catalog with default parameters using all the individuals sampled in adult localities and nurseries. Individual SNPs were called using the catalog with sstacks also with parameter left as is. The minimum coverage depth for a locus was fixed to 7 (m = 7) using populations module in STACKS set with the minimum percentage of individuals in a population required to process a locus set to 60% (r = .6), the minimum number of populations a locus must be present to process it fixed at 6 (p = 6, while the population map included the seven sample localities plus the two nurseries) and the maximum p-value to keep a F ST measurement set to 0.1. After, the ddRADseq dataset was filtered for SNPs calling at over 70% of genotyped individuals in every population (see details in Table 1). Following filtering was performed with stacks_workflow available online (https://github. com/enorm andea u/stacks_workflow). Reads were filtered for a minimum coverage depth of 10× and a maximum coverage depth of 100×. This step ensures working with markers that have enough coverage to limit false positives and also remove those that could be located in repetitive segments in the genome, which causes SNP overrepresentation. Next, we removed SNPs that were genotyped in fewer than 70% of overall individuals. Observed heterozygosity was set to be superior than 0.5 (Ho > 0.5) within samples in order to remove potential homeologs from the dataset (Davey et al., 2011). Both individuals and loci with a proportion of more than 10% of missing genotype overall were discarded. Minor allele frequency was set to more than 0.001 (MAF > 0.001) for all of the sampling sites to make sure any sequencing error from the dataset was removed. Finally, a relatedness analysis was performed to remove individuals closely related or that may have been pooled twice (relatedness > 0.9) using VCFtools (Danecek et al., 2011).
When a pair of individuals with a high relatedness coefficient originated from the same sampling site, one was selected randomly to be kept in the dataset, if they were from different sites, both individuals were discarded. One SNP per read was kept based on which one had the highest minor allele frequency. We performed this type of filtering selection in order to increasing discrimination power needed to detect differentiation with a marine species (Gagnaire et al., 2015).

| Identifying sex-linked and selected SNPs
We then removed SNPs under balanced selection using Bayescan v.2.1 (Foll & Gaggiotti, 2008). Balancing selection can take different forms, including disassortative mating, frequency-dependent selection, overdominance, or selection that varies in direction and intensity across space or time. Any of these processes will generally result in higher allelic diversity and/or less population differentiation at loci under balancing selection (King, Stansfield, & Mulligan, 2006).
This software uses a Bayesian approach to estimate F ST coefficients.
It categorizes SNPs under balanced and directional selection based on a posterior distribution set by the user, which determines how the neutral model is more likely to occur. We used a posterior odd of 10,000, as suggested by Lotterhos and Whitlock (2015), with 9,999 pairwise iterations. For further analysis, we kept all neutral and divergent loci, to keep all variation explained by either reproductive isolation or differentiation in selection pressure between sites (Gagnaire et al., 2015). We chose to remove markers potentially under balancing selection since this force causes both alleles to be maintained equally frequent when comparing sites, which reduce discrimination power and may wrongly lead to overinterpreting the extent of genetic connectivity between populations (e.g., Benestan et al., 2015).
After removing SNPs under balancing selection, the dataset was duplicated. One duplicate set was left as is and the other was used to remove sex-linked markers. To identify SNPs associated with sex, we performed a redundancy analysis (RDA) using the R package vegan (Oksanen et al., 2007). The dependant matrix was represented by the genotypic data and the explanatory variable as sex. A SNP was correlated with the sex variable when the standard deviation for the distribution of RDA loadings was higher than 95%. To perform following analysis, the VCF file obtained from the filtering steps was converted to a genepop file using PGDspider (Lischer & Excoffier, 2012), as well as VCFtools to translate into a PLINK file (Purcell et al., 2007).

| Assignment of juveniles
We performed an individual assignment test with the R package as-

| Genotype-by-sequencing
The mean read coverage depth was 50× overall. For additional details about general quality of the data, see Figures S1 and S2. In total, 32 individuals were removed because of high mean missing data and another 32 individuals because of high relatedness (>0.9). All of the subsequent analyses were made using a total of 26,965 SNPs when sex-linked SNPs were kept and 26,681 SNPs when they were removed (see below and Table 1 for details).

| F-statistics
Pairwise F ST values were calculated between all sampling sites (including nurseries) were relatively low (see Table 2). For the dataset including sex-linked markers, we found a significant difference between fish originating from the Atlantic Ocean with those of North

| Effect of sex-linked markers on population structure
Concerning the clustering analysis, the lowest cross-validation error was obtained when grouping individuals into one cluster for datasets with and without sex-linked markers (see Figure S4). However, since   (GSL) and Newfoundland (NFL). Monte Carlo cross-validation test showed an improvement of correct assignment for both reference stocks when including all loci ( Figure 5). For this reason, we kept all markers for subsequent analysis. Difference between the corresponding proportion of juveniles between years was significant (pvalue = 6.882e−07 for ANTJ and p-value = .001964 for ESTJ). Also, for the second year of sampling, proportion of juveniles originating from NFL were higher for the ANTJ nursery than ESTJ (.3034 vs.

| Assignment of juveniles to identified stocks
.1389, p-value = .04775). Therefore, this corresponded to an average juvenile contribution of the Newfoundland stock during the second year of sampling, albeit with a higher proportion of juveniles in the most eastern ANTJ nursery (30.33% for ANTJ and 13.89% for ESTJ; Figure 6).

| D ISCUSS I ON
The main purposes of this study were to determine the origin of Greenland Halibut juveniles from nurseries within the St. Lawrence system and to test for the relative contribution of the different stocks. First, we performed a structure analysis with ADMIXTURE and a PCA to identify adult populations to be used as reference for the population assignment with SNPs defined by GBS. We found that sex-linked markers affect significantly structure result by masking signal of differentiation between the GSL and NFL stocks that were defined. Second, we used the software assignPOP to determine the most likely stock of origin of juveniles of two different years of sampling between populations identified previously without sex-linked markers. Our results revealed an important inter-annual difference in proportion of juveniles originating from the NFL population, from being almost absent in the first year to almost a third of the total contribution the second year. We discuss the implications of these results on management strategies.

| Population structure: effect of sexlinked markers
Our results show a significant differentiation between geographical regions of Atlantic Ocean and the St. Lawrence, including the Saguenay Fjord, the Estuary, and the Gulf (p-value = .001) but no evidence of population structure within St. Lawrence system with the dataset. This result is similar to the analysis of allozymes done by Fairbairn (1981), which supported a pattern of high gene flow between the St. Lawrence and the Atlantic Ocean, although St. Lawrence showed a significant signal of isolation based on two unique allozymes found in this region. Still, it is probable that the significant pairwise F ST value we observed might be a consequence of the increased number of loci and samples, which inflates chances of obtaining false positive, instead of being a biological genomic differentiation signal (Helyar et al., 2011). On the other hand, as mentioned above, weak but biologically significant values of genetic differentiation have frequently been reported in the literature for marine species (e.g., Benestan et al., 2015;Xuereb et al., 2018). Here, the biological significance is revealed by the fact that we could confidently reassign juvenile fish from the nurseries to one population or the other, in proportion varying between years. This would not be possible if the genomic differentiation had not biological meaning.
Here, we showed that population structure was different when removing sex-linked SNPs from dataset. This result corroborates the study of Benestan et al. (2017) who showed that 12 highly differentiated sex-associated SNPs (with a total of approximately 10,000 SNPs) were sufficient to create population structure previously found between inshore and offshore locations in American lobster weight to these findings and show the importance of considering sex-linked markers in population genomics studies for species, especially in species characterized by weak genetic differentiation.
Another salient result of our study is that current management strategy in the southwestern distribution range of Greenland Halibut appears to reflect real population structure. To be more accurate, our results suggest that management zones 4R, 4S, and 4T should be combined and managed as single unit, which is already the case (DFO, 2018). Even though no sample sites were located in the  Bay, which has been suspected (but not confirmed) to be the main nursery for Greenland Halibut in the east of Atlantic Ocean population (Cooper, Maslenikov, & Gunderson, 2007;Junquera, Roman, Morgan, Sainza, & Ramilo, 2003;Riget & Boje, 1989 (Han, Loder, & Smith, 1999;Saucier, Roy, Gilbert, Pellerin, & Ritchie, 2003;Wu, Tang, & Hannah, 2012). We also hypothesize that the proportion of larvae entering the Gulf might depend on the strength of this current during the same year, which would need to be more formally tested. This could also explain why a significant superior number of juveniles from the Atlantic are found in the more eastern ANTJ nursery. Belle-Isle strait has been reported to be an important migration route for several species, including Atlantic redfish (Sebastes mentalla; Benestan et al., unpublished data), snow crab (Chionoecetes opilio; Puebla et al., 2008), and capelin (Mallotus villosus ;Cayuela et al., 2019;Kenchington, Nakashima, Taggart, & Hamilton, 2015).

| Assignment of juveniles to identified stocks
More specifically, the study on redfish aimed at documenting the connectivity between two ecotypes: deep water ecotype, found in Gulf of St. Lawrence, and the shallow water ecotype, found mostly in Labrador Sea and eastern Newfoundland coast. Authors found more than 10% of the total number of individuals classified as the GSL ecotype in the region corresponding to the shallow ecotype. From this finding, linked to the low density of adult located in the Belle-Isle strait, the authors suggested that larval drift plays a role in gene flow between the two ecotypes. For management purposes, the inter-annual variation shown in population assignment demonstrates the relevance of performing temporal replicates to better reflect the biological dynamics of the system. As such, our results also show that for questions related to recruitment, time needs to be taken into account even for population of large size such as Greenland Halibut.
Even though individual assignment is a reliable method to determine the origin of unknown individuals using frequentist statistics, we were not able to corroborate these results using a likelihood method (i.e., genetic stock identification). Assignment output from software gsi_sim (Anderson, Waples, & Kalinowski, 2008) , 2015), Atlantic salmon (Griffiths et al., 2010), and Green sturgeon (Israel et al., 2009). Clearly, although significant, the extant of population differentiation measured here is most likely too low to efficiently use this approach.

| Conclusion and perspective
Overall, our results show that performing both sex identification and temporal replicates are important in the type of study we report here, especially so when studying weakly structured marine species. This study is also part of a larger ongoing research pro-

ACK N OWLED G EM ENTS
We thank the Department of Fisheries and Oceans Canada for the planification and execution of sampling over the entire study range, as well as commercial fishermen for their contribution to sampling, more specifically Leopold Ghinter (UQAR) for his help in the field.
We also thank Renée Gagné (RAQ) and Céline Audet ( as support for participating in conferences.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
Raw demultiplexed sequences are available on the Sequence Read Archive (SRA) on the study accession number: PRJNA613476.