Impact of supplementation on deleterious mutation distribution in an exploited salmonid

Abstract Deleterious mutations have important implications for the evolutionary trajectories of populations. While several studies recently investigated the dynamics of deleterious mutations in wild populations, no study has yet explored the fate of deleterious mutations in a context of populations managed by supplementation. Here, based on a dataset of nine wild and 15 supplemented Lake Trout populations genotyped at 4,982 single nucleotide polymorphisms (SNP)s by means of genotype by sequencing (GBS), we explored the effect of supplementation on the frequency of putatively deleterious variants. Three main findings are consequential for the management of fish populations. First, an increase in neutral genetic diversity in stocked populations compared with unstocked ones was observed. Second, putatively deleterious mutations were more likely to be found in unstocked than in stocked populations, suggesting a lower efficiency to purge deleterious mutations in unstocked lakes. Third, a population currently used as a major source for supplementation is characterized by several fixed putatively deleterious alleles. Therefore, other source populations with lower abundance of putatively deleterious mutations should be favored as sources of supplementation. We discuss management implications of our results, especially pertaining to the joint identification of neutral and deleterious mutations that could help refining the choice of source and sink populations for supplementation in order to maximize their evolutionary potential and to limit their mutation load.

. Systems in which genetic drift prevails over natural selection (e.g., small population size) are of major concern for conservation and management programs because they may allow random deleterious variants to increase in frequency (Lande, 1994;Lynch, Conery, & Burger, 1995;Wright, 1931). Moreover, inbreeding can unmask recessive deleterious mutations, affecting fitness of individuals (Charlesworth & Willis, 2009). In wild populations, the presence of deleterious mutation has been recently found in small populations and used to evaluate the mutation load and the potential for accumulation of deleterious mutations (Benazzo et al., 2017;Perrier, Ferchaud, Sirois, Thibault, & Bernatchez, 2017). Predicting the functional effect of substitution is now possible and thus may be an important criterion to consider when implementing management actions that are currently not considered in conservation or management programs.
Lake Trout is an important recreational fish widely distributed in cold freshwater lakes of North America (Scott & Crossman, 1998).
The frequencies of such deleterious mutations, relative to the entire polymorphism within populations, were positively correlated with inbreeding, suggesting that the efficacy of purifying selection was negatively correlated with local level of inbreeding. Furthermore, approximately 46% of the Lake Trout populations in the southern part of province of Québec (Canada) have been stocked (mostly from wild populations), and the maintenance of several of these populations is considered dependent on stocking relative to fishing pressure (communication with the wildlife Québec agency, Ministère de la Forêt, de la Faune et des Parcs-MFFP). Based on 19 microsatellites markers, an increase in genetic diversity and a twofold decrease in genetic differentiation among stocked compared to unstocked populations were reported across Lake Trout populations (Valiquette et al., 2014). Furthermore, levels of admixture in stocked populations were correlated with stocking intensity, resulting in genetic homogenization of the heavily stocked populations (Valiquette et al., 2014).
The potential effect of stocking on the accumulation of deleterious mutations has not been tested yet in Lake Trout nor in any other exploited fish species to our knowledge. The extensive analyses of the population genetic impacts of stocking and of the accumulation of deleterious mutations in wild Lake Trout populations have paved the way for asking the timely question of how stocking affects the accumulation of deleterious mutations, which is of broad relevance for stocking management in general.
The main objective of this study was to assess the effect of stocking on the accumulation of putative deleterious variants. Specifically, this study will answer this question: Do the putative deleterious variants accumulate differentially in unstocked and stocked populations? We achieved this objective using a total of 578 fish from 24 populations in Québec, Canada, that we genotyped using a large single nucleotide polymorphism (SNP) panel designed by a genotyping by sequencing approach. On the basis of the results obtained, we discuss how stocking practices could be adjusted toward improving their usefulness by maintaining the evolutionary potential of populations, and how the use of SNPs may help improving stocking F I G U R E 1 Geographic location of the 24 sampled Lake Trout populations in Québec (Canada). Gradient color corresponds to the average daily air temperature, and the symbols indicate whether the population has been stocked or not

| Sampling, study system and molecular analyses
Samples from 24 lakes from Québec (Canada) were selected from the study of Valiquette et al. (2014) (Figure 1, Table 1) to maximize the (a) representation of genetic groups previously identified (Valiquette et al., 2014), (b) variation in levels of stocking, and (c) quality of samples for performing GBS. Stocking history information was extracted from the MFFP database. Among selected lakes, nine have never been stocked while 15 have been stocked at least once (Table 1).
It is worthwhile to mention that Lake Trout lakes are generally not supplemented from hatchery, but from wild fish from another lakes (Supporting Information Table S1). The median number of individuals per lake was 23, ranging from 10 to 41 (Table 1).
DNA was extracted from the same adipose fin clips for GBS genotyping using a modified version of salt extraction protocol (Aljanabi & Martinez, 1997). An RNAse A (Qiagen) treatment was applied following the manufacturer's recommendation. DNA quality was checked using agarose gel electrophoresis. DNA was quantified using a NanoDrop spectrophotometer (Thermo Scientific) and then using Quant-iT Picogreen dsDNA Assay Kit (Invitrogen). GBS librairies were prepared following a modified version of the two-enzyme GBS protocol, using PstI and MspI restriction enzymes (Poland, Brown, Sorrells, & Jannink, 2012), and single-end 100-bp-length sequencing on Illumina HiSeq2000 platform was conducted for all samples as detailed in Bernatchez et al. (2016).

| DNA sequencing, genotyping and genetic diversity
Cutadapt version 1.9.dev0 was used to remove adaptor sequences with a maximum error rate set to 0.2 and processed reads shorter than 80 bp were removed. Sequence quality was then inspected using FastQC 0.11.1 (Andrews, 2010). Libraries were demultiplexed and filtered for overall quality using the module process_radtags from Stacks version 1.30 (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011;Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013). One mismatch in individual tags was allowed for quality and ambiguous barcodes. To discard Illumina adaptors at the end of the read, reads were trimmed to 80 bp (Pujolar et al., 2013).
TA B L E 1 Population code and corresponding lake size (in hectares), sample size (N), administrative region, latitude, longitude, altitude (in meters), and stocking state (see Supporting Information Table S1 for more details about stocking and air temperature) Sequence reads were aligned to the Rainbow Trout (Oncorhynchus mykiss) genome (Berthelot et al., 2014) using the GSNAP version 2015-12-31v.9 (Wu & Nacu, 2010). Stacks pipeline version 1.40 was then used to discover loci and call genotypes. Pstacks was used to extract the stacks aligned to the reference genome and to identify SNPs at each locus with a minimum depth coverage of four to report a stack. Ten individuals per population (individuals with the highest number of sequence reads) were used to build the catalog using ctsacks. Set of stacks were then searched against the catalog using sstacks performing gapped alignment and using the -g option.
The populations module was used to call genotypes for loci having a minimum genotype likelihood of −10. Subsequent filtering was applied to retain SNPs that were biallelic, genotyped in a least 50% of the individuals of a given population, and had a global minor allele frequency (MAF) higher than 0.05. This filter was used to avoid the introduction of sequencing error in the dataset and to avoid restriction of MAF in local populations at the same time. Only a single SNP per locus was kept. All filtering details are presented in Supporting Information

| Removing markers under potential selection
Detection of putative outliers was performed using two approaches: (a) SNPs potentially under balancing and divergent selection were identified using BAYESCAN v.2.1 (Foll & Gaggiotti, 2008) and (b) SNPs under polygenic selection linked to temperature were detected using a random forest algorithm (Boulesteix, Janitza, Kruppa, & Konig, 2012;Chen & Ishwaran, 2012;Goldstein, Polley, & Briggs, 2011). This second step was produced because different seasonal temperature variables may impose different selective pressures on Lake Trout (Perrier et al., 2017). The putative outliers were subsequently removed from the SNP datasets, such that only putative neutrals remained (hereafter called "neutral markers") for the subsequent analyses. Details of Bayescan and random forest are provided in Supporting Information Data S1.

| Heterozygosity, neutral genetic differentiation, and structure
GENEPOP version 4.2 (Raymond & Rousset, 1995;Rousset, 2008) was used to estimate observed and expected heterozygosities (H O and H E , respectively), and a t test was performed between H E of unstocked versus stocked populations. The R package assigner v.0.3.9 (Gosselin, Anderson, & Bradbury, 2016) was used to estimate pairwise F ST (Weir & Cockerham, 1984) across all populations on both the neutral loci and the putatively deleterious mutation datasets and tested for significance with 1,000 bootstraps. Estimations of neutral admixture proportions of SNPs were performed with a discriminant analysis of principal components (DAPC). Because PCA methods do not accept missing data, imputations by lake were performed using the random forest algorithm implemented in the function "genomic_converter" from the "stackr" R package version 0.5.5. Then, the function "find.clusters" from the "adegenet" R package version 2.0.1 (Jombart & Ahmed, 2011) was used to assess the optimal number of groups with the Bayesian information criterion (BIC) method, considering no prior on group individual populations and allowing a maximum of 30 clusters. At last, the posterior individual assignment probabilities to each group (Q matrix) were obtained running the "dapc" function using the optimal number of discriminant functions to retain the optimal α-score obtained from each dataset (Jombart, Devillard, & Balloux, 2010).

| Identifying deleterious mutations
Deleterious mutations were first estimated by performing a BLAST on all SNPs against the Rainbow Trout transcriptome using blastx.
We used the Rainbow Trout transcriptome because it was the best transcriptome available among salmonids at the time we produced this work and for the sake of similarity with Perrier et al. (2017). All hits with a similarity higher than 25 amino acid of 26 possible (considering 80-bp-long read) and more than 95% similarity between the query read and the transcriptome sequence were retained. As described in Perrier et al. (2017), we then identified the nonsynonymous mutations across the significant hits. As the prediction of the functional effect of amino acid changes may depend on the direction (i.e., the predicted effect of Gly to Asp does not equal an Asp to Gly change), for each nonsynonymous substitution, we identified the specific amino acid substitution and reported the structural/physicochemical category to each amino acid according to the classification reported in Supporting Information Table S3.
We  Table S4), and we therefore show only PROVEAN scores for the subsequent analyses because unlike PolyPhen-2, this method is not limited to human genes.
Provean scores correlate with biological activity and may be used as an indicator for the degree of functional impact of a given protein variation. We investigated whether the values of these scores were significantly different across changes of amino acid categories by conducting two ANOVAs. In the first one, the direction of a category to another was kept whether in the second ANOVA, we clustered together two changes that implied the same categories with the goal of confirming the importance of the directionality of change on the potential deleterious state of a mutation.
At last, a default deleterious threshold value of −2.5 in Provean scores, as commonly employed by other studies (e.g., Renaut & Rieseberg, 2015), was used to classify a nonsynonymous mutation as potentially deleterious (<−2.5) or neutral (>−2.5). We thus applied Provean scores to test whether local deleterious allele frequencies tions. The latter ratio is expected to be lower in populations that purged deleterious mutations faster (see Perrier et al., 2017) and was compared between stocked and unstocked populations using a t test. Because deleterious alleles are negatively selected, it is expected that their frequencies should be lower than nondeleterious ones (Fay, Wyckoff, & Wu, 2001). Thus, t tests were also used to verify whether average "deleterious allele frequency" is lower than the ones of "nonsynonymous, but nondeleterious SNPs" and to the ones of "synonymous SNPs" within each stocked and unstocked groups.

| Effect of stocking on neutral genetic variation and accumulation of deleterious mutations
The effects of stocking and spatial autocorrelation on the genetic variation of both neutral genetic structure and deleterious alleles were tested using a distance-based redundancy analysis (db-RDA; Legendre & Legendre, 1998). First, we estimated a pairwise F ST distance matrix (neutral loci or loci with potential deleterious mutations) and produced principal coordinates analyses (PCoAs) on this pairwise F ST distance matrix to obtain the matrix to be explained by the db-RDA tests. For neutral SNPs, the Q matrix (i.e., admixture proportions; DAPC averaged at population level) was also used to serve as proxies for genetic structure to be explained by a db-RDA test.
Only PCo-axes explaining at least 5% of the variance were considered meaningful and kept. As a proxy for stocking intensity, a matrix of the total weight (in kg) of fish stocked from 13 sources (column) to the 24 studied lakes (row) was extracted from the MFFP database (Supporting Information Table S1). Because the density of stocked fish in a lake is a good indicator of stocking pressure (Létourneau et al., 2017), each row was divided by the surface of the stocked lake in hectare. Furthermore, a Hellinger transformation was produced to avoid the effect of absence of stocking on the similarity among objects (Legendre & Legendre, 1998). A PCA was produced on the transformed database and a broken-stick distribution was thereafter used to choose the meaningful axes related to stocking pressure. To control for spatial autocorrelation, a distance-based Moran's eigenvector map (db-MEM) based on latitude, longitude, and altitude, all pretransformed in meter with the function "geoXY" of the R package "SoDA," was produced. The PC-axes related to stocking intensity and db-MEMs related to spatial components constituted the two explanatories matrices. To identify the best model, the function "ordistep" was used to select the best explanatory variables among stocking and spatial matrices. The function "rda" was finally used to compute the db-RDAs on the selected model (see Benestan et al., 2016;Laporte et al., 2016;Le Luyer et al., 2017;Marengo et al., 2017, for examples of similar methodology). In addition, to ensure that stocking variables explained a significant part of the variation, a partial db-RDA analysis was produced to test for the effect of the stocking variables after controlling for the spatial autocorrelation was computed with the function "RsquareAdj." When not mentioned, R functions were part of the "vegan" package.

| Comparison SNPs/microsatellites
Single nucleotide polymorphism database produced in this study contained the exact same individuals that the one of Valiquette et al. (2014) based on 19 microsatellites. We thus took advantage of both of the large dataset to compare their respective capacity to detect an effect of stocking on neutral genetic variation. Because results were similar between both datasets (although SNPs showed an increase in power given the much higher number of loci screened), this section is only presented in Supporting Information Data S2 for sake of comparison.

| DNA sequencing, genotyping, and genetic diversity
The total number of demultiplexed and cleaned reads was After DNA control quality and filtering out individuals with more  Table S5). The median value of SNPs genotyped per population was 4,335 (Table 2), and the median depth of coverage was 35.

| Removing markers under potential selection
Among the 4,982 SNPs, BAYESCAN identified 437 SNPs (8.8%) that were potentially under divergent selection and 722 SNPs (14.4%) under balancing selection (Supporting Information Data S1; Figure   S1.1). Analyses with the random forest algorithm selected 135 and 139 SNPs associated with temperature for population structure with K = 21 and K = 24, respectively (Supporting Information Data S1). In each case, model performance was not improved by reducing the number of selected loci according to the cross-validation.

| Heterozygosity, neutral genetic differentiation, and structure
The median estimates of the observed and expected heterozygo-  Figure S2.1).
The DAPC further confirmed the pronounced genetic structure observed among populations. The BIC values decreased as the number of K increased until K = 24 (Supporting Information Data S2; Figure   S2.2). Thus, K = 24, which also corresponds to the total number of sampled populations, was used to estimate the posterior individual probabilities of assignment (Q matrix; Supporting Information Data S2; Figure S2.3). deleterious alleles) with a frequency lower than 5% was 52% across lakes (ranging from 34% (lake TU) to 61% (lake PA). This proportion of minor allele with low frequency (<5%) was higher than the same proportions observed for both nonsynonymous/nondeleterious and synonymous polymorphisms (respectively 41% (from 34% (lake CR) to % (lake MG)) and 36% (from 30% (lake AL) to 42% (lake MG)); Figure 5). Together, these results suggest that deleterious mutations are more likely to be lost than putatively neutral polymorphisms (especially in stocked versus unstocked populations) but that there are still relatively common across all populations. At last, it is important to note that certain deleterious mutations were fixed in five populations, including an important source population used for stocking other lakes (TZ Lake, Table 2).

| Effect of stocking on neutral genetic variation and accumulation of deleterious mutations
Based on the broken-stick distribution, four stocking PC-axes were considered meaningful and thus retained. Together, these  Table   S6). No stocking source was highly associated (correlation >0.60) with the fourth stocking PC-axis (Supporting Information Table   S6). In addition, 16 axes were obtained from the db-MEM analysis (Supporting Information    (Db-MEM # 2,9 and 14) and one stocking variables (PC-3, associated with source lake Mitis) and explained 24.5% of the five PCoaxes (p < 0.001). A total of 15.7% of the variation was explained by the stocking variable alone after controlling for spatial variables (p < 0.01). For the Q matrix, 12 PC-axes (totalizing 61% of explained variance) each explaining at least 5% of the variance were kept.
Global model retained five spatial variables (Db-MEM # 1, 2, 10, 12 and 14) and all four stocking variables and explained 26.5% of the 12 PCo-axes (p < 0.001). A total of 16.9% of the variation was explained by the stocking variables after controlling for spatial variables (p < 0.001).
For deleterious mutations, two PCo-axes (totalizing 63.3% of explained variance) each explaining at least 5% of the variance in pairwise F ST distance matrix were kept. The global model retained one spatial variable (Db-MEM #1) and one stocking variable (PC-2, associated with source lake Tremblant), and marginally explained 13.2% of the five PCo-axes (p < 0.1). A total of 8.8% of the variation was also marginally explained by the stocking variable alone after controlling for spatial variables (p < 0.1).

| D ISCUSS I ON
The main goal of this study was to assess the stocking impacts on the accumulation of putative deleterious mutations in an exploited fish, Lake Trout (Salvelinus namaycush). Our results highlighted an increase in neutral genetic diversity and a lower number of deleterious mutations in stocked populations compared with unstocked ones, as well as an effect on deleterious mutation variation related to a major source for supplementation characterized by several fixed putatively deleterious alleles (stocking PC-2, Tremblant lake). Below we discuss the usefulness of GBS for monitoring putative deleterious alleles in source and stocked populations in order to mitigate potential harmful effects linked to stocking practice.

| Effect of stocking on neutral genetic diversity and variation
An increase in genetic diversity (at both marker types, see Supporting Information Data S2 for a detailed discussion on this topic) was observed in stocked compared with unstocked lakes, as already reported by Valiquette et al. (2014). However, the percent-

| Detection and analyses of putative deleterious mutations
Our empirical GBS data on a nonmodel species highlights the fact that the prediction of the functional effect of amino acid change actually depends on the direction of amino acid substitution as suggested by Vasemägi et al. (2017). For several categories, one direction exhibits mainly "neutral" nonsynonymous mutations, whereas the other direction presents mainly putative deleterious mutations. A total of 113 putative deleterious SNPs was detected in our dataset corresponding to almost 53% of all nonsynonymous variations, which is in accordance with the 124 (60%) deleterious variants detected with a different dataset of natural Lake Trout populations (Perrier et al., 2017). For comparison, the percentage of nonsynonymous sites estimated to be deleterious in other species ranges from 3% in bacterial populations (Hughes, 2005) to 80% in the human genome (Fay et al., 2001). Such relatively abundant levels of deleterious mutations in Lake Trout populations could be attributed to their colonization history and to their small effective population sizes (Perrier et al., 2017), which are known to accumulate deleterious variants over time (Allendorf, Aitken, & Luikart, 2013;Balick, Do, Cassa, Reich, & Sunyaev, 2015;Benazzo et al., 2017;Frankham, Ballou, & Briscoe, 2010;Renaut & Rieseberg, 2015). Admittedly, the putative harmful mutations detected could also partly be linked to gene duplication and residual tetrasomic inheritance present in all salmonids due to their whole-genome duplication (WGD) (Dehal & Boore, 2005;Jaillon, Aury, Brunet, Petit, & Stange-Thomann, 2004;Macqueen & Johnston, 2014). Nevertheless, a recent study demonstrated that gene duplication can also impart fragility, not only robustness over mutations (Diss et al., 2017), suggesting that even duplicated regions remain strong candidates to harbor deleterious mutations. Furthermore, deleterious mutations should be kept at low frequency by natural selection (Kimura, 1983;Whitlock, Griswold, & Peters, 2003). The average proportion of minor alleles at potential deleterious SNPs with a frequency lower than 5% was more important than the proportion of nonsynonymous/nondeleterious and synonymous polymorphisms for both stocked and unstocked populations. These observations are consistent with the action of an effective purifying selection (Fay et al., 2001). Nevertheless, those rare predicted deleterious alleles frequencies varied among populations and some of them segregate at high frequency in a few populations, suggesting that the monitoring of markers harboring putative deleterious mutations has the potential to improve stocking practices, for instance for guiding the choice of sources and sinks for supplementation.

| Effect of stocking on deleterious mutations
It is interesting that the average relative abundance of putative deleterious alleles over the entire polymorphism found in a population is higher in unstocked than stocked populations. In stocked populations, the addition of new individuals within a population from another lake could indeed both diminish the effect of drift as well as the subsequent fixation of putative deleterious, whereas in unstocked populations, genetic drift remains the main driver allowing putative deleterious variants to increase in frequency as observed in natural Lake Trout populations (Perrier et al., 2017) and in small populations of another species (Benazzo et al., 2017). Nevertheless, the potential of the several harmful impacts of stocking must be kept in mind, including genetic homogenization, loss of genetic integrity, and the weakening of local adaptation (Halbisen & Wilson, 2009

| CON CLUS I ON S AND MANAG EMENT RECOMMENDATION
On the basis of a comprehensive empirical dataset, we presented the first study answering the timely question of how stocking can affect the accumulation of putative deleterious mutations and consequently addressed recommendations for Lake Trout management.
Our results suggest a slightly more effective purifying selection on deleterious variation in stocked than in unstocked populations.
Nevertheless, stocking may cause harmful effects on wild populations, particularly when source populations are adapted to different environments (i.e., hatcheries or remote populations inhabiting a distinct environment). In natural Lake Trout populations, lake size was positively associated with genetic diversity (Perrier et al., 2017), which points to the importance of avoiding small size lakes as stocking sources. In this study, an effect of stocking associated with the source lake Tremblant was observed on deleterious mutations variability. Across our sampling of lakes, Tremblant is a medium size lake that showed relatively high frequency/fixed deleterious mutations.
In addition, this lake appears to have difficulty purging its deleterious mutations, which overall suggests that fish supplementation from this source is not desirable and should be avoided. This result highlights the importance of assessing the genetic variability (including neutral and maladaptive alleles) before choosing a source for stocking and to avoid the spreading of deleterious mutations in the wild.
Moreover, choosing a stocking source from the same area with similar environmental conditions should be favored with the goal to minimize potential impacts on recipient population's local adaptation. However, local adaptation is not always related to geography.
For instance, ecotypic variation, such as piscivorous and planktivorous Lake Trout, could be intermingled across large geographic areas . Therefore, phenotypic traits such as diet regime and growth rate could also represent other important cues to be considered before choosing a population as a source of stocking in Lake Trout.

ACK N OWLED G EM ENTS
We thank all biologists and technicians of the Ministère des

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interests.

AUTH O R S CO NTR I B UTI O N
L.B. and C.P. conceived the study. C.P. carried out laboratory work.
A.-L.F. did bioinformatic analyses. A.-L.F. and M.L. carried out statistical analyses and wrote the manuscript. All co-authors critically revised and contributed to editing the manuscript and approved the final version to be published.

DATA A R C H I V I N G S TAT E M E N T
Individual read raw sequences have been submitted at the Sequence Read Archive (SRA) (Project Accession Number: SRP149671).