Neutral and adaptive drivers of genomic change in introduced brook trout (Salvelinus fontinalis) populations revealed by pooled sequencing

Abstract Understanding the drivers of successful species invasions is important for conserving native biodiversity and for mitigating the economic impacts of introduced species. However, whole‐genome resolution investigations of the underlying contributions of neutral and adaptive genetic variation in successful introductions are rare. Increased propagule pressure should result in greater neutral genetic variation, while environmental differences should elicit selective pressures on introduced populations, leading to adaptive differentiation. We investigated neutral and adaptive variation among nine introduced brook trout (Salvelinus fontinalis) populations using whole‐genome pooled sequencing. The populations inhabit isolated alpine lakes in western Canada and descend from a common source, with an average of ~19 (range of 7–41) generations since introduction. We found some evidence of bottlenecks without recovery, no strong evidence of purifying selection, and little support that varying propagule pressure or differences in local environments shaped observed neutral genetic variation differences. Putative adaptive loci analysis revealed nonconvergent patterns of adaptive differentiation among lakes with minimal putatively adaptive loci (0.001%–0.15%) that did not correspond with tested environmental variables. Our results suggest that (i) introduction success is not always strongly influenced by genetic load; (ii) observed differentiation among introduced populations can be idiosyncratic, population‐specific, or stochastic; and (iii) conservatively, in some introduced species, colonization barriers may be overcome by support through one aspect of propagule pressure or benign environmental conditions.


| INTRODUC TI ON
Accidental and intentional human-driven introductions of nonnative species are ubiquitous, yet the drivers of successful colonization are rarely known (Hayes & Barry, 2008;Lee, 2002). Invasive species often experience fitness advantages by having fewer predators and inhabiting generalist niches, which can offset inbreeding depression while also facilitating adaptation and plasticity to novel, variable environments (Colautti et al., 2017). Determining consistent predictors of colonization success in association with environmental and genetic factors is particularly challenging, albeit imperative for mitigating species invasions and for improving reintroduction strategies of endangered species (Lee, 2002;Louback-Franco et al., 2020;Sakai et al., 2001;Sauers & Sadd, 2019).
Although successful species introductions often occur into habitats with familiar environmental conditions (Hayes & Barry, 2008;Moyle & Marchetti, 2006), species may also colonize novel environments when adequate propagule pressure (i.e., number of introduction events and number of individuals introduced) ensures that sufficient genetic variation is available for survival and adaptation (Arismendi et al., 2014;Facon et al., 2006;Duncan, 2011). For example, genetic diversity should be increased through propagule pressure by introducing more individuals and/ or by carrying out multiple introductions (Hamilton et al., 2015;Via & Lande, 1985). Introduced species can also become locally adapted to both abiotic and biotic environmental factors (Carroll et al., 2001;Filchak et al., 2000), with a population-and contextspecific nature (Briscoe Runquist et al., 2020;Coulson et al., 2017;Schindler & Parker, 2002). Therefore, successful introductions are thought to be dependent on genetic factors associated with both sufficient propagule pressure and adaptive response to the introduced environment (Allendorf & Lundquist, 2003;Prentis et al., 2008). However, studies with the genomic resolution needed to clarify the relative influence of both processes are rare (Dlugosch & Parker, 2008;Narum et al., 2017;Yoshida et al., 2016).
Genomic approaches can improve understanding of how propagule pressure and environmental factors affect genetic diversity during introductions into novel environments (Frachon et al., 2019;Micheletti & Narum, 2018;Narum et al., 2017). Most evolutionary changes at the molecular level, and most of the variation within species, are neutrally evolving due to random genetic drift, gene flow, and bottlenecks (Narum et al., 2017). Conversely, when a genetic variant is putatively adaptive, molecular divergence at loci underlying adaptive traits is maintained by selection, commonly inferred by comparing neutral and adaptive genetic diversity among populations (Dennenmoser et al., 2017;Hamilton et al., 2015;Hecht et al., 2015). Emerging methods such as whole-genome pooled sequencing (pool-seq) can capture such genomic variation by sequencing pooled groups of individual DNA samples from the same populations together to characterize single nucleotide polymorphisms (SNPs) throughout the genome (Schlötterer et al., 2014). Pool-seq is particularly useful to quantify genome-wide neutral and adaptive variation of introduced species in a cost-effective manner (Davey et al., 2011;Kurland et al., 2019;Narum et al., 2013;Stanford, 2019).
Socio-economically important salmonid fishes are among the world's most invasive species and ideal for examining factors influencing colonization success (Krueger & May, 1991;Lecomte et al., 2013;Vigliano et al., 2007). Salmonid invasions resulted from transplants worldwide in over 97 countries for sport-fishing or through aquaculture escapees into the wild (Fausch, 2007). Hatchery stock management influences the level of genetic diversity in introduced salmonid populations, as stocks with high genetic variability are considered important for introduction success and for avoiding founder effects or bottlenecks (Bert et al., 2007;Kelly et al., 2006). Because environmental factors such as spawning area availability and habitat (stream/lake) size can regulate salmonid abundance (Krueger & May, 1991) and genetic diversity (Bernos & Fraser, 2016;Neville et al., 2009;Rieman & Allendorf, 2001), they may also create conditions for introduced salmonid genotypes under selection to confer fitness advantages and thereby potentially influence colonization success (Benjamin et al., 2007;Hecht et al., 2015;Kinnison et al., 2008).
Here, we examine population genomic structuring of nine populations of brook trout (Salvelinus fontinalis) established through extra-limital introductions using SNPs generated from a pool-seq approach, with the aim to quantify the neutral and adaptive genetic variation associated with successful introduction. To enhance sport-fishing opportunities, 100,000's of brook trout were stocked between 1941 and 1973 into mountain lakes in several national Parks in the Rocky Mountains of Canada ( Figure 1; Table 1), by hatcheries using a hatchery strain originating from the eastern USA (National Parks stocking records). To restore native aquatic F I G U R E 1 Map of nine sampled lakes for brook trout in their introduced range of Alberta (AB) and British Columbia (BC), Canada TA B L E 1 Summarized environmental information for the nine study lakes (2017)(2018), along with Parks Canada stocking information for brook trout   ecosystems, Parks Canada has initiated the manual removal of brook trout in several lakes (Pacas & Taylor, 2015). These lakes represent novel environments for brook trout relative to their eastern North American range: high elevations (1,185-2,400 m), covered by ice for 7-9 months of the year (native range 4-9), high pH (7.73-8.45), and variation in spawning site availability due to snowfall runoff among populations (Fassnacht et al., 2018;Power, 1980;Wood et al., 2014;Yates et al., 2019;  (iii) postcolonization bottleneck events would be associated with lower neutral genetic variation and lead to a lower proportion of deleterious mutations (Hedrick & Garcia-Dorado, 2016); and (iv) an increase in putatively adaptive loci would be associated with broad environmental differences among lakes; while signals of adaptive genetic variation would be positively correlated to lake volume and to the relative availability of spawning sites.

| Study site
Within Banff, Kootenay, and Yoho national parks, Canada, nine lakes were selected for their physical isolation from other lakes, small size, limited inlet/outlet expanse, and brook trout dominance ( Figure 1).
According to available records (Donald & Alger, 1984

| Sampling methods
Sampling of brook trout was conducted in August 2017, using a standardized, mixed-mesh gill net protocol until 5%-10% of fish in each lake were captured; captured fish were euthanized with an overdose of clove oil following CCAC and Parks Canada-approved procedures. Caudal fin tissue was collected from each fish and stored in 95% ethanol for DNA extraction, while sex was determined by abdominal dissection. Methodology associated with capture method (i.e., net mesh sizes/lengths, set durations) and census population size estimates obtained in concurrent research and used in analyses below are described in Yates et al. (2021).
To examine whether environmental variables were positively correlated with neutral genetic diversity and adaptive differentiation, 14 abiotic and biotic environmental variables were quantified in each lake between May 2017 and August 2018 (Table 1; Appendix S2). Variables were chosen for their relationship to regulating abundance in salmonids such as habitat size, spawning availability, resource competition, and survivability (i.e., Temperature, pH); winter measurements were not taken due to accessibility (Krueger & May, 1991;National Research Council, 2004

| DNA extraction, pooling, and sequencing
DNA extractions from fin tissue were conducted using Qiagen blood and tissue kits (Qiagen, Germany) and the manufacturer protocol.
To ensure equal quantities of DNA in the pooled samples, DNA quality and quantity were initially assessed by 1% Agarose gel electrophoresis using HindIII digested Lambda DNA run at 100V to assess possible DNA degradation. Multiple quality tests per individual were conducted on a Qubit Fluorometer 2.0 (Invitrogen, USA) selecting for quantity >20 ng/µl and confirmed in a NanoDrop spectrophotometer (Thermo Scientific, USA) as well as estimates of 260/230 and 260/280 ratios greater than >1.8 quality.
Individual DNA was then pooled by sex, and population (

| Neutral genetic diversity and differentiation
Estimates of nucleotide diversity within each pool were established using PoPoolation 2 (Kofler et al., 2011) with files provided by the PPalign module. We ran PoPoolation 2 with a minimum count of four for the minor allele and minimum coverage of 20 so as not to lose SNPs because a minimum must be met across all pools. We also used a maximum coverage of 100 to remove potentially paralogous regions (Li, 2011). Lastly, we used "sanger" fastq-type for Phred64, a window and step size of 250, and a pool size represented by 2× the individuals, which is suggested for diploid species. Pairwise estimates of genetic differentiation (using F ST ) between the 18 pools were determined with Poolfstat v 1.1.1 (Hivert et al., 2018) with a minimum read count of two, a minimum coverage per pool of 20, and a maximum coverage per pool of 100, while using the same minor allele frequency as the original file of 0.05 and removing indels, as commonly adopted (Kofler, Orozco-terWengel, et al., 2011;Micheletti & Narum, 2018).

| Relationships between neutral genetic diversity and propagule pressure or environmental variables
For regressions between nucleotide diversity and environmental or stocking variables, a correlation matrix for scaled (with the scale function in R) stocking and environmental variables was firstly run to remove potentially correlated variables at a cutoff of 0.7 with the psych R package v 1.9.12 (Revelle, 2019 additive and interaction terms were not calculated as we lacked sufficient power with 9 pools per sex (Table S1). Sexes were run separately to avoid pseudo-independence, as they had identical variable data but different dependent variables (nucleotide diversity). Model visualizations were conducted with ggplot2 v 3.2.1 (Wickham, 2016). All analyses were completed in R.

| Deleterious mutations within populations
Before identifying putatively deleterious mutations within the study populations, all variants were annotated using SnpEff v 4.3 t (Cingolani et al., 2012). At the beginning of annotation steps, a genome database was built based on both gff3 and fasta files obtained from NCBI (Agarwala et al., 2018). The deleterious variants were sorted by the following three categories, named for their putative impact: "high," a variant with a significant deleterious impact on the coding region (e.g., start codon lost, stop codon gained, frameshift variant); "moderate," a non-disruptive variant that may affect efficacy (e.g., missense variant, splice region variant); and "low," an in- We also used lake-specific, CMH chi-squared tests to examine using Benjamini and Hochberg correction. Identified genes were run through SnpEff to annotate the VCF file from the reference charr genome, and then through BLAST (Altschul et al., 1990) and QuickGO (Binns et al., 2009), to determine gene ontology and function by taking the first available annotated gene product for each BLAST definition. After annotation, SNPs on the scaffold of the vcf file were removed, leaving only biallelic SNPs aligned to linkage groups of the charr reference genome.
PCAdapt was run using the allele frequencies of the 18 pools with a Bonferroni p-value adjustment and false discovery rate (FDR) bounded at .05. Applying a Bayesian framework, PCAdapt determines population structure using K z-scores to fit SNPs to K principal components based on Cattell's rule (Cattell, 1966), where SNP-specific Mahalanobis distances are used to evaluate putatively adaptive loci from the normal z-score distribution, explained by the K factors. We chose PCAdapt because of its ability to run pool-seq data and its strength in examining a divergence model with hierarchical population structure, while maintaining statistical power and a controlled FDR (Luu et al., 2017). Furthermore, PCAdapt has been shown to have consistent strength for detecting putatively adaptive loci under weak, moderate, and strong selection (Lotterhos & Whitlock, 2015).
A redundancy analysis (RDA) was used to determine the driving habitat and environmental factors (n = 14; Table 1) of putative loci under selection. The RDA, a form of multivariate genotypeenvironment association, was conducted using the package vegan v 2.5-6 (Oksanen et al., 2019) on pooled allele frequency data from putatively adaptive loci attained from PCAdapt, as we were unable to run the RDA on a larger dataset due to computational restrictions. This RDA was also based on habitat and environmental predictors scaled with the scale function in R and filtered to a cutoff of 0.7 with psych package v 1.9.12 to avoid multicollinearity.
Significance was computed using F-statistics for each constrained axis (Legendre et al., 2010) to examine whether specific habitat and environmental predictors explained PCAdapt-based, putatively adaptive loci.

| DNA sequencing
For the 18 pools (i.e., 9 populations with 2 pools per population (one for each sex)), raw sequence counts totaled 10,775,432,330 reads, of which unique reads averaged 431,983,622 in each pool (72.2%) with quality scores of 36 ( Figure S1). All pools passed base quality scores, and per sequence GC content was normally distributed around a mean of 44%. Mean filtered depth of coverage among the 18 pools was 15.4X with a standard deviation of 1.7 ( Figure S2).
There were 68,836,296 total SNPs called, of which 10,721,236 were removed via quality and mapping parameters, 25,898,239 SNPs were removed due to global minor allele frequency restrictions, and 3,726,203 were removed as indels. Therefore, 28,490,618 SNPs were retained for downstream analyses, covering 70.9% (1,539,082,007 bp) of the reference genome ( Figure S3). The proportion of each chromosome covered across all libraries, apart from scaffolds, averaged 77.2% ( Figure S4). These SNPs were further filtered for analyses to 362,493 common SNP variants among the 18 pools.

| Neutral genetic diversity and differentiation
Levels of nucleotide diversity did not differ significantly among pools; in fact, the highest levels of nucleotide diversity were seen in lakes with very different propagule pressure (Table 1). Neutral genetic differentiation among populations was negligible: F ST was effectively zero, averaging −0.020 (−0.062 to −0.002; Table 2).

| Relationships between neutral genetic diversity and propagule pressure or environmental variables
Significant beta regressions were negative relationships between nucleotide diversity and lake volume in female populations, and the number of tributaries in male populations (Table S1). With Margaret and Cobb removed (as outliers), the negative trends of lake volume and the number of tributaries remained weakly negative, yet not significant, in females and males, respectively ( Figures S5 and   S6). Contrary to our hypotheses, populations with larger lake volume (habitat size) had lower nucleotide diversity, while the remaining tested stocking and environmental variables were not drivers of nucleotide diversity in these data (Figure 2). Log transforming the stocking variables associated with propagule pressure did not change the results.

| Deleterious mutations within populations
Overall, the results only partially supported our prediction that bottleneck events and a lesser proportion of deleterious mutations would lead to lower neutral genetic variation.
Putatively highly deleterious mutations (297) had lower allele frequencies than moderate (9,015) and low-impact (11,030) deleterious mutations, consistent with a role for purifying selection in these populations ( Figure 3). T-tests of the mean number of deleterious alleles of all levels (high, moderate, and low) across populations were similar between most populations; however, some populations were significantly different from others (Table S2).
Population-specific analyses with CMH tests estimated a total of 286 candidate loci that differentiated in allele frequency across all pairwise population comparisons and from 0 to 17 loci per individual pairwise comparison (n = 9, scaffold removed; Table 3). Of the 286 loci, 23 either had no exon associated, no results, or an uncharacterized locus. Only four of 286 loci appeared in more than one population; their functions and all population-specific comparisons are found in Table S3, while no loci with allele frequency differences were associated with putative local adaptation.

| Adaptive genetic differentiation
After filtering, PCAdapt identified 2,768 putatively adaptive loci Re-analysis of the RDA running VIF analysis <10 to avoid overfitting the model remained non-significant (adjusted R 2 = 1.8%, p = .11, with 999 permutations), when removing upstream and downstream water discharge, number of tributaries, surface area, depth, temperature variance, and number of discernable spawning sites.

| DISCUSS ION
Species introduction success to novel environments is thought to be directly linked to the product of propagule pressure and genomic variation as it facilitates adaptation (Dlugosch et al., 2015;Lee, 2002;Sakai et al., 2001), but empirical investigations using whole-genome resolution are rare (Dennenmoser et al., 2017;Yoshida et al., 2016).
Our pool-seq study on isolated brook trout populations introduced from a common source into alpine lakes on average ~19 generations ago suggests that wide variation (20-fold differences) in propagule pressure does not result in proportional standing genetic variation among introduced populations. Nor did we find that wide habitat variation in introduced environments leads to proportional variation in neutral or adaptive variation. First, we found little support for a role of abiotic and biotic environmental variables in affecting neutral genetic diversity among populations despite over 100-fold variability in these variables (Table 1). Second, when examining putative adaptive loci, there were very low levels of largely populationspecific adaptive differentiation that were seemingly independent from the environmental variables measured (Bolnick et al., 2018).
As founder effects are considered impediments to introduction success, adequate propagule pressure and robust source populations may be adequate to support colonization through increased TA B L E 2 Neutral genetic differentiation between introduced brook trout populations F ST , based on male pools ("M") and female ("F") pools (bottom); male-female pool comparisons within each lake are denoted with asterisks

F I G U R E 2
Regressions analyses of nucleotide diversity against tested noncollinear variables. Trends associated with lake volume in female-based (F) pools and number of tributaries in male-based pools (M) were statistically significant (trend lines) neutral genetic diversity (Allendorf & Lundquist, 2003;Ellstrand & Elam, 1993;Lavergne & Molofsky, 2007). Despite a range of propagule pressure (one introduction event and as little as 2,500 individuals to 22 such events and up to 55,500 individuals), the introduction effort was enough to result in successful colonization of all study populations and might also help to explain the similar levels of standing genetic variation observed among populations.
However, we acknowledge that our study only contains successful colonizations and that unavailable hatchery broodstock genetic diversity metrics may be responsible for any genetic bottlenecks during introduction.
Contrary to previous work across different species and taxa (Bernos & Fraser, 2016;Bert et al., 2007;Briscoe Runquist et al., 2020;Lachmuth et al., 2010;Narum et al., 2017;Schindler & Parker, 2002), we did not definitively find that neutral genetic diversity was positively associated with propagule pressure, habitat size (lake volume), or biotic factors such as prey availability. Furthermore, our study's populations were closed to gene flow effects and immigration from non-sampled introduced populations, or from the hatchery source. Instead, nucleotide diversity was negatively associated with lake volume in females, and the number of tributaries in males, while no relationships were detected for the other tested variables. The negative trends associated with lake volume were driven largely by Margaret, as it is both the largest lake and the population with the most nucleotide diversity in female populations. We suspect this negative relationship is driven primarily by weak propagule pressure compared to other lakes (Table 1); when Margaret was removed, the relationship was insignificant, but remained weakly negative ( Figure   S5). Similarly, the negative association between nucleotide diversity and the number of tributaries for male populations is driven by the lack of visible tributaries around Cobb Lake coupled with the most nucleotide diversity in male populations, and when Cobb was removed, the relationship was insignificant, but remained weakly negative ( Figure S6).
There was little indication that low proportions of deleterious mutations were associated with low levels of neutral genetic diversity among populations. Instead, most populations had similar levels of deleterious mutations, despite varying propagule pressures.
Highly deleterious alleles were significantly less common across populations than moderate or low deleterious mutations, suggesting that purifying selection may be present and purging deleterious mutations. Weak population structure may be driven by the relatively short duration for population differentiation since introduction and a potential lack of major founder effects, or the greater coverage afforded by the pool-seq methodology, as individual genotyping has a greater association with ascertainment bias and selection of highly polymorphic SNPs (Gaughran et al., 2018;Kurland et al., 2019;Malomane et al., 2018).
Plasticity-mediated population persistence is predicted to buffer against adaptive evolution in new environments (Morris et al., 2014), while the direction of plasticity is generally opposite to the direction of adaptive evolution (Ghalambor et al., 2015). These evolutionary processes may be influencing salmonid colonizations in alpine environments without requiring a process of adaptive differentiation.
Fourth, as the study lakes all occur in a similar geological area and alpine environments, the environmental contrasts between them might still be too similar for the environment to maintain adaptive divergence or, alternatively, other adaptive processes are at play (e.g., stabilizing selection). Finally, nuances in population histories from stocking events and subsequent establishment may generate population-specific idiosyncrasies in neutral and adaptive diversity, consistent with predictions for the consequences of phenotypic plasticity to novel environments.

| CON CLUS IONS
Our genome-wide analysis using pool-seq facilitated greater resolution for examining the roles of genetic and environmental factors in colonization of introduced species. Understanding the underlying factors that contribute to successful species colonization is crucial for applications in conservation, mitigating effects on endangered species, and population maintenance (Adams et al., 2000;Higgins & Zanden, 2010;Lodge, 1993). In our study, wide ranges in both environmental and propagule pressure did not lead to significant genetic variation among populations. Moreover, population differentiation and signals of local adaptation were not stronger in conditions expected to promote them. Our work suggests that propagule pressure and environmental predictors of neutral genetic diversity are not mutually exclusive and should be considered together, as, in this study, support through one aspect of these variables may have been responsible for colonization success despite events reducing genetic variation. Our work adds to a growing literature supporting proactive/preventative approaches to invasive species management rather than reactive approaches, as even weak propagule pressure of an effective colonizer can quickly lead to uncontrolled invasion (Kratzer et al., 2020;Leung et al., 2002;Rockwell-Postel et al., 2020). To better understand the neutral and adaptive differentiation of introduced species, we encourage future analyses to use wholegenome approaches across a greater range of sample sites that include populations with a large range of times since introduction to accumulate adaptive differentiation.

CO N FLI C T O F I NTE R E S T
We declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data are available in Dryad (https://doi.org/10.5061/dryad.np5hq bzvd).