Genomic dynamics of brown trout populations released to a novel environment

Abstract Population translocations occur for a variety of reasons, from displacement due to climate change to human‐induced transfers. Such actions have adverse effects on genetic variation and understanding their microevolutionary consequences requires monitoring. Here, we return to an experimental release of brown trout (Salmo trutta) in order to monitor the genomic effects of population translocations. In 1979, fish from each of two genetically (F ST = 0.16) and ecologically separate populations were simultaneously released, at one point in time, to a lake system previously void of brown trout. Here, whole‐genome sequencing of pooled DNA (Pool‐seq) is used to characterize diversity within and divergence between the introduced populations and fish inhabiting two lakes downstream of the release sites, sampled 30 years later (c. 5 generations). Present results suggest that while extensive hybridization has occurred, the two introduced populations are unequally represented in the lakes downstream of the release sites. One population, which is ecologically resident in its original habitat, mainly contributes to the lake closest to the release site. The other population, migratory in its natal habitat, is genetically more represented in the lake further downstream. Genomic regions putatively under directional selection in the new habitat are identified, where allele frequencies in both established populations are more similar to the introduced population stemming from a resident population than the migratory one. Results suggest that the microevolutionary consequences of population translocations, for example, hybridization and adaptation, can be rapid and that Pool‐seq can be used as an initial tool to monitor genome‐wide effects.


Genomic dynamics of brown trout populations released to a novel environment
Sara Kurland, Nima Rafati, Nils Ryman, Linda Laikre Appendix S1 Detailed material and methods Page 2 Table S1 Mapping statistics Page 6 Table S2 Number of variants and windows used for diversity metrics Page 6 Table S3 Pooled heterozygosity score (HP) for each population pool Page 6 Table S4 Statistical tests for amount HP between pairwise pools Page 7 Table S5 Statistical tests for π between pairwise pools Page 7 Table S6 Statistical tests for Tajima's D between pairwise pools Page 7

Table S7
Distributions of SNP counts in the different delta allele frequency bins for different functional impacts.

Table S8
Functional categories of SNPs fixed between introduced populations Page 10 Table S9 Read depth for non-synonymous SNPs identified as candidates for adaptive divergence between introduced populations A and B, and candidates for novel selection in the new lake system Page 10

Table S10
Genes of adaptive importance for introduced populations Page 11-12 Table S11 Genes putatively shaped by relaxed selection in the new environments on chromosome 28 Page 13

Table S12
Genes putatively under directional selection in the new environments on chromosome 7 Page 14-15

Table S13
Glossary for some of the terminology used Page 17

Figure S1
Genome wide heterozygosity per pool (HP) in 5 kb windows. Page 18 Figure S2 Genome-wide nucleotide diversity (π) and Tajima's D (TD) Page 19 Figure S3 Boxplots of difference in allele frequency (ΔAF) between each of the introduced populations A and B and established populations LB and HV for putatively selective regions Page 20

Figure S4
Potentially adaptive differences between the two introduced populations A and B

Page 21
Appendix S1: Detailed material and methods

Study system
The introduced populations A and B exhibited significant allele frequency differences at multiple allozyme loci, particularly at AGP-2, where the population from Lake Kallsjön was close to fixation for one of the alleles at this locus, while in Lakes Fälpfjälltjärnarna two alleles occurred at about equal frequencies (Palm & Ryman 1999). Spawners from each population were chosen to be homozygous for different alleles at this locus, thereby enabling discrepancy of lineages after release to the same environment (Palm & Ryman 1999). Brown trout established rapidly in the lakes Bävervattnen area after the release and have been sampled continuously with tissues stored in -80 o C freezers. The first documented reproduction occurred only a few years after the release (Palm & Ryman 1999) and ten years later both lakes downstream of the release site harbored strong populations from which sampling was easy. During the period 2003-2007 several lakes below the release site were sampled and brown trout establishment confirmed in all of those lakes ( Figure 1; unpublished data).

Table.
Sampling information for the material used in this study (cf. Figure 1).

DNA extraction
Genomic DNA was extracted and eluted in 100 µl elution buffer. DNA fragmentation was assessed on 2% agarose gels stained with GelRed and absorbance at 260/280. High molecular weight DNA from each individual was quantified using fluorometry (Qubit; Thermo Scientific) and combined at equal concentrations for each population to achieve 3 μg pooled genomic DNA in a volume within the range of 65-120 μl.

Library construction and sequencing
PCR-free paired-end libraries had an average insert size of 350 bp, sequencing used read length 150 bp and was performed by NGI across 3 (introduced populations A and B) or 4 (established populations LB and HV) lanes per pool.

Mapping and variant calling
The quality of sequenced reads from each pool were assessed using FastQC v.0.11.5 (Leggett et al. 2013) and results from different pools jointly evaluated using MultiQC v. , only keeping sites found in both files. These were then controlled for mapping quality, number of high-quality bases, read positional bias, base quality bias, and mapping quality versus strand bias, before filtering for mapping quality 100. The final set (7.4x10 6 variants) were annotated in SnpEff v.5.0 (Cingolani et al. 2012).

Patterns of genomic variation and divergence
Nucleotide diversity (π; Tajima, 1983) and Tajima's D (TD; Tajima, 1989) were estimated using the 'variance-sliding.pl' script of PoPoolation v.2.2 (Kofler et al. 2011a) in 5 kilo base pair (kb) non-overlapping windows, with a minor allele count of 2 for a SNP to be called and applying the same depth thresholds as for the subsampling (20-150X). Windows were retained for subsequent analysis if at least 80% of the window was covered with data within these depth thresholds.
We primarily used the default FST (Nei 1973) but also computed the alternative option in PoPoolation2 which is the approach of Karlsson et al. (2007) since these two approaches have different merits (Saha et al. 2021, their Supporting Appendix S4). A minor allele count of 3 was applied for SNP calling. Windows were only retained if the fraction of the number of sites within a window covered with data exceeded 80%. Genetic relationships among populations were examined by creating a dendrogram in TreeMix (Pickrell & Pritchard 2012), which constructs a maximum likelihood phylogeny based on the genomic data and compares the covariance structure calculated for the estimated dendrogram to the observed covariance between populations. TreeMix was run with default settings and the results visualized in MEGA X (Kumar et al. 2018).

Genomic distribution of SNPs
SnpEff v.5.0 (Cingolani et al. 2012) was used to annotate the genomic distribution of variants and to classify them into functional elements (non-synonymous, synonymous, untranslated region (UTR), 5 kb upstream, 5 kb downstream, intragenic, and intergenic). Functional elements from less confidant annotations, e.g. missing start and stop codons from the transcript (Barrio et al. 2016), were omitted and some functional elements were combined in order to avoid too small classes at high ΔAF (e.g. 3' UTR and 5' UTR combined to functional class "UTR"). For each of these functional categories, the allele frequency differences between introduced populations were sorted into bins (10 equally large bins of ΔAF=0-0.1, ΔAF=0.1-0.2 etc.). Log2 fold change was retrieved by comparing the observed and expected number of SNPs per category and bin. The expected number of SNPs in each category and bin was estimated as p(category) x n(bin), where p(category) equals the proportion of a given SNP category in the full genome and n(bin) the number of SNPs in an allele frequency bin. Statistical significances of deviations of observed from expected SNP counts were tested with standard chi-square tests of independence.

Screening for indications of selection in the new environment
We screened for signatures of selection in the new environments by considering local reductions of heterozygosity score within each pool (HP) relative to its pool-based genome-wide average (cf. Rubin et al. 2010;Kardos et al. 2015;Kjaerner-Semb et al. 2016;. Normalized HP (ZHP) was used in this approach ZHP = (HP -µHP)/ ϭHP) as calculated per population pool (following Rubin et al. 2010).
In order to identify putatively selected genes in the new environment, we selected windows (5 kb) of particularly low ZHP out of the windows with ZHp <0 in descendant populations (below 5 th percentile of ZHP in LB and HV, respectively). Precedence was given to the windows with lowest levels of ZHP in both descendant populations and regions with multiple adjacent windows. Patterns of population divergence (FST) surrounding these windows was then studied to corroborate assumptions of selection. The same approach was given to windows of markedly low diversity within introduced populations. The functional annotation of SNPs within windows of low HP in introduced populations and established populations respectively were extracted from the VCF annotated in SnpEff v.5.0 (Cingolani et al. 2012). Table S1. BAM file statistics from QualiMap for Pool-seq data from 4 populations pools mapped to the S. trutta assembly (size 2.4 Gb), using bwa mem and default settings. Results are presented for properly paired-end reads with minimum base quality 20. Average coverage and its standard deviation (SD), and mean mapping quality is also included. Down-stream analyses were additionally limited to reads with read depth 20-150X.  Table S2. Summary of number of variants and windows (5 kb in size) supporting each diversity metric. The mpileup file used 1.8x10 9 variant loci of which only biallelic SNPs, mapped to chromosomes (no orphans), and within coverage 20-150X were used for estimating population genomic parameters. Minimum coverage of 80% (min. cov. 0.8) implies whether windows were retained for subsequent analysis if at least 80% of the window was covered with data within these depth thresholds (marked "Yes"), otherwise (marked "-") all windows were used. Allele frequency was only estimated per variant site and not within windows.         Table S10. Genes putatively of adaptive importance for introduced populations found by considering windows (5 kb) of high differentiation between the two introduced populations simultaneously showing low diversity within each introduced population (FST above 95 th percentile and diversity within each introduced population below 5 th percentile of π). Gene model predictions are given for nonsynonymous SNPs found within these windows that exhibit significant allele frequency difference between introduced populations, tested using Fisher's exact test in Popoolation2 (significance threshold p<2.5x10 -11 , corresponding to α=0.05 corrected for multiple testing across a genome size of 2 Gb). SNP coordinate and the frequency of the major allele in each of the four populations is given. Predicted gene models (trout gene) are described as found in the brown trout genome. Gene model predictions (gene) and the general description of the biological function (function) found in the nearest related species (species) is also given for each gene model.  Table S11. Genes putatively shaped by relaxed selection in the new environments on chromosome 28. Gene model predictions are given for non-synonymous SNPs found within c. 1 Mb region flanking a window (5 kb) with reduced heterozygosity (HP) in introduced populations compared to established populations (ZHP>0 in both descendant populations and ZHP<-4 in both introduced populations). The coordinate is given for each SNP along with the frequency of the major allele in each of the four populations. Predicted gene models (gene) are described as found in brown trout, three of which overlap (LOC115165612, LOC115165613, and LOC115165615). Gene name (gene) and the general description of the biological function (function) found in the nearest related species (species) is also given for each gene model.  Table S12. Genes putatively under directional selection in the new environments found on chromosome 7. Gene model predictions are given for non-synonymous SNPs found within regions of reduced variation on chromosome 7 in established populations compared to introduced populations (ZHP<-2 in both established populations and ZHP>0 in both introduced populations; see Appendix 2 for details). The coordinate is given for each SNP along with the frequency of the major allele in each of the four populations. Predicted gene models (gene) are described as found in brown trout. Gene name (gene) and the general description of the biological function (function) found in the nearest related species (species) is also given for each gene model.  Table S13. Glossary for some of the terminology used.

Concept Description AF
Allele frequency within a population pool of the globally major allele per variant site (SNP), i.e., the allele that is most common across all four population pools analyzed in the present study.

ΔAF
Pairwise difference in allele frequency of major allele between population pools  . Distribution of genome wide nucleotide diversity (π) and Tajima's D (TD) within population pools. Tables include the population mean for each measure, their 95% confidence intervals (CI), and range. π and TD were estimated across 331,179 windows corresponding to 5,740,787 variants. Window size = 5,000 bp (5 kb) and fraction depth covered ≥ 0.8 (i.e. a window was only retained if at least 80% of its SNPs had a read depth between 20X and 150X).   (5 kb) showing low diversity within each of introduced population A and B (diversity within each introduced population below 5 th percentile of π; blue and yellow circles for introduced population A and B, respectively) and of marked divergence between introduced populations (FST above 95 th percentile, red circle), and their overlap. Candidates for adaptive divergence are defined as windows exhibiting marked differentiation between introduced populations with simultaneously low diversity within each introduced population; 403 such windows are found and genes predicted for non-synonymous SNPs within these windows are listed (Table  S8). (B) Diversity and divergence surrounding candidate for adaptive divergence; a c. 2 M base pair (Mb) region on chromosome 2 containing a swarm of fixed SNPs FST between introduced populations A and B. FST between released introduced population A and B and established populations LB and HV, and π and TD for each population is included. Arrows refer to gene models predicted for non-synonymous SNPs within these regions (Table S8). Windows where differentiation between the released introduced populations is above the 95 th percentile of FST and diversity within each introduced population below the 5 th percentile of π are marked in black. FST and π were estimated within 5 kb windows.