Northern European Salmo trutta (L.) populations are genetically divergent across geographical regions and environmental gradients

Abstract The salmonid fish Brown trout is iconic as a model for the application of conservation genetics to understand and manage local interspecific variation. However, there is still scant information about relationships between local and large‐scale population structure, and to what extent geographical and environmental variables are associated with barriers to gene flow. We used information from 3,782 mapped SNPs developed for the present study and conducted outlier tests and gene–environment association (GEA) analyses in order to examine drivers of population structure. Analyses comprised >2,600 fish from 72 riverine populations spanning a central part of the species' distribution in northern Europe. We report hitherto unidentified genetic breaks in population structure, indicating strong barriers to gene flow. GEA loci were widely spread across genomic regions and showed correlations with climatic, abiotic and geographical parameters. In some cases, individual loci showed consistent GEA across the geographical regions Britain, Europe and Scandinavia. In other cases, correlations were observed only within a sub‐set of regions, suggesting that locus‐specific variation was associated with local processes. A paired‐population sampling design allowed us to evaluate sampling effects on detection of outlier loci and GEA. Two widely applied methods for outlier detection (pcadapt and bayescan) showed low overlap in loci identified as statistical outliers across sub‐sets of data. Two GEA analytical approaches (LFMM and RDA) showed good correspondence concerning loci associated with specific variables, but LFMM identified five times more statistically significant associations than RDA. Our results emphasize the importance of carefully considering the statistical methods applied for the hypotheses being tested in outlier analysis. Sampling design may have lower impact on results if the objective is to identify GEA loci and their population distribution. Our study provides new insights into trout populations, and results have direct management implications in serving as a tool for identification of conservation units.


| INTRODUC TI ON
Improving our understanding of the genetic basis of local adaptation is a main aim in evolutionary biology and is also of significance in applied research because of its relevance to the conservation of genetic resources, management of exploited populations and for predicting impacts of climate change (Allendorf, Hohenlohe, & Luikart, 2010;Lehnert et al., 2019). Traits that confer local adaptation are typically polygenic quantitative traits, and identification of the loci that determine variation in such traits is usually challenging (Savolainen, Lascoux, & Merilä, 2015). Testing hypotheses of local selective sweeps and their association with environmental drivers by means of scanning genomic profiles across diverged populations provides novel insights but has also been criticized, as a range of factors may obscure or lead to false-positive inference about adaptive processes and the ecological mechanisms that structure populations (Ahrens et al., 2018). Demographic processes may for instance obscure inference about selection and the role of the environment in driving spatial patterns of adaptation (de Villemereuil, Frichot, Bazin, Francois, & Gaggiotti, 2014). There is a general call for evaluation of statistical methods (Vatsiou, Bazin, & Gaggiotti, 2016), particularly for populations connected by gene flow (Bradburd, Coop, & Ralph, 2018;Luu, Bazin, & Blum, 2017). It has been suggested that in analyses of gene-environment associations (GEA), sampling multiple populations exposed to similar environmental conditions is a means to increase detection power of true positives, especially for associations with weakly selected loci (Lotterhos & Whitlock, 2015). However, studies applying such sampling design are still rare (Roschanski et al., 2016) and tend to be restricted to local geographical scales (Ahrens et al., 2018).
Brown trout, Salmo trutta (in its anadromous form known as sea trout), is an ecologically and socioeconomically important salmonid fish species that allows for testing sampling effects on detection of local selective sweeps. Owing to its extreme ecological adaptability, it shows a widespread distribution throughout freshwater systems in most north-east Atlantic and western Asian regions (Klemetsen et al., 2003). The species is considered as an indicator of habitat quality in its native range (Imhof, Fitzgibbon, & Annable, 1994), and there are concerns about its conservation status under a range of anthropogenic stressors and climate change (Ayllón et al., 2016). Since the first paper by Allendorf, Ryman, Stennek, and Stahl (1976), four decades of genetic marker-based studies have identified genetically differentiated local populations of brown trout, sometimes even at small (<1 km) spatial scales (Ferguson, 1989; see Andersson et al., 2017 for a recent example). Although sea trout is renowned for natal homing, straying between rivers maintains gene flow and reduces the impact of genetic drift on population demographics (Hansen, 2002;Hansen, Fraser, Meier, & Mensberg, 2009). A suite of studies has suggested that local adaptation plays a role in population structuring and dynamics (see Meier, Hansen, Bekkevold, Skaala, & Mensberg, 2011 and references herein). As a poikilotherm, temperature directly affects the rate of biological processes and trout is expected to display evolutionary adaptations to reach homoeostasis.
Coupled with an anadromous life cycle, which for the iteroparous brown trout may entail several repeated movements between fresh, brackish and marine waters, trout is required to both adapt to local conditions while still retaining the capability for coping with strongly varying environments. Altogether, these characteristics render the species an optimal model for testing ecological and evolutionary parameters, including local adaptation and its association with specific environments (Jensen et al., 2008).
In spite of brown trout being among the best studied fish species (Klemetsen et al., 2003), genomic resources have until recently been scarce compared to other salmonids, such as Atlantic salmon (S. salar L.) and rainbow trout (Oncorhynchus mykiss W.), which has hampered the study of GEA. Genomic resources for brown trout are developing rapidly. Nonetheless, genome-wide SNP analyses have hitherto not been applied to examine broadscale population genetic relationships and associations between genomic variation and evolutionary drivers across spatial scales. Thus, to date there has been no assessment of genome-wide population structure beyond geographically restricted populations, limiting our understanding of the processes determining large-scale population connectivity. Evidence for local adaptation is commonly based on comparisons of populations at local scales Lemopoulos, Uusi-Heikkilä, Huusko, Vasemägi, & Vainikki, 2018;Meier et al., 2011), while assessments rarely concomitantly address small-and large-scale patterns. Lack of knowledge about ecological drivers of population processes is especially problematic given that many trout populations are considered under threat due to disturbances acting on both large scale, for example climate change (Jacquin et al., 2017;Pujolar, Vincenzi, Zane, & Crivelli, 2016;Vera, Martinez, & Bouza, 2018), and local scale. For example, anthropogenic habitat destruction, creation of impassable dams preventing gene flow (Hansen, Limborg, Ferchaud, & Pujolar, 2014) and genetic introgression from widespread stocking with nonnative strains (Gil, Labonne, & Caudron, 2016;Hansen et al., 2009) is expected to affect a wide number of populations.
In order to investigate population differentiation and its potential environmental drivers in brown trout using genome-wide analysis, we here developed and applied a SNP array encompassing ca. 3.8K mapped SNPs. In GEA analyses, populations showing hierarchically structured levels of gene flow (as in brown trout) can lead to obscured or false-positive inference about adaptive processes and the ecological drivers of diversification (Ahrens et al., 2018;Bradburd et al., 2018;Forester, Lasky, Wagner, & Urban, 2018;Luu et al., 2017). Following Lotterhos and Whitlock (2015), we therefore applied a paired-population sampling approach in order to analyse 72 S. trutta populations, spanning a central part of K E Y W O R D S brown trout, genotype-environment association, local adaptation, outlier test, salmonid the distribution of anadromous populations in Europe in order to first, describe regional scale population structure, and second, determine whether population divergence was associated with GEA and selective sweeps. Concomitantly, our study was designed to evaluate effects of sampling design and analytical approach on detection of outlier loci and GEA in a hierarchical population scenario.
The aim was to contribute new insights on environmental selection pressures in general and in anadromous fish species in particular, and at the same time add to the knowledge of population structure of S. trutta, a key indicator species for the health and conservation of rivers and streams.

| Population samples
A SNP development panel was built using genomic DNA from 2 to 3 fish from each of seven ascertainment populations (Table 1).
Populations were spread out geographically so as to span the region from the British Isles in west to the Baltic Sea in east (>1,500 km), and from Norway in north to the Wadden Sea in the southern North Sea (>1,000 km). All ascertainment samples represented the species' Atlantic clade (Bernatchez, 2001) and covered the geographical area represented by population collections in the present study.
To describe population genomic patterns, trout were collected from 74 spawning locations in 72 rivers (on average 36 fish collected per river; Table 1) draining into the North Sea, Skagerrak, Kattegat and the Western Baltic Sea (Figure 1). In analyses of GEA, sampling multiple populations exposed to similar environmental conditions represents a means to increase detection power of true positives, especially for associations including weakly selected loci (Lotterhos & Whitlock, 2015). We therefore aimed to sample a minimum of two rivers from each geographical sub-area. Sub-areas were here defined as geographically proximate river systems expected to share environmental drivers. When individual rivers show some degree of demographic isolation, as is the assumption in a natal-homing anadromous fish, the paired-sampling approach thus represents a way to cross-validate identified GEA loci. Sampling multiple rivers per subarea was not possible in all cases. Data therefore included four collections each representing only a single sub-region. Eight locations were sampled twice, 5-14 years apart, to examine temporal stability.
Collections consisted of adipose fin clips from electrofished, anaesthetized adults caught on spawning sites, or of 0-1 year juveniles, depending on availability (Table 1; Table S1). The River Ätran was represented by two collections, one from the main stem and one from its tributary Högvadsån. For the River Weser, collections of 5-9 fish from each of three neighbouring tributaries were combined. Sampling was aimed at natural populations with gene pools presumed to be relatively weakly affected by human-mediated introgression from farmed or foreign populations. However, several of the Danish populations were previously stocked with two closely related hatchery strains, leading to admixture and introgression (Hansen et al., 2009). Samples from the stocked hatchery strains were therefore also included in analyses in order to identify potential impact of introgression (Table 1). To compare regional structuring with patterns at broader geographical scales, collections also included three geographically remote populations: the Estonian Vainupea River (draining into the Gulf of Finland in the eastern Baltic Sea), River Tamar in Cornwall, UK (draining into the western English Channel), and from les Usses River, draining into the River Rhone in the Haute-Savoie in southern France. The latter representing the species' Mediterranean clade (Bernatchez, 2001). DNA from all samples was extracted from adipose fin clips using a commercial kit (E.Z.N.A.™ kit; Omega BioTek).

| SNP array development, genotyping and linkage analysis
A custom Illumina iSelect SNP array (with 6,000 SNPs) was developed by aligning reads in 16 male and female ascertainment samples against a draft genome assembly. Nextera sequencing libraries were prepared from genomic DNA and sequenced using a Illumina HiSeq 2000 to generate paired-end reads (2 × 100 bp); between 5 and 13 Gb of sequence was generated for each individual (average 6.9 Gb, see Table S2). After filtering to remove adapter sequence, low-quality sequence and any reads less than 60 bp, SOAPdenovo (Li et al., 2010) was used to generate a de novo assembly. Ignoring contigs <150 bp, the resulting fragmented assembly included 1,131,000 contigs (N50 = 2,281) and contained 1.4 Gb of sequence. The assembly was repeat masked using RepeatMasker software (http:// www.repea tmask er.org/) and a locally developed salmonid-specific repeat library developed with the RepeatModeler software. Reads from the 16 individuals were aligned to the reference using BWA (Li & Durbin, 2009); putative SNPs within each individual were identified using SAMtools . Selection criteria required that heterozygous SNPs should be supported by minimum of two reads in two individuals for the minor allele, across all samples at least one individual should be homozygous with a minimum of four reads, and only bi-allelic SNPs separated by >60bp to the nearest indel or SNP were considered. A list of 47,380 putative SNPs was initially reduced by considering factors such as read depth, proximity to repeats and Illumina's Assay Design Score. Subsequent filtering prioritized SNPs based on whether they mapped to full-length cDNA sequences/contigs from Atlantic salmon (Lien et al., 2011) were distributed across larger (>7,750 bp) brown trout contigs or fell within specific genes of interest.
To order SNPs by linkage group (LG) and generate a linkage map, we used the 6K SNP array to genotype both the 2,606 population samples and family material consisting of 320 individuals applying the manufacturers' recommended protocol and by calling genotypes using GenomeStudio V2 (Illumina). Family material came from a study by Jensen et al. (2008) and consisted of 10 LGs which correspond to the expected karyotype of brown trout (Martínez et al., 1991). After their initial grouping, SNPs were ordered within LGs using the BUILD and FLIPSN options in CRIMAP. The CHROMPIC option in CRIMAP was then used to phase genotypes within LGs, and a custom-made script was used to correct or remove erroneous genotypes based on unlikely tight double recombination events. Finally, multipoint linkage maps for Country Region (Sea) Rivers (number in Figure 1) TA B L E 1 Collections of Salmo trutta indicating country, geographical region and river, where number refers to map location in Figure 1 the 40 LGs were constructed using the FIXED option of CRIMAP.
SNPs that could be mapped to LG and that showed Mendelian segregation of genotypes in pedigree material were selected (excluding SNPs statistically deviating from expected proportions in chi-square tests across >10% individuals). This resulted in the retention of information for 3,782 SNPs.

| Population analyses
Global and per SNP observed and expected heterozygosity (H O and H E ) were determined for each of the 84 collections (samples from 74 rivers, two hatchery strains and eight temporal replicates) using the R package adegenet (Jombart, 2008). Samples comprising siblings may bias allele frequency estimates (Hansen & Jensen, 2005). To avoid this, we used the maximum-likelihood-based method described in Wang (2013) to analyse genetic relationships in samples exhibiting statistically significant F IS estimates. Individuals exhibiting relatedness levels corresponding with half-and full-sib relationships were recorded, and data were subsequently trimmed to exclude any such related individuals. This meant that 19 of 40, mostly juvenile, fish from the River Ure were excluded from further analysis.
Using adegenet, locus-specific F ST values were calculated across all samples and deviation from HWE within samples was tested using chi-square testing. Linkage disequilibrium was estimated using the squared correlation between alleles at two loci, both per sample and averaging across samples, using the R package genetics (Warnes,   . p-values of HWE tests were adjusted for multiple comparisons using false discovery rate (FDR; Benjamini & Hochberg, 1995).
All loci were retained in initial analyses, but in GEA and outlier analyses only one SNP was retained when pairs of SNPs showed very high LD (r 2 > .8) across samples.
Estimates of population structure and detection of outlier loci may be downwardly biased if markers are selected using specific criteria about the numbers of times they occur in the SNP development sample (Rosenblum & Novembre, 2007). Here, ascertainment bias (AB) was potentially non-negligible due to the relatively strict criteria applied when selecting candidates for the SNP array, particularly the criterion that at least two individuals from the ascertainment panel had to have the alternate allele in at least two reads.
Potential effects of AB were assessed by inspection of minor allele frequencies per population sample, where the expected nonbiased distribution describes a beta distribution with high proportions of low-frequency markers rapidly decreasing towards low proportions of high-frequency markers.

| Genome scan
A genome scan approach was used to distinguish genome-wide processes expected to mainly reflect demographic histories, from processes at individual loci potentially reflecting local processes, particularly selection. Genome scans may suffer from inflated numbers of false positives under hierarchical spatial structure coupled with isolation-by-distance dynamics (Excoffier, Hofer, & Foll, 2009). We therefore used a principal component analysis with a Mahalanobis distance-based approach to identify outlier loci (Luu et al., 2017).
This method, implemented in the R package pcadapt, allows for an examination of how different levels of population clustering affect outlier detection. It is reported to yield increased power compared to Bayesian models, especially when there is hierarchical population structure under divergence and range expansion scenarios (Luu et al., 2017). Initially, 50 principal components (PC) were used to assess the best supported genetic clustering among sampled individuals, where the optimal number (or range) of PC was determined using Cattell's graphical rule, following Luu et al. (2017). Outliers were then detected applying different levels of sample clustering and FDR to control error. Throughout, loci with global MAF <0.05 were excluded.
To examine effects of sampling specific populations on outlier detection, analyses were finally repeated for sub-sets of data: (a) including one half of the geographically paired collections selected at random (N = 34), (b) including the other half of paired samples (N = 34) and (c) combining these two sets of data (N = 68; Table 1). In all three latter cases, information was excluded for temporal replicates and for the four sub-regions for which only a single collection was available, including extant populations Tamar, Vainupea and Haute-Savoie (Table 1). pcadapt results were compared with results generated with the often used Bayesian bayescan method (Foll & Gaggiotti, 2008).
This method is expected to exhibit low false-positive rates, particularly when many population samples can be included (Narum & Hess, 2011), which was the case here. Settings followed recommendations in Foll and Gaggiotti (2008), excluding information for 66 SNPs exhibiting global MAF below 0.05. Following Foll and Gaggiotti (2008), we used Jeffrey's scale of evidence and defined potentially selected loci as markers having log 10 (PO) above 1, but excluded loci with Qvalues above 0.01, to minimize FDR bias.

| Genotype-environment association
Signatures indicative of local adaptation to environmental variables were investigated using the univariate GEA method implemented in LFMM v. 1.5 (Frichot, Schoville, Bouchard, & François, 2013). The approach uses latent factor mixed models that take into account neutral population structure when testing associations between gene variation and candidate environmental variables. Environmental variables are entered into the model as fixed effects while population structure is modelled using latent factors. We used population clustering results from DAPC and pcadapt to guide the choice of k latent factors, varying k 9-14, and running 10 replicates per factor level, with Gibbs sampling algorithm with 20K iterations, discarding the first 5K iterations as burn-in. In the LFMM model, a matrix term models the part of genetic variation that cannot be explained by the environmental variables. Selective responses in anadromous brown trout populations are expected to be mainly governed by regional, rather than by small-scale environmental drivers (Hansen, 2002).
Accordingly, we examined variables expected to reflect effects of different types of drivers on genetic variation. Firstly, to control for IBD dynamics and postglacial founder events, we entered sample latitude and longitude, and whether origin was either the British Isles, European continent or Scandinavian Peninsula (as a categorical variable). Secondly, potential relationships with abiotic conditions on spawning sites were examined by entering data on geochemistry (using soil type and soil pH), annual average water temperature, minimum winter temperature and maximum summer temperature, annual temperature range, average annual precipitation and precipitation in driest month at collection site. Climate data were obtained from https ://esdac.jrc.ec.europa.eu/. Data on soil type (entered as categorical variable), and pH values were obtained from http:// euros oils.jrc.ec.europa.eu/. Finally, to test for relationships between genetic variation and abiotic environment first encountered during smolt sea-migrating stage, information was included on average ambient water temperature and salinity 1 km from river confluence with the sea (from http://marine.coper nicus.eu/). Thirdly, we entered altitude at collection site as a proxy for both upstream spawning run and smolt downstream migration barrier. Fourthly, to examine effects of genetic introgression from stocked hatchery strains, we entered categorical information about whether populations had previously been stocked with either of the two hatchery strains (pertains to Danish populations). Associations among variables were tested using the function PCAMIX in the R package PCAmixdata (Chavent, Kuentz-Simonet, Labenne, & Saracco, 2014) allowing for PCA of mixed qualitative and quantitative data. Based on eigenvalues for 24 dimensions and using Cattell's graphical rule, the first five principal components from that analysis were used to reduce variables tested in the model.
Recent simulation analyses suggest that multivariate GEA methods, such as redundancy analysis (RDA), under some conditions may perform better than PCA-based methods in identifying environmental predictors of genotype variation and in selective outliers (Capblancq, Luu, Blum, & Bazin, 2018;Forester et al., 2018). RDA is a multivariate ordination approach that combines PCs from allele frequency and multivariate environmental distance matrices to produce canonical axes predicting relationships between environments and particular loci. We therefore compared results from LFMM with an RDA analysis using the vegan package (Oksanen et al., 2015) following the procedure detailed in Forester et al. (2018). Missing data were imputed using the most common genotype observed within samples. As for the LFMM analysis, the five environmental PCs were tested against SNP data in 68 sample collections. The function vif.
cca was used to ascertain lack of multi-collinearity among variables, as expected from the use of composite environmental variables, which alleviated the need for variable reduction. There was no a priori control for population structure. Following Forester et al. (2018), we classified SNPs as showing statistically significant association with individual environmental parameters when they loaded with more than three standard deviations from the mean. We estimated correlations between these SNPs and their most strongly associated environmental variable following Forester et al. (2018).

| SNP array performance
All SNP clusters were visually inspected using Genome Studio and all markers subjectively classified according to sample data point clustering precision and accuracy. A total of around 4,000 markers (72%) displayed tight grouping into three well separated genotype clusters, 1,106 (20%) appeared to be monomorphic in the examined samples, and the remainder displayed clustering patterns making SNP genotype calling unreliable.
The 40 brown trout LGs with 3,894 SNPs represent, on average, 95 SNPs per LG (range 13-173) and adds up to a male map of 1,316 cM and a female map of 2,494 cM (Table S2 for details on linkage map, assembly and annotation). Relative to the expected beta distribution, MAFs were skewed towards high values ( Figure   S1). Average MAF was 0.28, and 1.5% of loci had global MAF <0.05 (Table S3). In total, 153 pairs of loci showed evidence of linkage disequilibrium with genotype associations at R 2 > .8.

| Genetic relationships within and among populations
The 3 (Table S4).
Relatively low differentiation was observed between les Usses and the Danish hatchery strain #1, corresponding with the fact that this strain has been exported to France and is represented in brood stock used for stocking in the region. Hence, this strain has now introgressed into several French Mediterranean populations (Gil et al., 2016).
Average Ɵ between temporal replicates within location was 0.011, ranging from 0.001 to 0.052 (Table S4). None exhibited statistically significant differentiation, except for the River Skals, where one replicate sample consisted of only 10 individuals, which may have biased allele frequency estimates. Forty-three comparisons between spatial collections did not exhibit statistically significant differentiation (Table S4). In all cases, they represented collections from either neighbouring rivers or rivers from the same geographical area separated by less than 70 km. These included ( Table 1). Pairwise differentiation between collections was positively associated with geographical distance (Monte Carlo randomization test; 999 randomizations: observed = 0.7157, p < .001, observed standard deviation = 12.868, expectation = −0.0002, variance = 0.0031). However, density analysis also suggested that IBD dynamics differed across geographical regions, as evidenced by the presence of at least two distinct clusters in the data (Figure 2). One of the clusters corresponded with comparisons between British and European/Scandinavian samples, as shown by the fact that one of two main clusters disappeared when British samples were excluded from analyses ( Figure S2). When British samples were tested alone, two distinct clusters were suggested ( Figure S3). Although sample size was somewhat limited encompassing 15 British collections in total, this suggested disjoint IBD dynamics with one strong north-south genetic break within this geographical region.
A distinct separation between northern and southern British and all other populations was also evident in the PCA. Here, the three first principal components (PCs) described 5.3% of the variation, accounting for 2.4%, 1.6% and 1.3%, respectively ( Figure S4). PC1

| Identification of SNP outliers
The pcadapt analyses comprising all samples showed agreement with the results using adegenet by providing the optimal model resolution when grouping genotypes into 10 clusters ( Figure S8). When controlling for population clustering, pcadapt identified 24 outlier SNPs distributed across 13 LGs, which was considerable lower than the global result from bayescan, which identified 576 outlier SNPs (183 with lower and 393 with higher than expected divergence, Table S3). Seventeen outlier SNPs were identified with both methods. Comparing pcadapt outliers detected in paired sub-sets of data, 11, 11 and 17 outliers were identified in sub-set 1, sub-set 2 and subset 1 and 2 combined, respectively. There was good correspondence in the numbers of geographical clusters identified for the three data sets ( Figure S9), but relatively little overlap between loci identified as outliers in the sub-sets 1 and 2 (four of 11 outlier loci, Table S3, Figure S10). The low overlap in outliers detected with different subsets of samples was also evident with the bayescan approach, where just 137 of 401 outlier loci (34%) were identified in both sub-sets 1 and 2 (Table S3, Figure S11). Outlier SNPs identified with bayescan were distributed across all LGs (Figure 3, Table S3). There was no trend for covariance between LG size and numbers of outliers identified and no apparent clustering of outliers within LG.

| Genotype associations with environment
Principal component analysis of spatial and environmental variables showed that the five top PCs explained 69% of the variation ( Figure   S12). PC1 (32% variation explained) described parameters distinguishing the three main geographical regions, Britain, Continental Europe and Scandinavian Peninsula, including temperature (higher for Britain) and salinity (lower for Continental European samples). PC2 (15%) was associated primarily with climatic variables related to precipitation. PC3 (10%) was associated with soil pH. PC4 (6%) was associated with high summer temperatures, and PC5 (5%) was associated with soil type (Tables S5 and S6 (Table S3).

| D ISCUSS I ON
Our study has two main merits. Firstly, by taking advantage of genomic resources developed for S. trutta, this study provides the most detailed examination of large-scale genetic structure in a central part of the species' native range. We identified very strong sample clustering corresponding with broad geographical regions, as well as clear genetic breaks among samples within regions, that in several

| Inference from genome-wide SNP variation in S. trutta
Genomic resources are rapidly growing for S. trutta (Carruthers et al., 2018;Leitwein et al., 2017;Lemopoulos et al., 2018), but our study is the first to apply genome-wide SNP data to examine broadscale population relationships and associations with evolutionary drivers across regional scales. The stringent criteria for selecting SNPs are likely to have caused some ascertainment bias, thereby skewing MAF towards higher values and lowering power to disentangle effects of neutral versus selective processes.
Ascertainment bias in SNP selection may bias inferences of population demography (Guillot & Foll, 2009, but see also Albrechtsen, Nielsen, & Nielsen, 2010). In this study, individuals from all major genetic clusters detected were represented in the ascertainment panel, suggesting that our assessment of spatial structure is unlikely to suffer substantial bias with respect to the magnitude of F I G U R E 4 Association between environmental variables and frequency of major allele for populations in the three geographical regions Britain (open symbols, dotted line), Scandinavian peninsula (black symbols, solid line) and Continental Europe (grey symbols, stippled line), exemplifying gene-environment association relationships that are as follows: (a) specific to geographical regions; locus "Gdist_S82886_3406," identified as selective outlier in both pcadapt and bayescan, (b) general across regions; locus "cDNA_S415600_5145," showing no outlier behaviour but association with temperature parameters and latitude (shown here) or (c) general across regions characterized by salinity gradients, but not in the salinity-invariant region Britain; locus "cDNA_S84920_5746." None of the three loci exhibited statistically significant outlier behaviour in RDA (see text). Least squares regression lines are shown to guide inference genetic differences among populations, at least across larger geographical scales. Individuals in the SNP development ascertainment panel represented both the NE British and the SE British population clusters. The lower overall genetic diversity observed in samples from Britain was thus unlikely an artefact of ascertainment bias. Although methods have been proposed to correct for ascertainment bias (Guillot & Foll, 2009), most require some assumption about allele frequency distributions under un-biased conditions. Lacking such information, we therefore refrained from estimating demographic parameters, such as phylogenetic relationships and divergence times. Yet, we emphasize that given the genomic coverage and quantity of markers, the relative magnitudes of differentiation detected among samples are expected to be robust.

| Geographical breaks in population structure
Our results support previous inference based on smaller marker panels and samples with more restricted geographical coverage.
Accordingly, this study reinforces the notion that trout commonly show temporally stable (at least on a decadal scale) genetic structure on local river or neighbouring watershed scales and that the distribution of genetic variation can often be described by isolation-by-distance dynamics, attributed to the species' homing instinct (Griffiths, Koizumi, Bright, & Stevens, 2009;Meier et al., 2011;Stelkens, Jaffuel, Escher, & Wedekind, 2012). Although our sampling scheme was not exhaustive, we benefitted from substantial geographical sample coverage allowing fine-scale inference about structural relationships from wide to small geographical scales. First, we identified strong genetic breaks among populations inhabiting the landmasses Britain, the European continent and the Scandinavian Peninsula, most likely reflecting allopatric divergence during postglacial colonization events (Bernatchez, 2001;Cortey, Vera, Pla, & García-marín, 2009;Hewitt, 2000).
A novel result was the identification of strong divergence between northern and southern British populations. Our sampling design did not allow for precise definition of the geographical break between the two clusters of samples, nor whether geographically intermediate populations constituted a hybrid zone between clusters.
Analyses of mitochondrial DNA markers have not identified genetic breaks between populations sampled in south and north (McKeown, Hynes, Duguid, Ferguson, & Prodohl, 2010). However, complex phylogeographical processes and incomplete lineage sorting could have affected this result (McKeown et al., 2010) masking the divergence observed with nuclear markers. In contrast to Northern Britain, the area spanning from Wales in west to east Anglia in east was ice-free during the last glaciation and thus has a different geological history from that of the north. Moreover, the entire Southeast Britain previously constituted the catchment area for the River Thames draining into the southern North Sea (Rose, 1994). Analyses of additional samples from the River Thames and from rivers draining into the English Channel support the existence of close genetic relationships among SE British trout populations (D. Bekkevold, A. King, J. Stevens, unpublished results). Trout populations in SE Britain are generally characterized by inhabiting short, lowland rivers, whereas NE British populations included in our study generally inhabit larger upland rivers with stronger gradients and higher water flow. The genetic discontinuity identified in our study is hence in agreement with expectations for two population clusters inhabiting different environments and having discrete evolutionary histories potentially associated with selective sweeps that maintain separate demographics to present day.
We identified another prominent genetic break between populations from the Scandinavian Peninsula and Continental Europe (including the Danish Belt Sea islands). There was apparently a strong barrier to genetic exchange between rivers draining into east and west of the Skagerrak and Kattegat seas, in some places separated by less than 60-km waterway. Such strong genetic separation is remarkable, given the species' propensity for long distance feeding migrations (Koljonen, Gross, & Koskiniemi, 2014), including within the Kattegat area (D. Bekkevold, unpublished data), and points to accurate homing to specific geographical regions.
As expected, genetic structure was also evident within regional population clusters. In most cases, genome-wide differentiation followed geographies in correspondence with founder events combined with isolation-by-distance dynamics. Nonetheless, we stress that although inference about overall genetic relationships among populations should be robust, the potential ascertainment bias of the applied markers could somewhat influence the inference of the underlying demographic processes. Even so, the markers analysed here will be useful for delineating conservation units and their distributions in time and space, as well as for individual assignment and identification of population admixture (Nielsen et al., 2012).

| Contrasting results from different genome scan approaches
We used one of the, to our knowledge, largest sample sizes to date for outlier analysis across a large number of geographically widespread populations and typed more than 3,000 SNPs, which is the suggested threshold to limit false discovery bias (Ahrens et al., 2018). The application of two different genome scan methodologies allowed further insights into the processes hypothesized to drive population differentiation. The overarching result from both methodologies was the finding of statistical outliers across broad expanses of the trout genome. Large blocks of highly divergent loci positioned in one or a few linkage groups (sometimes referred to as "genomic islands" often attributed to chromosomal structures such as inversions) have been identified to differentiate ecotypes in fish (Bradbury et al., 2013;Hemmer-Hansen et al., 2013;Lamichhaney et al., 2017). However, unless a population has undergone a recent selective sweep, local selection is unlikely to act on only a few largeeffect loci or genomic regions (Rockman, 2012). Although the genetic structures we observe likely also reflect drift and, potentially, postglacial secondary contact between lineages, our results are thus in line with the pattern from other salmonids Pritchard et al., 2018) that populations exhibit genomic variation indicative of multiple sweeps and divergent selection acting on broad expanses of the genome.
When all collections were included in global analyses, the two methodologies returned large numerical differences in detected outlier loci. Thus, bayescan identified 11% outliers, whereas when controlling for regional population clustering, pcadapt identified less than one per cent outliers. Albeit both figures at face value differ from estimates of ~5% outlier loci seen in other species, they are still within the reported range (reviewed in Ahrens et al., 2018).
Our analysis is also consistent with results being dependent on both statistical methods used (Bradbury et al., 2013) and the type of genetic variation studied (de Villemereuil et al., 2014;Vasemägi, Nilsson, & Primmer, 2005; but see Ahrens et al., 2018). The bayescan approach may yield low detection power under an "isolation with migration" model, as is expected for brown trout populations, and sampling large numbers of genotypes may concurrently inflate numbers of false positives (de Villemereuil et al., 2014). The population clustering parameter (k) applied in our pcadapt analysis was based on all loci, rather than restricted to sub-sets of loci presumed to reflect neutral demographic processes. This could potentially have reduced number of outliers detected with the pcadapt approach, compared to the "demography-naïve" bayescan approach.
Conversely, in a minor number of cases, paired rivers did not exhibit statistically significant divergence, which could have led to slightly increased numbers of false positives identified across sub-set analyses. All of these error sources may have contributed to the observed discrepancy between the two outlier analysis approaches.
The inference gained in our study from comparing methods is that outliers may be highly specific to hierarchical population clusters, as evidenced by the much lower number loci detected with the pcadapt approach incorporating demography than in the approach not incorporating demography. The relatively low overlap in outliers identified in sample sub-sets corroborates the view that sampling design may have a strong effect on detection of genomic regions underlying selection. However, when comparing outliers identified across both sample sub-sets, all four pcadapt outliers and 90% of 137 bayescan outliers also showed GEA. This supports the notions that comparing results from different statistical methods (de Villemereuil et al., 2014) and applying a paired-population sampling design (Lotterhos & Whitlock, 2015) can strengthen inference about selective processes under a hierarchical population scenario.

| Genotype-environment associations within and among clusters
We identified several GEA related to climatic variables temperature and precipitation, which is consistent with results in other fish species, including other salmonids Chen, Farrell, Matala, & Narum, 2018;Hecht, Matala, Hess, & Narum, 2015;Matala, Ackerman, Campbell, & Narum, 2014;Perrier, Bourret, Kent, & Bernatchez, 2013). Such general relationships are not surprising given that in fishes, temperature is linked to key physiological, developmental and behavioural processes, rendering fish highly sensitive to climatic and thermal conditions (Crozier & Hutchings, 2014;Eliason et al., 2011). Climatic drivers are hence expected to exert selection pressures on local populations, also in trout (Jensen et al., 2008), although some salmonid studies indicate stronger effect of phenotypic plasticity rather than adaptation to specific temperatures (Solberg, Dyrhovden, Matre, & Glover, 2016 LD, especially high within salmonids, is likely to show low decay across broad genomic regions, also obscuring the direct relationship between specific SNP's and environmental variables. Thus, confirmation of functional and adaptive significance of post hoc identified genes requires rigorous testing, and although several GEA studies report annotation of SNPs found to be associated with environmental variables, we follow the argumentation in Pavlidis, Jensen, Stephan, and Stamatakis (2012) and refrain from mining for annotation of GEA markers. In GEA testing, both the univariate latent factor mixed model approach LFMM and the ordination-based method RDA identified loci associated with environmental parameters, but the latter method identified less than 5% of the numbers of SNPs as the former. This differs from results of a recent study in Populus, where RDA had superior statistical power and showed lower nondetection rates than LFMM (Capblancq et al., 2018). That study also found that the two approaches did not consistently identify known QTLs, whereas in our study, SNPs showing GEA with RDA generally also were identified with LFMM. Interestingly, there was relatively stronger consistency in loci identified with both methods to be associated with the variable Dim1 that comprised the main differences among geographical regions than across analyses for Dim2-5, associated with environmental variables that were less specific to individual geographical regions. This indicates that GEA methods were most consistent for variables showing the strongest inter-regional divergence. Numbers of GEA identified with LFMM were high (totalling almost 30% of all loci) and spread across all LGs. Although results were controlled for inflation of false positives, this suggests that absolute numbers of GEA identified may have been upwardly biased. Irrespective of a potential bias, the identification GEA across broad expanses of the trout genome with both methods is suggestive of locally adapted variation being pervasive throughout multiple genomic regions. Our results may also reflect the expectation that locally adapted traits often are polygenic and governed by loci that individually exhibit low effect that are difficult to detect statistically (Savolainen et al., 2015).
A strength of our sampling design is that it represents a pairedgradient design, in the sense that it allowed for an assessment of whether the same loci were associated with climatic gradients across presumably allopatric population clusters. Although several loci showed consistent relationships with environmental variables, there were also several cases where relationships were evident within only one or two population clusters. This was exemplified by the locus showing maximal association with salinity. Here, both the Scandinavian and the European mainland population clusters, which co-habit the North Sea-Baltic Sea salinity gradient, were found to display increasing allele frequencies with decreasing salinity, although variance among populations was pronounced. For the same locus, allele frequencies also varied greatly across the British populations that invariantly inhabit rivers, which drain into high-salinity coastal environments. This result could indicate that salinity conditions, rather than demography alone, drive dynamics in that locus in Scandinavian and European mainland populations, and that neutral, or at least dissimilar, dynamics drive allele frequencies in British populations. The application of composite environmental variables in the GEA models, rather than examining single, in several cases inter-correlated variables individually may have obscured identification of strong relationships between specific variables and SNPs.
However, in the present context association between environments and genomic regions suggests that adaptation to local environments may be complex but is pervasive across populations.

| Genetic management of brown trout
Although genomic analysis and identification of adaptive variation is not a requirement for conservation per se (Flanagan, Forester, Latch, Aitken, & Hoban, 2018), our results have direct management implications. First, our results can be applied to define conservation units and to prioritize management actions (Funk, McKay, Hohenlohe, & Allendorf, 2012; also see discussion in Mee, Bernatchez, Reist, Rogers, & Taylor, 2015). Specifically in salmonids, supplementary stocking has been a popular management tool to mitigate dwindling populations in the face of habitat deterioration and fisheries exploitation. A substantial body of literature has addressed potential genetic effects of stocking non-native genetic material into wild populations (Laikre, Schwartz, Waples, & Ryman, 2010) where effects on population fitness are mainly expected to be negative (Edmands, 2007). Our SNP data represent a valuable genomic resource for assessment and monitoring of introgression in local populations (Glover et al., 2013;Vera et al., 2018) and as a tool to design experimental tests of fitness effects of introgression (Hagen et al., 2019). A practical advantage of our SNP array approach, in comparison with, for example RAD and other genotyping-by-sequencing-based approaches (Andrews, Good, Miller, Luikart, & Hohenlohe, 2016), is that our markers are directly transferrable between genotyping platforms and that information can be used to tailor analyses addressing specific management objectives. To maximize the efficiency of conservation efforts, there is increasing effort to tailor releases by using the genetically most suitable stocking material (Caudron, Champigneulle, Guyomard, & Largiader, 2011). Genomic coverage was relatively low in our study, and the knowledge gained here is unlikely to fully reflect functionally significant differentiation within and among populations. It is therefore possible that some of the populations exhibiting no genetic differentiation in fact are locally adapted. Moreover, our sampling design was not equally comprehensive across all geographical regions. Although our study thus has shortcomings, it nonetheless provides essential information on the geographical distribution of populations more likely to share evolutionary histories that would allow for successful reintroductions, where needed. We identified geographical regions exhibiting overall weak genetic differentiation among neighbouring rivers, as, for example, was the case for several Swedish Skagerrak rivers.
Where human-mediated gene flow can be discounted, genetically similar populations are inferred to also display stronger demographic connectivity. Results could indicate that population dynamics can be described in a meta-population context, where demographic stability may be dependent on regional, rather than local processes (Østergaard, Hansen, Loeschcke, & Nielsen, 2003). Especially in systems consisting of small, temporally unpredictable streams (in terms of discharge), straying between neighbouring rivers might be a strategy that has been favoured by natural selection. Conversely, in a number of cases, river populations within a relatively constrained geographical area exhibited marked genetic divergence, suggestive of selection against interbreeding. In such cases, restocking activities should refrain from mixing gene pools, which could result in introgression and outbreeding depression. Although extensive genomic and experimental analyses are required to predict the suitability of directed releases of specific non-native strains, most applied conservation work relies on the establishment of practical guidelines that do not require detailed case-by-case study. An applied conservation guideline adhered to in Denmark is that restocking material should represent fish of the genetically closest related population and from the geographically closest population if that information is not available (Berg & Hansen, 1998). Governance of salmonid stocking varies strongly among north-east Atlantic legislative units, and there is a call for increased attention to halt unsustainable management practises (Aas et al., 2018). Our results serve as a tool that can be directly implemented in outlining conservation units and to advice on the geographical distribution of genetic populations that can be expected to be suitable for preserving adaptive state, for example under local restocking activities.

ACK N OWLED G EM ENTS
We are grateful to a great number of people for collecting samples,

DATA AVA I L A B I L I T Y S TAT E M E N T
SNP genotype data are deposited at https ://data.dtu.dk/10.11583/ DTU.10008020.