Comparison of anadromous and landlocked Atlantic salmon genomes reveals signatures of parallel and relaxed selection across the Northern Hemisphere

Abstract Most Atlantic salmon (Salmo salar L.) populations follow an anadromous life cycle, spending early life in freshwater, migrating to the sea for feeding, and returning to rivers to spawn. At the end of the last ice age ~10,000 years ago, several populations of Atlantic salmon became landlocked. Comparing their genomes to their anadromous counterparts can help identify genetic variation related to either freshwater residency or anadromy. The objective of this study was to identify consistently divergent loci between anadromous and landlocked Atlantic salmon strains throughout their geographical distribution, with the long‐term aim of identifying traits relevant for salmon aquaculture, including fresh and seawater growth, omega‐3 metabolism, smoltification, and disease resistance. We used a Pool‐seq approach (n = 10–40 individuals per population) to sequence the genomes of twelve anadromous and six landlocked Atlantic salmon populations covering a large part of the Northern Hemisphere and conducted a genomewide association study to identify genomic regions having been under different selection pressure in landlocked and anadromous strains. A total of 28 genomic regions were identified and included cadm1 on Chr 13 and ppargc1a on Chr 18. Seven of the regions additionally displayed consistently reduced heterozygosity in fish obtained from landlocked populations, including the genes gpr132, cdca4, and sertad2 on Chr 15. We also found 16 regions, including igf1 on Chr 17, which consistently display reduced heterozygosity in the anadromous populations compared to the freshwater populations, indicating relaxed selection on traits associated with anadromy in landlocked salmon. In conclusion, we have identified 37 regions which may harbor genetic variation relevant for improving fish welfare and quality in the salmon farming industry and for understanding life‐history traits in fish.


| INTRODUC TI ON
One of the most extreme adaptations in Atlantic salmon (Salmo salar) occurred during land rise following the most recent ice age ~10,000 years ago, when numerous salmon strains became landlocked throughout the geographical distribution in the Northern Hemisphere (Hutchings et al., 2019;Tonteri et al., 2005). Since the end of the ice age, landlocked salmon populations have adapted to a life in freshwater, losing selection pressures associated with seawater, marine diets, and seaborne pathogens. It is likely that different landlocked populations of salmon have been exposed to similar selection pressures and relaxed selection on seawater traits and gone through similar genetic adaptation, sometimes independently of each other. Such populations present a unique opportunity to identify genomic regions under selection for different important traits, as successfully demonstrated in salmon for the age at maturity (Ayllon et al., 2015;Barson et al., 2015) and on genes associated with disease resistance (Kjaerner-Semb et al., 2016;Zueva et al., 2018).
Previous studies on landlocked salmon populations have found that many of the phenotypic transitions associated with preparatory changes for a life in seawater differ from their anadromous counterparts in immunology (Ronneseth et al., 2005), and morphology and hypo-osmoregulatory capacity (McCormick et al., 2019;Nilsen et al., 2007Nilsen et al., , 2008. We hypothesize that developmental traits associated with marine life in ancestral anadromous populations have been lost or suppressed in landlocked salmon due to relaxed selection on seawater traits while advantageous traits have been positively selected. Comparisons between landlocked and anadromous salmon may therefore provide an excellent model for identifying genetic mechanisms underlying evolution of important phenotypic traits during seawater adaptation such as smoltification, resistance to seaborne diseases, and omega-3 synthesis. Farming of Atlantic salmon is a growing industry; however, sustainability issues such as seaborne diseases associated with seacage rearing are currently limiting further growth of the industry . In the recent past, the industry has also reported an increasing incidence of welfare problems associated with production of fast-growing, large smolts in modern industrial facilities including osmoregulatory problems, disease, poor growth, and precocious maturity. Domestication of salmon may have affected important traits associated with seawater adaptation such as osmoregulation, disease resistance, growth, reproduction, and behavior (Glover et al., 2017). Currently, we do not understand the genetics behind key traits for aquaculture, for example, smoltification, which is a key step in the transition into seawater and if not properly controlled by farmers will result in reduced growth and high mortality in the sea phase. Hence, there is an increasing demand to explain the genetic basis of traits relevant to current aquaculture production, which may support selective breeding programs aiming to increase the welfare and survival of farmed fish.
Here, we have sequenced and compared genomes of anadromous and landlocked salmon populations throughout their geographical distribution. We found several genes and genomic regions where all the assayed landlocked populations show signs of parallel selection. We also identified genes potentially important during the marine phase by screening for regions showing consistently relaxed purifying selection in landlocked compared to anadromous salmon.

| Sample collection and DNA extraction
All tissue samples have been obtained from scientific sampling or from professional or recreational fishers, except the landlocked fish from Gullspång and Blege, which were reared in freshwater in our hatchery facility (Matredal, Norway) for one (Blege) and two generations (Gullspång) under conditions similar to standard commercial fish farming, and are therefore exempt from the Norwegian Regulation on Animal Experimentation (NARA). Rearing and sampling of salmon from Connecticut River and Sebago Lake have been described previously (McCormick et al., 2019) and were in accordance with U.S. Geological Survey (USGS) institutional guidelines and protocol LSC-9096 that was approved by the USGS Leetown Science Center Institutional Animal Care and Use Committee.
Genomic DNA was extracted from scales or fins collected from fish representing all the populations included in this study using one of several methods including Qiagen DNeasy Blood and Tissue or Mini Kits (Qiagen), or a salt-based extraction protocol as performed by Tonteri et al. (2005). Populations are shown on a map of the Northern Hemisphere in Figure 1 and are listed in Table 1, with a more detailed description in Table S1. A schematic overview of the organization of populations and analyses is presented in Figure S1.

| Pooled genome sequencing
DNA purity was assayed using Nanodrop (Thermo Fisher), and fluorometric quantification with Qubit (Thermo Fisher) was used to measure DNA concentrations of each sample. DNA pools were made by pooling equal amounts of genomic DNA from 10 individuals from the same population (Rubin et al., 2010).
One to four pools were made for each population, and DNA integrity was inspected by gel electrophoresis. Paired-end libraries were made for each pool using Genomic DNA Sample  (Ayllon et al., 2015;Kjaerner-Semb et al., 2016) and include only males, while the rest of the populations either contain separate pools of males and females, or pools where males and females have been mixed. Atlantic salmon lack typical sex chromosomes, but instead contains the sex-determining sdy-locus, which is the only difference between males and females (Yano et al., 2013). Therefore, it is unlikely that any other regions are highly different between male and female salmon. All sequence data used in this study are available on SRA (BioProject ID: PRJNA627844), with accession numbers of all sequenced pools listed in Table S1.

| Processing of sequence data and SNP calling
To minimize batch effects from the use of different versions of the Illumina HiSeq sequencing platform, stringent filtering steps were applied to the data. Quality analysis of the sequence data, including screening for degenerated adapter sequences, was done using FastQC (https://www.bioin forma tics.babra ham.ac.uk/proje cts/fastq c/). Read pairs were filtered using Cutadapt (v. 1.18) (Marcel, 2011) with the following specifications: The first and last F I G U R E 1 Geographical overview of salmon populations. Sequenced genomes of Atlantic salmon from six landlocked populations (green) and 12 anadromous populations (combined into three groups, blue) were analyzed in this study and are indicated by numbers on a map of the Northern Hemisphere. Genetic distances between the populations are illustrated as a phylogenetic tree based on pairwise calculations of fixation index (F ST ), where the scale bar indicates F ST . Organization of subpopulations is illustrated in Figure S1, and a more detailed description of the populations is given in Table 1 and Note:: WN contains four pools from each of 6 populations from Western Norway, and NN contains 3 pools from each of five populations from Northern Norway. The average depth of coverage for SNPs is given as peak values (the depth value that was most prevalent in each sample), as visualized in Figure S2. A more detailed description of all populations used in this study is presented in Table  S1 and illustrated in Figure S1.

TA B L E 1
Anadromous and landlocked populations analyzed by pooled wholegenome sequencing two bases of each read were removed (using the parameters -u 2 -u −2 -U 2 -U −2), and low-quality bases were trimmed from the 3' end of each read by setting the option −q to 25. Minimum overlap between adapter and read sequences was set to 15 bp using the -O option, and reads containing adapters and reads shorter than 75 bp were discarded (--discard and -m options, respectively). Filtered reads were mapped to the Atlantic salmon reference genome (v. ICSASG_v2) (Lien et al., 2016) using Bowtie2 (v. 2.3.4.3) (Langmead & Salzberg, 2012) with default parameters. The mapped reads were further processed with Samtools (v. 1.9) (Li et al., 2009) for duplicate removal, quality filtering, and SNP calling as follows: First, the alignment files were converted to BAM format with 'samtools view' using the -b option, followed by sorting by read names using 'samtools sort' with the -n option. Read mate information was updated using 'samtools fixmate' with the -m option, followed by coordinate sorting with 'samtools sort', before marking duplicated reads with 'samtools markdup'. Finally, reads were filtered with 'samtools view' with the −q option set to 20 to remove reads with ambiguous mapping and setting the -F option to 1,024 to remove duplicated reads. SNPs were called for each population using 'samtools mpileup' with the -a and -B options, and minimum base and mapping quality thresholds (-Q and -q options, respectively) of 20. The resulting mpileup file was further converted to sync format (as used in PoPoolation2) (Kofler et al., 2011) and filtered (using custom scripts) by retaining only SNPs having global minor allele counts of at least 2 and exactly two alleles.
Pools belonging to the same population were merged by summing the allele counts of the pools. Subsequently, the anadromous populations from Western Norway (n = 6) and Northern Norway (n = 5) were grouped as two populations by summing the allele counts for the populations contained in each of the two groups (since they had been sequenced in the same way, with similar depths of coverage), resulting in a total of 9 populations (illustrated in Figure S1 and listed in Table 1, with more details shown in Table S1). Finally, each SNP was required to have a minimum depth of coverage of 5 in each of the 9 populations (i.e. SNPs with coverage less than 5 in any populations were discarded from the entire dataset), and SNPs in unplaced scaffolds or in mitochondrial DNA were discarded. The SNPs were annotated and divided into functional categories with SnpEff (v. 4.2) (Cingolani et al., 2012) using the Atlantic salmon reference genome annotation.

| Phylogenetic analysis of sequenced populations
F ST was calculated (with a custom Python script, Script S1) for all pairwise comparisons which included all the identified SNPs, using presented in (Nei, 1977), for each SNP, where p represents the allele frequency of the reference allele for each of the two populations in each pairwise comparison. F ST values of all SNPs were averaged for each pairwise comparison to make a distance matrix. The distance matrix was used to generate a neighbor-joining tree using Neighbor from the Phylip package (v.

| Identifying differentiated SNPs and selective sweeps
Identification of SNPs that were differentiated between anadromous and landlocked populations was done by calculating the difference in allele frequency between the two groups (dAF) (Carneiro et al., 2014) using the formula dAF = | | p L − p A | | for each SNP, where p L and p A are the average reference allele frequencies of the landlocked (n = 6) and anadromous (n = 3) populations, respectively. Our aim was to uncover differentiated genomic regions, indicating selective sweeps, so we performed a genomewide screen for regions containing several highly differentiated SNPs. Selective sweeps were predicted in 100 kb sliding genomic windows with 50 kb step size, only considering windows having at least 10 SNPs with a minimum dAF of 60%. Each window was then extended 50 kb to each side, and overlapping windows were merged. Regions passing these criteria were considered as putative selective sweeps.

| Pooled heterozygosity
In order to ensure high quality of the data, SNPs with inconsistent depths of coverage were removed from the initial set of SNPs by using strict filtering with the requirement that the depth of coverage for each SNP had to be within one standard deviation of the peak depth for each population ( Figure S2). If a SNP had depth of coverage outside this threshold in any population, it was discarded from the entire dataset. Heterozygosity was calculated in 50 kb sliding genomic windows with a step size of 1 bp. Using 1 bp step size provides a much higher genomic resolution as it includes all possible genomic windows, and is explored in more detail in Qanbari et al. (2012). Windows having low numbers of polymorphic loci are more susceptible to spurious fixation signals and uncertain heterozygosity values, so to increase the confidence of the analysis, windows having fewer than 10 SNPs were discarded (Qanbari et al., 2012;Rubin et al., 2010). For each population, the pooled heterozygosity of a window (Hp) was calculated with the formula where p i is the allele frequency of the global major allele for the i-th SNP in a given window containing n SNPs. This is similar to what has been done in Qanbari et al. (2012) and Rubin et al. (2010), except that we calculate the heterozygosity for each SNP and take the average for each window.

| Genotyping individual fish
To obtain individual-specific genotype distributions and to investi-  Table S2). From each of the populations listed in Table   S1, 10-61 individuals were genotyped for both SNPs. The genotyping assays were run on QuantStudio 5 (Thermo Fisher).

| Gene annotation and tissue-specific gene expression analysis
The Atlantic salmon reference gene model GFF file (v. ICSASG_v2) (Lien et al., 2016) was used to identify genes in genomic regions of interest by overlapping the GFF file with BED files containing selected regions using the 'intersect' tool from the Bedtools package (v. 2.26.0) (Quinlan & Hall, 2010). Genes were annotated by performing alignment searches using BLASTP (Altschul et al., 1997) with the amino acid sequences from the reference gene models against the Swiss-Prot database (v. 2015.08.10). Tissue-specific expression profiles of genes in these genomic regions of interest were examined using RNA-Seq data from various salmon tissues obtained from SRA (BioProject ID: PRJNA72713). Briefly, sequence reads were mapped to the gene models using Bowtie2, and read counts were summed for each gene ID and normalized by total mapped read counts.
Heatmaps were made by first discarding genes that had normalized read counts <50 in all assayed tissues, before using J-Express (v. 2012) (Dysvik & Jonassen, 2001) to generate heatmaps using highlevel mean and variance normalization, with complete linkage clustering and Euclidean distance measure. Gene expression in gills of salmon exposed to saltwater for 24 hr was examined using RNA-Seq data obtained from Array Express (accession number E-MTAB-8276), described previously in Iversen et al. (2020). Sequence reads were filtered using Cutadapt with parameters -q 20 -O 8 and -m 40 and mapped to the salmon gene models with Bowtie2 using default settings. DESeq2 (Love et al., 2014) was used to identify differentially expressed genes between fish exposed to saltwater (n = 84) and controls (n = 83) divided into six different sampling points (the fish were approximately 7 months of age at experiment start). Read counts were summed for each gene ID and normalized by total mapped read counts.

| Determination of missense SNP ancestral state
Ancestral state of a missense SNP in the candidate gene cadm1 (see Results) was determined by aligning the Cadm1 amino acid reference sequence (accession: XP_013992853) against the refseq_protein database using BLASTP (https://blast.ncbi.nlm.nih.gov), only including matches to teleost fishes (taxid: 32443).

| Genomewide SNP analysis reveals 28 selective sweeps
Identification of differentiated SNPs was based on the difference between average SNP allele frequencies between two groups (dAF).
This allowed us to identify parallel selection on genetic variation in multiple landlocked populations, where SNPs present in the ancestral anadromous populations were subjected to strong positive selection for the same allele after the formation of the landlocked populations.
We used two different thresholds for reporting differentiated SNPs; dAF > 0.5, which resulted in 15,038 SNPs, and dAF > 0.6, resulting in 2,194 SNPs. Regions harboring at least ten differentiated SNPs (dAF > 0.6) in 100 kb sliding genomic windows were regarded as selective sweeps, and genomewide screening revealed 28 sweeps containing many differentiated SNPs, potentially resulting from different selection pressures in the landlocked and anadromous populations ( Figure 2a, Table 2).

| Regions with consistently relaxed selection in landlocked salmon
Heterozygosity is commonly used as an index of genetic diversity and can also provide indications of purifying selection that keeps genomic regions from accumulating deleterious mutations. If a gene or region becomes less relevant in a population, it is more likely to accumulate mutations that are not purged from the population. This can be used to identify genomic regions that are under purifying selection due to a conserved function of the genes in that region.
For example, genes that are vital for survival at sea can be expected to accumulate more mutations in landlocked salmon that no longer require that specific function to be maintained since they no longer migrate to the sea, and therefore experience a reduction in selection pressure. Therefore, we aimed to uncover genomic regions and genes that show increased genetic diversity in landlocked salmon compared to anadromous salmon, potentially leading to discovery of genes associated with seawater-related traits relevant for aquaculture, such as resistance to seaborne diseases or smoltification. The populations were grouped into four groups: Ladoga, Onega, Barents Sea, and White Sea, by calculating the average allele frequency for each SNP marker in each group. Heterozygosity was analyzed using the same parameters as for the sequence data in the present study, and the regions with reduced heterozygosity in anadromous salmon that overlapped with our data are reported. In total, 1,217 regions showed reduced heterozygosity in anadromous populations relative to landlocked populations in that dataset, 16 of which overlapped with the regions showing reduced heterozygosity in anadromous populations in our data (shown in Table 3, and indicated by red dots in Figure 3). Since they are conserved in both datasets, these regions are expected to contain potential candidates for genes that are important for the seawater phase. The 16 regions covered 34 genes and, interestingly, included insulin-like growth factor 1 (igf1) (Figure 4). Igf1 is known to promote the development of salinity tolerance in Atlantic salmon (McCormick, 1996;Sakamoto et al., 1993), and transfer to seawater is associated with increasing plasma levels of Igf1 (McCormick, 2001). Together with growth hormone and cortisol, Igf1 is involved in increasing Na+/K+ ATPase activity in gills in different salmonids to promote seawater

F I G U R E 2 Differentiated genomic regions. (a) Manhattan plot showing SNP allele frequency differences (dAF) between landlocked and anadromous populations of Atlantic salmon in the Northern
Hemisphere. The x-axis shows chromosomal positions along the salmon genome, and the y-axis shows the difference in allele frequencies between the two groups. SNPs in selective sweep regions (n = 28), identified using a threshold of dAF > 0.6 using 100 kb nonoverlapping genomic windows are marked in red. (b) Heatmap showing tissue distribution of normalized gene expression of genes in the identified selective sweeps. Green = increased expression, blue = reduced expression. A detailed view of the heatmap including gene IDs is shown in Figure S4. (c) Upregulation of ppargc1a in gills after 24 hr saltwater exposure. The y-axis shows normalized read counts for ppargc1a in salmon gills, and the x-axis shows the sampling points given as number of days since experiment start. Blue indicates salmon challenged with saltwater (SW) for 24 hr and green indicates salmon kept in freshwater (FW). Contrasts between FW and SW were significant at each sampling point (p adj < 4.38E-41) tolerance (Bjornsson et al., 1987;Madsen, 1990;McCormick, 1996;Seidelin et al., 1999). Igf1 is also involved in growth regulation of vertebrates including teleost fish (McCormick et al., 1992;Wood et al., 2005), and in farmed Atlantic salmon, SNPs in igf1 have been associated with overall body weight and fillet weight (Tsai et al., 2014). It is therefore possible to speculate that the gene is conserved in anadromous salmon because of its importance in smoltification and seawater growth, which are processes that have become TA B L E 2 Selective sweeps differentiated between anadromous and landlocked populations. Regions harboring ≥ 10 SNPs having dAF values ≥ 0.6 identified in 100 kb nonoverlapping genomic windows Note:: Gene symbols of genes from the reference annotation are shown as obtained from the annotation against Swiss-Prot, where genes lacking a gene symbol are indicated by "unknown". Sweeps overlapping regions with reduced ZHp in landlocked salmon (n = 7) are presented in bold. Sweeps overlapping regions with reduced ZHp in both our data and the data from (Zueva et al., 2018; n = 2) are indicated by †. A detailed description of the genes can be found in File S2.
less relevant for landlocked salmon (Nilsen et al., 2008). Another interesting gene showing consistently reduced heterozygosity in anadromous populations was TGF-beta receptor 1 (tgfbr1), which is involved in regulation of many different processes in salmonids (Maehr et al., 2012). It has been shown to have a widespread tissue distribution and is highly expressed in the brain and muscle, as well as in immune-related cells in rainbow trout (Maehr et al., 2012), although in Atlantic salmon the highest expression level was found in ovary ( Figure S3). It is also worth noting that of the 16 regions with consistently reduced heterozygosity in anadromous salmon, two of the regions contained paralog regions that were duplicated in the salmonid-specific whole-genome duplication (Lien et al., 2016) ( Table 3). This indicates that the genes in these regions are under strong purifying selection in anadromous salmon, which has been relaxed in landlocked salmon. The paralog regions overlapped the genes signal peptidase complex subunit 3 (spcs3), WD repeat domain 17 (wdr17), and ankyrin repeat and SOCS box containing 5 (asb5).
Their functions in fish are not well characterized; however, Wdr17 has a function in eyes in mice (Chiang et al., 2020), and there is evidence that spectral sensitivity and eye pigments differ in freshwater and seawater life stages in salmon (Temple et al., 2008). spcs3 and asb5 have been assigned to Reactome pathways (https://react ome.org) such as "Viral mRNA Translation," and "Class I MHC mediated antigen processing and presentation", respectively, suggesting that these might be related to resistance against seaborne diseases.
We also identified regions with consistently reduced ZHp in landlocked populations included on the SNP array data presented in (Zueva et al., 2018). In total, 1,274 regions showed consistently reduced ZHp in the landlocked populations (Files S1 and S2), and comparison with ZHp values from the pool-seq data revealed 63 regions with consistently reduced ZHp in both datasets (Table S3).
Further, two of the regions overlapped with selective sweeps identified on Chr 9 and 24 (Table 2). It is worth noting that the relatively low number of marker positions on the SNP array compared to the genomic sequence data restricts the analysis to only regions covered by a sufficiently large number of SNPs on the SNP array.

| Tissue-specific gene expression of genes in selective sweeps
Since genetic variants in the selective sweeps can affect one or more  Note:: Gene symbols of genes from the reference annotation are shown as obtained from the annotation against Swiss-Prot, where genes lacking a gene symbol are indicated by "unknown". A detailed description of the genes can be found in File S2.

TA B L E 3
Regions with reduced heterozygosity in anadromous salmon. Listing regions showing consistently reduced heterozygosity in anadromous compared to landlocked populations (intersect of our data and the data from Zueva et al. (2018)).  Table 1 predominantly expressed in brain and gonads (Figures 2b and S4, File S2). These gene expression patterns point to (although not conclusive) selection acting on genes related to traits such as immune response, behavior, and reproduction. We also wanted to investigate if we could observe any tissue-specific enrichment for genes under selection. Compared to other tissues, gonad and brain express a large number of genes (Lien et al., 2016;Sonawane et al., 2017), which will cause a bias toward genes expressed in those tissues, making it difficult to identify any potential over-representation of genes under selection in certain tissues. Distribution of tissue-specific gene expression of a representative set of genes selected by random did not differ from that of genes in the sweeps ( Figure S5), indicating that such enrichment is either not present, or the large number of genes in the sweeps that are not under selection masks the enrichment.

| Genes in selective sweeps differentially expressed in the gill in response to saltwater
We also screened the sweeps for genes differentially expressed in juvenile fish exposed to saltwater by re-analysis of a recently published RNA-Seq dataset (Iversen et al., 2020) from salmon gills. This revealed that 14 of the genes in the sweeps were differentially expressed (p adj < .001) in at least one sampling point in fish challenged by saltwater for 24 hr at six sampling points over a 110-day period (File S3). Strikingly, it further revealed a highly significant upregulation of pparg coactivator 1 alpha (ppargc1a) at all sampling points (p adj < 4.38E-41, Figure 2c). This gene encodes a transcriptional cofactor located in a sweep on Chr 18 (positions 49,600,000-50,100,000) and is a master regulator of mitochondrial biogenesis and energy expenditure (Fernandez-Marcos & Auwerx, 2011). Mice lacking this gene show reduced mitochondrial respiratory capacity and an increased expression of lipogenic genes (Leone et al., 2005).
Adaptation to seawater is an energy-demanding process (Hoar, 2008) and salmon smolt show elevated respiratory enzyme activity and mitochondrial proliferation (Maxime et al., 1989), suggesting that ppargc1a can be a potential target for selection on salinity tolerance and smoltification.

| The most differentiated selective sweep on Chr 15
The selective sweep with the most differentiated SNPs was found on  Table S1) for the SNP in the 3′ UTR of sertad2 confirmed our observation (Figure 5b).
The gene sertad2 has been shown to modulate adipocyte function, and mice lacking the gene show increased lipolysis (Liew et al., 2013).
If the causative variant affects sertad2 gene regulation differently in landlocked and anadromous salmon, it is possible to imagine a mechanism where reduced expression in landlocked salmon inhibits lipolysis, allowing them to retain their lipid stores, which could be beneficial in a nutrient-poor environment. The gene gpr132 encodes a membrane receptor involved in modulation of several biological processes. In mammals, it is highly expressed in macrophages (Bolick et al., 2009;Chen et al., 2017), where it has been shown to facilitate macrophage M2 activation and to have a pro-inflammatory effect . In the salmon tissue distribution dataset ( Figure   S4), we observed higher expression in immune-related tissues such as spleen and head kidney, suggesting a possible role of this gene in immune defense in salmon. It is possible that different pathogen or parasite exposure in freshwater and seawater has been a driving force for selection on disease resistance (a topic discussed in more detail by Zueva et al. (2018)). Not much is known about cdca4, however, the gene encodes a regulator of transcriptional activation involved in cell proliferation (Hayashi et al., 2006) and has been shown to interact with p53 to promote apoptosis upon DNA damage (Hsieh et al., 2002;Pang et al., 2019). In humans, tRNA copy number variations can have phenotypic effects (Iben & Maraia, 2014;Kirchner & Ignatova, 2015). Since the two most differentiated SNPs in the sweep were located up-and downstream of a threonine tRNA, it is possible that they affect the transcription of the tRNA and therefore maybe affect phenotypic traits or physiological processes dependent on a certain amount of available threonine tRNA in the cell.

| The selective sweep on Chr 13 contains a missense SNP in cadm1
Because of hitchhiking effects, where polymorphic loci in proxim-  (Zhiling et al., 2008), and mice lacking cadm1 show impaired social interactions and increased anxiety (Takayanagi et al., 2010), in addition to male mice becoming sterile (Fujita et al., 2006). It also has a function in the immune system and has been reported in relation to human herpesvirus 8 (Hunte et al., 2018) and human T-cell lymphotropic virus-1 (Masuda et al., 2010;Pujari et al., 2015). Because Atlantic salmon cadm1 is expressed in several tissues and highly expressed in the brain ( Figure S4), it is difficult to speculate what function might be under selection, as behavior, immune response, and reproduction are all potentially relevant traits for adaptation to a life in different environments. Interestingly, it is known that landlocked salmon do not have the nerve innervation of important brain regions thought to be involved in downstream endocrine regulation of smolting . Most teleost fishes have a threonine in the position corresponding to the missense SNP, indicating that this may be the ancestral state; however, both amino acids can be found in different salmonids (File S4).

| Selective sweep on Chr 5 is linked to ISA resistance
We also identified a selective sweep on Chr 5 (positions 8,550,000-8,800,000) which contains a SNP previously found to explain 5.83% of phenotype variation in resistance to infectious salmon anemia (ISA) in commercial Saint John River Atlantic salmon (Holborn et al., 2020). The sweep contains the two genes sh3 domain-containing ring finger 1 (sh3rf1) and kinesin family member 3a (kif3a).
While kif3a is a microtubule motor protein involved in organelle organization and vesicle-mediated transport, sh3rf1 is assigned to the Reactome pathway (https://react ome.org) "Class I MHC mediated antigen processing & presentation" and can regulate T-cell differentiation and activation in mice (Cunningham et al., 2013(Cunningham et al., , 2016).
Sh3rf1 has also been shown to be essential for production and release of HIV-1 in humans (Alroy et al., 2005), suggesting a possible function in disease resistance in Atlantic salmon. Future studies will investigate which genetic variants in this sweep are associated with resistance to ISA.
The selective sweeps presented in this study provides a basis for identification of genetic variants with potential for increasing welfare of farmed animals. However, further studies are required to determine the precise function of genes and genetic variants under selection to be able to evaluate if any of these contribute to life-history traits relevant for aquaculture, including growth, smoltification, and disease resistance. When selective sweeps have been connected to specific tissues, pathways, and traits in salmon, this knowledge can be further used to identify potential targets for introducing genetic variants possibly conferring relevant traits into farmed salmon strains to increase their robustness, for example by the use of marker-assisted breeding or gene editing.

| CON CLUS IONS
We describe genomic regions under divergent selection in anadromous and landlocked populations of Atlantic salmon across the Northern Hemisphere, and we report genes and genetic variants that may be of relevance for improving fish welfare in aquaculture production and for conservation and management related issues.
The analyses were done using pooled whole-genome sequencing of 12 anadromous and 6 landlocked salmon populations, which were used in a large genomewide association study. The study revealed 28 highly differentiated selective sweeps with SNPs close to fixation in all assayed landlocked populations, indicating parallel selection of alleles beneficial for a landlocked life cycle. Among the most interesting selective sweeps, we found gpr132, cdca4, sertad2 and threonine tRNA in Chr 15, cadm1 containing a highly differentiated missense SNP in Chr 13, and ppargc1a on Chr 18 which display increased expression in gills upon saltwater exposure. Further, we identified regions in the genome where the landlocked salmon show consistent signs of relaxed purifying selection, including the gene igf1, indicating genomic regions containing genes that are important during the seawater phase. Further studies will aim to characterize candidate genes and genotypes from the selective sweeps to pinpoint causative variants with potential for improving welfare in farmed salmon strains and to enhance our understanding of the underlying biology of transition into seawater.

CO N FLI C T O F I NTE R E S T
None declared.

D I SCL A I M ER S
Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. government.

DATA AVA I L A B I L I T Y S TAT E M E N T
All genomic sequence data used in this study have been deposited on SRA with BioProject ID PRJNA627844, with accession numbers for each sequenced pool listed in Table S1.