Ecological niche modelling and population genomics provide insights into the geographic and demographic ‘explosion’ of a non‐indigenous salmonid

Effective management of non‐indigenous species requires knowledge of their dispersal factors and founder events. We aim to identify the main environmental drivers favouring dispersal events along the invasion gradient and to characterize the spatial patterns of genetic diversity in feral populations of the non‐native pink salmon within its epicentre of invasion in Norway.


| INTRODUC TI ON
An ecological explosion means the enormous increase in numbers of some kind of living organism-it may be[…]a green plant like the prickly pear, or an animal like the grey squirrel.I use the word "explosion" deliberately, because it means the bursting out from control forces that were previously held in restraint by other forces.Elton (1958: 15) Biological invasions are a major challenge of global environmental change, where the introduction and establishment of non-indigenous species (NIS) beyond their natural distributional range is occurring at unprecedented rates around the world (Bax et al., 2003;Shackleton et al., 2019;Simberloff et al., 2013).The impact posed by NIS on the communities and ecosystems they invade is substantial and complex (Bellard et al., 2016;Giakoumi et al., 2019;Shackleton et al., 2019;Simberloff et al., 2013), which vary in degree (negative, neutral, positive), magnitude (benign, moderate, severe) and scale (individual, community, ecosystem) (Mrnak et al., 2023).While only a small proportion of introduced NIS become invasive in the recipient habitat, community or ecosystem (Blackburn et al., 2011), invaders can exert severe per-capita effects on local communities, causing detrimental ecological and economic impacts (Giakoumi et al., 2019;Shackleton et al., 2019).Invasive NIS can have an immediate impact on recipient communities through direct ecological interactions, including competition, predation and disease transmission, as well as an indirect impact by modifying habitats (e.g.reduced plant cover, increased soil erosion, increased eutrophication, pollutant transference), community structure and dynamics and possibly disrupting their suitability (Dunlop et al., 2021;Lurgi et al., 2016;Simberloff et al., 2005).
Furthermore, invasive NIS can have large impacts on the genetic integrity of indigenous species through hybridization, introgression and selection, causing further shifts in the evolutionary pathway of native species and posing a threat to species integrity (Allendorf & Lundquist, 2003;Echelle & Echelle, 1997;Thirstrup et al., 2015).As a result, actions that prevent species establishments before they occur as well as post-establishment control measures have become major conservation priorities, with significant investments in invasive species and biosecurity research in the recent years (Mellin et al., 2016;Sandlund et al., 2019).Effective and successful management of NIS requires an understanding of how they were introduced, how they spread and how they are genetically structured.
Ecological niche and species distribution models (SDMs) provide an analytical framework for interpolating and extrapolating species distributions by linking, respectively, physiological (mechanistic models) or distribution data (correlative models) to environmental predictor variables that contribute to a species' survival and propagation (Araújo et al., 2019;Sillero & Barbosa, 2021;Václavík & Meentemeyer, 2009).The objective of SDMs in the context of NIS is to identify the environmental factors that promote or inhibit the establishment of NIS in non-native ranges and to predict their distribution and suitable habitats, as well as to determine their potential for spread factoring in the mode and strength of dispersal capacity (Burgess et al., 2022;El-Barougy et al., 2021;Melo-Merino et al., 2020;Rödder et al., 2009).Moreover, already built SDMs can be projected as well to other scenarios in space and time (Barbosa et al., 2009;Sillero & Barbosa, 2021;Werkowska et al., 2017;Yates et al., 2018) to further uncover broader suitable habitats in the face of ongoing anthropogenic pressures (Burgess et al., 2021(Burgess et al., , 2022;;El-Barougy et al., 2021;Melo-Merino et al., 2020;Rödder et al., 2009).
Such data are essential for making informed management decisions, such as identifying and prioritizing infestation-prone habitats (Melo-Merino et al., 2020) and planning post-establishment interventions such as harvesting, culling (Altenritter et al., 2022;Sandlund et al., 2019) or disrupting dispersal routes through restricting access (e.g.fencing, trapping) in conjunction with harvesting and/or culling (Burgess et al., 2022;Letnic et al., 2015).Therefore, SDMs have become popular tools for nature conservation and management, and their robustness can be increased by incorporating population genetics and genomics data to account for the adaptive capacity of populations distributed across a heterogeneous landscape (Burgess et al., 2021;Bush et al., 2016;Mathieu-Bégné et al., 2021;Scoble & Lowe, 2010).
Applied population genomics can reveal the demographic history of NIS, which can identify and/or validate its founders or putative introduction sources.In combination with historical, ecological and indigenous knowledge, genomic tools can be very informative that SDMs and genomic data can reveal species distribution determinants and provide indicators to aid in post-control measures and potentially inferences about their success.

K E Y W O R D S
adaptive management, invasive bridgehead effect, non-indigenous species, Oncorhynchus gorbuscha, pink salmon, species distribution modelling and influence management decisions by highlighting important information regarding population dynamics and biological invasion processes (Burgess et al., 2021;Hogg et al., 2022;Matheson & McGaughran, 2022).Invading or introduced species often have low genetic diversity in their pioneer populations because of founder effects (Colautti et al., 2017;Dlugosch et al., 2015;Estoup et al., 2016).Since normally there is no hybridization in this first wave of invaders, genetic diversity among these individuals is projected to be much lower than in the founder population.Individuals can be assigned to specific populations based on their genetic make-up, and major disruptions to gene flow can be identified, which is useful information in the fight against subsequent dispersal events along invasion (dispersal) gradients or 'invasion front'.
Pink salmon Oncorhynchus gorbuscha (Walbaum, 1792) is one of six Pacific salmon species in the family Salmonidae (Radchenko et al., 2018;Scott & Crossman, 1973).Pink salmon have a 2-year lifespan, during which they spend about 1 year at sea before returning to native rivers to spawn at age 2 in odd or even years (Quinn, 2005).
The temporal isolation (allochronic) between odd-year and evenyear spawners has led to the evolution of reproductively isolated year-classes, or -lineages (Aspinwall, 1974;Christensen et al., 2021;Quinn, 2005).Genome-wide studies showed that genetic divergence is greater between broodlines rather than geographically (Christensen et al., 2021;Limborg et al., 2014;Tarpey et al., 2018) but not exceeding the general level of inter-population differences in salmonids (Althukov et al., 2000;Sandlund et al., 2019).Pink salmon is an economically valuable, heavily exploited species in its native range, so conservation efforts have been made to maintain its exploitation levels.Pink salmon is a native of the southern Arctic Ocean (between North America and Asia) and the northern Pacific Ocean (between Alaska and British Columbia, on Russian side south to Japan), but it is increasingly expanding outside its native and introduced ranges due to translocation and rising temperature (Brenner et al., 2012;Hendry et al., 2004;Quinn, 2005;Sandlund et al., 2019;Ueda, 2012).
In the late 1950s, pink salmon was transferred from the northern Pacific Ocean and introduced it to the Laurentian Great Lakes and White Sea drainage basins of North-West Russia.Pink salmon has been detected in Norwegian waters since the late 1950s, after the first successful introduction of pink salmon fry in rivers draining to the White Sea.Pink salmon in Norway has a predominantly odd-year broodline, with even-year fish being far less abundant.During the last 5-10 years, the odd-year broodline of the 2-year life salmonid has shown a rapid and possibly climate-driven geographic and demographic expansion, with an unanticipated peak in 2017 in Norway and other parts of the North Atlantic area (Armstrong et al., 2018;Millane et al., 2019;Mo et al., 2018;Sandlund et al., 2019).In Norway, in the year 2017, local initiatives were implemented with the aim of capturing and eradicating the pink salmon (Mo et al., 2018).This action was taken in response to the official designation of the species as an undesirable invasive species, despite the limited scientific evidence available at the time (Jonsson & Jonsson, 2018).However, later in 2021, the pink salmon was observed in rivers from north to south, along the entire coastline of Norway (Berntsen et al., 2022;Diaz Pauli et al., 2023).Secondary spread and establishment to locations other than the original target release area(s) is now an ongoing major challenge for aquatic and natural resource management in Norway and other northern European countries because it becomes impossible to contain unwanted species with a high rate of longdistance straying, such as pink salmon (Brenner et al., 2012;Hendry et al., 2004;Quinn, 2005;Ueda, 2012).
Here, we used SDM and population genomics to investigate mechanisms that shape the distribution of the range-expanding pink salmon in northern Europe.We first conducted SDM using four modelling techniques with varying levels of complexity, which encompassed both regression-based and tree-based machine-learning algorithms, using climatic data from the present to 2050.Then, we used the triple-enzyme restriction site-associated DNA sequencing (3RADseq) approach of Bayona-Vásquez et al. (2019) to genotype high-quality single-nucleotide polymorphisms (SNPs) in several sampling populations collected from different river systems within the invasion hotspot of the pink salmon in Arctic Norway.We specifically aimed to (1) identify the key climatic drivers of pink salmon distribution during its reproductive stage, (2) determine the climatic favourable habitats within and beyond the current non-native and introduced ranges in in northern Europe, (3) determine the extent of projected distributional ranges under a climate scenario by the 2050s, (4) understand how genetic drift and gene flow influenced the spatial genetic patterns of the pink salmon following the 2017 surge and culling events within the invasion hotspot.We anticipate that such baseline data will aid in making genomic-informed management decisions on the species' spatial invasion gradient from northeast (Russia) to north-west (Norway).

| Species distribution modelling
We inferred climatically favourable areas for the pink salmon under current and future climate with ecological niche and species distribution models (SDMs) based on this species' current distribution in Norwegian rivers on presence data obtained during its spawning period.In terms of criteria for handing and processing of response and predictor variables, as well as model creation and evaluation, we followed best-practice standards in ecological niche and species distribution modelling (Araújo et al., 2019;Sillero & Barbosa, 2021).As Appendix S2, we present a description of the six modelling stages used to construct our SDMs in accordance with the ODMAP (Overview, Data, Model, Assessment and Prediction) protocol of Zurell et al. (2020).
Although prior research has emphasized the significance of utilizing both the native and invaded ranges for predicting the spread of invasive species (e.g.Zhang et al., 2020 and references therein), we opted to focus on the invaded range restricted to Norwegian estuarine and freshwater ecosystems.This is because the pink salmon in Atlantic waters were introduced over six decades ago (~30 generations) and in eastern part of North (or North-eastern) Norway, they constitute already established, locally adapted feral populations (Bjerknes & Waag, 1980;Gordeeva et al., 2015;Niemelä et al., 2016;Sandlund et al., 2019;Zubchenko et al., 2004), which are likely themselves the source of additional new introductions rather than from invaders originating from the native introduced range.This self-accelerating process whereby invasion begets invasion, that is, the so-called invasive bridgehead effect, is not that uncommon for highly invasive species (Bertelsmeier & Keller, 2018;Lombaert et al., 2010).Consequently, while global occurrence data may provide for a more accurate assessment of the whole potential invasive range, it is more important from a management standpoint to predict the places most likely to be invaded next (Baquero et al., 2021;Barbet-Massin et al., 2018).Indeed, SDMs built on NIS occurrences only at invaded areas have proven capable of producing accurate forecasts and providing actionable information for the management of invasive species (Baquero et al., 2021;Barbet-Massin et al., 2018;Muñoz & Real, 2006;Pereira et al., 2020).Occurrence and environmental data retrieval and processing and distribution modelling were performed with R v.4.1.3(R Core Team, 2022) using custom R scripts.

| Occurrence data
We collected occurrence data of pink salmon in Norway from the GBIF-the Global Biodiversity Information Facility-biodiversity database (https:// www.gbif.org/ ).These records were complemented with additional information from invasive species monitoring reports (Berntsen et al., 2018(Berntsen et al., , 2020(Berntsen et al., , 2022;;Husa et al., 2022) and peer-reviewed scientific literature (Diaz Pauli et al., 2023;Sandlund et al., 2019).We mapped the cleaned data and visually inspected points of occurrences to identify and remove occurrence data with coordinate uncertainty >10 km, given that our target spatial resolution of grid cells was 10 km × 10 km (i.e. 100 km 2 ).Additionally, we cleaned species occurrence data by removing duplicates, records of 'absence' (with abundance or individual count reported as zero) and occurrences with missing and/or unlikely coordinates.The final database included 300 UTM cell occurrences, where each occupied cell counted as only one presence, regardless of the overall number of observations per cell.This is equivalent to thinning occurrence data at 10-km intervals, and it lowers the impact of sample bias caused by disproportionally extensive survey efforts in few particularly accessible places (Baquero et al., 2021;Pereira et al., 2020).Given the invasion history of pink salmon in Norway, we considered any remaining spatial autocorrelation in the data, that is, 100-km 2 cells having a higher chance of presence because they are close to other 100-km 2 cells with presence, is a natural effect of dispersal processes.Consequently, we did not intentionally remove such cells for downstream distribution modelling (Baquero et al., 2021;Legendre & Legendre, 2012).
Similarly, to assess the probable future distribution in pink salmon invasive range, we utilized the climate projections from WorldClim.
All environmental variables except elevation were obtained from the representative concentration pathway (RCP) 2.6 scenario (minimum emission hypothesis) under the global climate model community earth system model (CCSM4) (Gent et al., 2011) for the period 2050 (average for 2041-2060).
The learning rate is used to govern how much each individual tree contributes to the larger model.The number of nodes in a tree is adjusted by tree complexity, which governs the model's interaction complexity (e.g. if tree complexity is 3, up to three-way interactions can be fit).During preliminary model runs, we evaluated differing combinations of learning rate and tree complexity values, and determined that shrinkage = 0.1 and interaction.depth= 2 were the parsimonious optimum parameters for building the BRT model.

| Model validation
To evaluate model performance, we initially partitioned the data into fivefold cross-validation sets: the grid cells of the study area were split into five groups, or folds, using R package blockcv v.2.1.4(Valavi et al., 2018), with a systematic selection and assignment of spatial blocks into 10 100 × 100 km 2 folds.The tactic of spatially separating our data into training and test data sets enabled us to determine if models perform effectively in both nearby and distant areas, which is especially crucial if models are to be extrapolated in space or time (Baquero et al., 2021;Valavi et al., 2018).This resulted in five different SDMs in which each fold was sequentially left out of the training data and used for validation of model predictions (Fielding & Bell, 1997), where an overall total of 20 models (4 modelling algorithms × 5 replicates) were computed in the cross-validation phase.
We then used the R package modeva v.3.5 (Barbosa et al., 2013) to assess the cross-validation performance of each modelling algorithm.Following Lake et al. ( 2020)'s proposal to employ various and diverse metrics for model evaluation, we selected three metrics that focus on different aspects of model performance: (i) the Area Under the receiver operating characteristic Curve (AUC) that assesses overall discrimination performance, that is, the model's ability to distinguish between presence and absence localities by assigning them higher predicted values; (ii) the True Skill Statistic (TSS), which assesses classification performance by calculating the proportion of correctly classified presences and absences (Allouche et al., 2006), with training prevalence as the threshold value for identifying predicted presences and predicted absences; and (iii) Miller's calibration slope that evaluates model reliability, defined as the total deviation of predicted probabilities from observed occurrence frequencies (Miller et al., 1991;Pearce & Ferrier, 2000).The values of these metrics were then averaged across the five cross-validation folds, and models whose average cross-validation performance fell below an acceptable threshold for any of the model evaluation criteria were eliminated from further analysis (Baquero et al., 2021).This performance threshold was set to 0.7 for AUC, which limits weak predictions (Baquero et al., 2021;Rapacciuolo et al., 2012;Swets, 1988).
Given that AUC can vary from 0 to 1 and TSS can range from −1 to 1, the TSS proportional performance threshold was set to 0.4.We selected a performance criterion of 0.5 above or below the ideal value of 1 for Miller's calibration slope (Baquero et al., 2021).

| Model post-processing and projection
After cross-validation, we projected retained models to a broader Area across the Arctic and North Atlantic Oceans, including the Baltic and Mediterranean Seas, and to a future climate scenario.
Given the sensitivity of model projections to removal of data points (Araújo et al., 2019), models were fitted again using the whole data set (i.e.without excluding validation data) in this instance.
For all regression-based approaches, we transformed projected probability values to prevalence-independent favourability, which is a function of presence probability and the presence-absence ratio of the species in the modelled sample (Real et al., 2006), using the Fav() function of the R package fuzzySim.This eliminates TA B L E 1 Variables used for modelling climatic favourability modelling of pink salmon.the effect of species prevalence, making the predicted values consistent and directly comparable across SDMs, species and time periods (Acevedo & Real, 2012;Real et al., 2017).We further used fuzzySim to calculate the similarity of the predicted habitat suitability and favourability between the best-performing models (MaxEnt, GLM, GAM and BRT) and the ensemble predictions for current and future climate scenarios.We used two niche comparison metrics of Warren et al. (2008), namely Schoener's D index (Schoener, 1968) and Warrant's I index, which range from 0 (no similarity or overlap) to 1 (complete similarity or overlap) (Warren et al., 2008(Warren et al., , 2010)).
Moreover, we examine the extrapolation potential of the SDMs across the North Atlantic and Arctic Oceans using the Multivariate Environmental Similarity Surface (MESS: Elith et al., 2010), with the mess() function of the diSmo v.1.3-9R package (Hijmans et al., 2022).

| Sampling and DNA extraction
Following the 2017 surge and culling events within the pink salmon's invasion hotspot, we launched a small citizen science initiative in 2019 in which local river guards, fishermen and anglers provided tissue or fin clip samples for the species in its putative invasion hotspot.The sampling area was comprised of three river basins (Munkelva, Neidenelva and Klokkerelva) and two sections of a fjord arm, Jarfjord (Jarfjord mouth and Jarfjordbotn), on Varanger Fjord in Sør-Varanger Municipality in Troms and Finnmark county, Norway (Table 2, Figure 1).It is likely that samples collected in fjords are fish that had not migrated to their spawning rivers or were a secondary wave of invaders.Nevertheless, we consider fjords as (unknown) 'river' basins.We used the DNeasy Tissue Kit following the manufacturer's instructions (Qiagen) to extract total genomic DNA from each sample.Then, we quantified all DNA stocks with the Quantus™ Fluorometer using the QuantiFluor® ONE dsDNA System (Promega).Finally, we normalized each DNA sample to 10 ng/μL in nuclease and ion-free water and stored the working stocks at −21°C prior to 3RAD library preparation.

| 3RAD library preparation and sequencing
We prepared RADseq libraries using the Adapterama III library preparation protocol (Bayona-Vásquez et al

| Read mapping and variant calling
After assessing the quality of the sequencing run using faStQC (Andrews, 2010) plus multiQC v.2.31 (Ewels et al., 2016), we processed the data as follows.First, we demultiplexed, cleaned and trimmed the raw sequences with the perl script process_radtags.plincluded as part of StaCkS v.2.51 (Rochette et al., 2019).We ran the script using 'inline_inline' mode, removed reads with an uncalled bases (-c) and discarded reads with low quality scores (-q) with a default slid- Fisheries and Oceans Canada (Christensen et al., 2021).Third, we then constructed the FM-index for the reference genome and individually aligned the cleaned, demultiplexed PE reads from each fish to the indexed reference genome using the BWA-MEM algorithm of BWA v.0.7.17 aligner (Li & Durbin, 2009), excluding reads with a minimum quality score of <30.Alignments were sorted, indexed and read pairs were fixed using tools from the SamtoolS v.1.9suite (Li et al., 2009), and we obtained alignment quality control (QC) statistics for the sorted and indexed alignments using BAMQC as implementing in Qualimap v.2.2.20 (Okonechnikov et al., 2016).Finally, we used the alignments that passed QC to assemble RAD-loci, call SNPs and construct genotypes for individual fish using the perl script ref_map.
pl also implemented in StaCkS, which consisted of two components (gstacks and populations).We ran population analysis in ref_map.pl,with (i) a minimum of 75% of individuals in a population required to process a RAD-locus for that population (-r 0.75), (ii) a minimum of 70% of populations a RAD-locus must be present in to process a locus (-p 4), (iii) discard unpaired reads (--rm-unpaired-reads), (iv) a minimum mapping quality of 20 to consider a read (--min-mapq 20) and (v) a minimum minor allele count (MAC) of 2 required to process an SNP at a RAD-locus (--min-mac 2).

| Data filtering
A recent whole-genome duplication occurred in the common ancestor of salmonids 80 million years ago (Mya) following their separation from Esociformes 125 Mya, and a substantial percentage of the salmonid genome remains duplicated (Allendorf et al., 2015;Lien et al., 2016;MacQueen & Johnston, 2014).Therefore, in the genomes of salmonids, there is a prevalence of paralogous loci, which can result in the occurrence of both disomic and tetrasomic inheritance within the same genome (Gidskehaug et al., 2011;McKinney et al., 2017).This implies that the presence of SNPs originating from paralogous regions of the genome, known as multi-site vari- MSVs, we further filtered out SNPs that were located in unplaced scaffolds, that is, contigs that were not part of the 26 chromosomes of the pink salmon reference genome used in the present study.We used pGdSpider v.2.1.1.5(Lischer & Excoffier, 2012) to transform VCF files into input files for the respective downstream population genomics programs, where applicable.

| Genome scans
Loci under selection are expected to behave differently from neutral loci in terms of population-related patterns of diversification and are thus inappropriate to infer population demographic history (Holderegger et al., 2006;Luikart et al., 2003).Moreover, neutrality was an assumption underlying the population structure models that we used; thus, we investigated whether SNPs were putatively under selection using a population-level method (BayeSCan) and an individual-level method (pCadapt).For BayeSCan v.2.1 (Foll, 2012;Foll & Gaggiotti, 2008), we set the prior odds (pr_odds) at 10,000 after explorative runs (pr_odds = 100; 1000), which is appropriate for the number of markers in our data sets (Foll, 2012;Lotterhos & Whitlock, 2015), ran the model using 10,000 iterations with a thinning interval of 10, a burn-in of 200,000 steps and 20 pilot runs of 5000 iterations.Then, we assessed convergence by visual inspection of trace plots, and additional diagnostic tests were carried out to confirm convergence such as obtaining estimates of autocorrelation and effective sample size, Geweke's convergence diagnostic and Geweke-Brooks plots (Geweke, 1991) with the R-package Coda v.0.19.4 (Plummer et al., 2006).For further post-processing of BayeSCan output in R, we defined an FDR q-value threshold of 0.05 and defined the respective locus-specific effects (α) and q combinations to classify SNPs: (i) positive α and q ≤ 0.05 as suggestive of diversifying selection, (ii) negative α and q ≥ 0.05 as suggestive of balancing selection and (iii) positive α and q ≥ 0.05 as suggestive of neutrality.For pCadapt v.4.3.3 (Luu et al., 2017;Privé et al., 2020), we examined scree plots to determine background population structure as K principal components and used the first 10 components that captured the majority of population structure in the data.To control the false positive, we set the false discovery rate (FDR) threshold value to 0.05.While BayeSCan is suitable for our population-level sampling design (Foll, 2012;Foll & Gaggiotti, 2008), pCadapt is more reliable for species with complex, hierarchical population structure and are less sensitive to admixed individuals and outliers in the data (Luu et al., 2017;Privé et al., 2020).Thus, to derive a putatively neutral SNP data set for estimating population genetics parameters for population structure and gene flow analyses, we removed outliers identified in any of the two methods to be putatively under selection.

| Genetic diversity and differentiation
To characterize genetic diversity, we calculated average observed heterozygosity (H O ) and within population gene diversity (H s ), Wright's inbreeding coefficient (F IS ) and a population-specific index of differentiation relative to all other sampling populations (β WT , Weir & Goudet, 2017) using the R package hierfstat v.0.5-11 (Goudet, 2005;Goudet & Jombart, 2022).We computed the 95% confidence intervals (CIs) of F IS and β WT values using the bootstrap method (1000 bootstraps) as implemented in hierfstat.We also calculated the number of private alleles at each locus and sampling population with the function 'privateAlleles' of the R package stratag v.2.5.01 (Archer et al., 2017).We calculated genome-wide pairwise F ST estimates using the Weir and Cockerham (1984) method with the function 'pairwise.WCfst' in hierfstat, and computed the 95% CIs of F ST values using the bootstrap function 'boot.ppfst'(nboot = 1000) as implemented in hierfstat.Indices were considered significant if the 95% CI did not include zero.

| Genetic clustering analysis
To investigate population clustering, we first computed a matrix of Nei's genetic distance D A (Nei, 1972;Nei & Takezaki, 1983) for each pair of populations with the function 'stamppNeisD' of the R package Stampp v.1.6.3 (Pembleton et al., 2013).The distance matrix was then used to perform a principal coordinate analysis (PCoA) using the function 'pcoa' of the R package ape v.5.et al., 2018).We first inferred a co-ancestry matrix with the script RADpainter.Subsequently, clustering was done with the Markov chain Monte Carlo method of fineradStruCture, running for 500,000 generations and sampling every 1000 generations; the first 200,000 generations were discarded as burn-in (non-default parameters: -x 200,000 -y 300,000 -z 1000).We also inferred a tree for visualization with fineradStruCture using the tree-building algorithm of Lawson et al. (2012) with 10,000 attempts (nondefault parameters: -mT -x 10,000).fineradStruCture results were plotted with R scripts included in the fineradStruCture package.

| Gene flow analysis
To estimate contemporary gene flow among river basins, we employed BayeSaSS3-SnpS v.1.1 (Mussmann et al., 2019), an expansion of BayesAss v.3.0.4 (Wilson & Rannala, 2003) that facilitates the use of large amounts of SNPs and automates the fine-tuning of model parameters (Mussmann et al., 2019).We performed a first round of analyses with the auto-tuning tool, BA3-SNPs-autotune, developed as part of BA3-SNPs to extensively find optimal mixing parameters for each run with short exploratory runs of BA3-SNPS.
We performed a maximum of 10 repetitions, each time discarding 10,000 generations as burn-in and running for 100,000 generations.The number of repetitions required to find optimal mixing parameters was recorded for each, and mixing parameters verified to produce adequate MCMC acceptance rates (i.e.0.2 < acceptance rate < 0.6).Subsequently, we conducted the final analysis by running three replicates for 10,000,000 MCMC generations, sampling every 1000 generations and discarding the first 1000,000 generations as burn-in.Convergence of parameter estimations was checked with coda.

| Species distribution modelling
The selected uncorrelated variables (Table 1) included mean diurnal range (BIO2), isothermality (BIO3), maximum temperature of warmest month (BIO5), mean temperature of wettest quarter (BIO8), precipitation seasonality (BIO15), precipitation of warmest quarter (BIO18) and elevation (Elev).The contributions of these variables were largely consistent in the four selected models, and all generated SDMs were accurate and performant, where the cross-validation means of AUC, TSS and Miller's slope across five folds ranged from 0.793 to 0.807, 0.482 to 0.547 and 0.575 to 0.961, respectively (Figure S1).We found that MAXENT suitability predictions provided essentially similar spatial patterns to those of favourability from GLM, GAM and BRT, with which they were highly correlated (r 2 > .85; Figure S2).S1).

| Population genomics
We generated a total of 332,338,509 raw paired-end reads from the  similar to the full-SNP data sets (Table 3b).Interestingly, there was an increasing trend in β WT from north-east to north-west, where western populations had nearly a twofold higher β WT compared to eastern populations.
The global F ST for the full-SNP data set was 0.0003, and there was shallow population differentiation with pairwise F ST ranging from 0.002 to 0.005 and 0 to 0.003 (Table 4).The 250-SNP panel uncovered population differentiation congruent with geographic spatial pattern, with pairwise F ST ranging from 0.054 to 0.224 (global F ST = 0.114).We found that pairwise F ST was lowest among western population comparisons (KKE, MKE and NDE) and highest for the eastern versus western population comparisons and between western populations (JAF and JFB) (Table 4).Interestingly, all pairwise F ST including NDE was shallow, except for JAF versus NDE pairwise comparison (Table 4).
We explored genetic clustering using several independent methods, and we did not detect genetic structuring with the full SNP data (data not shown) but with the subset data set of 250 diagnostic SNP ('ancestry-informative' SNPs, hereafter).and MaxMedK) all identified K = 4 as the most likely number of genetic clusters (Figure S5).Notably, the genetic composition of NDE was largely composed of admixed pink salmon (Figure 3b).
Last, the fineradStruCture co-ancestry between genotype pairs uncovered clustering of genotypes by river basin, although the NDE genotypes were weakly clustered where the separation into two subgroups was not well supported (Figure 4a).Most individuals from NDE had a genomic admixture profile and corroborated ADMIXTURE analyses.
Recent migration rates estimated using the BA3-SNPs program were highly consistent between three independent runs of the program.The average migration rates across the three runs were low and asymmetrical among river basins, except for the high migration rates (>0.2) from NDE to all other rivers (Figure 4b,c).

| Climate favourability and potential invasion hotspots
Based on predictions from our SDMs, we uncovered that Norway in the latter half of August or the first part of September (Gjelland & Sandlund, 2012;Niemelä et al., 2016;Diaz Pauli et al., 2023;Sandlund et al., 2019).Furthermore, the spawning grounds of the pink salmon are frequently in the lower regions of rivers (~50 km) and within the tidal zone in short coastal streams (Heard, 1991;Sandlund et al., 2019), though this can vary to >100 km upstream (Niemelä et al., 2016;Sandlund et al., 2019 and references therein).Moreover, the projection of our models in space revealed larger favourable areas for pink salmon in the Pan-Atlantic, which corresponded to confirmed occurrences of the species in the Fennoscandian Peninsula (Gordeeva et al., 2015;Niemelä et al., 2016;Staveley & Ahlbeck Bergendahl, 2022;Zubchenko et al., 2004), Svalbard (Hindar et al., 2020;Sandlund et al., 2019), Greenland (Nielsen et al., 2020), Ireland (Millane et al., 2019;Whelan, 2017), Faroe Islands (Eliasen & Johannesen, 2021), Scotland (Armstrong et al., 2018;Skóra, Jones, et al., 2023a) and Iceland (Gudjonsson & Scarnecchia, 2009;Skóra, Guðbergsson, et al., 2023b;Whelan, 2017).This integration of our models with additional information from the primary literature provides high confidence in the robustness of the provided predictions of habitat suitability and climatic favourability for pink salmon in its non-native range.Climate change is expected to accelerate the current spatial invasion-front and expand the species' potential ranges towards southern Europe and the northwest over the next three decades, according to our future projected models, if no effective post-control measures are implemented.Nonetheless, it has not escaped our notice that our models implicitly rely on ecological niche concepts; consequently, we assume that the pink salmon occurs everywhere where environmental conditions are favourable and that dispersal is not a limiting factor (Jeschke & Strayer, 2008), which does not apply to NIS (Václavík & Meentemeyer, 2009).
Indeed, NIS are frequently absent from specific locations not due to poor habitat quality, but because the species has not dispersed to that location due to stochastic events, geographical barriers and dispersal constraints (Araújo et al., 2005;Araújo & Guisan, 2006;Higgins et al., 1999).As a result, we must exercise caution when employing SDM projections in the context of future climate change.
Nevertheless, considering the pink salmon's propensity to stray and homing behaviour, along with its widespread distribution beyond its introduced and non-native habitats, we argue that the aforementioned conceptual issue with ecological niche concepts is unlikely to skew our future SDM projections for the species.In fact, all known pink salmon occurrences were strongly supported by our predictions, where the species' introduced range in the White Sea region was well supported in all of our space and time model projections.

| Genetic diversity, genetic structuring and connectivity
Human-mediated introductions, as well as the subsequent establishment and spread of NIS, have the potential to cause a founder or bottleneck effect, resulting in low genetic diversity and an increased likelihood of inbreeding (Colautti et al., 2017;Dlugosch et al., 2015;Estoup et al., 2016;Hänfling, 2007).Strikingly, our results revealed that genome-wide genetic variation in the 2019 odd-year cohort of non-native populations of the pink salmon were higher than those of native populations in the North Pacific Ocean (Tarpey et al., 2018).
With a panel of 16,681 SNPs, Tarpey et al. (2018) found that the mean H O and H E per population (n = 7) for the odd-year broodline ranged from 0.154 to 0.172 and 0.156 to 0.172, respectively, with a global F ST of 0.033 (95% CI: 0.032-0.035).Moreover, we found negligible levels of inbreeding within river basins.Several studies demonstrated a similar trend in that introduced, non-native populations can exhibit a genetic paradox in which they can pass through the population bottleneck with minimal genetic diversity loss, exhibit no negative signs of low genetic diversity or inbreeding, rapidly adapt to their novel environment and successfully establish a self-sustaining population in that novel environment (Bodt et al., 2020;Buchholz et al., 2023;García et al., 2017;Kim et al., 2019;McCann et al., 2014;Wang et al., 2019).
Together, these results add to the growing body of knowledge that a founding event is not necessarily linked to a paucity of genetic diversity in invasive species (Gillis et al., 2009;Hänfling, 2007;Roman & Darling, 2007).Furthermore, based on the 'neutral' full-SNP data set, our results likely reflect that pink salmon in Norway may have originated from a single admixed source population.However, the spatial genetic structure inferred with ancestry-informative SNPs uncovered the presence of four ancestral genetic clusters corresponding to river basins, where eastern sampling locations were the most divergent.These genetic clusters are likely the outcome of substantial genetic drift caused by previous successive founder effects during range expansion, combined with the intense culling events in 2017.Furthermore, it has not escaped our notice that the invasive bridgehead effect could be at play here (Bertelsmeier & Keller, 2018;Lombaert et al., 2010).Although we did not have access to contemporary 'native'/introduced populations to explicitly investigate the bridgehead effect in our study system, the contemporary gene flow rates revealed that the unmanaged river network NDE is likely the main source of invaders.Our results did not indicate distancerestricted gene flow based on limited dispersal capability but rather suggested homing behaviour with some degree of straying, where easternmost sampling locations were the most divergent.

| Genome-enabled invasive species management
The management of invasive species requires knowledge of regional-scale dispersal processes and their associated genetic diversity patterns.Results from our population genomic analysis of pink salmon in Varanger Fjord located within the Sør-Varanger Municipality provide actionable information for management officials particularly in terms of major dispersal pathways and the likely source of invaders within the municipality.Genetic clusters aligned fairly well to 'river' basins and to the broad fjord arm divisions of the Varanger Fjord, where these boundaries might provide directly actionable units for municipal management efforts.
For instance, river of importance for migratory salmonids within the Neiden/Munkefjord was found to harbour feral populations of pink salmon with the lowest levels of genetic structure and highest within-population genetic variation.Moreover, our results uncovered that these rivers likely constitute distinct genetic backgrounds and pools of standing genomic variation.
Remarkably, a small panel of 11 'ancestry-informative', outlier SNPs revealed pronounced genetic differentiation between earlyrun and late-run spawners in odd-year lineage of the pink salmon in its native range (Zelenina et al., 2023).These findings supported the hypothesis that the run of pink salmon may be bimodal and driven by temporal separation in the hatching times for offspring of early and late spawners in its native range (Lennox et al., 2023;Taylor, 1980).Interestingly, a similar bimodal run of the odd-year lineage was also reported in several rivers within the invasion centre in Finnmark (Bjerknes, 1977;Lennox et al., 2023).The postcontrol measures in Finnmark involve the use of river trap systems in the lower parts of rivers and fishing nets (Berntsen et al., 2022), but so far the timing of the establishment and removal of these restricting access measures has often allowed the arrival of both early-run and late-run spawners in the local rivers (personal correspondence with local river guards).Consequently, the current study serves as a benchmark for integrating genomics to answer questions that inform management decisions.

| CON CLUS IONS
To our knowledge, this study provided the first assessment of ecological niche and species distribution models (SDMs) and population genomics of the pink salmon outside its native distribution.
Our integrative framework of SDMs and population genomics window of 15% of the length of the read and raw Phred score of TA B L E 2 Information about localities (geographic coordinates of the river mouth, catchment area and mean annual flow at the outlet of rivers;Sandlund et al., 2019) and sample sizes (N) for pink salmon from the respective Norwegian rivers and fjord.rescued sequence tags (internal tags) and RAD-tags (enzyme over-hang) within two mismatches of their expected sequence (-r); otherwise, reads were discarded.We truncated (-t) PE150 reads to 140 nt to have equal length among all reads with different barcodes.Second, we downloaded the chromosomal-level reference genome of the pink salmon (accession number GCA_017355495.1; assembly version: Ogor_1.0 (odd year); scaffolds: 20,642; contigs: 20,663; N50: 1.8 Mb; L50: 386) from NCBI Genome database (https:// www.ncbi.nlm.nih.gov/ genome/ ) under BioProject PRJNA556728, ants (MSVs), can lead to significant distortions in the process of SNP calling and biased population genomics estimates(Nadukkalam Ravindran et al., 2018).Therefore, we base our data filtering on the prior work of McKinney et al. (2017) to filter out multiple copy loci F I G U R E 1 (a) Sampling map for the pink salmon Oncorhynchus gorbuscha in Varanger Fjord, Norway, and image of a pink salmon obtained from Wiki for Fishing Planet (https:// wiki.fishi ngpla net.com/ Pink_ Salmon).(b) The eastern-most Varanger Fjord arm Bugøyfjord, where sampling river network Klokkerelva (orange point with white outline) drains.(c) The two adjacent Neiden-and Munkefjord arms of Varanger Fjord, where the central-eastern sampling river networks Neidenelva (purple point with white outline) and Munkelva (red point with white outline) discharge.(d) The western-most Varanger Fjord arm Jarfjord along which pink salmon was collected at two sampling sites within the upper and lower ends, 'Jarfjord' (blue point with white outline) and Jarfjordbotn (green point with white outline), respectively.(duplicatedSNPs, MSVs) from the StaCkS filtered SNP data set.To achieve this, we used the filtering procedures and custom scripts available in StaCkS WorkfloW v.2 (https:// github.com/ enorm andeau/ stacks_ workflow).A detailed account of the filtering procedures can be found within the Appendix S1.After filtering out Figure S2).Moreover, favourability remained fairly high across the current invasive distribution range of pink salmon, with model projections across the Arctic and North Atlantic Oceans, including the Baltic and Mediterranean Seas, revealing a high number of suitable habitats (Figure2), many of which had previously reported pink salmon occurrences, including the Fennoscandian Peninsula, First, the PCoA revealed geographic clustering of pink salmon from different river basins, where samples from respective river basins formed individual clusters except for Neidenelva (NDE).Overall PCoA clustering patterns revealed at least five genetic clusters based on the first three principal components (PCs), which together accounted for 48.7% of the genetic variation (Figure 3a).Second, the postprocessing of the ADMIXTURE results based on lowest average CV F I G U R E 2 Space and time projections of habitat suitability for top performing species distribution models for pink salmon Oncorhynchus gorbuscha.Colour scale ranges from darker purple/blue-least suitable-through darker green to brighter yellow/green-most suitable.MAXENT habitat suitability and favourability (FAV) from GLM, GAM and BRT models for space (a) and (b) time projections.Shaded semitransparent grey overlays depict multivariate environmental similarity surfaces (MESS) maps for environmental dissimilarity of projection layers relative to modelled layers, where predictions in those areas should be treated with strong caution.TA B L E 3 Genetic variation descriptors of pink salmon within each sampling river basin and fjord, based on the full 3RAD SNP data set (a) and the diagnostic SNP data sets of 250 SNPs based on the highest locus-specific F ST (b): number of individuals (N), number of private alleles (A P ), allelic richness (A R ), observed heterozygosity (H O ), gene diversity (H S ), inbreeding coefficient (F IS ) and sampling population-specific F ST (β WT ).
score across replicates identified K = 3 as the most likely number of genetic clusters (FigureS4), which was consistent with the number of clusters based on first two principal components and discriminant analysis of the PCoA (Figures 3b).
However, the four estimators of the Puechmaille method (MedMeaK, MaxMeaK, MedMedK harbours a high proportion of climatically favourable areas for the pink salmon, where the northernmost distribution and invasion centre in Finnmark was confirmed as the invasion hotspot.Our models identified that the main environmental drivers for the ecological explosion of the pink salmon were temperature-related (BIO2, BIO3, BIO5 and BIO8) and precipitation-related (BIO15 and BIO18) variables.Additionally, elevation (Elev) was identified as a key driver for habitat suitability, where lower altitudinal areas had the highest favourability.In their non-native range, these main environmental drivers coincide with the seasonal period of arrival and river ascent of adult pink salmon back into their natal rivers for spawning.In Norway, pink salmon migrate up into the rivers primarily from mid-June to late July/early August, where spawning then usually occurs F I G U R E 3 Clustering analyses for a global picture of genetic cluster patterns of pink salmon individuals from Varanger Fjord across five river basins based on 250 genome-wide SNPs.(a) Principal coordinate analysis (PCoA) and (b) admixture clustering analyses.CLUMPAKaveraged Bayesian clustering (ADMIXTURE) outputs for 20 independent runs of K = 2-5 for pink salmon individuals from Finnmark.Numbers below each K-value show the proportion of runs that converged to solution presented.A single vertical line represents each individual; a black line separates sample sites; the whole sample is divided into K colours representing the number of clusters inferred.The colours show the estimated individual proportions of cluster membership.

F
I G U R E 4 (a) fineradStruCture pairwise co-ancestry matrix and tree based on 250 genome-wide, diagnostic SNPs.Outer tick marks represent individuals, but labels have been removed and replaced with a coloured bar corresponding to river basin of origin for visualization.The central matric panel shows co-ancestry between genotypes, with light yellow indicating low co-ancestry, and darker yellows, oranges and reds to black indicating progressively higher co-ancestry.Solid black squares indicate the four major clades corresponding to individual river basins and the direction of invasion-front from east to west.Dashed squares indicate the admixed NDE river basin and the likely two sub-clades.(b) Circular migration plot exhibiting, for each river basin, the fraction of migrant individuals (size of the band) per generation and its origin.(c) The insert table depicts mean migration rate between river basins colour coded from low as green to high as red.
provided a general paradigm for understanding climatically favourable areas, population differentiation and dispersal patterns involved in inter-basin straying for the pink salmon in its northernmost invasion and current distribution in Norway.Our study uncovered novel and important results regarding feral population of pink salmon following the 2017 surge and culling events.First, we uncover the invasion hotspots for the pink salmon in Norway and the broader the Pan-Atlantic under current and future climates with SDMs.Second, we found that genome-wide patterns of genetic diversity within populations varied little between river basins but varied with an increasing trend from north-east to north-west when using a subset of 250 ancestry-informative SNPs.Third, our results revealed that the 2019 cohort of the pink salmon constitute of at least five genetic clusters that follow a hierarchical geographic pattern.Last, contemporary migration rates were asymmetrical with Neidenelva (NDE) serving as a source population of invaders in the ongoing secondary spread of the pink salmon in Norway, or at least one of the source areas.These findings derived from our SDMs and population genomic analysis of pink salmon furnish management officials with practical insights, specifically concerning climatically favourable areas, significant dispersal routes and the probable origin of intruders within the of 96 pink salmon samples.Processing of raw Illumina data by the program process_radtags resulted in the removal of a mean insert length of 515.1 bp (±187.6 bp) and a mean effective per-sample coverage of 16x (min = 5.8x, max = 38.0x).After the initial SNP filtering with the populations program, we retained 165,231 RAD loci composed of 181,572 SNPs.After secondary filtering, thinning step in StaCkS WorkfloW, purging closely related individuals and removal of MSVs and SNPs within non-chromosomal scaffolds, we retained a final panel of 43,719 polymorphic SNPs (genotyping rate = 0.98; n = 73; Figure S3, TableS2).We detect 9356 putative SNPs under balancing selection and zero SNP putatively under divergent selection with BayeSCan and 643 putative SNPs under divergent selection with pCadapt, where 140 outlier SNPs overlapped between the two approaches.We removed all WT within rivers was among pink salmon in the western-most river KKE (Klokkerelva; 0.005, 95% CI: 0.001-0.008).With the 250-SNP data set, H O and H S ranged from 0.23 to 0.272 and 0.233 to 0.268, and F IS values ranging from −0.058 to −0.065 (Table3b).The observed A P ranged 0-4.The overall β WT index was 0.235 (95% CI: 0.180-0.290),where pink salmon in KKE had a larger β WT Pairwise F ST among sampling populations of pink salmon estimated using the full and the diagnostic SNP data sets.ST -values for full data set are below the diagonal and those of the diagnostic data set are above the diagonal.