Post‐glacial establishment of locally adapted fish populations over a steep salinity gradient

Studies of colonization of new habitats that appear from rapidly changing environments are interesting and highly relevant to our understanding of divergence and speciation. Here, we analyse phenotypic and genetic variation involved in the successful establishment of a marine fish (sand goby, Pomatoschistus minutus) over a steep salinity drop from 35 PSU in the North Sea (NE Atlantic) to two PSU in the inner parts of the post‐glacial Baltic Sea. We first show that populations are adapted to local salinity in a key reproductive trait, the proportion of motile sperm. Thereafter, we show that genome variation at 22,190 single nucleotide polymorphisms (SNPs) shows strong differentiation among populations along the gradient. Sequences containing outlier SNPs and transcriptome sequences, mapped to a draft genome, reveal associations with genes with relevant functions for adaptation in this environment but without overall evidence of functional enrichment. The many contigs involved suggest polygenic differentiation. We trace the origin of this differentiation using demographic modelling and find the most likely scenario is that at least part of the genetic differentiation is older than the Baltic Sea and is a result of isolation of two lineages prior to the current contact over the North Sea–Baltic Sea transition zone.


| INTRODUC TI ON
Genetic divergence of populations following establishment in new environments provides important insights into fundamental evolutionary processes, such as local adaptation, hybridization, speciation and post-glacial recolonization (Abbott et al., 2013;Hewitt, 2011;Schluter, 2009). In particular, recent colonization of a new environment gives several advantages for study; one being that the potential source populations are still present and available for analysis. A second advantage is that the accumulation of new genetic variation from new mutations and the continuous loss of genetic variation through genetic drift will have had less impact compared to populations established a long time ago.
For recent events, most of the observed genetic differentiation between the source and the new population is likely to be a consequence of sorting of genetic variation that is already present in the source population(s) by the altered selection pressure in the new environment (Barrett & Schluter, 2008). Today, genome-wide scans of genetic variation are used to analyse the genetic background of recent colonization events and, if complemented with experimental studies at the ecological and phenotypic level, will contribute understanding of when and how new environments are colonized.
Strong phenotypic and genetic changes over small-scale environmental shifts are common in a range of plant and invertebrate populations (Soria-Carrasco et al., 2014;Twyford & Friedman, 2015;Westram et al., 2018). Emblematic examples of such shifts in vertebrates include populations of three-spined stickleback Gasterosteus aculeatus invading recently formed habitats of lakes and streams over extremely short periods of time (50-150 years) (Lescak et al., 2015;Marques et al., 2016) and the repeated evolution of resistant populations of the Atlantic killifish Fundulus heteroclitus in polluted habitats (Whitehead, Clark, Reid, & Hahn, 2017). An opportunity to study strong divergence of populations at much larger spatial scales is to focus on post-glacial establishment in habitats that became available after the retreat of the last glacial ice cap in northern Europe and North America. Studies of colonization of post-glacial lakes have repeatedly suggested that large-scale phenotypic and genetic divergence is involved in local adaptation both among and within lakes (Bernatchez, Laporte, Perrier, Sirios, & Bernatchez, 2016;Gagnaire, Pavey, Normandeau, & Bernatchez, 2013;Wilson et al., 2004).
The transformation of the post-glacial Ancylus Lake into the brackish water Baltic Sea some 8-10,000 years ago (Björck, 1995;Zillén, Conley, Andrén, Andrén, & Björck, 2008) offers opportunities to study genetic differentiation in a number of marine species colonizing the new basin. After the post-glacial formation of the Baltic Sea initially at a salinity ~1/2 of ocean salinity, but currently of less than 1/3, some 10% of marine species present in the North Sea outside the Baltic Sea have colonized the new environment and established local populations and formed contact zones with the North Sea populations (Johannesson & André, 2006). The majority of the successful colonisers for which we have genetic data show rather dramatic genetic shifts coinciding with the steepest part of the salinity gradient southeast of the Danish straits (Barth et al., 2019;Johannesson & André, 2006;Le Moan, Gaggiotti, et al., 2019;Luttikhuizen, Drent, Peijnenburg, van der Veer, & Johannesson, 2012;Martinez-Barrio et al., 2016). However, there are also examples of shifts taking place further inside the Baltic Sea (e.g., Momigliano et al., 2017). Notably, some Baltic populations originate from recently derived Pacific lineages (Nikula, Strelkov, & Väinölä, 2007;Väinölä & Strelkov, 2011), whereas others have North Atlantic origins that had, prior to establishment in the Baltic Sea, been isolated in different refugia for some time (Le Moan, Gaggiotti, et al., 2019), or diverged very rapidly during an initial phase of isolation during establishment in the Baltic Sea (Momigliano et al., 2017). The sand goby (Pomatoschistus minutus Pallas, 1770), a small marine shallow-water annual fish, has successfully colonized the complete North Sea-Baltic Sea salinity gradient (Kullander, Nyman, Jilg, & Delling, 2012;Wiederholm, 1987). As detailed below, salinity is likely to affect a multitude of traits in this species, including physiological, ecological and behavioural traits. The effects of salinity on reproductive traits might be particularly important, as one expects these traits to be under strong selection compared to other traits (e.g., Hendry, 2016). With external fertilization, gametes are also likely to be adapted to the local habitat (Browne et al., 2015;Griffin et al., 1998;Morisawa, 2008), and in the case of sand goby, it seems that salinity only affects the males' ability to reproduce since eggs will develop and hatch regardless of female origin as long as the male spawns in his native salinity (Svensson et al., 2017).
Here, we investigate whether the establishment of sand goby in the brackish water Baltic Sea involves adaptive genomic divergence and local adaptation by combining analyses of genetic variation with experimental tests of sperm motility, which is a key reproductive trait (Gage et al., 2004;Gasparini, Simmons, Beveridge, & Evans, 2010).

| Study species
The sand goby is a small annual fish distributed in shallow seas and estuaries in the NE Atlantic Ocean from northern Norway (Tromsø) to southern Spain (Guadalquivir estuary), the northern Mediterranean Sea and the western Black Sea (Kullander et al., 2012;Miller, 1986). It is established in the Baltic Sea at least up to Luleå and Oulu archipelagoes in the Bothnian Bay (Kullander et al., 2012), where salinity is as low as 1.8-3.8 PSU (Andersen et al., 2017). During reproduction, goby males provide paternal F I G U R E 1 (a) Map of the North Sea-Baltic Sea transition zone with sampling sites. Kristineberg is the site for the sand goby reference genome individual. For details on sampling and site information, see Table S1. (b) Plot of principal component analysis (PCA) of all 22,190 sand goby SNP genotypes. Per cent of variation explained is 2.4982% on the x-axis and 0.9237% on the y-axis. (c) PCA plot of the significant outlier SNPs (n = 314) from BayeScEnv. Per cent of variation explained is 21.115% on the x-axis and 3.379% on the y-axis care (Breder & Rosen, 1966). The sand goby male builds a nest under a substrate (e.g., bivalve shell, stone) and females attach their eggs to the ceiling of the nest (Forsgren, 1999). Both sexes breed repeatedly during their single breeding season. A male typically cares for eggs of multiple females and parasitic spawning is common (Järvi-Laturi, Lindström, Kvarnemo, & Svensson, 2011;Singer, Kvarnemo, Lindström, & Svensson, 2006;Svensson & Kvarnemo, 2007).
Ecologically, gobies play an important role in the marine ecosystem both in the Baltic Sea and in the Atlantic. For example, seasonally some 50% of pelagic fish larvae in the northern Baltic proper are gobies (Parmanne & Lindström, 2003) with the sand goby being the most abundant (Ojaveer, Gross, Laur, Arula, & Klais, 2017).
Sand gobies are benthic and as they are prey for many pelagic fish (Salvanes & Nordeide, 1993), they form an important trophic link between the benthic ecosystem and pelagic ecosystem.
Salinity has a direct impact on osmoregulation and many other traits of sand goby. Geographical differences in salinity also affect sand goby parasite composition (Zander & Kesting, 1998).
Although sperm competition remains remarkably similar in the two environments, despite an expectation that nest site shortage in the Baltic Sea would have forced more males to resort to parasitic spawnings in that area Singer et al., 2006

| Sperm motility in relation to salinity
Reproductively mature males were collected from Härnösand and Kristineberg on 29 May 2012 ( Figure 1a, Table S1). The fish were transported to the laboratory in temperature-controlled coolers filled with local water to within 24 hr of being caught. All fish were dissected 2 days after they had been captured. Only males with breeding colouration were included in the study (n = 8 from Kristineberg; n = 8 from Härnösand).
Each male was euthanized (hit in the head followed by decapitation), immediately dissected, and had both testes removed.
One testis was put into liquid nitrogen for later gene expression analysis. The other testis was used to provide a sperm sample for analysis of sperm motility (proportion motile sperm), assayed at high and low salinity. Before dissection, two tubes were prepared for each male, each with either 750 µl of high salinity water (31 PSU, made from filtered natural seawater) or 750 µl of low salinity water (six PSU, made from the same filtered natural water, but diluted with distilled water). Tubes were submerged in water of 15.5 ± 0.5°C.
The assay of sperm motility followed Havenhand and Schlegel (2009 Videos were post-processed and analysed with ImageJ using the CASA plugin (Wilson-Leedy & Ingermann, 2007) to determine the proportion of motile sperm (i.e., sperm moving faster than 15 µm/s on average). The effect of male origin on the proportion of motile sperm was tested with analysis of variance (ANOVA). We compared the males among themselves, with sperm of each male being tested at both high salinity and low salinity. Salinity was therefore used as repeated factor within each male. To achieve normality and homogenous variances, the proportion motile sperm was arcsin square root transformed before analysis.

| Sample collection for DNA analyses
The fish used for the reference genome sequencing was a single in-  Table S1). We processed 25 individuals per population for genotyping. Sex was recorded for 89 of these 200 fish, and 42% were female and 58% were male. Since the sex-determining mechanism is unknown, genotypic sexing was not possible. All fish were caught in sandy and shallow habitat from mid-spring to late-autumn.
Each individual was euthanized in an overdose of MS222 or a hard hit on the head followed by decapitation. Samples were stored in alcohol prior to analyses. This study was performed under Swedish ethical permits 135-2010 and 143-2012.

| Laboratory methods
For the genome sequenced individual, genomic DNA was extracted using a simplified CTAB-protocol (Panova et al., 2016). In short, fragments of fins and muscles were digested in CTAB-buffer, mercaptoe- and 5 µl of Proteinase K (Promega) for 1.5-2 hr at 65°C. Proteins were precipitated with 150 µl 6M NaCl and then centrifuged at 11,000 g for 30 min. After centrifuging, the supernatant was removed to a new tube and isopropanol was added in 1:1 ratio by volume. The samples were inverted several times to mix, and DNA was allowed to precipitate for 15-30 min. Samples were then centrifuged at 13,000 g for 15 min at 4°C. The supernatant was removed, and 300 µl of ice cold 70% EtOH was added to wash the pellet.
Samples were then centrifuged at 4°C for 5 min at 13,000 g. The ethanol was carefully removed, and the pellet was allowed to dry at RT, after which 30 µl of nuclease-free H 2 O (SIGMA) was added to the tube. Quantity and quality of the DNA samples were estimated using the Nanodrop 1000× (Thermo Scientific, Vantaa, Finland) and by electrophoresis on a 1.5% agarose gel.
RNA was extracted from tissues with phenol-based phase separation (Tri-reagent, Ambion, Life Technologies) following the Trizol-protocol from SIGMA with minor modifications as follows:
Sequences were processed with cutadapt v1.12 (Martin, 2011) to remove partial adapters and bad quality sequences. Paired reads with one read shorter than 40 bases were also removed. Flash v1.2.11 (Magoč & Salzberg, 2011) was used to merge paired-end reads that had 10 or more bases overlapping. Lighter v1.1.1 (Song, Florea, & Langmead, 2014) was then used to correct sequencing errors via kmer correction. Sequences were scanned using a kmer length of 23 and an estimated genome size of 0.9 GB. Reads were trimmed or discarded if they could not be corrected. A python script (https://github.com/enorm andea u/Scrip ts/blob/maste r/fastq Combi nePai redEnd.py) was used to re-pair fastq files since Lighter may remove one sequence of a pair without removing the mate.
Soapdenovo2 v2.04 (Luo et al., 2012) was then used to assemble a draft genome using kmer size 29, and GapCloser v1.12 was used to fill in the gaps in the scaffolds. Input sequence libraries, insert sizes, read numbers and the Soapdenovo2 config file are available in the Supplementary Information.

| Variant calling for genotyping by sequencing
The reads generated for the GBS analysis where demultiplexed and barcodes were removed using a custom Perl script. Low-quality bases and leftover adapter sequences were then removed using cutadapt v1.8 (Martin, 2011). Paired-end reads for each individual were aligned to the sand goby reference genome using Bowtie2 v2.3.2 (Langmead & Salzberg, 2012). The SAM files were subsequently converted to sorted BAM files using samtools v0.1.19 (Li et al., 2009), and variants were called using the samtools mpileup function with extended BAQ computation, in combination with bcftools v0.1.19 (Li et al., 2009

| Annotation of transcriptomes
RNA sequences from testicular tissue from one individual collected at Kristineberg in 2012 and brain, liver, spleen and ovary from two individuals (brain from one and liver, spleen and ovary from the other) The logic for this procedure is that frameshifts are often introduced into the transcript sequence during the Trinity assembly. By using the strand information provided by the btop string to adjust the frameshift, a single open reading frame can be obtained from the transcript. We parsed the query sequence according to the btop string, thus removing any frameshifts and then translated them using the tool fastatranslate (in reading frame one) from Exonerate v2.2. CD-HIT v4.6.8 (Li & Godzik, 2006) was used to cluster the transcript sequences in order to remove some of the redundant amino acid sequences (one representative sequence per cluster where retained). The resulting sequences were used in Orthofinder to determine the most likely orthologues (Emms & Kelly, 2015). In order to identify orthogroups in Orthofinder, we used the Ensembl 96 protein databases for tigertail seahorse (H. comes), three-spined stickleback, zebrafish, and mudskipper as well as the Uniprot reference proteome (release 2019_4) for zebrafish.

| Differential expression in testes
Testes from the fish in the sperm motility experiment were compared for differential expression. Sequences obtained from the RNA of the testes were processed using Cutadapt v1.8 (Martin, 2011) to remove partial adapter sequences and bad quality bases, as well as read pairs when one of the pairs was less than 40 bases. Reads were mapped to the sand goby draft genome using STAR v2.7.2 (Dobin et al., 2013). A reference genome index was built in STAR using the testes transcriptome from the multiple tissue samples (above). This sample was used for annotation because the sequences were generated using a stranded library protocol whereas the testis samples from the salinity experiment were not. The reads from this one sample were mapped to the genome using STAR with a pair-end overlap threshold of 10 and other default parameters. Cufflinks v2.2.1 (Trapnell et al., 2010) was used to assemble the transcripts and create the gene transfer format (GTF) file. A new genome index was then created using this GTF file.
All testis samples from the salinity experiment were mapped using this genome index, and the splice junction files were retrieved. Splice junction files were concatenated and filtered by choosing only junctions that were supported by more than five uniquely mapped reads in one individual. All samples were then mapped again using both the gtf file and the splice junction file, and gene-level read counts were determined using the quantMode GeneCounts option. EdgeR v3.5.2 (Robinson, McCarthy, & Smyth, 2010) was used to normalize expression using the TMM method (Robinson & Oshlack, 2010) and remove genes with low expression (minimum count = 25, minimum total count = 100). Then, a linear model, limma trend analysis (Law, Chen, Shi, & Smyth, 2014), was implemented using the R package limma v3.5.1 (Ritchie et al., 2015) to assess differential expression.
Transcripts were considered differentially expressed if an adjusted p-value less than 0.05 was observed (Benjamini & Hochberg, 1995).
For transcripts for which no orthogroup was identified, a search was performed using BLASTX, with default parameters and the nonredundant protein database (nr) (https://blast.ncbi.nlm.nih.gov/Blast.cgi).

| Population differentiation
Gene diversity (expected heterozygosity) was calculated and averaged over loci for each of the eight samples. Pairwise F ST values (Weir & Cockerham, 1984) between all populations were calculated using ARLEQUIN v3.5 (Excoffier & Lischer, 2010), and their significance assessed using 10,000 permutation steps. Significance was adjusted for multiple testing using FDR at q = 0.05 (Benjamini & Hochberg, 1995).
Genetic isolation by geographic distance was estimated by correlating the most direct waterway distances calculated with Google earth and pairwise genetic differentiation, (Rousset, 1997).
Individual ancestry and the number of genetic clusters (K) were assessed using STRUCTURE v2.3.4 (Pritchard, Stephens, & Donnelly, 2000), allowing admixture, with uncorrelated allele frequencies but no locprior. Three replicates of 50,000 Monte Carlo Markov chain (MCMC) iterations (discarding the first 10,000 iterations as burn-in) were performed, testing for K = 1-5. Convergence was assumed by observing consistent results in all three replicates.

| Inference of the demographic history
We

| Genetic variation associated with salinity
To detect high F ST outliers potentially underlying traits involved in local adaptation, we used two methods. First, the Bayesian regression approach implemented in Bayescan v2.1. (Foll & Gaggiotti, 2008), using default settings with the following exceptions: 75,000 Markov chain Monte Carlo (MCMC) iterations with thinning 15 and prior odds 1,000. False discovery rate (FDR) cut-off was set at q = 0.05 (Benjamini & Hochberg, 1995). Convergence of the MCMC chain was assessed using the software CODA (Plummer, Best, Cowles, & Vines, 2006). Second, we used the frequentist approach FDIST2 by Beaumont and Nichols (1996) northern Baltic Sea (Rairanta and Härnösand). The number of coalescent simulations performed was 50,000 and we used an FDR cutoff at q = 0.05.
In addition to the high F ST outliers, we identified outlier loci associated with salinity, using BayeScEnv v1.1 (de Villmereuil & Gaggiotti, 2015). This software is an extension of Bayescan and can test hypotheses for specific drivers of local adaptation. We used default settings and prior jump 10, thin 15 and FDR q = 0.05.
Convergence of the MCMC chain was assessed using CODA (Plummer et al., 2006).
Outlier SNP positions from the BayeScENv analysis were con-

| Availability of data and material
The head of the sequenced sand goby male has been deposited as  (Rougeux et al., 2017), the likelihood of the model (lik.), the population mutation rate (Θ), the effective population size of the North Sea and the Baltic Sea populations (N eNS and N eBS ), the growth factor (b NS and b BS ), the per generation migrants (m NS>BS and m NS>BS ), the reduced per generation migrants (mr NS>BS and mr NS>BS ), the time of split (T S ) and the time of secondary contact or time of ancestral migration (T SCorAM ). All parameter estimates are relative to Θ (see δaδi manual for more explanation), and see Suppl.

| Sperm motility: effects of male origin and salinity
Sperm motility (proportion of sperm that were motile) was affected by test salinity and male origin, as shown by a significant interaction between the two factors. Specifically, a larger proportion of sperm were motile in the salinity that matched with the male origin than in the reciprocal salinity (repeated measures ANOVA: male origin * test salinity: F 1,14 = 114.45, p < .001; Figure 3).

| Population structure
Heterozygosity levels were slightly lower in the North Sea, but overall similar for the eight populations (  (Figures 1b and 4, and Figure S2). The main split overlaps with the steepest salinity shift (Figure 1a). The two main populations were further subdivided into an eastern and a southern cluster of subpopulations in the North Sea, and a northern (with only one subpopulation, Härnösand) and a south-eastern cluster in the Baltic Sea ( Figure 1b). Notably, the differentiation between the Härnösand subpopulation and the other Baltic Sea subpopulations largely disappeared when only outlier loci were analysed, suggesting that this difference was more related to genetic drift effects than to divergent selection ( Figure 1c). If plotted on waterway distance, the population structure corroborates a pattern expected from isolation by distance (r 2 = .87; p < .001; Figure S1).

| Demographic history
Both model comparisons (Westdiep vs. Härnösand, and Westdiep vs. Kalmar) showed that the most likely scenario describing the origin of the North Sea-Baltic Sea divergence involved past isolation followed by a secondary period of gene flow with heterogeneous migration rate along the genome (i.e. secondary contact with heterogeneous migration, SC2m) (Table 1 and Figure S3). Under such scenario, the isolation period was estimated to be 10 to 15 times longer than the period of secondary contact (ratio T split /T sc ). By using a generation time of 1 year and a mutation rate of 10 -8 , the divergence started 200kya (Table S5), whereas the secondary contact period was estimated to have happened 17 kya (15 kya for WE vs. HA and 20 kya for We vs. KA, Table 1 and Table S5). Interestingly, the population of Härnösand (inner Baltic Sea) was inferred to be half of the size of Westdiep (400k vs. 200k, Table S5 (Table 1) and provides nearly as good fit as the SC2m scenarios for the Westdiep versus Härnösand comparison (Δ AIC < 2, fit Figure 5). This model suggests that the Baltic Sea was initiated by a founder population of 10% of the ancestral population that experienced an exponential growth at a factor of 16 (Table 1). In this model, the split between the North Sea lineages and the Baltic Sea lineages was estimated to happen about 300 kya (Table S5), which predates the last glacial maximum and the opening of the Baltic Sea about 8,000 years ago.

| Outliers
A significant number of outliers were identified, 252 (q < 0.05) from the Arlequin hierarchical analysis, 231 from the Bayescan analysis and 314 from the BayeScEnv environmental association analysis. Of these, 148 were shared between all three methods ( Figure 6). The pairwise differentiation for the outliers was about ten times higher as compared to the differentiation using all SNPs (Table S4b) were found to be close to outlier SNPs (Table S6). Twenty-one BayescEnv outliers with high statistical significance (q < 0.0001) were, however, not adjacent to genes identified in our transcriptome analyses.

| Testes gene expression: effect of male origin
Differential transcript expression in testicular tissue compared between the Härnösand and Kristineberg populations revealed only seven differentially expressed genes, and only three of these seem to be protein-coding genes (serine/threonine-protein phosphatase 2A 56 kDa regulatory subunit beta isoform, translocator protein, and one uncharacterized protein). The other four genes did not appear to be protein coding since they lack unambiguous open reading frames and a significant similarity to known protein-coding genes found in NCBI.

| Population structure, demographic history and outlier loci
Our study identified two distinct populations of sand goby in the NE Atlantic Ocean, a fully marine population living in the North Sea and a brackish water population living in the Baltic Sea. In addition, we found differentiation among subpopulations within each basin and an overall pattern of isolation by distance, both supporting substructure within each population. Nevertheless, most differentiation is present over the saltwater-brackish water transition at the entrance of the Baltic Sea (Figures 1 and 4), and a large number of outlier loci correlate with this salinity transition (Table S6). Distinguishing secondary contact and isolation with migration models is difficult, and with multiple models that fit almost similarly well, interpretations should be done with caution, also taking into account that other alternative models may fit better than the models currently being compared. This said the demographic inference of both models (divergence with and without gene flow) suggests that the separation of the two populations started long before the opening of the Baltic Sea. The interpretation of this is that two lineages have colonized the North Sea-Baltic Sea from different refugia and thereafter formed a contact zone with genetic exchange. Currently, the contact zone between the populations overlaps with the steepest part of the salinity gradient in the Baltic Sea. This demographic history has led to a semi-permeable barrier to gene flow, inferred by strong support for a model with heterogeneous gene flow, which are classically associated with recent events of speciation (Wu, 2001). We found several markers within genes or closely linked to genes that were much more differentiated than most other loci over the contact zone, suggesting that genes underlying functional adaption involved in, for example osmoregulation, vision and reproduction, are segregating over the salinity gradient. Differences in sperm motility that we found suggest local adaptation to the different salinities in this key reproductive trait, although only slight evidence of differential adaptation was evident in the male gonad transcriptome data.
A history of colonization of the Baltic Sea by gradual transition from the North Sea sand goby population was previously suggested based on the finding that many haplotypes were shared between the North Sea and Baltic Sea groups (Gysels, Hellemans, Patarnello, & Volckaert, 2004;Larmuseau, Raeymaekers, Ruddick, Van Houdt, & Volckaert, 2009;Larmuseau, Vancampenhout, Raeymaekers, Van Houdt, & Volckaert, 2010). Here, models with gradual transition from the North Sea provide good fit of the data only when they account for a strong bottleneck followed by rapid population growth. However, limited evidence of such strong demographic effect appeared in the data, as the diversity of the Baltic Sea populations is similar to those of the North Sea populations (Table S3).  (Larmuseau et al., 2010) might be either ancestrally shared variation, or a consequence of recent gene flow over the contact zone (Gagnaire et al., 2015). The importance of secondary gene flow in the structure of the Baltic Sea is corroborated in our inferences by lower proportion of the genome resisting to gene flow in the population close to the transition zone.
However, the presence of ancestral divergence does not reject the possibility of in situ genetic differentiation due to divergent selection over the salinity transition from standing variation, as divergence can be very efficient, in particular, if high numbers of genes are contained in large haplotype blocks or inversions (Faria, Johannesson, Butlin, & Westram, 2019). In addition, following a mixing of gene pools due to introgression between two ancestral lineages might lead to evolution of new and unique lineages (Marques, Meier, & Seehausen, 2019), as observed in Baltic blue mussels and Baltic clams (Nikula et al., 2007;Väinölä & Strelkov, 2011).

| Genetic differentiation involves multiple traits
The large number of outlier loci and the lack of functional enrichment suggest the involvement of several physiological processes in adaptation to the Baltic Sea. For example, among outliers, we found several genes related to ion transport located in close proximity (within 4 kb) to outlier SNPs. For instance, we identified: solute carrier family 5 member 3 (slc5a3), a well-known cotransporter that affects osmolyte levels in mammals (Kwon et al., 1992); acid-sensing ion channel 2 (asic2), which is involved in sodium ion transport (Price, Snyder, & Welsh, 1996); adenylate cyclase type 8 (adcy8), which is involved in the regulation of cytosolic calcium ion concentration and water balance (Steel & Anderson, 2002). Additionally, we found SNPs near genes related to cellular water balance, such as aquaporin-4 (aqp4) and guanine nucleotide-binding protein subunit beta-4 (GNB4). The latter is a molecule that provides energy for vasopressin and thus affects membrane permeability allowing for water transport through aquaporin channels (Moeller, Fenton, Zeuthen, & Macaulay, 2009 The myo-inositol synthesis pathway is linked to osmoregulation Kalujnaia, Hazon, & Cramb, 2016;Sacchi, Li, Villarreal, Gardell, & Kültz, 2013) and upregulated in response to increased salinity. Myo-inositol is a small molecule used to adjust cellular osmolality, and by increasing the concentration, organisms can cope with hyperosmotic stress (Burg, Ferraris, & Dmitrieva, 2007).
We found the inositol-3-phosphate synthase 1 (isyna1) gene located on one end of scaffold 315. The gene has three outlier SNPs, and one of the few differentially expressed transcripts in sand goby testis tissue is located immediately downstream of this gene on the antisense strand. Increasing evidence of antisense noncoding transcripts involved in transcriptional regulation (Pelechano & Steinmetz, 2013) suggests this transcript is involved in regulation of the isyna1 gene. Additionally, in humans, the highest expression of this gene is found in testicular tissue (Guan, Dai, & Shechter, 2003), suggesting an important role in reproduction. Another gene involved in inositol metabolism is inositol hexakisphosphate and diphosphoinositol-pentakisphosphate kinase (ppip5k1b) which is also adjacent to an outlier SNP in another scaffold. A third gene related to inositol is the previously mentioned slc5a3b gene, which is a sodium/myo-inositol cotransporter. Whether ancient or relatively recent, this broad spectrum of genetic variation in osmoregulatory genes seems to make the sand goby well prepared for establishment in a truly marginal marine habitat.
Genes linked to outliers were enriched for the Uniprot key word 'mitochondrion' and more than 10 genes linked to outlier SNPs are involved in mitochondrial functions (  (Hill, 2017).
We also found that many outlier loci were related to immune function (Table S5) Since habitats vary dramatically in pathogen communities, it is not surprising that immune genes are important for local adaptation to the Baltic Sea (Eizaguirre, Lenz, Kalbe, & Milinski, 2012).
An additional outlier was the visual perception gene, phosphodiesterase 6H, cGMP-specific, cone, gamma, paralog a (PDE6HA) that likely has a role in visual adaptation in sand goby (Shimizu-Matsumoto et al., 1996). Sand gobies are sight predators active at low light levels (Larmuseau et al., 2009). Both day length and sea ice cover, which both correlate with the salinity gradient, affect the visual environment during the sand goby life cycle. Sea ice restricts the penetration of light and reduces turbidity from wind-induced mixing (Thomas et al., 2017).

| Reproductive incompatibilities
Limited gene flow due to selection against immigrants through inviability, infecundity or reproductive dysfunction is likely to facilitate genetic differentiation between populations inhabiting different habitats (Lemel, Belichon, Clobert, & Hochberg, 1997;Nosil, 2012;Nosil, Vines, & Funk, 2005;Svensson et al., 2017). We found clear signs of local adaptation to the native salinity in sperm motility, with a much larger proportion of sperm being motile in the salinity that matched the male's origin. These results are further supported by sperm velocity showing similar response (Svensson et al., 2017). We cannot completely rule out that acclimation may have affected our results, since sperm were sampled from fish that were kept in original salinity. However, other results support immigrant reproductive dysfunction in other traits between individuals of the same sand goby populations and salinities as tested here, after 4.5-8 months of acclimation to treatment salinities, (Svensson et al., 2017). Also, the round goby, Neogobius melanostomus, shows similarly strong effects of salinity on sperm motility and velocity, immediately and after 2 months of acclimation (Green, 2020). Since sperm are freeswimming single cells, all the osmoregulation genes discussed above would be relevant for sperm adaptation to salinity. Additionally, several genes involved in microtubule formation are present among the genes adjacent to outlier SNPs. Microtubules are an important component of sperm cell structure as well as motility. Rho-kinase 2 (rock2) was identified adjacent to an outlier SNP and has a role in regulating intracellular pH during the acrosome reaction in sea urchin (de la Sancha et al., 2007).

| CON CLUS ION
Isolation prior to establishment over the North Sea-Baltic Sea salinity gradient, followed by secondary contact with gene flow and divergent selection can explain the observed genetic structure of North Sea and Baltic Sea populations of the sand goby. Whether all genetic divergence is ancient or some divergence is more recent, the current Baltic Sea lineage is well adapted to the low salinity in traits related to osmoregulation, immunity, vision and reproduction. Many loci also appear to be involved in these traits, but the specific functional mechanism (e.g., coding sequence, regulatory loci) remains to be clarified. We conclude that the first steps on the speciation continuum trajectory have been taken and show interesting parallelisms with other taxa inhabiting the North Sea-Baltic Sea transition.

DATA AVA I L A B I L I T Y S TAT E M E N T
The draft scaffolded genome assembly is available in Dryad: https:// doi.org/10.5061/dryad.rfj6q 5780.