Adaptive genetic variation distinguishes Chilean blue mussels (Mytilus chilensis) from different marine environments

Abstract Chilean mussel populations have been thought to be panmictic with limited genetic structure. Genotyping‐by‐sequencing approaches have enabled investigation of genomewide variation that may better distinguish populations that have evolved in different environments. We investigated neutral and adaptive genetic variation in Mytilus from six locations in southern Chile with 1240 SNPs obtained with RAD‐seq. Differentiation among locations with 891 neutral SNPs was low (FST = 0.005). Higher differentiation was obtained with a panel of 58 putative outlier SNPs (FST = 0.114) indicating the potential for local adaptation. This panel identified clusters of genetically related individuals and demonstrated that much of the differentiation (~92%) could be attributed to the three major regions and environments: extreme conditions in Patagonia, inner bay influenced by aquaculture (Reloncaví), and outer bay (Chiloé Island). Patagonia samples were most distinct, but additional analysis carried out excluding this collection also revealed adaptive divergence between inner and outer bay samples. The four locations within Reloncaví area were most similar with all panels of markers, likely due to similar environments, high gene flow by aquaculture practices, and low geographical distance. Our results and the SNP markers developed will be a powerful tool supporting management and programs of this harvested species.


Introduction
Genetic data have been widely used in recent years to facilitate delineation of management units; however, this has been difficult for species with high gene flow and low differentiation (Waples 1998). Most marine species of fish and invertebrates have historically been thought to be panmictic with limited genetic structure due to pelagic larvae that are distributed over wide geographical areas by ocean currents (e.g., Grosberg and Cunningham 2001). Thus, it has been particularly challenging to identify management units for marine species (Allendorf et al. 2010), but recent studies have demonstrated the importance of studying genomewide variation for improving estimates of fine-scale population structure (Sanford and Kelly 2011;Limborg et al. 2012;Milano et al. 2014).
Recent advances in sequencing technology have possible to perform population genetic analyses in nonmodel species using thousands of SNP markers obtained by genotyping by sequencing (GBS) of reduced representation libraries like restriction-site-associated DNA (RAD-seq; Baird et al. 2008). Large panels of SNP markers allow researchers to test for patterns of neutral and adaptive genetic variation related to population structure and local adaptation. High F ST loci that are putative outliers have also been shown to be highly informative to determine the geographical origin of individuals (Ogden 2011;Ogden and Linacre 2015). This approach has been useful to improve resolution of fine-scale population structure of various nonmodel species (Larson et al. 2014) and provide insight into life-history variation within species (Hess et al. 2013;Pujolar et al. 2014).
Marine mussels of the genus Mytilus are widely distributed throughout all the oceans in the world and are typically found in cold and temperate waters of both hemispheres following an antitropical distribution pattern (G erard et al. 2008). Like many marine invertebrates, mussel larvae have a planktonic lifetime of weeks or months, and thereby can be potentially dispersed over large geographical areas by marine currents or humanmediated activities (Hilbish et al. 2002). Due to external fertilization, hybrid zones are often observed when different species of mussels inhabit the same geographical area (Hilbish et al. 2002;Toro et al. 2002Toro et al. , 2004a. In southern Chile, Toro et al. (2005) detected the presence of alleles from M. galloprovincialis and M. edulis using the Me 15-16 nuclear marker developed by Inoue et al. (1995). However, they did not perform the subsequent restriction analysis developed by Santaclara et al. (2006) to discriminate the allele of M. galloprovincialis from M. chilensis, confounding both species and consequently overestimating the frequency of M. galloprovincialis allele in southern Chile. Currently, it is generally accepted that the presence of M. galloprovincialis in Chile is restricted to the Arauco Gulf (S 37°06 0 16″, W 73°21 0 33″) (Tarifeño et al. 2012). Additionally, M. trossulus alleles were first observed in Chile by Larra ın et al. (2012), and a subsequent study reported in a hybrid zone of M. chilensis, M. edulis, and M. trossulus in the Magellan Strait surrounding the international port of Punta Arenas (Oyarz un et al. 2016).
Chilean blue mussel (M. chilensis) is an important commercial species in Chile, and it is distributed along approximately 1900 km of the Chilean coast from the Arauco Gulf (35°S) in the north to Cape Horn (55°S) in the south (Hern andez and Gonz alez 1976). Genetic diversity and population structure of this economically important species are unresolved, but critical for supporting management and conservation programs of this harvested species. Genetic structure of M. chilensis has been explored using five RAPD primers (54 presumptive domi-nant loci), and with seven and 26 allozyme loci, in populations from Arauco (35°S) to Punta Arenas (53°S) (Toro et al. 2004b(Toro et al. , 2006C arcamo et al. 2005), finding no evidence of discrete stocks (0.011 ≤ F ST ≤ 0.055), with the possible exception of an austral population from Punta Arenas (53°S). Studies with microsatellite markers have found similar results of limited genetic structure (Larra ın et al. 2014. All of this evidence indicates low genetic differentiation among populations of Chilean blue mussel, likely due to dispersion of long-lived (45 days) planktotrophic larvae across the Chilean coastline by coastal currents that homogenize populations (Toro et al. 2006).
In this study, we used RAD-seq to genotype 1240 SNPs in 190 M. chilensis individuals collected from six locations in southern Chile, including areas used as seed collection centers for the local mussel aquaculture industry. This study was designed to investigate patterns of neutral versus adaptive genetic variation within this species and identify a subset of genetic markers that could improve the ability to trace individuals to their population of origin, especially in areas with strong aquaculture activities.

Samples collection and preparation
Samples of M. chilensis were collected in 2009 from subtidal zones in six different locations, to capture diversity of the southern distribution of Chilean blue mussels in Chile, especially in areas used as seed collection centers for the local aquaculture industry (Table 1 and Fig. 1). For mussel species, "seeds" refer to individuals in the juvenile phase of their life cycle. We included samples from four locations in the Reloncav ı Gulf (Quillaipe, Caleta La Arena, Canutillar, and Pichicolo; Fig. 1), one location from southern area of Chilo e Island (Canal Coldita) and one population from the southern Patagonia area (Isla Peel; previously identified to be most genetically differentiated from the others) (Toro et al. 2004b(Toro et al. , 2006. All samples were seeds with a shell size of 15-25 mm with the exception of Patagonia samples, which were adults. Mantel tissue was collected from all individuals and stored in ethanol.

DNA extraction and species identification
Mantle edge tissue (50-100 mg) was used to extract DNA using the phenol-chloroform method described by Larra ın et al. (2012). Extracted DNA was quantified using the Quant-iT dsDNA pico-green assay kit (Thermo Fisher Scientific Inc, Waltham, MA) in 96-well plates and a Perkin Elmer Victor 5 plate reader. Since some of the collected individuals were too small (15-25 mm) to identify phenotypic species, these samples were tested with diag-nostic species markers as described by Larra ın et al.  (Santaclara et al. 2006). Three specimens (1.6% of samples) were identified as hybrids and were removed from the study.

Prediction of expected number of RADtags
Number of RADtag was predicted using the R package SimRAD (Lepais and Weir 2014) using the haploid   Araneda et al. (2015).

RAD library preparation and sequencing
Restriction-site-associated DNA (RAD) libraries were prepared with the restriction enzyme SbfI following the methods of Hess et al. (2013) and sequenced on an Illumina â HiSeq 1500 genetic analyzer (Illumina Inc., San Diego, CA). Two libraries were constructed containing 96 individuals per library and 500 ng DNA per individual. In each library, pooled individuals were tagged by ligation of one of 96 unique 6-bp barcode adapter (P1 adapter) to the SbfI site, and 4 lg was sheared by sonication using a Bioruptor UCD-300 instrument (Diagenode â , Denville, NJ). After sonication, each library was concentrated using the Qiagen MinElute PCR purification kit (Qiagen â , Venlo, Netherlands) and submitted to AMPure bead purification and size selection (Agencourt â , Beckman Coulter, Brea, CA). Size selection was performed on each library by first binding DNA fragments larger than 700 bp with a 2:1 ratio of DNA to beads, followed by a 1:1 ratio of DNA to suspension buffer (binding DNA fragments larger than 200 bp) in order to yield final products with DNA fragments ranging from 200 to 700 bp. Prior to sequencing, RADtag libraries were quantified by quantitative PCR (Kappa Biosystems Inc, Woburn, MA) on an ABI 7900HT Sequence Detection System (Thermo Fisher Scientific Inc, Waltham, MA). The libraries were then sequenced in a total of four lanes (two each) by single-end 100 base reads using an Illumina â HiSeq 1500 sequencer (Illumina Inc., San Diego, CA). The total number of reads obtained was 822,488,647, of which 738,252,110 passed the quality filters and were retained for SNP identification (89.76%). The number of sequences obtained per individual ranged from 155,007 to 21,512,094 with an average of 4,236,976 reads. Two individuals from Pichicolo were excluded from subsequent analysis due to low number of quality reads (<807,570).

Bioinformatics
Software package, version 1.08 (Catchen et al. 2011(Catchen et al. , 2013, was used for SNP discovery and genotyping. All 100-bp reads were quality-filtered (Phred 33), trimmed to 75 bp at the 3 0 end, and demultiplexed based on barcodes using "process_radtags" in STACKS. Prior to running "ustacks" for SNP discovery, all samples were standardized to 3.5 million reads to assure that each individual was contributing in same proportion of the overall variation in sequence (Hecht et al. 2013). Discovery of SNPs with the "ustacks" module included setting the "m" parameter to 10 (minimum depth of coverage required to create a stack) and using the "snp model type." A catalog of SNPs was created with the "cstacks" module including two individuals from every location, and this catalog was used to genotype each individual with the "sstacks" module. Finally, the "population" module was used to produce a single Genepop formatted file containing 190 individuals from six populations. Several quality filters were applied during the SNP discovery process in order to remove putative false SNPs and paralogs. First, SNPs with genotyping success lower than 70% and minor allele frequency (MAF) less than 5% were removed, retaining 4305 SNPs for further steps. Second, 2140 SNPs were removed because they were not genotyped in all six populations. Third, only one bi-allelic SNP per RADtag was permitted, removing another 865 SNPs. Fourth, to filter out potential paralogous markers, 60 SNPs were removed because they showed significant deviations from Hardy-Weinberg equilibrium (HWE; Genepop 4.2; Raymond and Rousset 1995;Rousset 2008) in three or more locations with FDR-BY corrected of critical level of 0.005238 (Narum 2006). This combination of filters resulted in a remaining panel of 1240 SNPs genotyped on 190 individuals.

Detection of loci under positive selection and neutral loci
Two scenarios using LOSITAN (Antao et al. 2008) were analyzed to produce different sets of candidate SNPs. Scenario 1 was performed including all six locations and scenario 2 excluded the most divergent collection (Isla Peel samples from Patagonia zone) and keeping only five locations in the geographical area used by the Chilean aquaculture industry as a collection site (Fig. 1, Table 1). In both scenarios, LOSITAN was run using FDIST2 method (Beaumont and Nichols 1996) with 1,000,000 simulations and confidence interval (CI) 0.995, false discovery rate of 0.1, and subsample size of 50. Simulated neutral F ST values were 0.01166 and 0.00597 for scenarios 1 and 2, respectively. In order to reduce false positives, loci were considered candidates under positive selection above a probability level of 0.995, and neutral loci were defined as falling between intervals of 10-90% of the F ST distribution. Underlying population structure was not considered to be a substantial factor since global F ST was near zero across all loci (F ST = 0.005).

Population structure and assignment tests
Population analyses were performed independently with three panels of markers: putatively neutral loci and two panels of outlier loci obtained from scenarios 1 and 2. A discriminant analysis of principal component (DAPC) using the R package adegenet 3.1-1 (Jombart 2008; Jombart and Ahmed 2011) was performed and the number of clusters (K) was identified using the lowest Bayesian information criteria (BIC). A cluster is an abstract object composed by genetically similar individuals that are not necessarily coincident with collecting locations. Global and pairwise F ST values (Weir and Cockerham 1984) were computed with Genepop 4.2. (Raymond and Rousset 1995;Rousset 2008), and 95% confidence intervals were estimated by bootstrapping using the R package diveRsity with 10,000 replicates (Keenan et al. 2013).
Isolation-by-distance (IBD) gene flow analysis using a Mantel test (Rousset 1997) was performed on F ST /(1ÀF ST ) and the logarithm of geographical distance estimated following the coastline among pairs of locations, with 10,000 permutations in program ISOLDE implemented in Genepop on the web (http://genepop.curtin.edu.au/index.html).
Assignment power of SNP panels was evaluated with leave-one-out tests (LOO) and with the double cross-validation proposed by Anderson (2010) using the "training set/holdout set" protocol (STH) and implemented in GeneClass2.0 (Piry et al. 2004), to compare the performance of both outlier SNP panels to resolve fine-scale population structure. With the LOO method, individuals were assigned to a population if the assignment probability to that population was higher than to any other population. We selected the frequency-based algorithm (Paetkau et al. 1995) as recommended by Piry et al. (2004) and because a former study showed it had the best performance in matching Mytilus individuals to their geographical origin (Larra ın et al. 2014). Additionally, to avoid high-grade bias in estimating classification accuracy (Anderson 2010), we randomly divided the samples into training and holdout sets ( Table 1). The training set was used as baseline to reassign individuals from the holdout set (Paetkau et al. 1995).

Results
The number of RADtags predicted using SimRAD was 8530 close to the mean number of RADtags obtained per samples with STACKS (10,030). In these RADtags initially were included 16,888 presumptive SNPs in the catalog, in which 4305 SNPs were genotyped in the 70% or more samples. However, after subsequent filtering, a final panel of 1240 SNPs was retained with an average genotyping success of 90% in the remaining 190 samples.

Genetic diversity
Average expected heterozygosities per location across all 1240 SNPs were similar, with values of 0.2321 for Caleta La Arena, 0.2322 for Quillaipe, 0.2330 for Canutillar, and 0.2333 for Pichicolo in the Reloncav ı zone, 0.2328 for Canal Coldita in Chilo e Island zone, and 0.2244 for Isla Peel in Patagonia zone.

Detection of loci under positive selection and neutral loci
Outlier analyses run in LOSITAN identified 58 and 34 SNPs as candidates for positive selection among scenarios 1 and 2, respectively (upper limit of F ST CI 99.5%), with 17 shared SNPs between both scenarios. Outlier tests also identified 981 putatively neutral loci for scenario 1, which fell inside of a conservative CI of 10-90% of F ST distribution.

Population structure
Global F ST of the neutral panel in scenario 1 was very low (0.005) but significant (95% CI: 0.001, 0.011). Higher global genetic differentiation was observed in scenario 2 with a F ST of 0.088 (95% CI: 0.065, 0.111) with the panel of 34 SNPs and for scenario 1 with a global F ST of 0.114 (95% CI: 0.101, 0.128) for the 58 SNP panel.
Pairwise F ST values for the neutral panel were not significantly different from zero for 10 of the 15 interlocation comparisons as shown by 95% confidence intervals, and significant values were only observed in comparisons where Isla Peel location (Patagonia zone) was involved (Table 2). This pattern of high pairwise F ST values between Isla Peel and the other five northern locations was observed for scenario 1, with an average pairwise F ST value of 0.228 (Table 2).
On the other hand, the location Canal Coldita from Chilo e Island zone shows moderate genetic differentiation with the other four populations from Reloncav ı Gulf with an average pairwise F ST of 0.081 in scenario 1. A similar pattern was observed for pairwise F ST in scenario 2, with significant genetic differentiation observed between Canal Coldita (Chilo e Island zone) and the four populations from Reloncav ı zone, but with a higher average pairwise F ST than scenario 1 of 0.159.
The four populations from Reloncav ı zone showed practically zero genetic differentiation according to the limits of 95% CI for the F ST values in both scenarios. However, there were two exceptions with the comparisons between Pichicolo and Caleta La Arena for scenario 1 (F ST = 0.023; 95% CI: 0.002, 0.050; Table 2) and scenario 2 (F ST = 0.034; 95% CI: 0.004, 0.074; Table 3), and the comparison between Canutillar and Caleta La Arena for the scenario 2 (F ST = 0.029; 95% CI: 0.001, 0.065; Table 3).

Assignment to origin populations
The lowest Bayesian information criterion (BIC) on DAPC for the putatively neutral panel was obtained for K = 1, indicating only one cluster across the three studied zones when this panel was used. However, more than one cluster was evident for the subsequent scenarios that included putative outlier SNPs. For scenario 1, three clusters (K = 3) were identified with DAPC with two discriminant functions extracted with 60 principal components retaining a 93.3% of conserved variance. For scenario 2, again three clusters (K = 3) were observed with 50 principal components retaining a 97.8% of conserved variance (Fig. 3).
The three clusters observed in scenario 1 correspond to the three main geographical zones included in this study (Fig. 2). All individuals (100%) from Isla Peel (Patagonia zone) were assigned to one cluster, 28 of 31 individuals (90.3%) from Canal Coldita (Chilo e Island zone) were assigned to a second cluster, and 115 of 127 individual (90.6%) from the four locations in Reloncav ı zone were classified to a third different cluster. The performance of assignment with the panel of 58 SNPs was consistent when it was evaluated with the LOO and STH methods. The correct re-assignment to Patagonia, Chilo e Island, and Reloncav ı zones was 100%, 83.9%, and 92.1% for LOO method and 100%, 73.3%, and 96.8% for STH method, respectively (Table 4).
In scenario 2, DAPC classified 83.9% of samples from Canal Coldita (Chilo e Island zone) in one cluster. Assignment to this location with the panel of 34 SNPs was 87.1% (27 of 31 samples) for LOO method and 86.7% (13 of 15 samples) for STH method (Table 5). Interestingly, individuals from the four locations from Reloncav ı zone were classified in two different clusters with DAPC (Fig. 3). However, assignment with GeneClass2 to the Reloncav ı zone with this panel of SNPs was high, 94.5% and 93.5% using the LOO and STH methods (Table 5), respectively.
The IBD analysis showed significant and positive Spearman correlations for the neutral SNP panel (r S = 0.863, right-tailed P = 0.0263) and outlier SNPs in scenario 1 (r S = 0.9393, right-tailed P = 0.0054) since they included the most distinct and geographically distant collection from Patagonia. However, for scenario 2 (excluding the most austral population of Isla Peel), the IBD Spearman correlation was nonsignificant (r S = 0.6485 right-tailed P = 0.1491). genetic diversity among populations of marine fishes in comparison with anadromous and freshwater fishes (Ward et al. 1994;DeWoody and Avise 2000). These differences have been ascribed to high levels of gene flow in marine environment and large effective population sizes (Ward et al. 1994). Panmixia has been proposed for other marine invertebrates with long-lived planktonic larvae like Chilean blue mussels due to estimates of low F ST from allozymes and microsatellite studies (C arcamo et al. 2005;Toro et al. 2006;Larra ın et al. 2014Larra ın et al. , 2015. However, recent studies with much larger panels of SNP markers have begun to elucidate more extensive patterns of population structure in marine species as a result of sampling both neutral and adaptive loci throughout the genome (Candy et al. 2015). For Mytilus species, few SNP markers have been published and molecular resources are scarce. Previous SNP markers have been developed for pedigree reconstruction to assist breeding programs (Vera et al. 2010;Nguyen et al. 2014;Pino-Querido et al. 2015), and also for identification of Mytilus species or hybridization in Europe (Zbawicka et al. 2012(Zbawicka et al. , 2014, but to date no SNPs have been developed specifically to assess population structure of South American populations or taxa. SNP markers in Mytilus have been developed by mining EST databases (Vera et al. 2010;Zbawicka et al. 2012), shotgun genome  sequencing using Illumina technology (Nguyen et al. 2014), and targeted enrichment sequencing (Fra€ ısse et al. 2015). However, with declining costs of sequencing technology (Liu et al. 2012) and the use of RAD-seq to reduce genome complexity (Baird et al. 2008), it is possible to discover and genotype SNP markers at same time in an effective way, opening the opportunity to apply SNPs in nonmodel species (Ogden 2011;Narum et al. 2013). Therefore, SNPs developed by GBS methods such as RAD-seq or double-digest RAD-seq (Peterson et al. 2012) are emerging in the recent literature (Saarman and Pogson 2015). Thus, for the current study, we used RADseq to develop 1240 SNP markers de novo and identified outlier loci to effectively characterize the genetic structure of M. chilensis from southern Chile, focusing in the area with strong aquaculture activities in order to trace the individuals to their origin populations. With this approach, our study identified finer scale genetic differentiation than has been detected previously for M. chilensis.

Reviews of allozymic and microsatellite studies have revealed lower levels of genetic differentiation and higher
Our analysis of population structure using the neutral panel of 981 SNPs showed nonsignificant F ST values for the locations in the area used by Chilean aquaculture industry concordant with the high gene flow within this aquaculture production region due to exchange of seed stocks among facilities (Pantoja et al. 2011). However, the samples from Patagonia zone (Isla Peel) showed low (average pairwise F ST = 0.0127) but significant differentiation with the five northern locations, confirming the findings of Toro et al. (2004bToro et al. ( , 2006. Further analyses with outlier loci for two scenarios of fine-scale population structure revealed more extensive genetic differentiation for M. chilensis in the study zones. We analyzed collections of M. chilensis from three different local environments: inner protected bays submitted to intense aquaculture (salmon and mussel production) in Reloncav ı zone, outer bay exposed to open sea influence in Chilo e Island, and extreme climatic conditions in Patagonia zone (scenario 1) with different contribution of  freshwater and variable levels of salinity. Across these three zones, the 58 outlier SNP panel showed the highest levels of genetic differentiation ever reported among collections from these three different zones, and they consistently discriminate individuals collected from these three geographical areas using DAPC, LOO, and STH reallocation methods. This pattern of genetic differentiation was concordant with an effect of isolation by distance at a large geographical scale. Our analyses support strong genetic differentiation between the Patagonia location and northern locations (Reloncav ı and Chilo e Island zones). This is consistent with previous studies using RAPD and allozyme markers (Toro et al. 2004b(Toro et al. , 2006, and also with our findings with the neutral panel of 981 SNPs. Large differentiation of samples from Isla Peel could be due to a combination of neutral and adaptive processes. Isolating (neutral) effects of the poleward Cape Horn current (Strub et al. 1998) presumably restrict migration of larvae from Patagonia zone to the north. Adaptation to local channel conditions in Patagonia is also possible with high precipitation levels (2.5 m/year), water discharge from rivers and glaciers, extreme photoperiod regime, and very low water temperature (5.5°C at 54°S) in this zone (Pantoja et al. 2011). Samples from the Canal Coldita location (Chilo e Island zone) were also highly distinct from the other two regions with both outlier SNP panels, which have not been previously detected with other molecular markers. This possibly reflects different environmental conditions in the southern area of Chilo e Island influenced by sea water (Strub et al. 1998) with salinity of 31.1 ppt reported in Quell on near the Canal Coldita location which is higher than the Reloncav ı zone (average salinity of 21.5 AE 1.55 ppt) (Aranda et al. 2015). Samples from the Reloncav ı zone in the inner sea of Chilo e (41-44°S) were distinct from the other two zones, supporting the finding that the population of M. chilensis in Chilo e is genetically divergent with these outlier loci. The four sites within Reloncav ı zone were not differentiated with both outlier SNP panels, possibly due to less variable environmental conditions with close levels of salinity ranging from 19.4 ppt in Cocham o (close to Canutillar location) to 21.0 ppt in Quillaipe and similar average water temperature (17.3 AE 0.7°C in summer and 11.4 AE 0.3°C in winter; Aranda et al. 2015). However, Reloncav ı samples were grouped in two DAPC clusters with balanced distribution of individuals from each location in each cluster. This pattern could indicate two divergent groups in the Reloncav ı zone that could be    (Avendaño et al. 2011). M. chilensis in the Reloncav ı zone typically have a seven month spawning season from September to March (Stotz 1981). Active spawning was observed from December/January to March in Quillaipe and Pichicolo by Avendaño et al. (2011) and seeds reach a size of 1-2 cm after 3 or 4 months after fixing (Clasing et al. 1998). This size is concordant with our sampling dates in June for Reloncav ı locations (Larrain et al. 2012), suggesting that seeds collected in this zone came from the natural banks of these same locations. Thus, the signal of genetic differentiation within this region would be due to either neutral variation among cohorts or possibly adaptive variation associated with family traits in each cohort.
Outlier tests have been shown to have high false-positive rates under certain study conditions, so candidate loci must be interpreted with caution (Lotterhos and Whitlock 2014). The F ST outlier detection method FDIST2 used here (Beaumont and Nichols 1996) has proved to have good performance in species with low genetic differentiation and can be powerful at detecting adaptive variation relative to other outlier tests under certain scenarios (Narum and Hess 2011;De Mita et al. 2013;Lotterhos and Whitlock 2014). In our study, some F ST outlier SNPs may be interpreted as genetic variation in response to adaptation to local environmental conditions (Beaumont 2005), but other alternative explanations are also possible (Gosset and Bierne 2013) such as the examples of different cohorts described above or other endogenous barriers (Bierne et al. 2011) and introgression (Fra€ ısse et al. 2015). Mytilus species are prone to produce hybrid in geographical areas where two sibling species coexist (Daguin et al. 2001;Toro et al. 2004a), given the opportunity to introgress alleles from one species into the other, producing higher F ST when these loci are evaluated in the genome of the recipient species. Such processes have been shown to cause signals of outlier F ST in two length polymorphism loci between Atlantic and Mediterranean populations of M. galloprovincialis (Gosset and Bierne 2013). Very large F ST values ≥0.5 can be found when different species of Mytilus and their hybrids are compared, even when outlier loci are removed (Zbawicka et al. 2014), but this large genetic differentiation was not evident with the neutral or outlier SNP panels developed here. However, in light of a hybrid zone recently reported in the Magellan Strait (Oyarz un et al. 2016), introgression is still a possibility to explain high F ST values reported here. This hybrid zone shows decreasing levels of hybridization from the Magellan Strait to the north and is far from Isla Peel, our southern sampling location.
In our study, both neutral and adaptive processes, and even introgression, may be contributing to high F ST values for outlier loci. Regardless of the underlying processes affecting these loci, these outliers provide useful tools for assigning stocks of M. chilensis to region of origin.

Traceability implications
Traceability of aquaculture products is necessary in order to effectively regulate poaching and overharvest of natural populations, maintain genetic diversity of propagated stocks (e.g., Ogden 2011) and for food quality and safety purposes (Larra ın et al. 2014). For M. chilensis, traceability with genetic methods is an important goal in order to preserve natural stocks and improve aquaculture practices in this species. Previous studies with small numbers of genetic markers (nine microsatellites) have only achieved 50.7% correct re-assignment to origin locations which is too low to be used effectively in traceability applications (Larra ın et al. 2014). However, our current study with 1240 SNPs has demonstrated that various panels of these markers can strongly assign individuals to the three macrogeographical zones included in scenario 1 (~92% of overall correct assignment). Also it was possible to reassign individuals to the Chilo e Island location with relatively high performance (~87%) using the 34 SNP panel (scenario 2).
In conclusion, our study demonstrates that species with high gene flow may require analyses at different geographical scales and multiple panels of putative adaptive loci to reveal hidden population structure. The SNP marker panels developed here increased assignment accuracy compared to previous microsatellite loci, probably due to the use of adaptive loci and the number of markers employed. Genetic traceability for M. chilensis with these F ST outlier SNP panels is much improved but it is necessary in future research, to test the effectiveness of these panels in a wider zone and across years to evaluate the temporal stability of allele frequencies.