Genomic survey of edible cockle (Cerastoderma edule) in the Northeast Atlantic: A baseline for sustainable management of its wild resources

Abstract Knowledge on correlations between environmental factors and genome divergence between populations of marine species is crucial for sustainable management of fisheries and wild populations. The edible cockle (Cerastoderma edule) is a marine bivalve distributed along the Northeast Atlantic coast of Europe and is an important resource from both commercial and ecological perspectives. We performed a population genomics screening using 2b‐RAD genotyping on 9309 SNPs localized in the cockle's genome on a sample of 536 specimens pertaining to 14 beds in the Northeast Atlantic Ocean to analyse the genetic structure with regard to environmental variables. Larval dispersal modelling considering species behaviour and interannual/interseasonal variation in ocean conditions was carried out as an essential background to which compare genetic information. Cockle populations in the Northeast Atlantic displayed low but significant geographical differentiation between populations (F ST = 0.0240; p < 0.001), albeit not across generations. We identified 742 and 36 outlier SNPs related to divergent and balancing selection in all the geographical scenarios inspected, and sea temperature and salinity were the main environmental correlates suggested. Highly significant linkage disequilibrium was detected at specific genomic regions against the very low values observed across the whole genome. Two main genetic groups were identified, northwards and southwards of French Brittany. Larval dispersal modelling suggested a barrier for larval dispersal linked to the Ushant front that could explain these two genetic clusters. Further genetic subdivision was observed using outlier loci and considering larval advection. The northern group was divided into the Irish/Celtic Seas and the English Channel/North Sea, while the southern group was divided into three subgroups. This information represents the baseline for the management of cockles, designing conservation strategies, founding broodstock for depleted beds and producing suitable seed for aquaculture production.


| INTRODUC TI ON
The genetic structure of marine species and patterns of connectivity between discrete populations is central to their health and resilience to external pressures such as parasites and pathogens (Rowley et al., 2014), pollution, exploitation and climate change over ecological and evolutionary timescales (Burgess et al., 2014;Cowen & Sponaugle, 2009). Detection of genetic structure in marine species remains challenging due to the often large effective population sizes and high levels of gene flow facilitated by the scarcity of physical barriers, which would be expected to lead to genomic homogenization (Danancher & Garcia-Vazquez, 2011;do Prado et al., 2018;Zhao et al., 2018). However, genetic differentiation can be driven by features such as oceanographic barriers (e.g. current systems, fronts, gyres and eddies) or environmental gaps limiting dispersal (Blanco-González et al., 2016;Nielsen et al., 2004;Vera et al., 2016;Xuereb et al., 2018), but also natural local selection in response to environmental variation (Clucas et al., 2019;Jiménez-Mena et al., 2020;Vilas et al., 2015). Furthermore, historical vicariance, not yet erased by gene flow, and reproductive isolation can be other factors explaining differentiation (Gagnaire et al., 2015;Riginos et al., 2016;Saha et al., 2021). Distinguishing between neutral and adaptive genetic variation in the marine landscape has become a central issue in conservation biology, allowing for interpreting genetic variation in both historical/demographic and adaptive terms (Bernatchez, 2016;Nielsen et al., 2009). This information is essential for identifying the genetic diversity needed for conservation and breeding programmes in marine aquaculture (do Prado et al., 2018;Hughes et al., 2019).
The edible cockle, Cerastoderma edule, is a bivalve mollusc naturally distributed along the Northeast Atlantic coast, from Senegal to Norway, inhabiting intertidal soft sediments (Hayward & Ryland, 1995). The species plays a crucial role as a food source for birds, crustaceans and fish (Norris et al., 1998). Moreover, the species is highly appreciated for cuisine, and its main fisheries are located in Ireland, the United Kingdom, France, Spain and Portugal, where its commercialization drives employment of thousands of collectors, processors and sellers (http://www.cockl es-proje ct.eu/). Cerastoderma edule is dioecious and can live up to 10 years in the wild with a fast sexual maturation (reached in its first year) and high fecundity (Honkoop & van der Meer, 1998), with its reproductive period spanning from late spring to mid-autumn (Mahony et al., 2020;Malham et al., 2012). Larvae are planktonic and remain in the water column for around 30 days, which allows for larval dispersal by ocean currents that drive connectivity and gene flow between populations along the Northeast Atlantic coast (Dare et al., 2004;de Montaudouin et al., 2003).
The genetic structure of C. edule across the Northeast Atlantic has been studied over the last 40 years. Pioneering studies using allozymes detected genetic differences between populations located on either side of the English Channel (collected in Wales, France and The Netherlands; Beaumont et al., 1980), but also a high connectivity and gene flow from France to Denmark (Hummel et al., 1994), although other processes such as low genetic drift effect combined with recent population divergence may explain the observed patterns. Mitochondrial DNA (mtDNA) sequencing carried out on a wider sampling scale (from Morocco to Russia) revealed the presence of two major mtDNA groups in northern and southern areas, suggesting the presence of a northern cryptic refuge for C. edule (Krakau et al., 2012;Martínez et al., 2015). Studies using microsatellites showed high homogeneity in the southern beds from Portugal and Spain (Martínez et al., 2013, while two main clusters were identified in the northern area in the British Isles and the North Sea. In summary, three major areas were defined from microsatellite data: (i) a southern region (Morocco, Portugal, Spain and French beds up to the English Channel); (ii) Ireland, Great Britain and southern North Sea (the Netherlands and Germany); and (iii) a northern group (Scotland, Denmark, Norway and Russia) . However, the low amount of microsatellite markers used has greatly limited the investigation of local adaptation and population connectivity at the fine scale necessary for the appropriate management of exploited species (Bernatchez et al., 2017).
Recently, Coscia et al. (2020) analysed the genetic structure and connectivity among cockle populations within the Celtic/Irish Seas using Restriction-site Associated DNA sequencing (RADseq) data and a population genomics approach in combination with information on ocean conditions and larval dispersal modelling. They identified a significant genetic differentiation in the regional study area (three groups; F ST = 0.021), mainly driven by salinity, larval dispersal by oceanographic currents and geographical distance. These results suggest that a finer structure can underlie cockle distributions in the Northeast Atlantic and that a genomic scan covering southern and northern beds combined with understanding of the dispersal of larvae in the area is necessary to understand how the species is structured for its appropriate management.
In this study, we applied a 2b-RADseq genotyping by sequencing approach to assess the genetic structure of C. edule along the Northeast Atlantic coast considering environmental drivers and models of larval dispersal with the aim of providing essential information for the sustainable management of its resources. Major regions previously identified with microsatellites were confirmed, but refined information was obtained, mostly in agreement with the ocean dynamics and resulting larval dispersal patterns across the Northeast Atlantic.

K E Y W O R D S
2b-RAD, adaptive variation, fisheries management, genetic structure, larval dispersal modelling 2 | MATERIAL S AND ME THODS

| Oceanography of the study area
The study area covers the Northeast Atlantic from southeast Portugal to northeast Ireland and the southern North Sea (Figure 1). This area is divided into several oceanographic regions: Iberian coastal waters, the Bay of Biscay, the English Channel, the Celtic Sea, the Irish Sea and the North Sea. These regions are to some extent discrete units with limited oceanographic connectivity between them resulting from either divergent coastal currents or frontal currents generated during summer heating (Galparsoro et al., 2014). During summer months (corresponding to the spawning period of C. edule), when upwelling is a prominent feature along the Galician coast (NW Spain), the Portuguese coastal currents transport surface waters southward along the west coast of the Iberian Peninsula, whereas during the winter months the Iberian Poleward Current shoals and moves surface waters northwards (Teles-Machado et al., 2016). Along the southern Bay of Biscay, a strong westward transport develops during the summer months which changes direction during the winter months and links into the slope current along the Armorican and Aquitaine Shelf (W France). The coastal circulation along the east Bay of Biscay is characterized by northward transport by the Iberian Poleward Current during the winter, which reverses in direction and reduces in strength during the summer months (Charria et al., 2013). Tidal mixing fronts separating mixed from seasonally stratified waters form in early summer and extend into the autumn at the eastern entrance to the English Channel (Ushant Front) Group "Grepma" (1988) and between the Celtic and the Irish Sea (Celtic Sea Front). From late spring to early autumn, a current system develops in the Celtic Sea that transports waters from southwestern Britain via F I G U R E 1 Study area for Cerastoderma edule genetic analysis and larval dispersal modelling. Ocean bathymetry is shaded in blue. Summer surface currents are schematically represented by magentacoloured arrows (see section Study Area for a detailed description and references). Locations of fronts are depicted by purple dotted lines (CSF, Celtic Sea Front; UF, Ushant Front). Location of the C. edule beds for the genetic analysis are shown in dark red (see Table 1 for location codes) and particle release locations for larval dispersal modelling are shown in yellow and numbered from 1 to 51. Location codes in panel A are detailed in Table 1 the frontal jet associated with the Celtic Sea Front to the south and west coasts of Ireland as the Irish coastal current (Brown et al., 2003;Fernand et al., 2006;Horsburgh et al., 1998). Flow through the English Channel is generally north-eastward and links into the anticyclonic circulation of the North Sea, which is characterized by a southward flow along the east-coast of the UK that joins into the coastal current transporting waters northward along the southern North Sea coastlines and the Norwegian Coastal Current (Winther & Johannessen, 2006).

| Sampling
A total of 545 individuals from 14 cockle beds distributed along the Northeast Atlantic Coast (Figure 1) were collected during the period 2017-2019 and stored in 100% ethanol (Table 1). Temporal replicates, to analyse genetic stability across generations, were obtained for six beds in 2017 and 2018 (Table 1). All specimens belonged to the 0+ class of the year, and they were collected to avoid generation overlapping between consecutive year cohorts.

| Single nucleotide polymorphism genotyping
Total DNA was extracted from gill tissue samples using the e.Z.N.A. E-96 mollusc DNA kit (OMEGA Bio-tech), following manufacturer recommendations. Single nucleotide polymorphism (SNP) identification and selection, as well as genotyping and validation protocols followed those described by Maroso et al. (2019). Briefly, AlfI IIb restriction enzyme (RE) was used to construct the 2b-RAD libraries, which were evenly pooled for sequencing in Illumina Next-seq including 90 individuals per run. The recently assembled cockle's genome (794 Mb; A. Bruzos, J. M. C. Tubio, unpublished data) was used to align reads from each individual using Bowtie 1.1.2 (Langmead et al., 2009), allowing a maximum of three mismatches and a unique valid alignment (-v 3 -m 1). Individuals with <250,000 reads were discarded. STACKS 2.0 (Catchen et al., 2013) was then used to call SNPs and genotype a common set of markers in the sample set, applying the marukilow model with default parameters in the gstacks module of Stacks 2.0. This SNP panel was further filtered by applying the following criteria: (i) genotyped in >60% individuals in the total sample; (ii) minimum allele count (MAC) ≥3 in the total sample; (iii) conformance to Hardy-Weinberg equilibrium within each sample (HWE) across the whole collection; that is, loci with significant deviation from HWE (p < 0.05) in more than 25% of samples were removed; and (iv) selection of the most polymorphic SNP in each RAD-tag.

| Genetic diversity and population structure
Genetic diversity (i.e. observed (H o ) and expected (H e ) heterozygosities, proportion of polymorphic loci), departure from Hardy-Weinberg equilibrium (HWE) and intrapopulation fixation index (F IS ) were estimated for each bed using GENEPOP v4.0 (Rousset, 2008) and ARLEQUIN v3.5 (Excoffier & Lischer, 2010). H e , H o and F IS were also calculated for each population using only markers that were polymorphic within each bed (MAC ≥ 2). Linkage disequilibrium between pairs of loci regarding physical distance (bp) was estimated with r 2 across the cockle genome using PLINK 1.9 (Chang et al., 2015; http://www.cog-genom ics.org/plink/ 1.9), and its significance calculated through exact tests over genotypic contingency tables using GENEPOP v4.0.
Global and pairwise relative coefficients of population differentiation (F ST ) between cockle beds were calculated with ARLEQUIN v3.5 using 10,000 permutations to test for significance. The number of genetically homogenous population units (K) was estimated using the variational Bayesian clustering method implemented in the package fastSTRUCTURE v2.3.4 (Raj et al., 2014) for K = 1 − number of samples + 1, with an admixture ancestry model, convergence criterion of 1 × 10 −7 , five cross-validated sets and the simple prior (flat-beta prior). The most likely number of K was estimated using the 'chooseK.py' programme. This programme gives the best K value and K corresponding with weak population structure in the data using the heuristic scores. Summarized outputs were carried out using the software DISTRUCT 1.1 (Rosenberg, 2004). Discriminant analyses of principal components (DAPC) were run in ADEGENET package (Jombart et al., 2010;Jombart & Ahmed, 2011) for the R platform (R Development Core Team, 2014; http://www.r-proje ct.org/). Data were transformed using PCA (principal component analysis), and an appropriate number of principal components (PC) and discriminant functions (DF) were retained to explain >90% of the variance. Analyses of molecular variance (AMOVA) to study the distribution of genetic diversity within (F SC ) and among (F CT ) bed groups obtained from fastSTRUCTURE analyses were carried out with the program ARLEQUIN v.3.5 and their significance tested with 10,000 permutations. Isolation by distance (IBD) was evaluated by the correlation between geographical (measured as the shortest ocean distance) and genetic (measured as F ST /1 − F ST ; Rousset, 1997) distance matrices checked with a Mantel test with 10,000 permutations using NTSYS v.2.1 (Rohlf, 1993). All these analyses were performed with the complete SNP data set (9309 markers), the neutral data set (8021 markers) and the detected divergent outlier loci data set (554 markers) (see Section 3). Finally, the search algorithm implemented in the TREEMIX program (Pickrell & Pritchard, 2012) was used to infer population history of cockle beds. The program creates a maximum-likelihood phylogenetic tree that incorporates admixture on the basis of allele frequencies and a Gaussian approximation to genetic drift, allowing patterns of splits and mixtures in multiple populations to be inferred. The different beds were used as independent clusters, grouping the temporal replicates, and only the neutral data set was used. To avoid biases linked to missing data, the neutral data set was further filtered for this analysis to retain only markers shared by at least 90% of the individuals (2940 SNPs).
To account for linkage disequilibria, SNPs were grouped in windows of five markers. Given the density of markers used, this number ensured that markers in different windows were unlinked (see below). A tree without migration events was initially generated; then, migration events were sequentially added one by one up to 14, and the likelihood was monitored to check when it reached a plateau for estimating the number of migration events.

| Outlier tests and gene mining
Several statistical approaches were applied to identify outlier loci subject to selection in different geographic scenarios, namely the whole Northeast Atlantic region and the two main northern and southern groups identified by fastSTRUCTURE (see Section 3). The Bayesian F ST -based method implemented in BAYESCAN v2.01 (Foll & Gaggiotti, 2008) was used by grouping individuals by beds in all the analyses performed. BAYESCAN was run using default parameters (i.e. 20 pilot runs; prior odds value of 10; 100,000 iterations; burn-in of 50,000 iterations and a sample size of 5000). Additionally, the FDIST F ST method implemented in ARLEQUIN v3.5, which uses a maximum-likelihood approach (Beaumont & Nichols, 1996), was applied to incorporate a priori information regarding population structure. Thus, two different scenarios were tested for the whole corresponding to the cockle's haploid karyotype (Insua & Thiriot-Quiévreux, 1992). To establish the size of the windows for genome mining and the reliability of each region under selection, linkage disequilibrium (LD) was evaluated across the cockle's genome within each cockle bed. Furthermore, we also evaluated LD using the whole data set, keeping in mind population structure at genomic islands of differentiation produces enhanced LD. It was expected that those regions under selection would show a higher LD than the average due to selective sweeping. LD was represented against physical distance for all pairs of markers within each chromosome across the whole genome using PLINK 1.9, between markers separated up to 5000 kb in each bed and up to 500 kb in the whole cockle collection of the Northeast Atlantic respectively. The corresponding r 2 values (the square of the correlation coefficient checking for the nonrandom association of alleles at pairs of loci) were averaged within 50 and 1 kb genomic windows within each bed and for the whole sampling collection respectively. Considering that LD was negligible above

| Landscape analyses
Genetic differentiation explained by the different spatial (latitude and longitude) and abiotic factors (see below) was studied following a canonical redundancy analysis (RDA) using the VEGAN software (Oksanen, 2015)   for the period 2014-2018. The nearest model cell with water was taken to extract the data. Then, averages for the two seasonal extremes, winter (i.e. from January to March) and summer (i.e. from July to September), and for the spawning season (i.e. from April to September, see Malham et al., 2012;Mahony et al., 2020) were calculated for each bed.
ANOVA was performed to test the significance of the variance associated with the different variables using 1000 random permutations with VEGAN. The variance inflation factor (VIF) was estimated to explore collinearity (correlation) among landscape variables in the data set. VIF values higher than 10 would denote important collinearity problems, and values from 5 to 10 moderate problems, while values lower than 5 would indicate no collinearity problems (Marquardt, 1970). Different models were adjusted following a forward selection process with the PACKFOR package in R (Dray et al., 2009). This selection process corrects for highly inflated type I errors and overestimated amounts of explained variation (Vandamme et al., 2014). Thus, the reduced panel of explanatory variables is used to recalculate the total proportion of genetic variation in the variance partitioning. The weight of the different loci on the significant environmental vectors was obtained using VEGAN. All these analyses were performed separately for the complete, neutral and divergent outlier SNP data sets for the whole Northeast Atlantic region. etry.fr/en/data/produ cts/auxil iary-produ cts/globa l-tide-fes.html).

| Larval dispersal modelling
Freshwater fluxes from rivers were implemented for 33 rivers across the model domain. The model has been extensively validated and quality controlled (Sotillo et al., 2015(Sotillo et al., , 2020. For the larval dispersal model, particles were released from 51 coastal sites (see Figure  While it is possible that spawning varies latitudinally, a recent literature and data review (Mahony et al., 2020) found no clear evidence of such a pattern. We therefore apply the same spawning period for all sites. At present, very little is known about cockle larval behaviour. Existing studies show that bivalve larvae appear not to be randomly dispersed throughout the water column but are aggregated at certain depths in the water column (mostly between 10 and 40 m) (Irigoien et al., 2004;Mann, 1985;Raby et al., 1994;Scrope-Howe & Jones, 1986;Tremblay & Sinclair, 1990) with a number of studies showing an association of bivalve larvae with the thermocline (Raby et al., 1994;Scrope-Howe & Jones, 1986;Tremblay & Sinclair, 1990). The approach taken here allowed us to explore different scenarios of dispersal focussing on the ocean current dispersal at different depths rather than on very uncertain and potentially even 'artificially' induced larval behaviours that are not backed by evidence from real ocean scenarios.
Results are presented in terms of probability distribution maps and connectivity networks. To calculate connectivities, it was assumed that the larvae were able to settle during the last 10 days of their larval stage (i.e. days 31-40); therefore, each particle's positions during this time were used for the analysis of larval connectivities between sites. Larval connectivities were calculated as the percentage of larvae released from the source location arriving within a square of 0.2° latitude/longitude distance of one of the other 50 sink locations or returning to the source location (self-recruitment). Connectivities were averaged over the three vertical release levels, but also presented separately in the Supporting Information.

| SNP genotyping and genetic diversity within beds
A total of ~2000 M raw reads were sequenced (~3.7 M reads per sample

| Outlier detection and gene mining
Detection    (Table 3; Tables S3 and S4). A total of 446 genes were identified in those windows, which showed significant enrichment (FDR 5%) for GO terms re-  (Table S4). All these observations suggest selective sweeps at those regions, that is selection of favourable haplotypes driven by particular environmental factors determining LD and/or loss of genetic diversity.
These regions included clusters of outliers (average 4.9), ranging from 3 (several) to 12 (C4 and C12) with a total of 106 annotated genes (Tables S4 and S5). Although no significant functionally enriched GO terms were identified (FDR 5%) among this set of genes, several ubiquitin-related genes, ATP-binding kinases, Kelch domain-related proteins, DNA binding/metabolism-related proteins, transporter, exocytosis and transmembrane proteins were F I G U R E 2 Linkage disequilibrium (r 2 ) with regard to physical distance between pairs of markers in the Cerastoderma edule genome in a representative cockle bed (Noia) and using the whole data set from the Atlantic Area found, four of them (ubiquitin conjugation factor E4 B, reverse transcriptase domain-containing protein, NRAMP, monocarboxylate transporter 12-like) including paralogous copies in the same or in different chromosomes (Table S5).

| Population structure: temporal and geographical factors
All pairwise F ST values between temporal replicates were nonsignificant (p > 0.050), suggesting temporal genetic stability between consecutive cockle cohorts in the Northeast Atlantic (Table S6). This stability was confirmed when integrating the whole data set using an AMOVA analysis, where the percentage of variation associated with differences among temporal replicates within beds (F SC ) was nonsignificant and negligible, while the percentage among sampling sites (F CT ) was highly significant (p < 0.001) and above 3% (Model I,

TA B L E 4 AMOVAs for European
Cerastoderma edule beds in the Northeast Atlantic region. For models based on fastSTRUCTURE results (II, III and IV), beds were grouped following the groups found with this programme including the Bay of Biscay (Arcachon, France) and the western Iberian coast beds (Figure 3a). Significant IBD patterns using the different SNP data sets were suggested in the northern group (complete data set: r = 0.49761, p = 0.041; neutral data set: r = 0.35403, p = 0.074; outlier data set: r = 0.65359, p = 0.020) and southern group (complete data set: r = 0.44919, p = 0.067; neutral data set: r = 0.55559, p = 0.039; outlier data set: r = 0.58677, p = 0.003), indicating that the IBD pattern observed in the whole Northeast Atlantic region is not an artificial consequence of vicariance between two genetically differentiated genetic groups. AMOVA using these two groups showed that this structuring (F CT = 2.98% of the total genetic diversity) captured close to the 80% of the total differentiation among beds (F ST = 3.77%) (Model II, Table 4). The best K values for the neutral and the outlier data sets were 1 and 2, respectively, with identical results to those found with the complete data set when the number of simulated clusters was low (3 and 2, respectively; Figure S2). AMOVA results with the neutral data set were similar to those obtained with the whole data set, the main difference being also detected among groups (F CT = 1.22%) close to 75% of total differentiation among beds (Model III, Table 4). The second most probable clustering suggested by the outlier data set  Table 4), while the lowest variance proportion among beds within groups (Table 4), confirming their genetic homogeneity. Within the two main northern and southern groups, fastSTRUCTURE suggested K = 1 as the most probable value for all the analyses except for those using outlier data sets.
With these latter data sets, the most likely structure of the northern and southern groups was identical to that suggested by the analysis with the complete data set (three groups in the northern region and three in the southern region, see Figure S2). Moreover, for the northern region, at higher number of simulated clusters (K = 7) each bed constituted a single cluster ( Figure S2). Although sampling with higher spatial resolution in the Bay of Biscay could F I G U R E 3 Population structure of Cerastoderma edule in the studied region using fastSTRUCTURE for the complete data set (a) and the divergent outlier data set (b) taking into account all beds studied. Each vertical bar represents one individual, and the colour proportion for each bar represents the posterior probability of assignment of each individual to the different clusters (K) inferred by the programme (K = 2 and K = 9 for a and b respectively). Codes are shown in Table 1 N Results using the complete data set (a), neutral data set (b) and divergent outlier data set (c) are shown. Codes are given in Table 1 (a) aid to refine genetic structure in edible cockle, ours results suggest that the cluster structure found is consistent.
Discriminant analyses of principal components plots confirmed the results found with fastSTRUCTURE regarding the main north-south subdivision, but further clustering was suggested within groups. The analyses with the complete and neutral data sets showed an important scattering within each group (Figure 4a, and southern groups using their respective outlier data sets also fitted with the clustering suggested by fastSTRUCTURE ( Figure S3).
The population trees provided by TREEMIX without migration events also identified the aforementioned groups, clustering beds according to northern and southern groups and identifying the substructure suggested by outliers ( Figure 5). The best number of migration events was three, with the strongest connection between Irish Sea beds and weaker connections between beds from the northern and the southern groups. An additional TREEMIX analysis was also run with beds grouped according to the six previously described clusters identified with fast STRUCTURE based on 554 outlier markers. This tree provided similar results with the clustering of i, ii, iii (northern group) in one branch and iv, v and vi (southern group) in the other. The best fitting model was for three migration events, with the connection between groups i and iv being strongest ( Figure S4).

| Genetic-environmental associations
When all the seascape variables were included in the RDA analyses, latitude was suggested as the main driver for genetic differentiation, for the three different data sets (i.e. complete, neutral and outlier divergent loci) and seasons (i.e. reproductive, winter and summer) analysed (Table 5). Latitude was mainly associated with the first axis (Figure 6a), which separated the beds in the two groups suggested by fastSTRUCTURE (i.e. northern and southern groups).
Moreover, sea surface temperature (SST) was a suggested driver for the complete and divergent data sets during the reproductive and winter periods respectively. When spatial variables (latitude and longitude) were excluded from the analysis, SST was the main driver in most scenarios, separating northern and southern groups for all data sets (Figure 6b). Also, SSS appeared as the main driver during summer for all data sets and in the reproductive period for the outlier data set (

F I G U R E 6
Redundancy analyses (RDA) plots of Cerastoderma edule samples from the studied area using the different genetic data sets (complete: complete data set; neutral: neutral data set; divergent: divergent outlier data set) and seasons (reproductive period, winter and summer) taking into account all landscape variables (a) and only abiotic factors (b). BSS, bottom shear stress; SBS, sea bottom salinity; SSS, sea surface salinity; SST, sea surface temperature. Population code colours: northern region (red), southern region (blue)

(a) (b)
so despite this information still being useful, it should be considered with caution for interpretation and management decisions.

| Genetic diversity
Genetic diversity is crucial for the adaptation of natural populations to environmental changes. The preservation of connectivity among populations is essential to maintain the genetic diversity within and between populations thus contributing to their genetic robustness and aiding to counteract the threat of demographic depletion caused by overexploitation or diseases (Frankham et al., 2002). Moreover, genetic diversity in wild populations is important for commercial species, such as C. edule, to support breeding programmes for selecting high-value seed for intensive production in particular areas (do Prado et al., 2018;Lallias et al., 2010;Petereit et al., 2018;Vera et al., 2019 (Etter et al., 2011). Genetic diversity observed for C. edule in our study was also similar to that reported for the sis- and Crassostrea gigas (no MAF filtering but mean MAF of selected SNPs = 0.182; H e ~ 0.300; Gutierrez et al., 2017), although in the latter cases comparison is not so straightforward since the filtering scenario was not the same. These data suggest that despite genetic diversity differences could in part be due to filtering protocols and genotyping with preselected SNP chips, the genus Cerastoderma appears to lie in the lower range of genetic diversity of molluscs studied to date.
Regarding genetic diversity within the edible cockle, higher genetic diversity would be expected at lower latitudes of the Northern Hemisphere because of their role as glacial refugia during Quaternary glaciation (Hewitt, 2000). This hypothesis fits to our data as genetic diversity (H e ) and the proportion of polymorphic loci (MAC ≥2) were slightly higher but significant or marginally significant in the south-  (Martínez et al., 2013), and even supported higher diversity in the Celtic/Irish Seas, English Channel and the North Sea  corresponding to the present northern group. Moreover, higher genetic diversity has been found in northern beds from the Fennoscandian region and Russia, using mtDNA markers, suggesting a cryptic northern glacial refuge for C. edule (Krakau et al., 2012;Martínez et al., 2015), which has also been described for other marine species (Luttikhuizen et al., 2008;Maggs et al., 2008;Sotelo et al., 2020). Our extensive genome-wide analysis indicates slight differences in genetic diversity for the edible cockle in the whole Northeast Atlantic compatible with the existence of important glacial refuges in the south.

| Population structure and connectivity
Knowledge of population structure, including local adaptations and historical processes, is crucial to define and apply sustainable strategies for the management and conservation of exploited species (Bernatchez et al., 2017;Frankham et al., 2002). For C. edule, the presence of at least two main population units within the Northeast Atlantic, delimited by the western English Channel, had been reported in previous studies (Krakau et al., 2012;Martínez et al., 2015) and has been confirmed in this study with a more in depth population genomic analysis using both whole, neutral and outlier SNP data sets with several complementary statistical approaches (i.e. STRUCTURE, DAPC, TREEMIX analyses). These two units would correspond to the two glacial refugia of the species during the Pleistocene, with Ireland, Great Britain and southern North Sea being the suggested area for the secondary contact between those divergent lineages . The northern group in our study, comprising of the Celtic and Irish Seas, the English Channel and North Sea beds, corresponds to the northern group defined by Martínez et al. (2015). This structure has also been documented in other mollusc species with a similar distribution range such as O. edulis (Vera et al., 2016). The results from our larval dispersal modelling, based on potential ocean-driven larval flows between sites along the Northeast Atlantic, suggest a potential discontinuity between the Bay of Biscay Genomic scans help detect footprints of selection with regards to the neutral background, which represents invaluable information for sustainable management and conservation of fisheries (Bernatchez, 2016;Nielsen et al., 2009). Genetic markers showing a significant departure, above or below the neutral data set are potential outliers under divergent or stabilizing selection, respectively. Divergent outliers can unravel a fine-scale structure related to environmental variables critical for resources management (Coscia et al., 2020;Longo et al., 2020;Schulze et al., 2020;Vandamme et al., 2021;Vera et al., 2019;Whitaker et al., 2020). Temperature, salinity and other abiotic factors have been suggested as potential drivers for adaptive differentiation in C. edule (Coscia et al., 2020), as in other marine species in the region (do Prado et al., 2018;Vera et al., 2016). In our study, the RDA analyses also suggested the effect of temperature and salinity in a wider geographical range, the major genetic subdivision northsouth in the Northeast Atlantic being associated with a latitudinal annual mean temperature gradient of ~5.5°C between the warmest (Ria de Formosa) and the coldest station (Dundalk). However, these results should be taken with caution due to the collinearity between variables in half of the scenarios tested (VIF > 10).
We carried out detailed gene mining on the most consistent genomic regions putatively under divergent selection. They were targeted using different criteria to avoid false positives (two or more consecutive outliers criterion) and to confirm characteristics compatible with selective sweeps (significance and magnitude of linkage disequilibrium). Signals of balancing selection were very weak, as expected in a low structure scenario, and the few markers detected did not meet the criteria established. to gene flow as found in other species (Domingues et al., 2010;López et al., 2015;Riquet et al., 2019). This substructure did not appear to be related to dispersal limitations of larvae according to our modelling data and neutral data set. Therefore, more detailed sampling should be carried out in the Bay of Biscay to define the potential biogeographical barriers in this region. Northwards, the areas in the central Bay of Biscay defined as the ICES VIIIb fishery subzone (Bay of Biscay centre) have previously been described as potential barriers for different marine organisms such as Hippocampus guttulatus (Riquet et al., 2019) and Zoostera noltei (Chust et al., 2013).
As outlined above, the genetic structure of the edible cockle in the Northeast Atlantic shows a notable correspondence with the larval dispersal modelling performed here, although some discordances are suggested depending on the set of markers used (neutral vs. outlier loci). However, both data sets might be fully compatible if other variables such as larval depth in the water column is considered in larval dispersal modelling. The simulations carried out with larvae located at three different depths suggest that deeper depths (15 and 30 m) correlate better with the major subdivision observed with neutral loci (north vs. south) and are in accordance with other studies on bivalve larval behaviour, which show that larvae are preferably located below the surface of the water column rather than close to it (Irigoien et al., 2004;Scrope-Howe & Jones, 1986).
This result also highlights the importance of oceanographic studies of larval distributions throughout the water column, and that this knowledge on larval behaviour cannot be solely gained from tank experiments under laboratory conditions. Large uncertainties persist since the larval behaviour of C. edule with regard to swimming behaviour remains to be studied in detail. While the surface releases showed a higher degree of interannual variability, the deeper releases had stable connectivity patterns between years, thus tying in with the genetic structure which seems stable interannually.
It could be expected that larval dispersal and environmental variables had distinct imprints on genetic diversity, unless patterns of dispersal and environmental variables were correlated and influenced by the same oceanographic features. Moreover, different spatial patterns of genetic diversity associated with neutral and outlier markers would be expected, the former being related to ocean current patterns and fronts and outlier loci with environmental factors driving selection (Riginos et al., 2016). However, as suggested before, a correlation between larval dispersal and environmental factors seems to occur in the northeast Atlantic scenario, and thus, neutral and outlier markers show quite similar outcomes at least at larger scales. This is clear when considering the two groups divided at French Brittany and the related north-south variation in temperature, but it is likely that ocean current patterns and seasonally generated frontal flows may too be structuring environmental ecosystems, and hence the correlation observed, not only at macrogeographic level, but also at a meso-scale. Moreover, both types of markers would also be affected by historical population processes, and accordingly, outliers could also be disclosing other phenomena different from selection, such as hybridization between the main genetic units identified or secondary contacts between divergent lineages related to glacial refugia. Thus, genomic regions associated with divergent selection could be related to local adaptation, but also to genetic incompatibilities or differences in mate choice among differentiated units or lineages (Bierne et al., 2011(Bierne et al., , 2013Vilas et al., 2015). A refined analysis of genetic diversity at a local scale around the main fronts and physical barriers described here additionally covering holy sampling in the Biscay Bay would be essential to disentangle the different factors shaping genetic variation of the edible cockle in the Northeast Atlantic.

| Fisheries management
Despite sampling limitations (e.g. limited sampling along Bay of Biscay), the discontinuities together with further information on larval behaviour and more realistic modelling would aid to a more detailed picture.
Further, the imminent release of the annotated cockle's genome will add more refined information on candidate genes in genomic regions subjected to divergent selection to detect signals of local adaptation.
Moreover, the information from this study might be useful to define sets of markers, starting from outlier loci, which could be applied to found brookstock for restocking depleted populations and to track individuals to their units that could aid the identification of illegal transferences between countries or from disease-affected areas. Our data represent the baseline to monitor restocking and to evaluate the impact of intensive aquaculture on cockle beds.

ACK N OWLED G EM ENTS
The research leading to these results has received funding from the Interreg Atlantic Area Programme through the European Regional

CO N FLI C T O F I NTE R E S T
Authors have no conflict of interest to declare.