RAD sequencing of common whelk, Buccinum undatum, reveals fine‐scale population structuring in Europe and cryptic speciation within the North Atlantic

Abstract Buccinum undatum is a subtidal gastropod that exhibits clear spatial variation in several phenotypic shell traits (color, shape, and thickness) across its North Atlantic distribution. Studies of spatial phenotypic variation exist for the species; however, population genetic studies have thus far relied on a limited set of mitochondrial and microsatellite markers. Here, we greatly expand on previous work by characterizing population genetic structure in B. undatum across the North Atlantic from SNP variation obtained by RAD sequencing. There was a high degree of genetic differentiation between Canadian and European populations (Iceland, Faroe Islands, and England) consistent with the divergence of populations in allopatry (F ST > 0.57 for all pairwise comparisons). In addition, B. undatum populations within Iceland, the Faroe Islands, and England are typified by weak but significant genetic structuring following an isolation‐by‐distance model. Finally, we established a significant correlation between genetic structuring in Iceland and two phenotypic traits: shell shape and color frequency. The works detailed here enhance our understanding of genetic structuring in B. undatum and establish the species as an intriguing model for future genome‐wide association studies.


| INTRODUC TI ON
The reconstruction of spatial genetic structure can provide valuable insights into the evolutionary processes affecting speciessuch as genetic drift, adaptive selection, and gene flow between populations Funk et al., 2012;Kohn et al., 2006)-while facilitating the characterization of historical demographic events and ongoing evolutionary trajectories (Emerson et al., 2010). Marine invertebrate phylogeography in the north Atlantic Ocean has been shaped by climate variation during the Pleistocene Epoch; glacial cycles have repeatedly altered species' distributions over the last two million years. Several marine invertebrate species have gone extinct in North America and have subsequently been recolonized from Europe (Ingolfsson, 1992;Vermeij, 1991;Wares & Cunningham, 2001). Expansion from southern refugia since the last glacial maximum 18-25 k years Magnúsdóttir et al., 2010;Martel et al., 1986), further supporting the diversification of the Canadian and ENA lineages. Within the ENA lineage, multiple studies of B. undatum have reported patterns of local population structure following an isolation-bydistance model (Magnúsdóttir, Pálsson, Westfall, Jónsson, Goodall, et al., 2019;Mariani et al., 2012;Pálsson et al., 2014;Weetman et al., 2006) at distances as short as 30 km in Icelandic waters (Magnúsdóttir, Pálsson, Westfall, Jónsson, Goodall, et al., 2019;Pálsson et al., 2014). However, despite repeated observations of fine-scale population structure, genetic divergence (as measured by F ST ) across the entire ENA lineage is uncharacteristically low (Weetman et al., 2006). Large effective population size and recent divergence may explain the limited genetic differentiation.
Semi-continuity has also been proposed as a potential mechanism maintaining gene flow among the European populations (Mariani et al., 2012;Weetman et al., 2006).
Comparing intraspecific patterns of genotypic and phenotypic variation may also elucidate concordant or distinctive responses to environmental landscapes (Zamudio et al., 2016). Common whelk displays discordant relationships between genotypic and phenotypic variation across small geographic scales in Iceland (Magnúsdóttir et al., 2010) and Ireland (Mariani et al., 2012). In Breiðafjörður (western Iceland), shell traits such as shape and color are highly differentiated among geographically proximate populations (20-30 km), exhibiting small but significant amounts of neutral genetic variation (Magnúsdóttir, Pálsson, Westfall, Jónsson, Goodall, et al., 2019;Pálsson et al., 2014). Fine-scale phenotypic patterns have been documented, with shell shape and color exhibiting gradients from the inner to the outer bay that correlated with environmental variables (Magnúsdóttir et al., 2018). Limited demographic connectivity was apparent from the patterns of phenotypic variation; still, with only low levels of neutral genetic differentiation among the same populations, it was hypothesized that plastic responses to environmental variation are driving the observed phenotypic divergence (Mariani et al., 2012).
Although phenotypic variation may reflect population differentiation at the molecular level, a certain level of discordance can be expected due to sampling of markers, selection on functional traits, or plasticity. Previous studies on common whelk were based on a handful of loci (microsatellite and mtDNA COI) (Magnúsdóttir, Pálsson, Westfall, Jónsson, Goodall, et al., 2019;Mariani et al., 2012;Pálsson et al., 2014;Weetman et al., 2006).
Studies utilizing genome-wide genetic datasets such as RAD sequencing (Davey & Blaxter, 2010) may be required to delineate concordant responses between the genotypic and phenotypic variations observed in common whelk. RAD sequencing has been used to resolve fine-scale population structure in several marine invertebrates species, including the American and European lobster (Benestan et al., 2015;Jenkins et al., 2019), great and Mediterranean scallop (Vendrami et al., 2017(Vendrami et al., , 2019, sea scallop (Van Wyngaarden et al., 2016), and staghorn corals (Drury et al., 2016). Additionally, RAD sequencing has the potential to detect uncharacterized cryptic species complexes within known species distributions (e.g., in the River Limpet ).
As such, utilization of large-scale RAD sequencing datasets may help clarify the low overall genetic differentiation across the ENA lineage and provide further evidence of the cryptic species found within common whelk.
The current study utilizes new and extensive geographic sampling of common whelk across the North Atlantic combined with RAD sequencing technologies to assess the genetic structure within the species at varying geographic scales, ranging from a broad-scale analysis of genetic divergence across the North Atlantic to a more targeted fine-scale analysis of divergence within Iceland. In addition, broad-scale analyses (herein referred to as North Atlantic-specific analyses) aimed to evaluate the existence of cryptic species in Canada and Europe and the low genetic divergence within the ENA lineage (Magnúsdóttir, Pálsson, Westfall, Jónsson, Goodall, et al., 2019;Pálsson et al., 2014;Weetman et al., 2006). Within Iceland, individuals from Breiðafjörður Bay (western Iceland) display exceptionally high levels of shell color variation (Magnúsdóttir et al., 2018).
Detailed analyses of geographical patterns of phenotypic shell traits in Breiðafjörður have been described by Magnúsdóttir et al. (2018).
Therefore, in addition to describing fine-scale population trends in Breiðafjörður using a RAD sequencing approach, we aim to examine further the relationship between population genetic differentiation and existing phenotypic profiling data (shell color and shape) within the bay.  Table 1. For all individuals, foot tissue was sampled via dissection and fixed immediately in 96% ethanol. Samples were stored at 4°C for 2 days before being exchanged with fresh ethanol and stored at -30°C.

| DNA extraction, RAD library preparation, and sequencing
Fixed foot tissue was dried briefly (to evaporate residual ethanol) and cut into small pieces that underwent DNA isolation using an Omega Individuals were assigned a molecular identification tag using combinatorial barcoding of forward (5bp) and reverse (6bp) reads, respectively, then pooled into four separate libraries before size selection and amplification using 10 PCR cycles. Resulting library concentrations were quantified using SYBR Gold double-stranded DNA assay measured on a TECAN GENios plate reader (TECAN ™ , www.tecan. com), and the sizes estimated by 2% agarose gel. Library quality was assessed using two pilot sequencing runs on Illumina MiSeq (pairedend 2 × 150 bp). Libraries were sequenced using Illumina HiSeq 2500 (paired-end 2 × 125 bp) across four lanes, generating an average of 4 ± 3.5 SD million paired raw reads per individual.
The initial pass through the de novo stacks pipeline was de- GPS coordinates for each sample site can be found in Table 1 run in de novo mode (-P option enabled). Finally, populations analyses were run with -p 3 and -r 0.5 options enabled, where the --popmap designated individuals based on their country of origin (i.e., Canada, England, Faroe Islands, and Iceland).
The resulting populations .vcf file was imported into Radiator (Gosselin, 2017) for quality analysis. Radiator's filter_rad pipeline was run in interactive mode, with the aim of blacklisting poor-quality markers and individuals. Filtering parameters and outputs are listed in Table S2. After Radiator filtering, the following samples were  Table 1 TA B L E 1 Sample locations, depths, GPS coordinates, and sample numbers (pre-and post-data filtering) for Buccinum undatum sampled within Breiðafjörður (Iceland) and across the North Atlantic (Canada, England, Faroe Islands) Sstacks and tsv2bam steps were run using default settings, and gstacks run in de novo mode (-P option enabled), yielding 900,825 loci, with an effective per-sample coverage of 17.7 (± 6.8 standard deviation). Populations analyses were again run with -p 3 and -r 0.5 options enabled, and --popmap designating individuals based on their country of origin. Low-quality individuals and markers were again filtered using Radiator (parameters and outputs listed in Table   S3). The number of individuals retained, postfiltering, across all geographic scales of analysis is listed in Table 1. The average coverage per marker following Radiator was 18.6 (±6.3 standard deviation).
Departure from Hardy-Weinberg equilibrium (HWE) was assessed using a chi-squared test (χ 2 ) of expected genotype frequencies in the R-package Pegas (Paradis, 2010), where the false discovery rate of p was controlled for using Benjamini and Yekutieli (2001) procedure for multiple testing. HWE tests were undertaking in a population-wise manner, where population status was assigned based on country of origin, except for Icelandic individuals who were assigned population status relative to their sample site of origin in Breiðafjörður. In addition to population-wise chi-squared values, combined probabilities across all populations were calculated using Fisher's combined probability (Sokal & Rohlf, 1969) (Foll & Gaggiotti, 2008), with parameters set for 50,000 iterations with prior odds of 100, following 20 pilot runs of 5,000 interactions each, a thinning interval of 10, and a burn-in length of 50,000. Confirmation of Markov chain convergence was conducted using Gelman and Rubin multiple sequence diagnostic and Heidelberger and Welch's Convergence diagnostic tests within the R-package CODA (Plummer et al., 2006).
Putative adaptive SNPs were removed from the North Atlantic or Icelandic datasets using Adegenet (Jombart & Ahmed, 2011) (loci were considered outliers under a false discovery rate (FDR) of 0.05).

All downstream analysis of the quality filtered North Atlantic-and
Iceland-specific datasets were conducted independently.

| Analysis of genetic variation and population divergence
The estimation of genetic variation and divergence was undertaken in both North Atlantic and Icelandic datasets using the neutral SNP dataset exclusively. Mean observed heterozygosity (H O ), expected heterozygosity (H E ), and inbreeding coefficient (F IS ) per population were calculated using Hierfstat (Goudet, 2005). All pairwise F ST calculations were computed in DartR following 1,000 permutations (Gruber et al., 2018). The presence of putative population clusters (K) within the genomic data was estimated using the find.clusters function of Adegenet (Jombart, 2008;Jombart & Ahmed, 2011). The optimal number of clusters (K) was selected by computing the value of K with the lowest associated goodness of fit value estimated with BIC.
Population genetic structure was visualized by using Gower PCoA ordination (using Euclidean distance) in DartR. Population structure was visualized using PCoA both when considering population assignments as sample site of origin or as a function of the putative population clusters (K) assignments. Population admixture was estimated using the Discriminant Analysis of Principal Components (DAPC) in Adegenet (Jombart et al., 2010). The number of principal components retained within DAPC was optimized using the a-score optimization test prior to analysis. Admixture was expressed both when considering population assignments as sample site of origin or as a function of the putative population clusters (K) assignments.
Isolation-by-distance was assessed by analyzing the association of genetic and geographic distances for all individuals using a Mantel test in Ade4 (Dray & Dufour, 2007;Mantel, 1967

| Comparison of genetic and phenotypic trends in Breiðafjörður, Iceland
Comparisons of genetic and phenotypic distances were performed for the Icelandic dataset only. Phenotypic data corresponding to each of the Icelandic sites was sourced from Magnúsdóttir et al. (2018).
No phenotypic data could be sourced from RAU (Rauðasandur), and therefore, individuals from this site were excluded from phenotype association analyses. Phenotypic data corresponding to the following measures were sourced: (a) shell shape, based on eleven geometric morphometric landmarks on the ventral surface of the shell that combine information on shell spire/body whorl ratio, aperture shape, shell shape, and shell lip thickness; and (b) shell color frequency, based on the primary color of the shell (See Table S4).
Correlation of genetic distances (euclidean) with phenotypic distances was tested on a per-sample site basis, using Mantel's test, with 1,000 permutations in the R-package vegan (Oksanen et al., 2017).
Euclidean distances were calculated for variation in shell shape between sites, while differences in shell color composition at each site were assessed using Bray-Curtis distances (Oksanen et al., 2015).
To account for spatial autocorrelation due to either geographic distance or differences in depth, a partial Mantel test was implemented (Smouse et al., 1986) whereby either the effect of geographic distance or depth differences was kept constant.
The relationship between population genetic and phenotypic structuring was tested using cluster (K = 2) population assignments.
Correlation between putative population clusters and shell shape differences were assessed using permutational multivariate analysis of variance of shape distances, implemented using the adonis() function in vegan (Oksanen et al., 2017) with the assigned putative genetic clusters as the explanatory variable. Similarly, correlation between putative population clusters and shell color were assessed by comparing differences in shell color frequencies between putative population clusters, using Fisher's exact test (Fisher, 1934).

| Population genetic structure and divergence in Iceland
Quality filtering reduced sample representation across all sample sites (particularly for ODD), with Iceland-specific analyses being conducted across BJA (n = 9), BRJ (n = 9), HVA (n = 34), HRU

| Comparison of genetic and phenotypic variation in Iceland
No significant correlation between shell shape distances or color dissimilarity were observed (p > .05) when considering individuals on a site-specific basis, regardless of correction for spatial autocorrelation (geographic distance or depth). Sites were grouped into putative population clusters based on K-means clustering (K = 2) and analyzed for correlations between genetic distance and phenotypic traits, shell shape distances, and color frequency (Magnúsdóttir et al., 2018). Significant differences in both color frequency (Fisher's test: p = .0005) and shape distances (F′ = 15.95, p = .001) were observed between the two putative genetic clusters. The outer bay cluster consisted primarily of lighter shell colors, with orange and white being the most common (Table S4), while the inner bay cluster consisted predominantly of green and brown shells. With regard to shell shape, the outer bay cluster was correlated with rounder shells, a proportionally shorter spire (compared with the body whorl), and a more elongate aperture (Magnúsdóttir et al., 2018), whereas the inner bay cluster was correlated with more elongate shells, with a comparatively taller spire, and smaller rounder apertures.

| D ISCUSS I ON
The current study undertook a comprehensive analysis of North Atlantic population genetic structure in the common whelk utilizing a RAD sequencing approach. WNA (Canada) and ENA (England, Faroe Islands, and Iceland) populations were highly divergent, with an F ST ≥ 0.57 relative to within-ENA divergence (F ST ≤ 0.1), bolstering previous claims that the two lineages constitute cryptic species that diverged under allopatric speciation (Magnúsdóttir, Pálsson, Westfall, Jónsson, Goodall, et al., 2019;Pálsson et al., 2014). Smaller Direct-developing, benthic marine species are hypothesized to exhibit greater population genetic structure, relative to broadcast spawners, due to their limited dispersal capabilities. While rafting of egg masses may confer some means of dispersal for direct-developing gastropods (see Johannesson, 1988;Kyle & Boulding, 2000;Marko, 2004), the frequency of such events is unlikely to maintain gene flow to an extent which counteracts the effects of high self-recruitment. Existing population genetic studies of B. undatum described genetic structuring following an isolationby-distance model across the North Atlantic (Magnúsdóttir, Pálsson, Westfall, Jónsson, Goodall, et al., 2019;Mariani et al., 2012;Pálsson et al., 2014;Weetman et al., 2006); however, it remained unclear whether the variation observed from mitochondrial/microsatellite markers was representative of broader genomic trends. In the present study, trends in population structure both within Iceland and Our results reaffirm mtDNA data at the North Atlantic scale, indicating the Canadian and ENA populations constitute two distinct genetic lineages, which diverged some 2.1 million years ago . Our results support claims of low, but significant, genetic divergence as a persistent feature within the ENA lineage. Further research is required to derive the mechanisms driving low divergence within the ENA; however, our data suggest that maintenance of gene flow under a stepping stone model is unlikely given little F I G U R E 3 Summary of inferred population genetic structure for Buccinum undatum (n = 164) sampled from across the North Atlantic. All plots are derived from 35,518 putatively neutral SNP. Plots corresponding to the following: (a) Gower PCoA ordination (using euclidean distance) plot where population is assigned relative country of origin; (b) Admixture plot where population is assigned relative country of origin; (c) Gower PCoA ordination plot where population is assigned relative to optimal cluster (K = 3) assignments; (d) Admixture plot where population is assigned relative to optimal cluster (K = 3) assignments. Individuals are labeled relative country of origin: Canada (CAN, orange), Iceland (ICE, purple), Faroe Island (FAR, blue), and England (ENG, green) to no admixture is observed between Icelandic, Faroe Islands, and English whelk. At the Icelandic scale, our estimates of F ST (ranging from 0.003 to 0.068) fall within the ranges previously estimated for Breiðafjörður using mtDNA (ranging from 0.0003 to 0.145) (Pálsson et al., 2014). On a regional basis, global F ST for Iceland (global  To summarize, trends in genetic diversity and divergence in Iceland do not indicate unidirectional offshore migration, as seen in Britain. Instead, modeling of migration patterns indicated that some degree of migration likely occurs between all sites; however, said migration is primarily restricted to shallower more coastal sites and, even then, occurs mostly within the putatively assigned outermost and innermost bay population clusters (a trend also reflected by estimations of admixture).
Interestingly, both BJA and ODD, which represent the deepest as witnessed by findings of glacial moraines (Norddahl et al., 2008).
Given the species' limited dispersal capabilities, the formation of mul-

F I G U R E 5
Summary of inferred population genetic structure for Buccinum undatum (n = 104) sampled from Breiðafjörður, Iceland. All plots are derived from 35,836 putatively neutral SNP. Plots corresponding to the following: (a) Gower PCoA ordination (using euclidean distance) plot where individual population is assigned relative to capture site in Breiðafjördur; (b) Admixture plot where individual population is assigned relative to capture site in Breiðafjördur; (c) Gower PCoA ordination plot where individual population is assigned relative to optimal cluster (K = 2) assignments; (d) Admixture plot where individual population is assigned relative to optimal cluster (K = 2) assignments.

ACK N OWLED G EM ENTS
The project was funded by research grants #141302-051 and #196417-051 from the Icelandic Centre for Research (RANNÍS).
We want to thank Símon Sturluson, the captain of Ronja SH-53, and Will Butler for their assistance in sampling in Breiðafjörður.
We are also immensely grateful to Bernard Sainte-Marie (Maurice

CO N FLI C T O F I NTE R E S T
None to declare.

AUTH O R CO NTR I B UTI O N
Jake Goodall: Conceptualization (equal); Data curation (equal);