Oceanographic variation influences spatial genomic structure in the sea scallop, Placopecten magellanicus

Abstract Environmental factors can influence diversity and population structure in marine species and accurate understanding of this influence can both improve fisheries management and help predict responses to environmental change. We used 7163 SNPs derived from restriction site‐associated DNA sequencing genotyped in 245 individuals of the economically important sea scallop, Placopecten magellanicus, to evaluate the correlations between oceanographic variation and a previously identified latitudinal genomic cline. Sea scallops span a broad latitudinal area (>10 degrees), and we hypothesized that climatic variation significantly drives clinal trends in allele frequency. Using a large environmental dataset, including temperature, salinity, chlorophyll a, and nutrient concentrations, we identified a suite of SNPs (285–621, depending on analysis and environmental dataset) potentially under selection through correlations with environmental variation. Principal components analysis of different outlier SNPs and environmental datasets revealed similar northern and southern clusters, with significant associations between the first axes of each (R 2 adj = .66–.79). Multivariate redundancy analysis of outlier SNPs and the environmental principal components indicated that environmental factors explained more than 32% of the variance. Similarly, multiple linear regressions and random‐forest analysis identified winter average and minimum ocean temperatures as significant parameters in the link between genetic and environmental variation. This work indicates that oceanographic variation is associated with the observed genomic cline in this species and that seasonal periods of extreme cold may restrict gene flow along a latitudinal gradient in this marine benthic bivalve. Incorporating this finding into management may improve accuracy of management strategies and future predictions.


| INTRODUCTION
The application of population genomic-based approaches to the study of marine population structure has revealed increasingly higher levels of genetic differentiation and population structure than previously recognized in multiple marine species (e.g., Benestan et al., 2015;Bradbury et al., 2013;Corander, Majander, Cheng, & Merila, 2013;Milano et al., 2014;Moura et al., 2014). Recent observations of finescale differentiation are changing our view of marine connectivity and marine population dynamics (Hauser & Carvalho, 2008). Limited dispersal may contribute to fine-scale population differentiation, but given large populations and large environmental gradients, selection may also contribute significantly to genetic differentiation among marine populations (Hauser & Carvalho, 2008). As such, studies supporting a role for selection in regulating marine connectivity continue to accumulate (Bradbury et al., 2010;Clarke, Munch, Thorrold, & Conover, 2010;Limborg et al., 2012;Milano et al., 2014;Sjöqvist, Godhe, Jonsson, Sundqvist, & Kremp, 2015;Van Wyngaarden et al., 2017). Researchers increasingly recognize the important role of selection in population connectivity, particularly for economically important species, because an accurate understanding of population structure and environmental influences can contribute to the identification of conservation units and allow prediction of a species' response to climate change (Allendorf, Hohenlohe, & Luikart, 2010;Conover, Clarke, Munch, & Wagner, 2006;Sale et al., 2005).
The sea scallop, Placopecten magellanicus (Gmelin) (Figure 1), is an economically important benthic marine bivalve with a range that extends from North Carolina, USA, to Newfoundland, Canada (Posgay, 1957). The scallop fishery in both the United States and Canada extends back over 100 years and is one of the most economically important fisheries on the east coast of North America (DFO 2016; Naidu & Robert, 2006;NOAA 2016). The sessile adult scallops live in isolated beds up to 300 m deep and undergo broadcast spawning.
Juvenile scallops have a planktonic period of development of up to 40 days, which is conducive with the potential for long distance dispersal among populations (Davies, Gentleman, DiBacco, & Johnson, 2014;Tian et al., 2009). The scallop range spans a vast latitudinal area where the cold Labrador Current meets the warm Gulf Stream. This convergence leads to large gradients in ocean temperature and other environmental factors experienced by different scallop populations, all of which may be influenced by the oceanographic properties of major currents and storm-related mixing along coastal areas (Townsend, Thomas, Mayer, Thomas, & Quinlan, 2006). Several oceanographic barriers (such as current fronts) along the range may also influence larval dispersal and survival among populations (Townsend et al., 2006).
Previous genetic work on the sea scallop detected significant but weak population structure among geographic locations off eastern Northern America using microsatellites, AFLPs, and SNPs (Kenchington, Patwary, Zouros, & Bird, 2006;Owen & Rawson, 2013;Van Wyngaarden et al., 2017). Recently, Van Wyngaarden et al. (2017) resolved significant spatial structuring in sea scallops visible primarily in outlier loci detected using F ST -based outlier detection methods and genomewide RAD-seq (restriction site-associated DNA sequencing)derived SNPs. Taken together, the results from these studies suggest that both limited dispersal and selection associated with local adaptation across the species range may spatially structure scallop populations, despite high potential for gene flow. Here, we build directly on existing studies and known population structure to identify environmental variables that may contribute to non-neutral divergence among sea scallop populations. Using the RAD-seq-derived SNPs identified in Van Wyngaarden et al. (2017), we focus solely on environmental association-based outlier tests to identify loci potentially under selection and directly compare our results to existing work and the results of previous outlier tests.
Considering the unique oceanographic features, the large latitudinal range, and previously identified clinal population structure along the range of the sea scallop, we hypothesize that directional selection F I G U R E 1 Placopecten magellanicus. By Dann Blackwood, USGS [Public domain], via Wikimedia Commons and local adaptation drive sea scallop population structure and that ocean temperature likely contributes significantly to adaptation of the species to its local environment. Our specific objectives are to: (1) explore spatial variation in environmental variables across the range of the sea scallop, (2) use environmental correlation-based outlier detection methods to pinpoint potential targets of environment-based selection across the genome of the sea scallop, and (3) identify potentially important environmental drivers of population structure and adaptation in scallops. This work represents one of the first population genomic studies on a bivalve species or on an invertebrate species with a planktonic juvenile and sessile adult life stage. This work also incorporates both environmental association-based outlier detection and nonlinear random-forest analysis (Breiman, 2001), a machinelearning strategy only recently applied to genomic analysis that can help to account for interaction and covariation between variables (Brieuc, Ono, Drinan, & Naish, 2015). We extend a previous study identifying latitudinal clinal trends in allele frequency across the range using 7163 RAD-seq-derived SNPs (Van Wyngaarden et al., 2017) and identify environmental associations and possible mechanisms.

| Sample collection and RAD-seq
Sample collection and RAD-seq follow Van Wyngaarden et al. (2017).
Using divers or bottom trawls, 252 adult scallops were collected from a total of 12 locations across the entire range of the species between 2011 and 2013 (Table 1, Figure 2). A minimum of 12 scallops were collected per population (mean ± SD, 20.4 ± 2.8 scallops).
Muscle tissue samples were collected and preserved in AllProtect (Qiagen) or 80% ethanol. DNA extraction and RAD-seq library preparation were performed at the Aquatic Biotechnology Lab, Bedford Institute of Oceanography in Dartmouth, Nova Scotia. RAD-seq libraries were prepared using Sbf1 as described by Etter, Preston, Bassham, Cresko, and Johnson (2011) (see also Etter, Bassham, Hohenlohe, Johnson, and Cresko, 2011) with modifications. Each library was created using 22 individuals from each sampling location (or 20 individuals for SUN) with a unique in-line barcode in the P1 adapter for each individual. The P1 adapter barcodes were 6 bp in length for all populations except for SSB, GEO, and SUN where the barcodes ranged from 5 bp to 9 bp. The barcodes for SSB, GEO, and SUN were chosen to ensure equal distribution of all nucleotides at each base position and to maximize the edit distance (Faircloth & Glenn, 2012). Based on edit tags analysis (Faircloth & Glenn, 2012), the variable length barcodes edit distance ranged from two to eight (modal edit distance was six SNPs were detected using the de novo pipeline in STACKS v.0.9999 (Catchen, Amores, Hohenlohe, Cresko, & Postlethwait, 2011). Loci were assembled using ustacks, requiring a minimum depth of coverage for a stack (m) of five and allowing four maximum nucleotide mismatches (M) between stacks. The catalog of loci was assembled using cstacks allowing a distance between loci in the catalog (n) of six. The final dataset was filtered using PLINK v.1.07 (Purcell, 2009;Purcell et al., 2007) to include only RADtags present in 75% of individuals in SNP discovery and calling; all SNPs included in the analysis were present in 75% of individuals with a minor allele frequency (MAF) greater than 5%. Furthermore, we excluded individuals with more than 20% missing loci from the analysis. Loci were filtered for Hardy-Weinberg equilibrium using the program GENEPOP v.4 (Rousset, 2008), excluding loci out of equilibrium in six or more populations from the analysis (<0.7% of all loci).

| Environmental data collection and processing
We amalgamated environmental data from several databases; from Fisheries and Oceans Canada: Climate (Gregory, 2004)  We averaged data from all data sources within a bounding box of one square degree around each sample site to create site-specific averages for each data type used in the analysis. Data were separated into surface and depth values based on the collection depth for each sampling location. Surface values encompassed depths between zero and 20 m except for collection sites less than 20 m depth, where 10 m was used as the surface cutoff. We averaged values from a cutoff approximately 10 m above a given collection depth to the collection depth for depth-profiled variables. In cases where multiple sample collection depths were provided, depth cutoff parameters were altered to include the entire range of collection depths ( minimum, maximum, and seasonal calculations. We repeated all analyses using only the temperature, salinity, and chlorophyll a variables (n = 36 variables, henceforth CST), given that we expected these to be the most likely to associate with selection; they not only provide evidence of food availability but also characterize water mass properties that can affect all trophic levels.

| Detection of outlier loci
We used two separate methods to detect outlier loci using both environmental datasets (four tests in total). The first method used a Bayesian framework implemented in the program BAYENV2 (Coop, Witonsky, Di Rienzo, & Pritchard, 2010;Guenther & Coop, 2013).
This method calculates a set of "standardized allele frequencies" that controls for population history and structure when detecting loci whose allele frequencies show significant associations with environ-  Table 3). We selected loci with BFs in the top 5% of the range of BFs for each bin as outliers.
Latent factor mixed models (LFMMs) as described in Frichot, Schoville, Bouchard, and François (2013) were implemented as the second method of outlier detection in the R package LEA (Frichot & François, 2015

| Environmental factors that influence genetic variation
We conducted principal components analysis (PCA) using the AllEnvOutlier and CSTOutlier loci using the R package adegenet (Jombart, 2008) to examine population structure among the sampled populations at outlier loci. To examine the relationship between environmental and genetic variation among our collection sites, we calculated population-specific allele frequencies for AllEnvOutlier and CSTOutlier using the R package gstudio (Dyer, 2014). Next, we ran PCA on population-specific allele frequencies for AllEnvOutlier and CSTOutlier (AllEnvOutlierPCA and CSTOutlierPCA), and the population-specific environmental data in AllEnv and CST (AllEnvPCA and CSTPCA) using the R package adegenet. Linear regression was then performed between the first principal component (PC) from AllEnvOutlierPCA (AllEnvOutlierPC1) and the first PC from the PCA on AllEnv (AllEnvPC1) as well as the first PC from CSTOutlierPCA (CSTOutlierPC1) and the first PC from the PCA on CST (CSTPC1).
We then conducted redundancy analysis (RDA), a multivariate canonical correlation analysis, using the R package vegan ( (Arnold, 2010).
We also used nonlinear random-forest (RF) analysis to identify important environmental variables and then compared key drivers with those identified using multiple linear regressions. One key attribute of RF is the automatic computation of variable importance, which allows us to determine which environmental variables influence population structure. Additionally, RF considers interaction between predictor variables and may manage the covariation among environmental variables more effectively than the multiple linear regression approach (Brieuc et al., 2015). This ensemble approach benefits from growing a large group of decision trees to improve overall performance.
RF cannot tolerate missing data, so we used a method based on weighted k nearest neighbors (KNN) called KNNcatImpute (Schwender, 2012) to impute the missing genotypes in our genetic SNP data using the scrime package in R (Schwender & Fritsch, 2013

| Gene ontology
We performed gene ontology (GO) analysis on AllEnvOutlier and CSTOutlier in the program Blast2GO (Conesa et al., 2005) using the program default parameters and InterProScan to improve GO annotation quality.

| Detection of outlier loci
The neutral matrices calculated to generate "standardized allele frequencies" for BAYENV2 varied little within runs and when compared to the F ST matrix calculated for the neutral loci; we therefore chose a single matrix for further calculations with BAYENV2. LFMM identified K = 2 as the most supported number of clusters (and thus latent factors) using GIF analysis, with values of 0.85 for AllEnv and 0.83 for CST. According to Frichot and François (2015), p-values in this analysis calibrate correctly when GIF approaches one.
Overall, LFMM identified more loci potentially under selection than BAYENV2. Combining the results from both programs, AllEnv identified 621 loci (8.7% of all loci) as under selection, whereas CST identified 285 loci (4.0% of all loci) as under selection; 250 loci were shared between the two datasets (Table 6). Using AllEnv, BAYENV2 detected 128 loci as putatively under selection (1.8% of total loci), whereas LFMM detected 511 (7.1% of total loci). Only 18 loci were common to both the BAYENV2 and LFMM sets. Using CST, BAYENV2 detected 72 loci (1.0% of total loci), whereas LFMM detected 218 (3.0% of total loci), with only five loci shared between the two methods. Within the BAYENV2 results, the AllEnv outlier list and CST outlier list shared 37 loci (Table S1). The LFMM analysis of AllEnv and CST overlapped completely in loci identified (Table S2). We also compared our combined  (Table S3).

| Patterns of genetic and environmental variation
PCA of all individuals and sets of outlier loci detected using AllEnv and CST both split north and south populations along the first PC, separating the populations into two clusters as seen in the BAYENV2 results ( Figure 3). Using AllEnvOutlier, the first PC explained 2.38% of the total explainable variance in the model, and using CSTOutlier, the first PC explained 3.33% of the total explainable variance. The PCA on the population-specific allele frequencies for AllEnvOutlier and CSTOutlier T A B L E 5 Number of Placopecten magellanicus individuals and number of SNP loci included in initial RAD-sequencing and final analysis following quality control

| Environmental factors that influence genetic variation
To examine the effects of climate versus geography on the genetic variation within the outlier SNP loci, we selected five PCs from AllEnvPCA and four from CSTPCA for use as explanatory variables in RDA, each explaining more than 5% of the total variance in the PCA. In AllEnvPCA, the five selected axes explained 89.78% of the total model variance, and in CSTPCA, the four selected axes explained 88.96% of the total variance.
Backwards stepwise variable selection on the RDA for AllEnv retained only AllEnvPC1 as an important explanatory variable, whereas selection on the RDA for CST retained both CSTPC1 and CSTPC4 (Figure 7). Both To choose environmental parameters to include in the multiple linear models, we examined variable weightings on the PC axes selected during RDA and retained the five most highly weighted variables from each axis. For all variables included in each global model, we calculated cumulative AIC c weights and model-averaged parameter estimates (Table 7).
Model selection using CSTEnv and all 10 selected environmental variables could not determine best fit models and provide accurate estimates for parameter weights and coefficients due to overfitting of the model. Using RF, we calculated the importance proportion for all environmental variables using both AllEnvOutlier and CST Outlier (Figure 9).
Using AllEnvOutlier, deep average summer salinity, deep minimum salinity (occurred in spring), and deep maximum salinity (occurred in autumn) ranked as the most important environmental variables. Surface average autumn temperature, deep average winter temperature, and deep minimum temperature (occurred in winter) were also selected as important variables. CSTOutlier once again ranked salinity-associated variables as most important; however, deep average winter temperature and deep minimum temperature ranked highly and the importance proportions for CSTOutlier exceeded those from AllEnvOutlier.

| Gene ontology
Blast2GO functionally annotated very few outlier loci. CSTOutlier determined annotation matches for only four loci (1.4% of loci), with a BLAST hit but no GO annotation at one further locus. In AllEnvOutlier, only five loci (0.8% of total loci) matched, with a BLAST hit but no GO annotation in one further locus. The two lists of outliers shared three matches, with GO annotations split between molecular function (calcium ion and carbohydrate binding) and metabolic processes (regulation of transcription and steroid hormone-mediated signaling). In CSTOutlier, GO annotation of the remaining locus identified a molecular function (oxidoreductase activity) and a metabolic process (oxidation-reduction process). In AllEnvOutlier, the GO annotations of the remaining two loci differed, one locus with molecular functions (oxidoreductase activity) and metabolic processes (oxidation-reduction process) and the other locus with several annotations (molecular function/catalytic activity, transferase activity, and folic acid binding, and metabolic processes/cellular metabolic processes) (Table 8).

| Environmental variables driving adaptation
Our results highlight ocean temperature as a critical environmental factor contributing to population structuring of the sea scallop. The sea scallop's distribution spans almost 10° latitude encompassing an  (Townsend et al., 2006). In contrast, the warm Gulf Stream moves north from the Gulf of Mexico along the east coast of North America. These two currents meet and move roughly offshore around Nova Scotia, exposing scallop populations to large differences in water temperature (and other oceanographic variables) in different areas of their range (Townsend et al., 2006). Our environmental PCAs clearly detected the differences in environment associated with these currents. The first PC in our environmental PCAs illustrates the split between northern and southern populations and for both AllEnv and CST explains more than 40% of the variation in the environmental data.
Using F ST -based outlier detection methods, Van Wyngaarden et al.
(2017) identified a strong genetic population separation related to these currents and the other oceanographic and environmental features in the region; populations in the north, generally exposed to colder temperatures, clustered separately from the southern populations, which are often exposed to warmer temperatures. This genetic split also clearly appears in the first PC of our genetic outlier PCAs, however the first   However, in many regions other environmental features often covary with temperature (e.g., salinity or ChlA) and in some analyses temperature may act as an unintentional proxy for the true selective force (a species may appear to adapt to temperature when in fact they are experiencing selection due to another variable such as ocean productivity). This may have particular relevance for the sea scallop, because our sampling locations and temperature gradient both span the same north-south axis. There is also some evidence of a slight north-south salinity gradient, with the lowest salinities in MGD and NTS and the highest salinity in MDA (Figure 6b,c). Although clear associations between genetic variation and temperature have been reported in several other species, including Pacific invertebrates (Pespeni & Palumbi, 2013), studies also demonstrate genomic adaptation to environmental gradients other than temperature, such as adaptation to salinity gradients in several Baltic Sea species (Berg et al., 2015;Limborg et al., 2012;Sjöqvist et al., 2015). In our analyses, in addition to cold temperatures RF analysis also identified salinity as an important environ-

| Mechanisms of adaptation
The genomic associations with ocean temperature during periods of extreme cold (i.e., winter) suggest temperature-associated mortality may significantly structure sea scallop populations. Sea scallops reproduce via broadcast spawning, generally in the autumn, although timing varies along their range. Given that scallops tend to spawn in the warmest water (Thompson, 1977), generally between August and October (Beninger, 1987;Langton, Robinson, & Schick, 1987;Naidu, 1970), and they likely settle before December (Naidu & Robert, 2006), a link between winter temperatures and larval mortality appears unlikely. Our analyses point to the overwinter survival of juvenile scallops as a potentially important structuring force limiting the effective dispersal of scallops between our northern and southern population groups, rather than selective mortality of planktonic larval scallops, and future experimental studies on larval and juvenile scallops may help to clarify this possibility. Some evidence suggests that temperatures experienced by adults can help ensure a healthy larval year class (Dickie, 1955;DuPaul, Kirkley, & Schmitzer, 1989;Kirkley & Dupaul, 1991;Langton et al., 1987;Macdonald & Thompson, 1985). Interestingly, our study identified surface temperature rather than temperature at depth as the most important driver of selection, contrary to expectations of juvenile scallop survival. One possible explanation is that deep temperature values are often estimated or provided as a range at collection sites, presumably reducing accuracy of those measurements relative to those for surface temperature. Our Blast2GO results identified possible genetic matches with several cellular processes, which may be temperature dependent, highlighting potential mechanisms of thermal adaptation in the sea scallop. Unfortunately, the lack of

| Alternative contributions to population structure
Despite the clear association observed with ocean temperature and population structure, selective processes may not be the sole driving The sea scallop range encompasses an area of complex oceanography and several processes could contribute to the neutral genetic separation of populations including potential current-related fronts that may prevent larval movement between regions, upwelling and water movement related to the continental shelf, and storm mixing along the coast (Townsend et al., 2006). These factors may inhibit the successful movement, settlement, and growth of larvae and can be difficult to accurately incorporate into models of connectivity that try to calculate larval dispersal. Additionally, larval behavior significantly impacts dispersal in many cases (Shanks, 2009) and can be difficult to accurately model and evaluate. Using postsettlement genetic structure to determine what processes influence connectivity among sea scallop populations, our methods inherently account for the effects of larval behavior and complexity when drawing conclusions. Although we believe our results to be robust to complications of neutral population structure and geographic distance, additional sampling (especially from populations at the same latitude) will help to more thoroughly separate the joint effects of climate and geography on scallop population structure as it may allow sampling of populations with a similar climatic profile at varying separation distances.

| Challenges and limitations
Many reviews on environmental association studies recommend removing the effects of neutral population structure to fully assess the effect of selection on population structure in natural systems (e.g., Rellstab, Gugerli, Eckert, Hancock, & Holderegger, 2015) and accounting for geographic distance and isolation by distance when examining potential isolation by ecology (e.g., Shafer & Wolf, 2013); however, this is a particular challenge in our system. Because a single northsouth population split characterizes our sample sites rather than a classic isolation-by-distance pattern (Van Wyngaarden et al., 2017), geographic distance among populations may not influence our results the way it would in a system characterized by a classic stepping-stone pattern. Our samples also align along the north-south axis of the population range providing few opportunities to examine the effects of distance between samples without also removing the effects of latitude. To minimize the potential bias of neutral population structure on our results, we focused our analysis solely on outlier loci potentially under selection in the genome, likely making our analyses less prone to the confounding effects of neutral population structure. We also compared the results of RDA and pRDA, which controls for geographic distance among populations. Even when controlling for geographic distance, our results nonetheless indicate climate as a significant population structuring force, although the patterns of population clustering change slightly.
Another source of bias in population genomics studies is the effect of age-related structuring in population samples, which has been documented in scallops previously (Owen & Rawson, 2013), potentially due to recruitment events or yearly environmental fluctuations.
Although our samples were collected over a relatively short time period (2011-2013), we made attempts to cover multiple age classes to avoid this issue.
Our analyses pinpointed potential environmental influences on sea scallop population structure; however, annotating the outlier SNPs of interest remains challenging. Although RAD-seq generates vast quantities of SNPs in organisms without reference genomes (Benestan et al., 2015;Catchen et al., 2013;Hohenlohe et al., 2012;Reitzel, Herrera, Layden, Martindale, & Shank, 2013), the lack of more detailed genetic resources makes inference on the causal mechanisms contributing to local adaptations in sea scallops difficult, as noted by our lack of GO matches. Fortunately, with continued development of resources for P. magellanicus and related species, future studies will likely identify and study the features most important in characterizing sea scallop population structure.

| CONCLUSIONS
Our results show that ocean climate plays a role in structuring populations of sea scallops, particularly the influence of the coldest