RAD sequencing reveals genomewide divergence between independent invasions of the European green crab (Carcinus maenas) in the Northwest Atlantic

Abstract Genomic studies of invasive species can reveal both invasive pathways and functional differences underpinning patterns of colonization success. The European green crab (Carcinus maenas) was initially introduced to eastern North America nearly 200 years ago where it expanded northwards to eastern Nova Scotia. A subsequent invasion to Nova Scotia from a northern European source allowed further range expansion, providing a unique opportunity to study the invasion genomics of a species with multiple invasions. Here, we use restriction‐site‐associated DNA sequencing‐derived SNPs to explore fine‐scale genomewide differentiation between these two invasions. We identified 9137 loci from green crab sampled from 11 locations along eastern North America and compared spatial variation to mitochondrial COI sequence variation used previously to characterize these invasions. Overall spatial divergence among invasions was high (pairwise FST ~0.001 to 0.15) and spread across many loci, with a mean FST ~0.052 and 52% of loci examined characterized by FST values >0.05. The majority of the most divergent loci (i.e., outliers, ~1.2%) displayed latitudinal clines in allele frequency highlighting extensive genomic divergence among the invasions. Discriminant analysis of principal components (both neutral and outlier loci) clearly resolved the two invasions spatially and was highly correlated with mitochondrial divergence. Our results reveal extensive cryptic intraspecific genomic diversity associated with differing patterns of colonization success and demonstrates clear utility for genomic approaches to delineating the distribution and colonization success of aquatic invasive species.

Ecological impacts have been well studied in coastal marine habitats, which represent some of the most heavily invaded ecosystems due to human-mediated introductions from ship ballast and hull fouling, as well as the aquarium trade and aquaculture activities (Grosholz, 2002).
In contrast to ecological impacts, evolutionary dynamics of marine invasions are less well studied including both the genomic basis for invasiveness (Lee, 2002) and evolutionary consequences of invasion (Chown et al., 2015;Cristescu, 2015). Increasingly, it is becoming apparent that cryptic diversity can contribute to variation in invasion success and ultimately to impacts on marine and freshwater ecosystems (e.g., Folino-Rorem, Darling, & D'Ausilio, 2009;Lee, 1999;Sherman et al., 2016). A better understanding of the presence of intraspecific diversity within invasive species and how it correlates with variation in invasion success is necessary to successfully manage invaders and mitigate ecological and evolutionary impacts.
An evolutionary perspective of marine invasion dynamics and of cryptic diversity within invaders has been limited by the huge interspecific diversity among invasive taxa and a lack of genomic resources.
The ability to study "invasion genomics" using relatively new tools such as restriction-site-associated DNA sequencing (RAD-seq), RNA-seq genotyping, or transcriptomics allows the exploration of evolutionary history, connectivity, range expansion rates, and invasion patterns with markers from across the entire genome without the requirement of an existing whole-genome sequence (Chown et al., 2015;Rius & Darling, 2014;Sherman et al., 2016;Viard, David, & Darling, 2016). Such genomewide sampling techniques can be used to identify so-called invasion genes or marker associations with environmental factors (Chown et al., 2015;Wang et al., 2011) and can be useful in understanding patterns and rates of expansion (e.g., Krehenwinkel & Tautz, 2013), as well as the roles of neutral and adaptive pressures in invasion dynamics (e.g., Tepolt & Palumbi, 2015). Genomewide markers are particularly useful for marine invasive species whose adult and/or larval phases may be transported by coastal currents or by human-mediated vectors, and thus, invasion genomics can greatly facilitate our understanding of the processes behind the initial invasion and subsequent impacts at a scale not attained with typical genetic markers.
European green crab (Carcinus maenas) are a notorious invasive species now present on every continent with the exception of Antarctica (Roman, 2006). Green crab have a prolonged larval phase and high tolerance of variable environmental conditions allowing for further expansion throughout their introduced ranges, including both the Pacific and Atlantic coasts of North America (Roman, 2006). In the northwest Atlantic, green crab likely experienced multiple cryptic introductions, having invaded the eastern USA and Canada at least twice from genetically distinct northern and southern European populations in 1817 and again in the 1980s (Darling, Tsai, Blakeslee, & Roman, 2014;Roman, 2006). This species' range is continuing to expand, having recently been discovered in Placentia Bay, Newfoundland, in 2007, with a likely introduction c. 2002 from a Scotian Shelf population (Blakeslee et al., 2010). Green crab population structure has been investigated previously using traditional genetic markers, including mitochondrial DNA (16S, Geller, Walton, Grosholz, & Ruiz, 1997;COI, Darling, Bagley, Roman, Tepolt, & Geller, 2008;Roman & Palumbi, 2004;Roman, 2006) and microsatellites (Darling, 2011;Darling et al., 2008Darling et al., , 2014Pascoal et al., 2009;Tepolt, Bagley, Geller, & Blum 2006).
However, a genomewide comparison (i.e., RAD-seq) of fine-scale spatial structuring on the east coast has been absent and is warranted to compare these independent invasions and characterize the full genetic diversity of populations along this part of their invasive range.
Our overall objective was to evaluate population structuring using genomewide SNPs obtained by RAD-seq for green crab from 11 locations in eastern North America to explore the genomic extent of differences between colonization events. Specifically, our goals were to (i) quantify genomewide differentiation among samples encompassing both the invasions in eastern North America from northern and southern European source populations, (ii) describe the spatial distribution of these two invasions using a large number of genomewide SNPs, and (iii) compare spatial variation at both highly divergent and selectively neutral SNPs, as well as the mitochondrial cytochrome oxidase I (COI) gene to better understand the evolutionary dynamics associated with the recent invasion and northwards expansion of the species. We predict (i) widespread genomic differentiation among these independent invasions based on existing work and the geographic isolation of the source populations and (ii) that larger numbers of loci used here will reveal finer-scale population structure than has previously detected. We build directly on previous studies (e.g., Blakeslee et al., 2010;Darling et al., 2008Darling et al., , 2014Tepolt & Palumbi, 2015) by comparing neutral and divergent SNPs to COI sequence data between the two invasions. The datasets developed here will allow for monitoring future range expansion of populations at both the southern and northern limits of the current range and also permit the formulation of conservation strategies that are region-and population-specific.    with the restriction site) and maximizing the edit distance (Faircloth & Glenn, 2012). Based on edittags analysis (Faircloth & Glenn, 2012), the variable-length barcodes had an edit distance ranging from 2 to 8 and a modal edit distance of 6. Gel size selection performed after sonica-

| Bioinformatics
RAD data for 242 individuals were cleaned and demultiplexed using process_radtags in Stacks v.1.21 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013). We used ustacks for de novo loci formation with a minimum depth of 5 to create a stack and a maximum distance of 4 between stacks. One individual from Kejimkujik (KJI) was removed due to low sequence coverage, and so, we retained 241 individuals overall. We then used cstacks to construct a locus catalog based on sequence identity with 1 mismatch allowed between stacks.
We used settings of M = 4 as the maximum distance between two loci to merge and n = 1 to allow two catalog loci to merge in case they are differently fixed versions of the same locus. All RAD tags were 80 bp in length. These were then filtered in the Stacks populations module to include only RAD tags present in each population and in ≥75% of individuals and a minor allele frequency (MAF) greater than 5%. This dataset was filtered for missing data and deviations from Hardy-Weinberg equilibrium using PLINK (Purcell et al., 2007), where individuals missing >25% of loci and loci missing >5% of genotypes were removed.

| Population statistics and detecting divergent loci
Multiple approaches were used to identify loci putatively experiencing adaptive selection (outliers). We first employed BayeScan v2.1 (Foll & Gaggiotti, 2008)  where the highest major allele frequency per SNP among populations is given a value of 1.0 and the lowest major allele frequency a value of 0, for outlier and neutral SNPs, were created using custom R scripts.

| Population structure
Spatial structuring in neutral and outlier SNPs was explored using several methods. We conducted a discriminant analysis of principal components (DAPC) using adegenet 2.0.0 (Jombart, 2008;Jombart & Ahmed, 2011) in R to first find the number of clusters to best represent our 11 populations using the find.clusters function. All principal components were retained at this step. This function defines the best number of clusters using a Bayesian information criterion (BIC).
We then conducted the DAPC using the defined clusters as prior information and retained the first 50 principal components to determine the best value of K for our data. The DAPC was conducted on the full SNP panel, as well as on the neutral and outlier datasets separately.  (Slatkin, 1995) values and geographic distance in ade4 for the full SNP panel and the COI dataset as a test of isolation by distance. Least-cost distances between each pair of sampling locations were estimated in marmap (Pante & Benoit, 2013) implemented in R. An analysis of molecular variance (AMOVA) with 1,000 permutations was conducted for each dataset in Arlequin based on our optimal K = 2, where each sampled population was considered part of either a northern or southern group. Locus-specific F ST values ranged from 0.00001 to 0.551, and heterozygosities ranged from 0.095 to 0.564 for the full SNP panel ( Figure   S1B). More than half (52%) of the loci exhibited an F ST >0.05.

| Outlier SNP detection
We utilized three detection methods to obtain our outlier SNP panel. Principal component analyses on outlier and neutral SNPs revealed the same overall population structure as the DAPC, with clear separation of the northern and southern populations along the first principal component for both the neutral and outlier locus datasets ( Figure S3).
Neighbor-joining dendrograms of chord distances showed high support for northern and southern populations for each SNP panel.
Bootstrap support for each cluster using the outlier SNP set was 100, with PLB clustering with the southern population. The neutral SNP panel had bootstrap support values of 52-100, but PLB clustered with the northern population in this case (Figures 5 and S4). Bootstrap support for the northern populations was 100 when excluding PLB in all cases. The chord distances for the outlier tree was twice the magnitude of the full or neutral SNP panels.
Standardized allele frequencies of the outlier loci and neutral loci revealed a clear latitudinal cline overall with a sharp change in allele frequency occurring between CBI and KJI (Figures 6a and S5).
Interestingly, PLB shared more alleles with southern sites than northern sites (Figures 6a and S5). Linkage disequilibrium r 2 values among outliers ranged from 4.75 −07 to 0.92 following removal of loci located on the same RAD tag with r 2 values of 1.0 ( Figure 6b).

| Marker comparisons
Haplotype distributions displayed a clear transition from dominance of haplotype H01 in the southern population to dominance of haplotypes H04 to H06 in the northern population ( Figure 7). However, approximately 25% of the crabs collected at KJI and PLB were of the southern haplotype. This geographic split mirrors that exhibited by the spatial clustering analysis of the RAD-seq data above. Pairwise

| DISCUSSION
The use of genomic approaches to study invasive species demographics and evolutionary history has become increasingly common with

| Overall spatial population structure
Our results clearly reveal high levels of genomewide differentiation separating the two invasions into northern and southern groups. This is consistent with the findings of Tepolt and Palumbi (2015) who suggest secondary contact as the dominant structuring force in the introduced range. The presence of two distinct groups of green crab has been consistently reported previously using mtDNA (Roman, 2006), microsatellites , and RNA-seq SNPs (Tepolt & Palumbi, 2015). This broad structuring of the two independent invasions in turn mirrors the spatial structuring of the native European range and is likely maintained by environmental differences between the northern and southern regions (Roman, 2006). Our spatial analysis did reveal that two locations (PLB and KJI) appear intermediate between the northern and southern clusters which may be indicative of admixture and hybridization Tepolt & Palumbi, 2015). One of these locations, Placentia Bay, is characterized by heavy shipping traffic, and the introduction of a previously admixed population from the Scotian Shelf is likely being maintained at this site (Blakeslee et al., 2010). The other location, Kejimkujik, represents the area where the two invasions (north and south) are coming into contact which appears to be the likely source of observed hybridization . Previous studies suggest the possible southward movement of this contact zone over time Pringle et al., 2011) so the temporal stability of the patterns observed here remains to be seen.
The presence of these two groups was strongly supported by RADseq loci distributed across the genome. Of interest is the fact that the small panel of highly differentiated loci (i.e., outliers, ~1.2% of loci) detected the same level of population structuring as the full panel of SNPs, reinforcing the use of relatively few "diagnostic" markers for fine geographic scale population identification. Similar conclusions were F I G U R E 6 (a) Spatial clines in standardized allele frequency at each outlier locus for the 11 sampling sites sorted by latitude. (b) Heat map of linkage disequilibrium r 2 values for each outlier locus. Those loci with an r 2 of 1.0 were located on the same RAD tag, and so only, one locus per RAD tag was retained Europe (Roman, 2006). Our observations of genomewide divergence support a hypothesis of significant and long-standing isolation between the two invasions. Ultimately, isolating mechanisms in both the native or introduced range, the temporal stability of these patterns, and the role of hybridization and selection in structuring these two invasions in North America will require further study and is beyond the scope of this work.

| Genetic marker comparisons
Previous studies have used COI sequence variation to study green crab population structure and to trace the invasion history (Darling, 2011;Darling et al., 2008Darling et al., , 2014Roman, 2006). Overall, our SNP data were similar to COI in terms of resolving the current spatial structure of the invasions and suggests that patterns of divergence found in the mitochondria are widespread across the nuclear genome. We identified fewer COI haplotypes relative to previous studies in this system (Darling et al., 2008 and less evidence of northern haplotypes in the southern population. Nearly 25% of crabs at both KJI and PLB were of the southern haplotype, suggesting anthropogenic introduction from a mixed or hybridized Scotian Shelf population source to PLB (Blakeslee et al., 2010), and secondary contact between northern and southern populations at KJI where the southwards expansion of the more recent invasion met the northern limit of the initial invasion . Overall, the COI data were equally able to detect isolation by distance and population structuring via pairwise F ST values among sampling sites compared to the SNPs. The SNP panel clearly revealed this divergence was widespread across the genome and the outlier SNPs were better able to detect genetic differences between the northern and southern groups explaining significantly greater components of the spatial variance.

| Management and conservation implications
The data presented here are consistent with previous descriptions of green crab invasion history Tepolt & Palumbi, 2015), suggesting two groups with southern Nova Scotia acting as the (potentially transient) geographic divide between northern and southern populations. This location is consistent with an initial observation by Darling et al. (2014) who documented an admixture zone between northern and southern populations between New Brunswick and Nova Scotia. Knowledge of invasion histories is necessary to predict secondary spread, which ultimately will determine the impact of an invasive species, particularly those where introduction and spread are mediated by anthropogenic vectors (Cristescu, 2015). A better understanding of the population structuring and environmental factors that sustain this structuring within this invasive species will allow us to monitor and potentially predict range expansion, which in turn helps to focus the development of preemptive and targeted monitoring and mitigation strategies in Atlantic Canada. Green crab have the potential to impact commercial shellfish fisheries (e.g., McClenachan et al., 2015) and lead to regional reduction of eelgrass beds (Garbary et al., 2014;Matheson et al., 2016;Neckles, 2015), and so management strategies need to be implemented to minimize their environmental impact. In fact, it may be necessary to consider these crabs as two distinct ecotypes across their invasive range due to their potential adaptations to different thermal regimes and behavioral differences (Rossong et al., 2012)

| Summary
The recent green crab range expansion in eastern North America appears to be driven by a second invasion from the northern region of their native range in Europe (Roman, 2006). Our results show that genetic differences between the northern and southern populations of green crab in eastern North America are extensive and genomewide and present strong support for two genetically distinct groups of crabs that were the result of two separate invasions. These crabs may continue to expand northwards and southwards, and potentially hybridize at contact zones or regions high in shipping traffic. The genetic differences observed here are so distinctive that they warrant the division of green crab into two separate ecotypes in the northwest Atlantic, yet the precise latitudinal boundary between ecotypes has yet to be determined and may be transient. Further characterization of the outlier loci based on a whole-genome sequence and investigation of introgression between the two populations of green crab studied here will facilitate continued monitoring and future exploration.

ACKNOWLEDGMENTS
The authors wish to thank staff at Fisheries and Oceans Canada for their assistance with sample collection, in particular Erica Watson, Sophie Boudreau, Lorne Penny, Marie-Helene Theriault, Dawn Sephton, Roland Hagan, Paul Jivoff, and April Blakeslee. They also wish to thank the staff of the Aquatic Biotechnology Lab at the Bedford Institute of Oceanography for assistance with DNA extractions and RAD library preparation.
F I G U R E 8 (a) Isolation by distance relationships of linearized pairwise F ST values versus least-cost geographic distances between each site. Correlations were significant for both COI using a Mantel test with 1,000 permutations (r = .60, p = .008) and the SNP panel (r = .624, p = .004). The solid line represents the SNP correlation, while the dotted line represents the COI correlation. (b) Comparison of spatial structuring at all three SNP panels and mtDNA data. Mantel tests with 1,000 permutations showed strong (r = .93-.95) and significant (p < .001) correlations for each dataset. The solid line indicates the full and neutral SNP correlation, while the dotted line indicates the outlier relationship T A B L E 3 Analysis of molecular variance (AMOVA) for the (a) neutral SNP panel, (b) outlier SNP panel, and (c) COI data after categorizing each sampling location into a northern or southern population based on discriminant analysis of principal components

DATA ACCESSIBILITY
DNA sequences: All raw COI mtDNA sequence data will be uploaded to NCBI and accession numbers provided. The RAD-seq genotypes used in this study as well as all distance matrices and geographic least-cost distances will be deposited in Dryad prior to final acceptance. All corresponding sample codes and locations are in Table 1

CONFLICT OF INTEREST
None declared.

AUTHOR CONTRIBUTIONS
IRB, R. Beiko, and CDB conceived of the study. JF, KM, CHM, R.
Bernier, and CDB collected specimens for use in this study. LCH conducted DNA extraction and RADseq library preparation. NWJ, MVR, RES, PNR, R. Beiko conducted bioinformatics and statistical analyses of the data. All authors aided in the interpretation of the data and initial study design. All authors contributed to writing, revising, and approving the final draft of the manuscript.