Population genomic structure of the gelatinous zooplankton species Mnemiopsis leidyi in its nonindigenous range in the North Sea

Abstract Nonindigenous species pose a major threat for coastal and estuarine ecosystems. Risk management requires genetic information to establish appropriate management units and infer introduction and dispersal routes. We investigated one of the most successful marine invaders, the ctenophore Mnemiopsis leidyi, and used genotyping‐by‐sequencing (GBS) to explore the spatial population structure in its nonindigenous range in the North Sea. We analyzed 140 specimens collected in different environments, including coastal and estuarine areas, and ports along the coast. Single nucleotide polymorphisms (SNPs) were called in approximately 40 k GBS loci. Population structure based on the neutral SNP panel was significant (F ST .02; p < .01), and a distinct genetic cluster was identified in a port along the Belgian coast (Ostend port; pairwise F ST .02–.04; p < .01). Remarkably, no population structure was detected between geographically distant regions in the North Sea (the Southern part of the North Sea vs. the Kattegat/Skagerrak region), which indicates substantial gene flow at this geographical scale and recent population expansion of nonindigenous M. leidyi. Additionally, seven specimens collected at one location in the indigenous range (Chesapeake Bay, USA) were highly differentiated from the North Sea populations (pairwise F ST .36–.39; p < .01). This study demonstrates the utility of GBS to investigate fine‐scale population structure of gelatinous zooplankton species and shows high population connectivity among nonindigenous populations of this recently introduced species in the North Sea. OPEN RESEARCH BADGES This article has earned an Open Data Badge for making publicly available the digitally‐shareable data necessary to reproduce the reported results. The data is available at: The DNA sequences generated for this study are deposited in the NCBI sequence read archive under SRA accession numbers SRR6950721–SRR6950884, and will be made publically available upon publication of this manuscript.


| INTRODUC TI ON
Invasive species are widely recognized for their negative effects on biodiversity and ecosystem functioning (Carlton & Geller, 1993;Simberloff et al., 2013). During the last decades, globalization of maritime traffic has increased invasion rates of marine organisms by facilitating dispersal over large geographical distances (Hulme, 2009;Ricciardi & MacIsaac, 2000;Ruiz, Fofonoff, Carlton, Wonham, & Hines, 2000). Although only a fraction of the introduced species is able to thrive in a new environment, establishment of permanent populations can have dramatic effects on the local community (Katsanevakis et al., 2014;Molnar, Gamboa, Revenga, & Spalding, 2008;Ojaveer et al., 2015). Therefore, management of nonindigenous species should be a priority for marine conservation. Effective control measures and impact prediction rely on an accurate understanding of dispersal and population connectivity, which can be investigated with genetic approaches (Allendorf, Hohenlohe, & Luikart, 2010;Chown et al., 2015;Sherman et al., 2016;Viard, David, & Darling, 2016). Moreover, genetic reconstruction of invasion histories provides the opportunity to study the eco-evolutionary mechanisms underlying long-distance dispersal, range expansion, and local adaptation (Cristescu, 2015;Sax et al., 2007).
Marine invasions often affect valuable coastal and estuarine ecosystems (Grosholz, 2002). Marine species typically show limited population differentiation, and establishing appropriate management units is challenging. Estimating population connectivity is complicated by many factors specific to the marine environment (Palumbi, 2003). Obvious physical barriers to migration are seemingly absent, and habitat connectivity and high mobility promote long-distance dispersal (Allendorf et al., 2010;Cowen & Sponaugle, 2009). Distribution ranges and population densities are shaped by seascape features such as ocean currents and physicochemical fluctuations and temperature (Hohenlohe, 2004;Johansson et al., 2015;O'Connor et al., 2007;Selkoe et al., 2016), and demographic processes such as local recruitment (Jones, Planes, & Thorrold, 2005).
Moreover, the dispersal capabilities of pelagic organisms can be affected by other factors such as antropogenic pollution (Puritz & Toonen, 2011). Marine invertebrate species typically have a high reproductive output, and large population sizes prevent the accumulation of neutral divergence by genetic drift (Deagle, Faux, Kawaguchi, Meyer, & Jarman, 2015). Traditional genetic methods may lack the resolution to identify genetically differentiated populations. In contrast, genotyping-by-sequencing (GBS) allows identification of SNPs at thousands of loci to investigate fine-scale population structure and accurate population assignment in the context of weak genetic structure (Andrews, Good, Miller, Luikart, & Hohenlohe, 2016;Davey et al., 2011;Narum, Buerkle, Davey, Miller, & Hohenlohe, 2013).
One of the most successful marine invaders is the ctenophore Mnemiopsis leidyi. This species is native to the Atlantic coasts of North and South America but is nowadays widespread across European seas. Introduction of M. leidyi in coastal ecosystems can induce community trophic cascades (Tiselius & Møller, 2017), and outbreaks have coincided with alarming changes in the pelagic food web (Oguz, Fach, & Salihoglu, 2008;Shiganova & Bulgakova, 2000). The invasive success of M. leidyi is attributed to its broad tolerance for environmental variability (Fuentes et al., 2010;Purcell, Shiganova, Decker, & Houde, 2001), flexible planktivorous diet (Costello, Bayha, Mianzan, Shiganova, & Purcell, 2012;Costello, Sullivan, Gifford, Van Keuren, & Sullivan, 2006;Rapoza, Novak, & Costello, 2005), and high fertility (Costello et al., 2012;Jaspers, Møller, & Kiørboe, 2015). Individuals are simultaneous hermaphrodite and capable of self-fertilization. Maturity can be reached in a few weeks, after which thousands of eggs per day can be released (Jaspers, Costello, & Colin, 2014). Mnemiopsis leidyi is pelagic through its entire life cycle (Rapoza et al., 2005), which possibly promotes long-distance dispersal via ocean currents. Molecular studies have identified distinct genetic clusters in Southern Europe (Black, Caspian and Mediterranean Sea) and Northwestern Europe (Baltic and North Sea; Bayha et al., 2015;Ghabooli et al., 2011;Reusch, Bolte, Sparwel, Moss, & Javidpour, 2010). These originate from distinct introductions that can be linked to the climatic conditions in the indigenous range. The Southern European cluster originates from a limited number of founders from the Gulf of Mexico, and subsequent outbreaks along the coast suggest a stepping-stone scenario of colonization (Bolte et al., 2013;Fuentes et al., 2010;Ghabooli et al., 2013). The Northern European cluster originates from the Atlantic coast of North America.
However, these studies were based on a limited number of genetic markers and provided limited geographical resolution for risk assessment of contemporary outbreaks in Northwestern Europe. In the indigenous range, coastal embayments and estuaries support overwintering populations that populate adjacent coastal areas (Costello et al., 2006). Similar source-sink dynamics are expected in the nonindigenous range in Northwestern Europe (Collingridge, Molen, & Pitois, 2014;Schaber et al., 2011;Vansteenbrugge, Ampe, Troch, Vincx, & Hostens, 2015).
In the current study, we used high-density SNP markers to investigate the spatial population structure of M. leidyi on three geographical scales. First, we estimated fine-scale population differentiation and connectivity within the Southern part of the North Sea. Sampling locations covered three potential source populations (the ports of Dunkirk and Ostend, and the estuary of the Scheldt River) and potential sink populations along the Belgian coastal zone. We investigated the presence of distinct source populations and estimated their contribution to the coastal population. Based on source-sink dynamics, we expected to find higher genetic diversity in the three putative source populations compared with the population(s) in the North Sea. Second, we compared geographically distant regions; the Southern part of the North Sea and the Kattegat/Skagerrak region (DK) in the north. Third, sampling covered one location in the indigenous species range (Chesapeake Bay, USA). We expected low levels of population structure among the regions in the Southern part of the North Sea, due to local, annual migration, and increasing genetic differences between individuals from the Southern part of the North Sea, the Baltic Sea, and Chesapeake Bay due to isolation by distance. Our main aims were to develop a GBS procedure for M. leidyi with a focus on marker density and data completeness, and to describe spatial variation in SNP frequencies after the introduction and secondary spread in the North Sea. Specific goals were to

| Sample collection
Mnemiopsis leidyi specimens were collected from 23 sampling locations ( Figure 1). Several stations along the Belgian coast, in the port of Ostend and in the Scheldt estuary were sampled using plankton net trawling (both CalCOFI net and hyperbenthic sledge towed at three knots, mesh size 1,000 µm), scuba diving, or dipnet sampling from June until November 2014 (Table 1)  EcoRI (G|AATTC), EcoT22I (ATGCA|T), and PstI (CTGCA|G). Ten libraries were prepared for each enzyme using a set of ten specimens.

| Library preparation
The performance of the enzymes was evaluated by gel electrophoresis of the restriction digest and a pilot sequencing run. MspI was selected for genotyping all other samples (see Section 3). In short, 100 ng of genomic DNA was digested, and adapters were ligated with T4 DNA ligase. A barcode adapter-common adapter system was used with in-line barcode sequences. Adapter sequences were generated with the tool from Deena Bioinformatics (http://www. deena bio.com/servi ces/gbs-adapters). Barcode sequences were 4 to 9 bp, differed from each other by at least 3 substitutions and contain homopolymers of maximum 2 bp. Restriction digests and adapter ligations were performed according to the enzyme manufacturer's recommendations (New England Biolabs). Individual libraries were purified with 1.6× MagNa magnetic beads (Rohland & Reich, 2012) and eluted in 50 µl 0.1× TE buffer. Short fragments were amplified by

| Read filtering and mapping
GNU parallel (Tange, 2011) was used for parallelization of all following steps. Reads were demultiplexed with GBSX 1.1.5 (Herten, Hestand, Vermeesch, & Houdt, 2015), allowing 1 mismatch in the barcode. Common adapter sequences, restriction site remnants, and intact restriction sites were trimmed with cutadapt 1.9.1 (Martin, 2011). Reads containing ambiguous bases and reads with an average base quality below 30 were discarded with prinseq-lite 0.20.4 (Schmieder & Edwards, 2011). Sequence quality was checked with  Figure S1b). We excluded samples with less than 40 k loci for genotyping of the sample collection ( Figure S1c).

| Genotype calling
Genotypes were called with GATK 3.7 using the HaplotypeCaller  Figure S2). Deviating SNPs with low H o may be fixed variants and were kept.
A substantial portion of our SNP dataset consisted of low-frequency alleles. These are usually discarded for population analysis, because they are deemed uninformative and may contain errors (Roesti, Salzburger, & Berner, 2012). However, careful consideration of MAF filtering was recently recommended (Linck & Battey, 2019). Therefore, we compared the distribution of alleles over the geographical regions for MAF thresholds of minimum 1 and 5% (results not shown). Low-frequency alleles in our datasets (between 1% and 5%) were unevenly distributed over the geographical regions. Therefore, these alleles represented relevant genetic diversity and we applied a MAF threshold of minimum 1% ( Figure S3).

| Outlier detection
To find SNPs that might be under natural selection, or linked to such loci, we used outlier detection. Individual outlier detection methods differ in their ability to identify outliers, and a combination of methods is recommended (Narum & Hess, 2011;Villemereuil, Frichot, Bazin, François, & Gaggiotti, 2014

| Population genetic analysis
The mean allelic richness and mean expected and observed heterozygosity (A r , H e and H o ) were calculated for the geographical regions (Table 1) using Hierfstat 0.04-22 (Goudet, 2005). Population structure was analyzed using the panel of all SNPs and the panel of neutral SNPs, both of datasets including and excluding Chesapeake Bay. Analysis of molecular variance (AMOVA) based on Nei's genetic distance (Nei, 1972) was performed with pegas 0.10 (Paradis, 2010) and StAMPP 1.5.1 (Pembleton, Cogan, & Forster, 2013), using 100 permutations. Pairwise genetic differentiation between the regions was estimated with the unbiased F ST estimator θ (Weir & Cockerham, 1984), and pairwise 95% confidence intervals were calculated with StAMPP using 100 bootstraps. Negative F ST values were set to zero.
The population structure and population assignment of individuals were further described with discriminant analysis of principle components (DAPC) implemented in Adegenet 2.1.0 (Jombart, 2008), using 20 PCs.

| Optimization of GBS genotyping in M. leidyi
Of the six restriction enzymes tested, EcoT22I and EcoRI were ex- MspI. The number of loci shared between samples was determined per enzyme ( Figure S1b). These curves indicated that MspI was more efficient to sequence common loci compared with ApeKI.
Therefore, we decided to use MspI for GBS profiling of the sample collection.

| Distribution of read data and SNPs
We obtained 140 libraries with more than 40 k GBS loci ( Figure S1c) and discarded 35 libraries with less than 40 k GBS loci.  Figure S2), and around 82 k SNPs with a MAF below 1% ( Figure S3). Of the remaining 74 k SNPs, 73% were located within the genes and 34% within the exons predicted by Ryan et al. (2013). No mitochondrial SNPs were detected.

| Identification of outlier SNPs
The SNP thinning algorithm of PCAdapt reduced the SNP dataset including Chesapeake Bay from 74 to 52 k SNPs, and two relevant PCs were retained based on the clustering of samples ( Figure S4a

| Genetic diversity, population structure, and population assignment of individuals
Mean H e and A r showed similar patterns among geographical regions ( Figure 3). The highest diversity was measured for Chesapeake Bay, followed by the port of Ostend. Mean H o was generally lower than mean H e , except for the port of Dunkirk, which also had the highest mean H o of all nonindigenous regions. AMOVA indicated highly significant population structure between the seven regions (p < .001) for both panels (all SNPs or only neutral SNPs) and both the datasets including and excluding Chesapeake Bay (Table 2). Pairwise comparisons revealed significant differentiation of two regions, Chesapeake Bay and Ostend port, from all other regions (Table 3).
Pairwise In contrast, the remaining five regions showed a nearly equal contribution of loci to all individuals ( Figure 5). Two specimens collected in Ostend port were assigned to the North Sea cluster (Figures 4b   and 5).

| Population connectivity among environmentally distinct regions in the Southern part of the North Sea
The recurrent presence of M. leidyi in the Southern part of the North Sea indicates the establishment of a persistent population (Vansteenbrugge et al., 2015). Mnemiopsis leidyi has been observed during the winter in the Scheldt estuary and ports along the Belgian coast, and the presence of larvae and high population densities suggested that reproduction events in these areas might populate the adjacent coastal zone (Vansteenbrugge et al., 2015). Our data provide clear evidence for the presence of two genetically distinct clusters in the Southern part of the North Sea. The first cluster represents a broadly distributed population, present in the Scheldt Estuary, the Belgian coastal zone, and Dunkirk port and is consistent with the suggested exchange of individuals between the Scheldt estuary, ports, and the North Sea. The second cluster represents a secluded population that was only sampled in the port of Ostend.
Adaptation to the local environment is an important driver of population differentiation in marine organisms (Gagnaire et al., 2015). However, since the population structure based on the neutral SNP panels was highly significant, much of the differentiation between these two clusters was driven by neutral processes. Genetic drift and limited gene flow due to physical barriers (e.g., sluices) might promote local differentiation of the Ostend port population.
However, this port was not completely isolated as we identified two individuals of the North Sea population within this location. Also, fluctuating population size of the Ostend port population might reduce the effective population size and promote neutral differentiation (Kalinowski & Waples, 2002;Wright, 1938). It is unlikely that genetic bottlenecks recently occurred since the Ostend port population was more genetically diverse than any other region sampled in the indigenous range. Alternatively, recent reintroduction from distant populations or secondary spread from within the nonindigenous range might explain the difference between both genetic clusters in the Southern part of the North Sea.
Our data further show that Ostend port is probably not an important source area for the North sea, because no individuals collected in the adjacent coastal area were assigned to this population. The two other potential source areas, the Scheldt Estuary and Dunkirk port, were inhabited by the prevalent genetic cluster and are better candidate source populations. However, we did not detect a higher genetic diversity in the Scheldt Estuary and Dunkirk port compared to the coastal area. This can be explained by population expansion during the seasonal reproduction events.

| Population connectivity between geographically distant regions of the North Sea and Baltic Sea
We investigated the population structure of M. leidyi over a large north-south range across the North Sea. We compared specimens collected in the Southern part of the North Sea in 2014 with specimens of the Kattegat/Skagerrak region that were collected in the same year. Although these regions are 500 km apart, both groups were indistinguishable. The lack of population differentiation in marine invertebrates is commonly explained by large population sizes (Deagle et al., 2015) and population connectivity due to limited physical barriers (Cowen & Sponaugle, 2009 Bayha et al., 2015). Also, long-term survival and dispersal in the open ocean were described in the indigenous species range (Bayha et al., 2015). Secondary spread via ocean currents is possibly an important process maintaining homogeneity (David et al., 2015;Grosholz, 1996). to occur, as the area is part of a heavily trafficked maritime transport network (Kaluza, Kölzsch, Gastner, & Blasius, 2010 (Bayha et al., 2015;Ghabooli et al., 2011;Reusch et al., 2010).
This issue was reevaluated in this study using high-density molecular markers.
We showed the presence of extensive population structure This scenario is likely because local extinction and recolonization were also observed in the nonindigenous range (Jaspers et al., 2018).
Secondly, it should be noted that our sample collection included only seven specimens from a single location, while reconstruction of the invasion histories and the identification of native source populations require extensive sampling in the indigenous species range (Gaither, Bowen, & Toonen, 2013;Muirhead et al., 2008).

| CON CLUS ION
Our results show that GBS is a powerful method to investigate fine-

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N
CV, LV, HM, OH, TR, IRR, and KH planned the experiment. CV and TK carried out the laboratory experiments. CV and TR analyzed the NGS data. CV, LV, SD, and KH conducted the population genetic analysis. CV, LV, SD, OH, TR, IRR, and KH wrote the manuscript. All authors read and approved the final manuscript.

DATA ACCE SS I B I LIT Y
The DNA sequences generated for this study are available at the NCBI sequence read archive under SRA accession numbers SRR6950721-SRR6950884.

O PE N R E S E A RCH BA D G E S
This article has earned an Open Data Badge for making publicly available the digitally-shareable data necessary to reproduce the reported results. The data is available at: The DNA sequences generated for this study are deposited in the NCBI sequence read archive under SRA accession numbers SRR6950721-SRR6950884, and will be made publically available upon publication of this manuscript.