Population genetic structure of the deep‐sea mussel Bathymodiolus platifrons (Bivalvia: Mytilidae) in the Northwest Pacific

Abstract Studying population genetics of deep‐sea animals helps us understand their history of habitat colonization and population divergence. Here, we report a population genetic study of the deep‐sea mussel Bathymodiolus platifrons (Bivalvia: Mytilidae) widely distributed in chemosynthesis‐based ecosystems in the Northwest Pacific. Three mitochondrial genes (i.e., atp6, cox1, and nad4) and 6,398 genomewide single nucleotide polymorphisms (SNPs) were obtained from 110 individuals from four hydrothermal vents and two methane seeps. When using the three mitochondrial genes, nearly no genetic differentiation was detected for B. platifrons in the Northwest Pacific. Nevertheless, when using SNP datasets, all individuals in the South China Sea (SCS) and three individuals in Sagami Bay (SB) together formed one genetic cluster that was distinct from the remaining individuals. Such genetic divergence indicated a genetic barrier to gene flow between the SCS and the open Northwest Pacific, resulting in the co‐occurrence of two cryptic semi‐isolated lineages. When using 125 outlier SNPs identified focusing on individuals in the Okinawa Trough (OT) and SB, a minor genetic subdivision was detected between individuals in the southern OT (S‐OT) and those in the middle OT (M‐OT) and SB. This result indicated that, although under the influence of the Kuroshio Current and the North Pacific Intermediate Water, subtle geographic barriers may exist between the S‐OT and the M‐OT. Introgression analyses based on these outlier SNPs revealed that Hatoma Knoll in the S‐OT represents a possible contact zone for individuals in the OT‐SB region. Furthermore, migration dynamic analyses uncovered stronger gene flow from Dai‐yon Yonaguni Knoll in the S‐OT to the other local populations, compared to the reverse directions. Taken together, the present study offered novel perspectives on the genetic connectivity of B. platifrons mussels, revealing the potential interaction of ocean currents and geographic barriers with adaption and reproductive isolation in shaping their migration patterns and genetic differentiation in the Northwest Pacific.


| INTRODUC TI ON
Hydrothermal vents and cold seeps generally occur in tectonically active areas and along continental margins, where neighboring sites are often separated by tens to hundreds of kilometers in the ocean (Le Bris et al., 2016). Despite differences in water temperature and main source of fluid, vent and seep ecosystems are both fueled mainly by chemosynthesis, the conversion of carbon dioxide and/or methane into organic matters in microbes via oxidation of reduced substances, such as hydrogen sulfide, methane, and hydrogen, unlike shallow-water ecosystems that are driven primarily by photosynthesis (Tunnicliffe, Juniper, & Sibuet, 2003). High chemosynthetic primary production enables these ecosystems to support a much higher biomass and abundance of megafauna compared to the surrounding seabed (Levin et al., 2016).
Most marine benthic animals, including those in the deep ocean, have a biphasic life with a pelagic larval stage through which they achieve connectivity across different habitats (Cowen & Sponaugle, 2009). Knowledge on population connectivity of vent and seep animals sheds light on the scale, direction, and frequency of dispersal, which will not only enhance our understanding of the mechanisms shaping their global and regional biogeography, but also provide key insights into their recovery potential in response to environmental and anthropogenic disturbances Kinlan & Gaines, 2003;Miller, Thompson, Johnston, & Santillo, 2018;Rogers et al., 2012).
Deep-sea mussels in the genus Bathymodiolus (Bivalvia: Mytilidae) are one of the most iconic, dominant, and important foundation taxa in chemosynthesis-based ecosystems (Van Dover, 2000). Dense Bathymodiolus mussel beds generate a highly complex Vents and seeps are represented in red and green dots, respectively. Ocean currents were redrawn based on You et al. (2005) and Nan et al. (2011). Three flow patterns of the Kuroshio Current when passing through the Luzon Strait, namely the leaping, the leaking, and the looping path, are indicated by 1, 2, and 3, respectively.  Govenar, 2010;Vrijenhoek, 2010). Bathymodiolus mussels produce planktotrophic larvae capable of migrating to surface water and dispersing across a long distance in ocean currents with very long planktonic larval durations (Arellano, Van Gaest, Johnson, Vrijenhoek, & Young, 2014;McVeigh, Eggleston, Todd, Young, & He, 2017;Young et al., 2012). To date, 30 species including eight fossil species of Bathymodiolus have been reported (MolluscaBase, 2018), although the genus appears to be polyphyletic (Lorion et al., 2014). Among them, mussels of the Northwest Pacific species Bathymodiolus platifrons are considered to be a good candidate for population genetic studies due to their wide horizontal (22° to 35°N) and bathymetric (642 to 1,684 m) distribution ranges, as well as their capability to inhabit both hydrothermal vents and methane seeps (Fujikura et al., 2007;Suess, 2005;Watanabe, Fujikura, Kojima, Miyazaki, & Fujiwara, 2010;http://www.godac.jamstec. go.jp/bio-sample/index_e.html, April 2018).
Previous studies based on one or several mitochondrial genes revealed a lack of genetic differentiation among vent and seep populations of B. platifrons (Kyuno et al., 2009;Miyazaki et al., 2013;Shen et al., 2016). Nevertheless, our recent study based on 9,307 genomewide single nucleotide polymorphisms (SNPs) generated by the type IIB restriction site-associated DNA (2b-RAD) approach detected a clear genetic divergence between individuals of B. platifrons from a vent field in the Okinawa Trough (OT) and a methane seep in the South China Sea (SCS) . However, due to the difference in results based on different genetic marker types and limited sampling locations in these studies, the population structure and migration patterns of B. platifrons in the Northwest Pacific remain puzzling.
Therefore, we carried out a population genetic study of B. platifrons combining both mitochondrial genes and genomewide SNP markers derived from samples from all representative habitats of this species known thus far. Through this study, we aimed to better understand the patterns and causes of genetic connectivity, genetic divergence, and migration dynamics of B. platifrons, the most widely distributed Bathymodiolus mussel in the Northwest Pacific.

| Sample collection and DNA extraction
A total of 110 adults of B. platifrons used in this study were collected from four hydrothermal vents and two methane seeps be- Sagami Bay (SB). These sampling locations span a horizontal distance of more than 2,400 km. Upon arrival on the deck, mussels were either dissected immediately for preservation in 95%-100% ethanol or frozen immediately at −80°C for later dissection. Genomic DNA was extracted from the adductor muscle of each individual using the phenol/chloroform extraction protocol (Sambrook, Fritsch, & Maniatis, 1989). Concentration and purity of the extracted DNA were measured using a NanoDrop ND-1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE, USA), and the integrity of DNA was checked by 1.0% agarose gel electrophoresis.
In addition, the three mitochondrial genes of each individual were concatenated into a single sequence using SequenceMatrix v.1.7.8 (Vaidya, Lohman, & Meier, 2011), and the concatenated sequences were then used to estimate pairwise F ST between local populations using Arlequin, with 10,000 permutations applied to test for significance. Furthermore, to investigate the population structure, TCS haplotype networks were reconstructed for each mitochondrial gene using POPART v.1.7 (Leigh & Bryant, 2015).

| RAD library construction, sequencing, and data filtering
The 2b-RAD method (Wang, Meyer, McKay, & Matz, 2012)  Raw sequencing reads were filtered using a custom Perl script to remove adaptors and the 3-bp terminal sequences of each read (Jiao et al., 2014). Reads with ≥10 nucleotide positions having a Phred quality index <20, without restriction sites, with ambiguous bases, or ≥30% homopolymer regions, were all discarded.

| SNP identification and genetic statistic estimation
Based on the recognition sites of the BsaXI enzyme, 2b-RAD tags were extracted from the draft genome of B. platifrons , which served as a reference for SNP identification. Alignment of the filtered reads of each individual to the reference was achieved by SOAP v.2.21 (Li et al., 2009) using the match mode of "find the best hits" (-M 4), the maximum number of allowed mismatches of two (-v 2), and no repeat allowed (-r 0  (Tajima, 1989) and Fu and Li's D* (Fu & Li, 1993), for each polymorphic loci of each local population, were estimated based on the batch mode implemented in DnaSP v.5 (Rozas, Sánchez-DelBarrio, Messeguer, & Rozas, 2003).

| Outlier SNP detection and characterization
The coalescent method implemented in Arlequin was used to screen for candidate outlier SNPs. It has been reported that the hierarchical island model in this software is more powerful than the finite island model in outlier SNP detection for hierarchically subdivided populations or populations with a recent common ancestry (Excoffier, Hofer, & Foll, 2009). Therefore, based on the geographic affinity and the result of principal component analyses (PCA) carried out using the entire SNP dataset (see section 3.4), two hierarchical island models were assumed in Arlequin to screen for candidate outlier SNPs: Genomic regions of the identified candidate outlier SNPs were determined by mapping the 2b-RAD tags that harbored outlier SNPs against the draft genome of B. platifrons . For those mapped to the genic regions [i.e., coding DNA sequence (CDS), intron, or 3/5′-untranslated region (3/5′-UTR)], their corresponding proteins (i.e., outlier-associated proteins) and annotations  were extracted for functional classification.

| Genetic differentiation estimation and its relatedness to geographic distance
Values of pairwise F ST between local populations were estimated using Arlequin based on the entire SNP dataset and the two outlier SNP datasets, with 10,000 permutations to determine significance.
The Mantel test implemented in the same software was carried out to correlate genetic distance (i.e., values of pairwise F ST calculated based on the entire SNP dataset) and geographic distance (km), also with 10,000 permutations applied to test for significance.

| Population structure and individual assignment based on the entire SNP dataset and the two outlier SNP datasets
A Bayesian approach implemented in STRUCTURE v.2.3.4 (Pritchard, Stephens, & Donnelly, 2000) was applied to detect population structure. The LOCPRIOR model which uses sampling locations as prior information to assist the clustering and the corrected allele frequencies model were selected. The number of genetic clusters K was set from 1 to 6, each with five replicates, using a burn-in of 100,000 followed by 1,000,000 iterations. The optimal K was evaluated using STRUCTURE HARVESTER v.0.6.94 (Earl & vonHoldt, 2012). An optimal alignment of replicate runs at the optimal K was determined using CLUMPP v.1.1.2 (Jakobsson & Rosenberg, 2007), and the graph of genetic structure was visualized using DISTRUCT v.1.1 (Rosenberg, 2004). The PCA implemented in the R package SNPRelate (Zheng et al., 2012) were used to perform individual assignment based on the genetic variation among individuals.
All of the above analyses were carried out using both the entire SNP dataset and the two outlier SNP datasets separately, with only one SNP per locus retained in each dataset to avoid bias derived from potential linkage disequilibrium.

| Introgression analyses based on the outlier SNP dataset
Since two genetic backgrounds were detected in the local population of HK based on the second outlier SNP dataset (see section 3.7), the USEPOPINFO model which uses sampling locations to screen for migrants or hybrids in STRUCTURE was applied to test the hypothesis of mixed ancestry for individuals in HK during the past two generations (i.e., GENSBACK = 2). Two clusters (i.e., K = 2) were defined according to the results of STRUCTURE and PCA based on the second outlier SNP dataset (see section 3.7), with one containing individuals in the S-OT, and the other containing those in the M-OT and SB with the three JR-like individuals in SB excluded. The program was run with a burnin of 100,000 followed by 1,000,000 iterations based on the second outlier SNP dataset (only one SNP per locus retained). Three different values of MIGRPRIOR, which is v, were applied to test whether the results were robust (Pritchard et al., 2000). Individuals with less than 50% posterior probability of having pure ancestry from the designated population (i.e., the S-OT) were considered to be migrants or hybrid descendants (Falush, Stephens, & Pritchard, 2007;Pritchard et al., 2000).
Introgression signature in the local population of HK was further investigated by calculating the hybrid index (h) (Buerkle, 2005) using

| Migration dynamic analyses based on the entire SNP dataset
The web-based software divMigrate-online (Sundqvist, Keenan, Zackrisson, Prodöhl, & Kleinhans, 2016) was applied to infer the directional relative migration patterns using the G ST statistic (Nei, 1973) as a measure of genetic differentiation. The method implemented in this software is based on defining a hypothetical pool of migrants for a given pair of populations and estimating an appropriate measure of genetic differentiation between each of the two populations and the hypothetical pool (Sundqvist et al., 2016). The directional genetic differentiation can be used afterward to evaluate the relative levels of migration between the two populations. The larger of the two relative migration values indicates the population is most likely the source population, whereas the smaller of the two values indicates the population is most likely to be the sink population (Sundqvist et al., 2016).

| SNP identification and genetic statistic estimation
Sequencing of 2b-RAD libraries generated approximately 2.2 billion reads in total, with a mean of 19.9 million reads per individual.
Quality control reduced the data to a mean of 15  Figure S1), showing no evidence for isolation-by-distance throughout the known distribution range of B. platifrons.

| Population structure and individual assignment based on the entire SNP dataset
STRUCTURE analyses based on the entire set of 5,458 SNPs (only one SNP per locus retained) revealed the occurrence of two genetic groups (i.e., optimal K = 2) of B. platifrons in the Northwest Pacific (Supporting Information Figure S2a
By defining a hierarchical island model composed of the S-OT, the M-OT, and SB, a total of 138 candidate outlier SNPs associated with 125 loci (p < 0.01) were identified (Figure 4b), with the allele frequency of each outlier SNP shown in Supporting Information  Table S12).
Although the number of candidate outliers was comparable under the two assumed hierarchical island models, only seven SNPs were shared between these two outlier SNP datasets (Figure 4a,b), indicating the two hierarchical island models relied on different genetic architecture.   (Supporting Information Figure S3).

| Signature of introgression in HK based on the outlier SNP dataset
By using the USEPOPINFO model in STRUCTURE analyses based on the second outlier SNP dataset containing 125 outlier SNPs (only one SNP per locus retained), eight individuals (53.3%) in the local population of HK were identified to have mixed ancestry as either migrants or hybrid descendants (Table 5) (Table 6). Among them, the eight individuals showing mixed ancestry revealed by STRUCTURE analyses were estimated to have a moderate to large values of h ranging from 0.3643 to 0.7033, which indicated that these individuals were more likely hybrid descendants rather than migrants.  (Fu, 1997;Tajima, 1989). However, the dual analyses of Tajima's D and Fu and Li's D* statistics on SNP markers paved an argument for selective sweeps at the mitochondrial genes, as only a few SNP markers agreed with the scenario of population expansion. In addition, the scenario of selective sweeps at the mitochondrial genes also fitted well with the observed lack of genetic differentiation between local populations of B. platifrons as revealed by the mitochondrial genes. Therefore, we focused our discussion below on the results obtained by using genomewide SNPs.

| Co-occurrence of two cryptic semi-isolated lineages of B. platifrons in the Northwest Pacific
Oceanographic connection between the semi-enclosed marginal SCS and the Northwest Pacific is mainly achieved through the seasonal intrusions of the Kuroshio Current, the North Pacific Intermediate Water (NPIW), as well as the Pacific Deep Water through the Luzon Strait (Nan et al., 2011;Qu, Girton, & Whitehead, 2006;You et al., 2005). The Kuroshio Current is a dominant and strong warm surface current in the Northwest Pacific that originates from the North Equatorial Current and runs northeastward along the Philippines coast (Andres et al., 2015). When passing by the Luzon Strait, a branch of the Kuroshio water can intrude into the SCS in a strong anticyclonic circulation, which is known as the "looping path," and then flows out of the SCS to join its mainstream Such fine-scale population genetic structure could be explained as a result of natural selection in response to local adaptation (Gagnaire et al., 2015;Milano et al., 2014). Furthermore, the topography of the OT may have also played a key role in shaping the observed population subdivision. The OT is a back-arc rifting basin formed behind the Ryukyu trench-arc system. The M-OT is located at the transitional region between the deeper S-OT and the shallower northern OT, which is associated with the occurrence of intra-trough grabens (Ikegami, Tsuji, Kumagai, Ishibashi, & Takai, 2015). These geological settings indicate the existence of subtle geographic barriers between the S-OT and the M-OT, which may decrease the dispersal success of B. platifrons larvae across different regions of the OT.
In addition, the ancestry inference of STRUCTURE analyses and the moderate to large values of h revealed that eight individuals in the local population of HK to be hybrid descendants of individuals in the S-OT mating with those originating from the M-OT and SB.
This result indicated that HK may represent a contact zone for larval dispersal and genetic exchange of B. platifrons in the OT-SB region.
Individuals of B. platifrons in HK inhabited the deepest vent field included in this study, and this vent field is contained within a caldera.
Therefore, it is possible that the introgression signature observed here was related to such bathymetry and topography, which may serve to trap mussel larvae from different genetic groups.

| Directional migration patterns of B. platifrons in the Northwest Pacific
The

| CON CLUS I ON S AND PER S PEC TIVE S
By using genomewide SNPs rather than mitochondrial genes, two cryptic semi-isolated lineages of B. platifrons in the Northwest Pacific were identified in this study, which may have been formed due to the barrier effect of the Luzon Strait or the contact zone been trapped by it. Among them, one lineage is mainly distributed in the semienclosed marginal SCS, while the other is mainly distributed across the OT-SB region. In addition, a fine-scale population structure was detected for B. platifrons in the OT-SB region, which might be due to the existence of subtle geographic barriers between the S-OT and the M-OT. The occurrence of three individuals with a high genetic affinity to those in JR in the OH site of SB remains puzzling and warrants further investigation to test whether this is related to differences in habitat preferences (i.e., methane seeps vs hydrothermal vents). The mixed genetic backgrounds detected in the local population of HK in the S-OT indicated that this area may represent a contact zone for larvae from different locations in the OT-SB region.
Moreover, the local populations in the S-OT (especially DK) may serve as a potential source of B. platifrons in the Northwest Pacific.
Overall, the present study has enhanced our understanding of the genetic connectivity, fine-scale genetic structure, and migration patterns of B. platifrons. Together with several recent studies (Breusing et al., 2016(Breusing et al., , 2017, this study exemplified the usefulness of SNP data especially from high-throughput sequencing for population genetic studies of chemosynthetic ecosystems. More importantly, the results from this study will lay a foundation for an effective determination of biogeographic regions, establishment of informed management plans, and designation of deep-sea reserves in the Pacific Ocean, in preparation for an upcoming era of deep-sea resource exploitation (Miller et al., 2018;Sigwart, Chen, & Marsh, 2017).

ACK N OWLED G EM ENTS
We thank the captains and crews of the R/Vs Natsushima, Kaiyo, and

DATA A R C H I V I N G S TAT E M E N T
The mitochondrial gene sequences were deposited in GenBank of National Center for Biotechnology Information (NCBI) under the accession numbers of MH389991 to MH390100 for cox1, MH390101 to MH390210 for nad4, and MH390211 to MH390320 for atp6.