Substantial gene flow caused by long-term translocation between natural bank populations of the Peruvian scallop ( Argopecten purpuratus ) is supported by RAD-Seq analyses

The Peruvian scallop ( Argopecten purpuratus, Lamarck 1989) is a marine bivalve of high commercial value in the aquaculture industry, with wild populations distributing from northern Peru to Chile. Growing demand for it in the world aquaculture markets and limited availability of hatchery-based seeds caused long-term seed translocations among wild populations to recover depleted local populations and for production needs. We investigated long-term translocation effects on the genetic diversity and structure of wild populations using next-generation RAD sequencing. We sam-pled individuals from Sechura, Lobos de Tierra, Samanco, and Bahia Independencia in Peru, and La Rinconada in Northern Chile. We identified 8,345 polymorphic RAD loci and 24,218 SNPs for the five populations. We estimated high observed heterozygosity for all populations and high SNP frequency compared to similar studies on marine bivalves. We detected no spatial divergence among populations in Peru (pairwise F ST from 0 to 0.003), but strong differentiation with the population in Chile. Migration rate estimates suggested asymmetric directionality of seed translocation. Overall, our results support a remnant effect of an intense historic translocation and ongoing gene flow among wild populations in Peru, chal-lenging the identification of outlier loci and certification of sustainable origin of cultured scallops using genetic markers. detected a genetic homogeneity among Peruvian populations and a strong differentiation with a population in the north of Chile. Our results suggest that the genetic similarity among Peruvian populations is a footprint of long-term man-made seed translocations, beyond what would be expected by larval dispersion and oceanographic factors. Our study indicates that the traceability of individuals from wild or aquaculture populations using molecular markers of population like kinship or inbreeding as an alternative.

. It has a long planktonic larval stage and inhabits sheltered bays along the coasts of Peru and Chile, from Paita (5.1 S) to Valparaiso (33.1 S) (von Brand, Abarca, Merino, & Stotz, 2016;Wolff & Mendo, 2000). This species has been subject to local translocations for decades, fueled by the expansion in Peruvian scallop harvest during the 1980s (Mendo, Wolff, Carbajal, Gonz ales, & Badjeck, 2008). In 1982-1983 Niño Southern Oscillation (ENSO) event caused the overgrowth of A. purpuratus populations, resulting in overexploitation, jeopardizing the integrity of natural bank stocks. In response, the Peruvian government promoted the adoption of aquaculture practices to guarantee the sustainability of this resource. But, the limited availability of hatchery-produced seeds led to periodical extractions and translocations of young individuals from healthy natural banks to recover depleted local populations (Mendo et al., 2008). This practice was aimed to promote the recovery of diminished natural stocks and to facilitate scallop production in nearby hatcheries.
Translocation of individuals among wild populations of marine species with commercial value is a widespread practice with consequences that are not yet fully understood (Lemer & Planes, 2012). Translocation is defined as the manmade movement of individuals from a given population to another within the species range of distribution (Beaumont, 2000). This practice serves specific purposes, the main being the recovery of depleted populations for securing harvest (Carlton & Mann, 1996). Evidence indicates that estimates and distribution of genetic diversity and relatedness can change in wild or natural bank populations as a result of translocations. For example, a genetic homogenization in wild populations of previously distinct black-lip pearl oyster (Pinctada margaritifera) was found following 10 years of translocations (Arnaud-Haond et al., 2004). A recent study in the same species reported high levels of genetic diversity in wild populations adjacent to farmed populations because of "gene leaking" (Lemer & Planes, 2012).
The intensity and directionality of translocations of Peruvian scallops are poorly documented. An early study investigated the level of genetic structure for three Peruvian scallop populations in Peru (Sechura, Samanco, and Bahia Independencia). They found a significant, albeit weak, population differentiation using nine microsatellite and mitochondrial markers (Marín, Fujimoto, & Arai, 2013). The authors suggested that, despite intense translocations, at least one population retained part of its genetic signature (Bahia Independencia) because of asymmetric translocations and to the effect of geographic distance for realized larval dispersal. A recent study investigated the genetic structure in Peruvian scallop populations spanning its entire geographic distribution using two mitochondrial markers (COI and Cytb genes) and found a slightly phylogeographical structure between Peru and Chile populations (Acosta-Jofré et al., 2020). Their results suggested that Peruvian scallop populations experienced a recent population range expansion, with moderate to low levels of gene flow after the expansion. Both studies attributed the observed patterns to high man-mediated gene flow through seed translocation, and suggested that further studies with higher marker resolution are needed to understand population structuring.
Genome-wide molecular markers could provide the resolution power needed to analyze and detect structure in large populations with high gene flow (Waples, 1998). Genome-wide markers, particularly those generated by genotyping-by-sequencing approaches (Davey et al., 2011) such as Restriction-Site Associated DNA (RAD) markers (Baird et al., 2008) can detect variation and resolve fine-scale population structure for species with large dispersal capacity and potential high gene flow, for example, mosquitoes (Raši c, Filipovi c, Weeks, & Hoffmann, 2014) and American lobster (Benestan et al., 2015). We evaluated the genome-wide variation in individuals of the Peruvian scallop from five natural bank populations along their distribution range using thousands of SNP markers obtained from next-generation RAD sequencing. The objective was to investigate the influence of long-term translocations on the current genetic structure of Peruvian scallop wild populations. We (1) estimated levels of genomic diversity among and within wild populations, (2) described the spatial population structure, and (3) assessed the level and directionality of gene flow among them.
Despite its major role in the Peruvian aquaculture industry, the sustainable harvest of Peruvian scallop is not totally guaranteed. To this day, the Peruvian aquaculture industry continues to rely on wild seed extraction, with much of the extracted seed being moved to nearby hatcheries or distant ones adjacent to natural banks (E. Hanschke, personal communication). This hinders the performance of Peruvian producers in international markets (I. Hanschke, personal communication). Traceability tools, such as genome-wide molecular markers, could provide a sustainable origin certification of aquaculture species, demanded by the global bivalve market (Nielsen et al., 2012).
In this study, we also discuss perspectives on the use of genetic markers and population-wide genetic descriptors (i.e., heterozygosity and inbreeding coefficients) for traceability.  Table 1). We obtained approximately 200 mg of adductor muscle tissues and stored it in RNAlater (Ambion, Thermo Fisher Scientific, Waltham, MA) at room temperature for the first 24 hr and then stored permanently at À20 C. We isolated whole genomic DNA using the GeneJET Genomic DNA Purification Kit (Thermo Fisher Scientific, Waltham, MA) following manufacturer's instructions. DNA concentration (ng/μl) was measured using the Qubit dsDNA BR Assay Kit in a Qubit 2.0 fluorometer (Invitrogen, Thermo Fisher Scientific, Waltham, MA). Genomic DNA integrity was verified by agarose gels electrophoresis.

| Genomic libraries, de novo assembly, and dataset filtering
To construct the Genomic RAD libraries, we followed a recommended method (Etter, Bassham, Hohenlohe, Johnson, & Cresko, 2011) with minor modifications. We digested about 1 μg of high-quality genomic DNA per sample with a low-frequency endonuclease restriction enzyme ("Sbf1", Thermo Fisher Scientific). We ligated the digested fragments to unique modified Solexa 5 bp-barcode-labeled P1 adapters (see Data S1) using the following reaction parameters: 2 μl P1 adapter, 10 mM rATP, T4 1X DNA ligase buffer (50 mM NaCl, Tris-HCl 10 mM pH 7.5), and 200 CEU T4 DNA Ligase; incubating overnight at 4 C. We pooled the P1-adapter-ligated fragments bearing unique barcode identifiers into four libraries (15 individuals per library) for further fragmentation using a COVARIS S220 Focused-ultrasonicator (Woburn, MA). We recovered fragments ranging from 300 to 600 bp from a low-melting-point agarose gel electrophoresis using the GeneJet Gel Extraction kit (Thermo Scientific). We repaired fragment ends with the Fast DNA End Repair Kit (Thermo Scientific), and added a 3'dA overhang with a Klenow Fragment Exo À Kit (Thermo Scientific). We then ligated a P2 adapter (divergent "Y" adapter; see Data S2) to all libraries using the following reaction parameters: 1 μl P2 adapter, 200 CEU DNA T4 Ligase, and 1X T4 Ligase Buffer; incubating 1 hr at room temperature. To verify successful ligation of both adapters, we run a PCR pre-enrichment reaction using P1 and P2 specific primers (see Data S3). Reaction setup was as follows: 1X Phusion Master Mix with GC buffer (Thermo Scientific), primer P1 (0.5 μM), primer P2 (0.5 μM), BSA (10 μg), and 5 μl library DNA. Note: n = number of individuals; Ho, observed heterozygosity; He, expected heterozygosity; π, nucleotide diversity; Fis, inbreeding coefficient. Standard deviation (SD) is indicated within parenthesis.
Amplification parameters were as follows: 30 s at 98 C, 25 cycles of 10 s at 98 C, 30 s at 65 C, and 30 s at 72 C, followed by a 5 min extension at 72 C. Amplification products were verified in a 1% agarose gel electrophoresis.
Successfully ligated libraries were enriched by PCR, nine PCR replicate reactions per library, using the same PCR preenrichment reaction setup and the following amplification parameters: 30 s at 98 C, 18 cycles of 10 s at 98 C, 30 s at 65 C, and 30 s at 72 C, followed by a 5 min extension at 72 C. We recovered amplified fragments ranging from 300 to 600 bp from a low-melting-point agarose gel electrophoresis using the GeneJet Gel Extraction kit (Thermo Scientific) into 30 μl elution buffer. We estimated library concentration in nM using a Qubit 2.0 Fluorometer. RAD-seq libraries were pooled and run in two lanes of an Illumina HiSeq2500 sequencer at the University of Chicago Genomics Facility (USA), with a single end 100 bp kit using fast mode.
We used FastQC (Andrews, 2010) to assess the quality of RAD sequences. Sequences with Phred quality scores greater than 20 were kept. All sequences were pruned to 80 bp to minimize variability in the sequence quality. For sample demultiplexing, de novo assembly, and RAD loci and SNPs identification, we used the Perl modules implemented in Stacks v1.10 Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013). For the de novo assembly and to generate a catalog of usable loci, we established the following parameters: (1) a minimum depth of six RAD reads per stack or locus (Àm), (2) a maximum of two mismatches to add reads (secondary) to a given stack, and (3) a maximum number of three mismatches allowed (ÀM) between loci within each individual (Benestan et al., 2015). The resulting genotypes were combined using the populations module in Stacks, filtered to include all loci shared for at least 75% of individuals (r = 0.75 from each population [p = 5] and minimum allele frequency equal or greater than 0.01, to increase the opportunity to identify rare alleles or singletons). To generate final output files for downstream analyses, we used a combination of different output formats such as VCF, Genepop, and Structure (Pritchard, Stephens, & Donnelly, 2000).

| Molecular estimates and genetic analyses
We calculated basic molecular estimates at the population level (i.e., observed heterozygosity-Ho, expected heterozygosity-He, and inbreeding coefficient-Fis) using the output provided by the populations module from Stacks.
We analyzed deviations from Hardy-Weinberg equilibrium using 999 iterations with GenoDive software v2.0 (Meirmans & Van Tienderen, 2004). The test was conducted by locus for each population, and a Bonferroni correction (Rice, 1989) for multiple comparisons was applied, starting with the lowest p value reported. To investigate how genetic diversity is distributed among populations, we conducted an analysis of molecular variance (AMOVA) and tested for population structure estimating the Weir and Cockerham's F ST unbiased estimator (Weir & Cockerham, 1984) between all population pairs. We used GenoDive with 999 permutations and a Bonferroni correction (Rice, 1989) to adjust p values for these calculations. To further test for significant structure among populations, we conducted a principal components analysis (PCA) using the R package adegenet version 2.0.1 (Jombart, 2008;Jombart & Ahmed, 2011), implemented in the software R within Rstudio 1.0.136 (Rstudio Team, 2015). We retained 5-axis components to maintain the original number of putative populations and to observe discrimination.
To investigate the level and directionality of historical translocations between natural bank populations, we estimated the number of migrants (N m ) using a coalescent approach implemented in the program Migrate-n v3.6.4 (Beerli & Felsenstein, 2001). Migrate-n uses Bayesian inference to estimate effective population size (variable θ) and migration rates (variable M), which can be used to calculate N m (N m = M Â θ). Migrate-n was run with the full migration Matrix model, adjusting the transition/transversion rate value to 2.0, for nuclear DNA and otherwise using default parameters. The dataset for this analysis included loci that were shared by all individuals from all populations.
SNP markers suggested to be under positive selection by BayeScan (Foll & Gaggiotti, 2008) were removed from the dataset for this analysis.

| RESULTS
We developed and sequenced four genomic RAD libraries that yielded approximately 260 million sequencing reads, corresponding to single raw reads. On average, <1% of these sequences had ambiguous barcodes, <1% were low quality sequences, and less than 7% had ambiguous RAD loci. These estimates varied among individuals and among populations (see Data S4 and Data S5). After the initial screening and filtering, 225,410,859 sequences in total were retained for further analyses. The number of retained sequences varied among individuals and populations. The number of sequences per individual varied between 498,875 and 9,743,838 (mean value: 3,954,576.47, SD: ± 2,101,529.36) and the mean percentage of retained sequences was 85% (see Data S4). The average Phred score value was 30. All sequences were pruned to 80 bp to minimize variability in the sequence quality. The de novo assembly identified 9,593 RAD loci. After filtering the data to obtain only polymorphic loci for at least 75% of individuals from each population, we recovered 8,345 polymorphic RAD loci (with ≥1 SNP and 246 average depth read) and 24,218 SNPs, which were used for genetic diversity and population structure analysis. The number of loci was reduced to 3,027 (yielding 7,608 SNPs) when including only loci shared by all individuals from all populations (zero missing data) and removing outliers SNPs (outlier SNPs = 4) for gene flow estimates.

| DISCUSSION
Despite being the most economically valuable bivalve aquaculture species in Peru, up to date, it is impossible to differentiate batches of exported wild Peruvian scallops from those produced in a sustainable way. This is mostly because of the lack of a reliable regulation system and/or methods for traceability and certification of origin for hatchery produced batches. Genetic tools offer an opportunity for reliable traceability of seafood (Bernatchez  Marín et al., 2013). These markers were useful in determining genetic differences between populations, but they lack the power to detect small, but significant, genetic differences in nearby and large populations (Waples, 1998 diversity levels, our estimates of SNP frequency for A. purpuratus were also higher compared with other bivalve species. We detected, on average, a SNP every 20 bp, compared with estimates of 1SNP/30 bp for the Chilean mussel and 1 SNP/40 bp for the Pacific oyster (Sauvage, Bierne, Lapegue, & Boudry, 2007). Our estimates agree with high polymorphic levels previously reported for bivalves.
Overall, the genetic diversity did not differ significantly among populations, albeit the population of Bahia Independencia, which exhibited the highest genetic diversity values (Ho = 0.2044). This agrees with a previous study where the population of Bahia Independencia exhibited higher genetic diversity than populations from Sechura and Samanco (Marín et al., 2013). Sechura and Samanco populations had the lowest genetic diversity estimates, which is in agreement with a previous study (Marín et al., 2013). These two populations are adjacent to hatcheries, so the low levels of diversity could be the result of migrants coming from them. A similar scenario has been observed previously for the black-lip pearl oyster (Lal et al., 2016). Low genetic diversity in hatchery populations is expected because of the reproductive dynamics in captivity, where a few progenitors are responsible for all the individuals produced, thus promoting allele loss. This has been observed in populations of captive invertebrates (Miller et al., 2013).
Samanco (Fis = 0.0655) and La Rinconada (Fis = 0.0556) ( Table 1) exhibited the highest inbreeding coefficient while Lobos de Tierra and Sechura exhibited the lowest values. The surplus of homozygotes can be explained by selffertilization caused by closed circulation in these bays. The Peruvian scallop is a functional hermaphrodite with external cross-fertilization and sperm release before the oocytes, to prevent self-fertilization (Uriarte, Rupp, & Abarca, 2001); the areas in Samanco and La Rinconada present gyres or gyre-like currents (Avendaño, Cantill anez, Thouzeau, & Peña, 2007;Tenorio, 2016), which promote larval retention. In contrast, the lower inbreeding values detected in Lobos de Tierra Islands and Sechura Bay (Table 1)  Except for Flores-Valiente et al. (2019), no other study has assessed the potential role that marine currents in Peru play in larval transport and retention (Barahona, Vélez-Zuazo, Santa-Maria, & Pacheco, 2019). In any case, we would expect to detect population structure at least between Bahia Independencia and the rest of the sites. Previous genetic distinction for this population was observed by Marín et al. (2013), who suggested that the geographic and oceanographic characteristics of this bay strongly limited natural gene flow with other populations.
We do not overrule the homogenizing role of larval dispersal facilitated by currents and oceanographic events (Miller et al., 2013), in combination with the large population size and high fecundity of marine bivalves (Cano, Shikano, Kuparinen, & Merilä, 2008 (Barahona et al., 2019). The recolonization of extinct natural beds (Wolff & Mendo, 2000) and the formation of temporal intermediate aggregations (Ysla, 2009)  In any case, we detected an asymmetrical gene flow among the Peruvian and La Rinconada populations. The strong gene flow observed from the Chilean population agrees with the cold northward Humboldt Current (PCC). We found that mean northward gene flow is stronger than mean southward gene flow, which suggests that the effect of ENSO warm periods is weaker compared with the constant flow of the Peruvian Coastal Current for this species.
Among Peruvian natural banks, we found evidence indicating that Lobos de Tierra is a source of migrants, which is consistent with reports of that location being a common source of seeds for translocations (Mendo et al., 2008).
Regardless of the power of using thousands of genome-wide SNPs, we did not detect fine-scale genetic structure among natural banks. Instead, we found a seascape of genetic homogenization for wild populations in Peru and differentiation from the most distant population, in northern Chile, likely the result of long-distance dispersal limitations. The lack of population structure and outlier loci to distinguish wild populations limits the opportunity of using genomic tools to trace Peruvian scallop stocks back to their population of origin. However, the low heterozygosity observed in natural bank nearby hatcheries suggests that genetic population descriptors such as kinship or inbreeding index could be used as an alternative to trace products back to wild stocks or to aquaculture hatcheries (i.e., bred in captivity).

| CONCLUSION
We detected a genetic homogeneity among Peruvian populations and a strong differentiation with a population in the north of Chile. Our results suggest that the genetic similarity among Peruvian populations is a footprint of longterm man-made seed translocations, beyond what would be expected by larval dispersion and oceanographic factors.
Our study indicates that the traceability of individuals from wild or aquaculture populations using molecular markers would be difficult, but we do not overrule the use of population descriptors, like kinship or inbreeding index, as an alternative.

ACKNOWLEDGMENTS
We are grateful to Dr. Aldo Pacheco, who facilitated access to scallop samples from La Rinconada (

CONFLICT OF INTEREST
The authors declare no conflicts of interest of any sort in the production and publication of this study.

AUTHOR CONTRIBUTIONS
Ximena Velez-Zuazo: Conceived and designed the project, wrote the research proposal, coordinated field data collection, analyzed the data, and wrote the study. Sergio P. Barahona