Comparing Pool‐seq, Rapture, and GBS genotyping for inferring weak population structure: The American lobster (Homarus americanus) as a case study

Abstract Unraveling genetic population structure is challenging in species potentially characterized by large population size and high dispersal rates, often resulting in weak genetic differentiation. Genotyping a large number of samples can improve the detection of subtle genetic structure, but this may substantially increase sequencing cost and downstream bioinformatics computational time. To overcome this challenge, alternative, cost‐effective sequencing approaches, namely Pool‐seq and Rapture, have been developed. We empirically measured the power of resolution and congruence of these two methods in documenting weak population structure in nonmodel species with high gene flow comparatively to a conventional genotyping‐by‐sequencing (GBS) approach. For this, we used the American lobster (Homarus americanus) as a case study. First, we found that GBS, Rapture, and Pool‐seq approaches gave similar allele frequency estimates (i.e., correlation coefficient over 0.90) and all three revealed the same weak pattern of population structure. Yet, Pool‐seq data showed F ST estimates three to five times higher than GBS and Rapture, while the latter two methods returned similar F ST estimates, indicating that individual‐based approaches provided more congruent results than Pool‐seq. We conclude that despite higher costs, GBS and Rapture are more convenient approaches to use in the case of species exhibiting very weak differentiation. While both GBS and Rapture approaches provided similar results with regard to estimates of population genetic parameters, GBS remains more cost‐effective in project involving a relatively small numbers of genotyped individuals (e.g., <1,000). Overall, this study illustrates the complexity of estimating genetic differentiation and other summary statistics in complex biological systems characterized by large population size and migration rates.

Uncovering population genomic structure of species characterized by large effective population size and/or high migration rate may be challenging since this often translates into a lack or very weak genetic differentiation and spatial genomic structure (Gagnaire et al., 2015;Holliday et al., 2017;Neale & Kremer, 2011;Waples, 1998). Recent studies suggested that increasing the number of samples and markers genotyped can improve the detection of subtle genetic structure in nonmodel species such as the polar bear (Viengkone et al., 2016), the candlefish (Candy et al., 2015), the American lobster (Benestan et al., 2015), the silvery lightfish (Rodriguez-Ezpeleta, Álvarez, & Irigoien, 2017), the Tasmanian devil (Hendricks et al., 2017), or the sea cucumber (Xuereb et al., 2018).
Yet, the genotyping of a large number of samples and markers may vary widely depending on the selected NGS protocol. Therefore, choosing the most appropriate NGS genotyping approach sometimes remains challenging. Roughly speaking, the lower the extent of genetic differentiation is, the higher the number of required samples and markers is to obtain narrow confidence intervals (CI) around estimates of genetic differentiation (Patterson, Price, & Reich, 2006). This may substantially increase the analytical cost and computational time (Shendure & Aiden, 2012). To overcome this challenge, alternative protocols to the classic way of sequencing individuals separately (i.e., each individual is sequenced with a unique barcode) have been developed either by pooling DNA samples such as Pool sequencing (Futschik & Schlötterer, 2010;Lynch, Bost, & Wilson, 2014;Schlötterer, Tobler, Kofler, & Nolte, 2014) or by reducing genomic complexity using sequence capture methods (Ali et al., 2016;Boucher, Casazza, Szövényi, & Conti, 2016;Hoffberg et al., 2016;Jones & Good., 2016).
Despite these promising alternatives, each approach has its own strengths and weaknesses, related to the distribution of polymorphic loci, the cost of library preparation and sequencing, and the accuracy of variant calling and genotyping. All of these factors may ultimately affect demographic inferences (Cutler & Jensen, 2010;Harvey, Smith, & Glenn, 2016). For instance, Pool sequencing (hereafter Pool-seq) does not provide individual genotypes, whereas this information is essential for some applications such as assignment tests and linkage disequilibrium estimation (Cutler & Jensen, 2010). On the other hand, and pending on specific research objectives, quantifying genetic parameters where individual information is required may not always be necessary. In such cases, Pool-seq has already proven to be an effective and accurate approach to investigate genomewide variations of terrestrial and marine high gene flow species such as oaks (Quercus spp., Leroy et al., 2018), poplar (Populus alba, Stölting et al., 2015;Populus alba, Populus tremula, Christe et al., 2016), Chinese chestnut (Castanea mollissima, LaBonte, Zhao, & Woeste, 2018), sticklebacks (Gasterosteus aculeatus, Guo, DeFaveri, & Sotelo, 2015), Atlantic herring (Clupea harengus, Guo, Li, & Merilä, 2016;Lamichhaney et al., 2012;Martinez Barrio et al., 2016), Atlantic cod (Gadus morhua, Karlsen et al., 2013), and the copepod (Tigriopus californicus, Lima & Willett, 2018). This latter approach was also successful to detect selection in the model species Drosophila spp. characterized by very large effective population size (e.g., Barghi et al., 2018;Bastide et al., 2013;Kapun et al., 2014). Furthermore, Pool-seq offers the possibility to genotype a large number of individuals at a much lower cost than individual sequencing.
In contrast to Pool-seq, sequence capture approaches enable the sequencing of a large number of samples while preserving genotypic information at the individual level (Andrews et al., 2016). Originally, this latter method targets known genomic regions such as exons, which limits the number and diversity of DNA sequences being studied (Harvey et al., 2016). For the past few years, sequence capture approaches have undergone a new upswing, in particular with the concept of targeted sequence enrichment that couples the power of sequence capture with NGS technology (Grover, Salmon, & Wendel, 2012). More recently, methods combining sequence capture enrichment and reduced representation libraries have been proposed (Ali et al., 2016;Boucher et al., 2016;Hoffberg et al., 2016;Suchan et al., 2016).
In this study, due to its ease of use with GBS libraries, we focused our work on the so-called Rapture protocol, which represents a highly flexible genotyping method protocol allowing thousands of individuals to be sequenced simultaneously with a high sequencing depth (Ali et al., 2016). However, this method requires known genomic sequences of interest (e.g., reference genome or targeted sequence information) in order to design capture probes (Ali et al., 2016;Jones & Good, 2016). The design of capture probes is a critical step since it may influence the quality of all the genomic data collected. Furthermore, this step can be potentially costly in terms of probes development and synthesis. Although targeted loci should be selected according to the experimental needs of a given project, this selection step may be hampered by the occurrence of paralogous or highly polymorphic sequences (Ali et al., 2016).
In sum, each of these NGS protocols (GBS, Pool-seq, and Rapture) offers different benefits and limitations and all may be relevant to the field of population genomics in species exhibiting low genetic differentiation. To date, there is no study that have already explored and compared their relative efficiency in resolving weak population structure in nonmodel species. In this context, the goal of this study was to assess the consistency of two cost-effective sequencing approaches, Poolseq and Rapture, in documenting genetic population structure as well as estimating allele frequency and derived statistics in a high gene flow and nonmodel species, the American lobster (Homarus americanus), comparatively to a conventional GBS approach.

| Sampling material and DNA extraction
A total of 288 egg-bearing lobster females were collected between May and August 2012, from six locations (N = 48 individuals per location) across the Northeast Atlantic ( Figure 1; see Benestan et al., 2015 for details). Half of the second walking leg of each individual was collected and preserved in 95% EtOH until DNA extraction. A previous study on the American lobster using RAD-seq revealed the existence of both genetic structure and significant isolation by distance (IBD) (Benestan et al., 2015). The authors identified two main distinct units as (a) southern region (i.e., from USA Maine to midsouth of Nova Scotia shelf) and as (b) northern region (i.e., from midnorth of Nova Scotia shelf to the north of Newfoundland, including all the Gulf of St. Lawrence samples). Given this, we selected six sampling sites spread over each of these two pre-identified regions (i.e., three northern sites, Gaspé (GAS), Sidney Bight (SID), and Triton (TRI), and three southern sites, Lobster bay (LOB), Saint-John Harbour (SJH), and The Wolves/Deer island (THE); Figure 1). Genomic DNA was extracted using salt extraction (Aljanabi & Martinez, 1997) with an additional RNAse treatment following the manufacturer protocol.
Genomic DNA quality was checked on 1% agarose gel, and specimens with too many smears (i.e., indicating degradation of DNA) were excluded from the entire dataset. Genomic DNA was then quantified using a NanoDrop instrument, roughly diluted, and final DNA concentrations were normalized to 20 ng/µl based on fluorescence reads values (AccuClear™ Ultra High Sensitivity dsDNA Quantitation Solution).
F I G U R E 1 Map of lobster sampling locations. GAS, Gaspé; LOB, Lobster bay; SID, Sidney Bight; SJH, Saint-John Harbour; THE, The Wolves/Deer island; TRI, Triton

| Library preparation
Individual GBS library was prepared following Mascher, Wu, and Amand (2013) and detailed in Moore et al. (2017). Briefly, genomic DNA was double-digested using the PstI and MspI restriction enzymes followed by ligation to a unique barcoded adapter for each individual. For GBS-based libraries, each individual was labeled with a unique barcode and 96 individuals were pooled for size selection, PCR, and sequencing (see sequencing details below).
For Pool-seq library preparation, we used 2 µl DNA of equimolar concentrations for each individual of a given location. The 48 individuals of a given sampling site were pooled and barcoded using the same nucleotide sequence to identify sample origin.
Here, the sample size of pooled DNA samples is a critical parameter, which will ultimately influence the accuracy of allele frequency estimates (Anderson, Skaug, & Barshis, 2014;Fracassetti, Griffin, & Willi, 2015;Futschik & Schlötterer, 2010;Gautier et al., 2013;Lynch et al., 2014;Rode et al., 2017;Schlötterer et al., 2014). Therefore, maximizing the number of samples contributing to the pool  can minimize variance at the individual level potentially caused by technical errors (e.g., pipetting, DNA concentration estimations) or quality of samples (e.g., tissues or DNA quality). Moreover, several authors advocates that replication of pools may help to reduce the error rate in SNP calling (Gautier et al., 2013;Schlötterer et al., 2014). Here, our pool size (i.e., 48 samples) was selected according to Schlötterer et al. Pool-seq libraries were digested using the same restriction enzymes and protocol described above for GBS-based libraries. We sequenced 16 pool libraries in a first sequencing Ion Proton chip containing four sampling sites (i.e.,4xGAS,4xLOB,4xSID,4xTRI) and then six pool libraries in a second chip composed of the two remaining sampling sites (i.e., 3xSJH and 3xTHE).
First, custom probes were designed from a de novo reference catalog of 9,818 loci genotyped during the previous GBS library sequencing run (see details below). The probe library was purchased from Arbor Biosciences™, and we followed the Mybait protocol supplied with the capture kit. In order to explore the potential offered by this method that aims at reducing the sequencing costs relative to a conventional RAD-seq approach, we increased the multiplexing load from 96 individual barcodes for our GBS library setup to 384 individual barcodes on our Rapture experiment. For these Rapture libraries, a total of 288 individuals from the six sampling sites used in this study were coupled with 93 others samples (required for another project) and three free-DNA water blanks (used for sequencing/bioinformatic plate control) to sequence one Rapture library with similar sequencing efforts compared to the GBS and Pool-seq libraries.

| Library sequencing
All libraries were sequenced on the Ion Torrent p1v3 chip at the plateforme d'analyses génomiques of the Institute of Integrative and Systems Biology (IBIS, Université Laval, Québec, Canada) with a median target of 80 million single-end reads (50-220 pb) per chip.
Two rounds of sequencing (i.e., two separate chips) were conducted for all libraries. GBS libraries were normalized after the first round in order to reduce the unbalanced sequence representation of individuals by adjusting DNA volumes for each sample. Pool-seq libraries normalization was not possible because individual information was unavailable, and therefore, balanced contribution of each individual in each pool is assumed. Rapture protocol was also conducted without normalization as the adjustment of DNA volumes on a highly randomized and multiplexed Rapture setup (i.e., several hundreds to one thousand barcodes) is very time-consuming and could substantially increase the risk of inadvertent pipetting errors.

| Construction of a de novo reference catalog of individual genotyping-bysequencing (GBS) libraries
Genotyping-by-sequencing sequence data from four locations (Gaspé, Lobster Bay, Sidney Bight, Triton) were analyzed using the pipeline available at (https ://github.com/enorm andea u/stacks_workflow) according to Benestan et al. (2016). First, reads were trimmed to 80bp and shorter reads were discarded using cutadapt (Martin, 2011). Samples were then demultiplexed using process_radtags in STACKS V.1.38 (Catchen, Hohenlohe, & Bassham, 2013). A maximum of three nucleotide mismatches (M = 3), a minimum stack depth of three (m = 3), and a maximum distance for secondary reads N = 5 were allowed in ustacks. Then, reads were aligned de novo to create a catalog of putative loci (cstacks module in stacks, with default parameters) and the populations command was run requiring a locus to be present in at least one sampling location and in 50% of all individuals.
Finally, this dataset was postfiltered using a custom python script (available at https ://github.com/enorm andea u/stacks_workf low/00scrip ts/05_filter_vcf.py) where we kept SNPs for which a genotype was called in at least 70% of individuals in each sampling site with a Ho < 0.6 and a F IS between [−0.7: 0.7]. A minor allele frequency (MAF) threshold of at least 1% globally or 5% in each sampling locality was also applied and no more than eight SNPs per locus were allowed.
Based on this final individual GBS dataset, we then generated a targeted sequences catalog for Rapture. Ultimately, we removed highly similar sequences through a "self-blast" test and

| Individual variant calling (GBS and Rapture)
Both GBS and Rapture raw data were processed using the same workflow as indicated previously for trimming and demultiplexing.
Individual reads were aligned to the reference catalog with BWA-mem (Li & Durbin, 2009) using default settings values except for minimum seed length (−k 19), maximum seed occurrence (−c 500), gap open penalty (−O 0), gap extension penalty (−E 2), and the output alignment score option disabled (−T 0). The resulting SAM files were then filtered to remove unmapped reads and perform secondary alignment as well as supplementary alignment using SAMtools view . Then, reads with mapping quality less than 20 and reads containing soft clipping (i.e., the exclusion of terminal bases with mismatches) were removed. SNPs were identified using pstacks module specifying a 5× minimum depth coverage for each stack. This threshold was selected based on the reads distribution of each sample and in order to limit false-positive SNPs resulting from sequencing errors.
Then, a catalog was built using cstacks with three mismatches allowed between samples tags. We ran populations module requiring again a locus to be present in at least one population and at a frequency >50% in that population, with a minimal depth of five to be processed. The final dataset was obtained by keeping SNPs genotyped in at least 70% of the individuals in each sampling site, showing an observed heterozygosity < 0.6 within samples and a global MAF > 0.01. Since our de novo reference catalog was already filtered for F IS to remove ambiguous SNPs, we did not apply this filter again on the mapped datasets.
Genotype missing data threshold was set to 16% in order to retain more than 90% of individuals in both GBS and Rapture datasets.

| Pool-seq variant calling
Pool-seq sequences were trimmed, demultiplexed, and aligned across the reference catalog as previously described. Sequences Alignment/ Map files in binary format (BAM) were then filtered as above, removing all reads with soft clipping (Kofler, Orozco-terWengel, et al., 2011a).
Then, BAM files were combined to generate a synchronized multiple pileup file using SAMtools mpileup tool  and the Popoolation2 java script mpileup2sync.jar (Kofler, Pandey, & Schlötterer, 2011b) with default parameters. SNP calling was performed using the popsync2pooldata function in the R package poolftsat 0.0.1 (Hivert, Leblois, Petit, Gautier, & Vitalis, 2018). We considered only biallelic SNPs called with a minimal read count ≥ 4. We also required a minimal coverage of 30 and maximal coverage of 300 in order to remove poor-quality SNPs and potential sequencing artifacts (i.e., PCR duplicates), respectively. Finally, we fixed a MAF threshold of at least 0.01 in each pool.

| Testing for correlations among allele frequencies
We first identified shared SNPs between GBS, Rapture, and Poolseq datasets based on locus name information, locus read position, and SNP alleles. Allele frequencies for individual-based data were computed for each sampling site using vcftools v0.1.12b (Danecek et al., 2011), for both GBS and Rapture VCF datasets. Pool-seq allele frequency estimates were performed by dividing the read count of each SNP allele on the total locus coverage. Correlations of minor allele frequencies were calculated between each of the three methods for each shared SNP. We also computed the average MAF across all Pool-seq replicates for each SNP to further reduce the potential bias of MAF estimations and to accurately examine the relationship between Pool-seq, GBS, and Rapture methods using the Pearson correlation coefficient available in R under the cor.test function.

| Computing genetic differentiation for the three methods
First, the extent of genetic differentiation was computed and compared relatively to each method and to the entire dataset (i.e., using SNPs shared between all methods and the overall set of discovered SNPs in each method). We computed pairwise F ST values between each sampling site using the θ estimator of Weir and Cockerham (1984). In order to minimize the effects of linkage disequilibrium, downstream analyses were performed using only one SNP per locus, by keeping only SNP showing the higher MAF at each locus. This last filtering step is expected to reduce the number of low-frequency SNPs. Indeed, these rare variants are typically hard to distinguish from sequencing errors and mapping artifacts in low coverage NGS data without reference genome. Moreover, Guo et al. (2013) also demonstrated by simulations that Pool-seq is not ideal for estimating allele frequencies of rare SNPs.
For GBS and Rapture, F ST were computed using the stamppFst function from the R package StAMPP 1.5.1 (Pembleton, Cogan, & Forster, 2013) with 95% CI estimated on 1,000 bootstraps. Pool-seq F ST values were computed with the computeFST function available in the R package poolfstat 0.0.1, using the method of moments developed by Hivert et al. (2018). Briefly, this latter method is based on an analysis of variance derived from the Weir and Cockerham (1984) estimator and corrected for Pool-seq datasets. CI for Pool-seq F ST was obtained using a custom bash script over 1,000 bootstraps iterations. Pool technical replicates were used to compute the average of each pairwise F ST and CI values. Additionally, for all pairwise site comparisons, we performed standard Mantel tests to assess correlation between genetic distances (measured as F ST /(1 − F ST ); Rousset, 1997) and geographic distances. Seafloor distances were measured between each sites using the R package marmap 0.9.6 (Pante & Simon-Bouhet, 2013). This R toolbox enabled us to estimate marine distances along coast lines. Mantel test was performed with Ade4 1.7.10 (Dray & Dufour, 2007) using 1,000 permutations assuming a two-dimensional habitat in which geographic distance was log-transformed.
An additional analysis of allele frequency differentiation was conducted with BayPass v2.1 (Gautier, 2015). First, we ran BayPass to estimate the scaled variance-covariance matrix (Ω) under the neutral core model implemented in the software. For both GBS and Rapture datasets, 100 short pilot runs with 1,000 iterations each were set with a 5,000 burn-in period. We then ran BayPass with the same settings defined for individual-based data but accounting for the specificities of Pool-seq data using the "Pool-seq" options implemented in BayPass. As for F ST analysis, only the SNP with the highest MAF at each locus for each dataset was kept. To investigate population structure, we carried out a singular value decomposition on each Ω matrix. We used the resulting principal components coordinates to produce a two-dimensional visualization of the observed genetic variation. We compared the geographic position (i.e., latitude and longitude) of sampling sites with their PC-based genetic positions. The correlation between Ω matrices was then assessed using a Mantel test. The Pool-seq variance-covariance matrix was reduced by averaging over pool site replicates in order to perform this latter analysis with similar matrix sizes as the individual matrix (i.e., individual-based Ω matrices of size 6 × 6).

| Sequencing data statistics
The average number of reads per sample among sequenced libraries captured sequences). Hence, the useful genomic load (i.e., proportion of expected reads that are both present in the reference and in the raw data) between the two protocols would not be the same. From our catalog of reference containing 9,818 loci, the average proportion of targeted loci recovered (with at least one read) was comparable yet slightly lower for Rapture mapped data (95%) compared to GBS and Pool-seq (98% and 99%, respectively). After filtration, the number of SNPs discovered by GBS and Rapture was 16,986 and 13,931, respectively, while SNPs calling from Pool-seq discovered a total of 10,874 filtered SNPs (Table 1) Table S1.

| Consistency of estimated allele frequencies
In total, 4,664 SNPs were shared among the three methods which is referred to "overlapped dataset" hereafter (see Figure S1). Minor allele frequencies were highly correlated among the three meth-  Note: The last line (% targeted loci after filtering) indicates the proportion of loci kept at the end of the filtering steps and relative to the maximum of loci expected (i.e., the 9,818 loci from the reference catalog used for mapping and for sequence capture).
TA B L E 1 Summary statistics of data obtained using genotype by sequencing (GBS), Rapture, and Pool-seq approaches difference in coverage was observed for THE(1) compared to other pool replicates THE(2) and THE(3) (see Figure S3). Thus, we suspected that individual DNA contributions in THE(1) were strongly unbalanced, probably due to experimental errors when samples DNA were pooled together. Therefore, the pool replicate THE (1) was removed from all Pool-seq datasets in order to mitigate its effect in downstream analyses (both for overall and overlapped SNPs datasets).

| Measuring genetic differentiation
Overall, the genetic differentiation measured by the three methods was weak with an average pairwise F ST of 0.0028 (SD = 0.0027,    and Ω-PC2), and spatial distribution of sample sites (i.e., latitude and longitude) revealed a significant spatial structure (Table 4)  Here, we empirically explored the consistency of the genetic structure observed in a high gene flow species by comparing conventional GBS with Rapture and Pool-seq approaches. We found that individual-based methods (i.e., GBS and Rapture) provided more congruent results than Pool-seq. In the following sections, we discussed the consistency of these three methods in a context of a weak genetic differentiation and we also highlight the cost and benefits for each method tested.

| Level of congruence between GBS and alternatives methods
Our results showed that allele frequencies estimated from GBS and the two alternatives methods, Rapture and Pool-seq, were consistent although allele frequency estimates from Rapture were more highly correlated to GBS than Pool-seq. These observations are in agreement with other studies that also reported a strong correlation between pooled and individually measured allele frequencies   Futschik & Schlötterer, 2010;Gautier et al., 2013). Yet, the pool sample size is a critical parameter for characterizing genetic structure from Pool-seq data, particularly for F ST estimation since numerous computational approaches, such as maximum likelihood estimates Smadja et al., 2012) or model-based methods (Fariello et al., 2017), are conditioned by sample size. In practice, unequal contributions of each individual to the final pool of sequences may introduce biases in allele frequencies estimates (Gautier et al., 2013;Zhu, Bergland, González, & Petrov, 2012). The concept of effective pool size (i.e., number of diploid individuals with equimolar amounts of DNA in an idealized pool that was expected to show the same level of variance in allele frequency estimations) has been proposed by Gautier et al. (2013) to illustrate this latter source of errors. Using an empirical dataset, they showed that the effective pool size could be up to 30% lower than the experimental pool size. Here, using the program poolne_estim developed by Gautier et al. (2013), we estimated that the effective pool size ranged from 17 to 48 among all pool replicates (see details in Table S4). This latter results represented an experimental error (as defined by Gautier et al. (2013)) ranging from 0% to 133.6% (average = 55.1%, SD = 29.2).
Thus, like Gautier et al. (2013), we observed that the effective pool size differed from our experimental pool design. Nevertheless, allele frequency estimates remained similar between individual-based methods and Pool-seq, except for one pool. Unfortunately, Gautier et F I G U R E 4 Clustering analysis under Bayesian hierarchical model. (a, b, and c) represent the eigenvalue decomposition of the scaled variance-covariance matrices of population allele frequencies (Ω) for GBS, Rapture, and Pool-seq datasets, respectively. Left plots correspond to overall SNPs datasets and right plots correspond to overlapping SNPs datasets. Variance-covariance matrix (Ω) was estimated from the neutral core model proposed by Coop, Witonsky, Rienzo, & Pritchard (2010) and implemented in BAYPASS software (Gautier, 2015). studies. However, such work was beyond the goal of the present study that only took an empirical approach to measure consistency among methods. Moreover, as stipulated by Shafer et al. (2017), it is difficult to reproduce the important variation introduced during wet laboratory data generation using simulated data. Indeed, building proper algorithms simulating complex laboratory biases such as PCR duplicates remains difficult as well as similar statistic estimators for all sequencing methods. Hence, we think that our empirical data can still be a relevant approach to substantiate interpretations and test the consistency of the alternatives methods to GBS.
NGS approaches have some conceptual and methodological limitations that can introduce artifacts and affect estimates of population genetic parameters (Andrews et al., 2016;Cariou et al., 2016;Davey et al., 2011). For example, mutations in restriction sites, referred to allele dropout ("ADO"), may result in an underestimation of genetic diversity and false inference of population divergence (Arnold et al., 2013;Cariou et al., 2016;Gautier et al., 2013).
Moreover, restricted digested libraries can suffer from a high level of sequence clonality related to PCR amplification (i.e., PCR duplicates), which have the potential to bias allelic read depth and produce genotyping errors (Davey et al., 2011). These methodological limitations are known to generate missing data that can substantially cause mis-estimations of commonly used statistics (e.g., F ST , Tajima's D, nucleotide diversity) as well as bias in population genomic inferences (Arnold et al., 2013). In this study, we used Ion Proton™ systems, which were designed to produce single-end sequencing reads.
However, unlike paired-end sequences data, distinguishing PCR duplicates in single-end sequences remains difficult. To date, only one recent study provided estimate of the average PCR duplication rate of single-end high-throughput sequences datasets (Bansal, 2017).
Moreover, the identification of PCR duplicates is difficult in singleend Pool-seq since haplotype information is lost. Still, we mitigated potential experimental bias due to PCR duplication by maximizing genomic diversity in each DNA library (i.e., number of genomes), using 200 ng of genomic DNA per sample as recommended by several studies (Andrews et al., 2016;Casbon, Osborne, Brenner, & Lichtenstein, 2011;Davey et al., 2011). Considering that the genome size of the American lobster was estimated roughly 4.5Gb (Jimenez, Kinsey, Dillaman, & Kapraun, 2010), we expected that each DNA library was represented by nearly 40,000 American lobster genomes following the equation: where Q ng is the amount of DNA in nanograms and G bp is the length of DNA amplicon in base pairs (i.e., genome size). This calculation is based on the assumption that the average weight of Note: Values represent Pearson r correlation between Ω-PC space coordinates of each sampling site (i.e., PC1 and PC2, see Figure 4) versus geographic position (i.e., latitude and longitude).
base pair (bp) is 650 Daltons. We also used a sequence size selection (BluePippin™ prep-Sage Science) to minimize amplicon size variability and limit PCR cycling to 10 cycles during library preparation, two measures that should prevent efficiently the formation of PCR duplicates. Finally, the probability of obtaining PCR duplicates is negatively correlated to the number of targeted markers. Therefore, Ali et al. (2016) targeted 500 loci and observed a high proportion of PCR duplicates. Here, on the contrary, we targeted 9,818 loci, resulting in lower probability of generating PCR duplicates.

| Consistency of population structure patterns
Population genetic structure was further investigated using a hierarchical Bayesian model available in BayPass (Gautier, 2015). All the three methods showed a similar signal of clustering, uncovering the presence of two geospatial groups (Figure 4). This North/ South dichotomy was previously highlighted using 13 microsatellites (Kenchington, Harding, Jones, & Prodöhl, 2009) and 10,156 SNPs (Benestan et al., 2015), which then give support to our outcomes.
Nevertheless, caution is required to interpret this clustering signal.
Indeed, comparing observed genetic variation from Ω matrices with spatial distribution of samples sites (i.e., latitude and longitude) revealed that the latitude criterion was the most prominent pattern of clustering. Furthermore, comparing genetic differentiation (i.e., pairwise F ST ) with geographic distances depicted a positive signal of IBD, as also identified by Benestan et al. (2015). Importantly, the disjointed range of sampling sites (i.e., geographic gap between samples from the North vs. South) may impact our capability to accurately resolve population structure (Bradburd, Coop, & Ralph, 2018). Indeed, most currently available clustering methods are known to be easily confounded by the presence of IBD and tend to split continuous patterns of spatial variations in discrete groups (Frantz, Cellina, & Krier, 2009;Meirmans, 2012 Figure   S2). Consequently, it may be unnecessary and more expensive to use Rapture compared to conventional GBS depending on the scientific question, the scale of the research project, the sampling design, and the laboratory possibilities. Table 5 provides a summary of prominent advantages and disadvantages of each method that are briefly summarized below. The advantage of GBS approach is that it can be applied using a de novo assembled catalog (Etter, Preston, & Bassham, 2011), while both Rapture and Pool-seq approach require prior genomic reference in order to align sequencing reads (Ali et al., 2016;Schlötterer et al., 2014). Since no reference genome was available for the American lobster (or even for a closely related organism), we used prior GBS development to select a set of SNP markers and then build a de novo reference catalog, which was used to align raw reads obtained from each approach (e.g., GBS, Rapture, and Pool-seq

| CON CLUS ION
In conclusion, we found that Pool-seq and Rapture provided consistent allele frequency estimates and nearly identical patterns of population structure compared to conventional GBS approach.
However, despite increasingly accurate F ST estimator for Poolseq methods (Hivert et al., 2018), or the availability of individual effect of this experimental bias. We further advocate that future empirical Pool-seq projects would be reinforced with several pool replicates in order to control for experiment reproducibility and data robustness.

ACK N OWLED G M ENTS
This research was financially supported a Strategic Partnership Grants for Projects from the Natural Sciences and Engineering Research Council of Canada to LB and RR. We thank two anonymous reviewers for their comments on a previous version of this manuscript. We also thank scientists from the Department of Fisheries and Oceans and Canadian fishers who helped collecting the samples. We are grateful to Alison Devault and the Arbor Biosciences team for DNA probes synthesis and methodological advices. We also thank the personal of the IBIS sequencing platform for their assistance in developing the Rapture assay for highly multiplexed configuration. Finally, we thank Michael Miller's laboratory team for conceptual recommendations and technical communications about Rapture.

CO N FLI C T O F I NTE R E S T
None declared. 2. The following datasets generated for this study and available from the Dryad Digital Repository: https ://doi.org/10.5061/ dryad.64f7982

AUTH O R CO NTR I B UTI
• GBS VCF file of no filtered SNPs dataset.
• Rapture VCF file of no filtered SNPs dataset.