Indications of strong adaptive population genetic structure in albacore tuna (Thunnus alalunga) in the southwest and central Pacific Ocean

Abstract Albacore tuna (Thunnus alalunga) has a distinctly complex life history in which juveniles and adults separate geographically but at times inhabit the same spaces sequentially. The species also migrates long distances and presumably experiences varied regimes of physical stress over a lifetime. There are, therefore, many opportunities for population structure to arise based on stochastic differences or environmental factors that promote local adaptation. However, with the extent of mobility consistently demonstrated by tagged individuals, there is also a strong argument for panmixia within an ocean basin. It is important to confirm such assumptions from a population genetics standpoint for this species in particular because albacore is one of the principal market tuna species that sustains massive global fisheries and yet is also a slow‐growing temperate tuna. Consequently, we used 1,837 neutral SNP loci and 89 loci under potential selection to analyze population genetic structure among five sample groups collected from the western and central South Pacific. We found no evidence to challenge panmixia at neutral loci, but strong indications of structuring at adaptive loci. One population sample, from French Polynesia in 2004, was particularly differentiated. Unfortunately, the current study cannot infer whether the divergence is geographic or temporal, or possibly caused by sample distribution. We encourage future studies to include potentially adaptive loci and to continue fine‐scale observations within an ocean basin, and not to assume genome‐wide panmixia.

There is no evidence of current overfishing of albacore tuna in any ocean basin (ISSF, 2017), but there is also no comprehensive study that assesses all phases of the species' complex life history.
Fish in the Indian and South Pacific Ocean spawn in the tropics during the spring and summer months and make their first migration to high latitudes largely undetected, appearing around 40°S roughly a year later (Farley, Williams, Hoyle, Davies, & Nicol, 2013). Juveniles persist at high latitudes until they reach sexual maturity and in the meantime follow smaller summer migrations to feeding grounds at latitudes of 15-25°S (Chen, Lee, & Tzeng, 2005). Once they become sexually mature, fish move closer to the equator, now wintering in the same latitudes that become juvenile feeding grounds in the summer, and migrating in the spring into tropical waters <15°S in order to spawn (Chen et al., 2005). Apart from the failure to mark yearly recruits during their first migration to 30°S, most elements of the division between life phases have been well recorded through catch per unit effort and tagging data.
However, other aspects of the albacore life cycle are not completely understood (Hoyle & Davies, 2009;Nikolic et al., 2013) or described for all stocks. There is a much less well-documented longitudinal element to albacore migrations, with young fish wintering offshore and migrating east to feed in both the North Pacific (Childers, Snyder, & Kohin, 2011) and North Atlantic (Arrizabalaga, López-Rodas, Ortiz de Zárate, Costas, & Gonzaléz-Garcés, 2002), and mature fish going from the same longitudes west to spawn (Arrizabalaga et al., 2002;Chen et al., 2005). Another juvenile feeding ground off New Zealand in the southwest Pacific is well recognized, but the corresponding spawning ground for the hemisphere is not confidently identified. Inferences by various studies place the spawning ground anywhere from the west Pacific to as far east as French Polynesia (Farley et al., 2014;Hoyle, Hampton, & Davies, 2012). The actual route taken by a single fish can also vary between years, both in how far offshore or toward the continental shelf they travel and the latitude to which they return (Childers et al., 2011;Duncan et al., 2018). Likewise, the longitudinal boundaries of these wintering grounds are not mutually agreed upon; tagging studies in the northeastern Pacific describe endpoints of migrations extending to between 130 and 180°W (Childers et al., 2011).
The story is even less complete from a population genetics perspective. Many studies have been successful in recognizing the differentiation of the Pacific, Indian, Atlantic, and Mediterranean populations, but putatively neutral loci rarely demonstrate differentiation within an ocean basin (Chow & Ushiama, 1995;Davies, Gosling, Was, Brophy, & Tysklind, 2011;Laconcha et al., 2015). This contradicts tagging studies (Arrizabalaga et al., 2002) and current management boundaries , which suggest that there is no significant interaction between populations from different hemispheres within the same body of water. Loci flagged as possibly being under divergent selection (loci under potential selection, LUPS) are consistently more successful than neutral loci at demonstrating structure (e.g., Grewe et al., 2015) but are rarely employed and, like neutral loci, may be limited by study design. Specifically, studies' sample collection protocols rarely acknowledge the geographic separation of age groups or the different placement of age groups between seasons. Therefore, many forms of cryptic structure could exist that are not currently considered in the literature.  (Frankham et al., 2011). Studies of species without an annotated reference genome traditionally use potentially selective loci in place of adaptive loci. LUPS are identified based on deviation from normal patterns of allele frequencies across multiple populations (Refoyo-Martínez et al., 2019). More recently, Random Forest machine learning algorithm has been used to select loci that potentially interact and produce significant population discrimination despite unremarkable allele frequencies per individual locus (Brieuc, Waters, Drinan, & Naish, 2018;Jacobs et al., 2018). In both cases, panels of loci were specifically selected to produce differentiation patterns during population structure analyses that are otherwise overwhelmed by the signal of characteristically neutral loci.
However, the adaptive status of loci selected in either fashion is still unconfirmed, and the biological relevance of downstream trends is likewise putative. The confidence of LUPS identification is also impacted by the number of sample groups observed, the true number of demes under observation, extent of neutral differentiation between demes, and migration rate (Flanagan & Jones, 2017). Basing management decisions on such analyses in isolation would be unadvisable. Alternatively, when true adaptive loci are unavailable, patterns described using LUPS can suggest population dynamics that go unrecognized by neutral loci, such as sensitivity to external stressors that management needs to account for or control (Vitalis, Dawson, & Boursot, 2001).
In response to these various limitations in the literature, we sampled 188 individuals from across the western and central Pacific F I G U R E 1 Albacore tuna, Thunnus alalunga. Illustration: Les Hata, © Pacific Community Ocean (WCPO) in units that control for temporal and spatial variation, considering both neutral loci and LUPS. Our comparisons, while still not comprehensive, directly assess spatial and temporal population structure at neutral and adaptive loci in South Pacific albacore.

| ME THODS
Fish specimens were selected for analysis from tissue samples archived in a tissue bank collection managed by the Pacific Community Tonga also in 2010 (NZ10 and TO10, respectively), and the aforementioned group PF04 from French Polynesia in 2004. NC10 and NC14 were chosen to provide a temporal comparison that controls for both location and season. Similarly, NC10, NZ10, and TO10 consist of specimens collected within the same two-month period, in order to produce a concisely controlled spatial comparison. All samples were accompanied by metadata including catch location, date, and catch event number, and fish size (Table S1). A total of 188 individuals from five geographic locations were analyzed, representing four countries and spanning 10 years ( Figure 2). DNA extraction and sequencing were conducted by Diversity Arrays Technology (DArT PL). Its patented next-generation sequencing protocol, DArTseq, is a cost-effective option for generating high-quality, high-throughput SNP datasets for nonmodel species.
Although some steps are proprietary knowledge, a description of the DArTseq protocol is available in Kilian et al., (2012), Sansaloni et al. (2011), and Ren et al. (2015. Following automated DNA extraction, samples were digested using PstI and SphI restriction enzymes. Methylation-sensitive enzymes were chosen to avoid highly repetitive, methylated genomic regions that are minimally informative and tend to carry elevated risk of misinterpreting paralogs as a single locus during marker calling. Specialized adaptors were ligated to digested DNA. Both PstI and SphI adapters included a PCR primer sequence and Illumina flowcell attachment sequence, and the PstI adaptor also included a unique, varying length barcode sequence for sample recognition within pooled libraries. PCR only amplified fragments capped with both adaptors, using the following protocol: 1-min denaturation at 94°C, 30 cycles of 20 s at 94°C, 30 s and 58°C and 45 s at 72°C, and a final extension step of 7 min at 72°C. Libraries were then further amplified using bridge PCR on the Illumina HiSeq 2500 platform and sequenced on the same platform. The resulting data were submitted to an in-house software, DArTsoft, which interprets sequences from images of fluorescence taken during Illumina sequencing and produces FASTQ files. Files were quality controlled for sequences with 90% confidence at 50% of bases and split by barcode into individual specimens. Sequences were aligned de novo.
A separate algorithm, DArTsoft14, called SNPs and further qualityfiltered for singletons and other suspected sequencing errors. The final output produced by DArT was a genotype report of all identified SNPs, their global call rate, polymorphic information content, and their codominant status in each sequenced specimen.
The returned dataset of 27,295 SNPs was further filtered for locus quality. Loci were first culled by removing all but one SNP per sequenced DNA fragment. Remaining loci were selected based on a 99% call rate, a minimum read depth of 7×, and 5% minor allele frequency. F ST outlier analyses were conducted with LOSITAN v. 2.1 using the prior odds for neutral model and a 10% false discovery rate. Individuals were submitted to LOSITAN in their five original sample groups. Next, loci were extracted that showed deviation from Hardy-Weinberg Equilibrium (HWE) across all populations.
F I G U R E 2 Map of sample sight locations. Size of circle represents the number of fish caught inside blue line of each country's EEZ boundaries. Inset: albacore range in the Pacific, from http:// www.fao.org/figis/ geose rver/facts heets/ speci es.html HWE tests whether loci occur at frequencies that deviate from selectively neutral assumptions and were analyzed using Arlequin v.
3.5.2.1 (Excoffier & Lischer, 2010). The large number of loci used in this study complicates HWE testing significance because a standard p-value of .05 would prompt the unnecessary discard of 100 or more informative loci through Type I error if the threshold for significance was not adjusted accordingly (Waples & Allendorf, 2015).
Because Alrequin does not calculate a p-value small enough to reflect accurate correction for multiple testing, we used the most sensitive available p-value threshold of .0001. HWE results were also filtered for loci with a maximum observed heterozygosity of 0.5, as an independent control for the potential merging of paralogous loci in the DArTseq pipeline. Finally, pairwise linkage disequilibrium was assessed using PLINK2 (Chang et al., 2015) and a threshold of 70% linkage between loci. All filtering steps were first conducted on all five population samples and then repeated using just NC10 and NC14 to provide a maximally informative dataset for analyses that controlled for spatial distribution of samples when exploring temporal variation in population structure. Raw datasets are publicly avail- The k with the lowest associated CV was selected. ADMIXTURE outputs from the recommended k value were then visualized using the "barplot ()" command in R v. 3.3.1 and cross-checked using the R package stockr and the command "stockSTRUCTURE." Another visual assessment of each dataset was produced using a discriminant analysis of principal components (DAPC) in the R package adegenet after alpha optimization (via R commands "dapc ()" and "a.score.optim ()"). The adegenet package was also used to independently recommend the number of genetic clusters based on the Bayesian information criterion (BIC), although this always concurred with interpretations of ADMIXTURE.
Genodive v. 2.0b27 (Meirmans & van Tienderen, 2004) was used for an independent recommendation of population assignment using the original five sample groups and an analysis of molecular variance (AMOVA).
The neutral dataset was also used to calculate adjusted expected and observed heterozygosities (H n.b. and H o ), and the inbreeding coefficient F IS (at 1,000 permutations), using GENETIX v 4.05 (Belkir, Borsa, Chikhi, Raufaste, & Bonhomme, 2004). These values are only informative under assumptions that include neutrality and therefore were not calculated for the LUPS dataset. Basic statistical comparisons were made using the chi-square test for homogeneity in R, using the core command "chisq.test."

| RE SULTS
Sequencing of 188 individuals from five population samples using DArTseq identified 27,295 loci after Diversity Array Technology's inhouse quality filtering protocols using the DArTtoolbox. Forty-one individuals were discarded from reporting due to their inability to produce adequate quality sequencing data, including 22 from TO10, six from NC10, five from NC14 and PG04, and three from NZ10.
Secondary quality filtering of the remaining specimens from all five population samples produced a neutral dataset of 1,837 loci and a LUPS dataset of 89 loci. When the two New Caledonia samples, NC10 and NC14, were filtered independent of other samples, 1,925 neutral and 66 purportedly selected loci were identified (Table 1).
Neutral global analyses demonstrated underwhelming evidence of population structure. As the most fundamental measures of population structure, diversity assessments were unremarkable.
No chi-square comparisons of the expected and observed values per population samples were significantly different (χ 2 = .0004, p = 1).
Comparative assessments using neutral loci were likewise rarely significant. Of 10 possible pairwise F ST values, only two are TA B L E 1 Number of loci remaining after each quality filtering step for datasets using neutral loci (NL) and loci under potential selection (LUPS)  Figure 4). An alpha-optimized DAPC using eight principal components and 4 df produced similar results, with limited overlap of samples from PF04 with those from NC10, NC14, and NZ10. TO10 showed a similar degree of separation as PF04 (Figure 3). Improved distinction between sample groups also reflected in Genodive population assignments, where the global average rose to 70%, ranging from 56% (NZ10) to 76% (PF04). Likewise, an AMOVA of LUPS identified more variation between groups than the same analysis using NL data, with 90% of variation within individuals, 7% allocated between population samples (p = .001), and another 3% among individuals within populations (p = .003).
Using the New Caledonia-specific dataset, analysis of neutral loci again provided evidence of temporal stability in the allele frequencies. Pairwise F ST values were not significant and uninformative, at .0003 with p-value .76. ADMIXTURE recommended k = 1 and a DAPC using 23 principal components and one degree of freedom showed major overlap between sample groups ( Figure 5). An AMOVA produced similar results to the global neutral dataset, with 97.7% of variation found within individuals (p = .001), and population assignment was only correct 41% of the time. The evidence produced by LUPS is more complex. The pairwise F ST value was comparable to those produced by the global dataset (.08, p < .000), as were an AMOVA (90% variability within the individual and 8% between populations, p = .001). Population assignment was even more accurate than that of global analyses, with all but one individual correctly assigned (98% accuracy). However, although a DAPC visibly separated the two groups using one principal component (Figure 5), the recommended number of genetic clusters from ADMIXTURE and adegenet both dropped to 1.

| D ISCUSS I ON
Based on the current results, we infer that the central and western South Pacific supports a single, genetically healthy population of albacore tuna that is notably substructured in some genetic regions, potentially due to environmental factors that can produce local adaptation. Observations of neutral genetic diversity and robustness all follow precedent set by other studies of albacore and various tuna species and do not indicate any need for conservation concerns. Assessments using LUPS demonstrate genetic differentiation between population samples, although the overlapping impact of time, distance, and cohort sampling strategy cannot be definitively isolated using the current sample distribution. Although there is no simple rule about sample size, incorporating more samples invariable reduces the risk of sampling error (Nei, 1978), lending much greater confidence to the relevance of shallow population structure demonstrated in Knusten et al. (2011) than the current study. Furthermore, effective population size (N e ) is also much larger in cod than those characteristically found in mobile pelagic species (Knutsen et al., 2011), suggesting that coastal Atlantic cod do not follow the same sweepstakes reproduction model that is believed to produce extremely small N e values in pelagic species such as tuna (Waples, 2016 Our observations of neutral panmixia within the WCPO follow a long tradition of similar conclusions. Differentiation between ocean basins is well established; very few studies fail to find differentiation between water bodies, and those that do are often poorly sampled (Graves & Dizon, 1989), or use sample sites that are technically in different bodies of water, but in fact are very close to the border (Pujolar, Roldan, & Pla, 2003). However, many studies that successfully describe separation of albacore from separate oceans fail to sense differentiation between Pacific samples. These include Davies et al. (2011), which sampled two locations in the southwest Pacific; Albaina et al. (2013), which used three samples from the southwest, southeast, and North Pacific; Laconcha et al. (2015), which similarly sampled the southwest, south-central, and northeast Pacific; and Chow and Ushiama (1995)  acknowledged presence at the same latitudes in the West Pacific Warm Pool . Albacore are known to remain within a body of water while migrating, rather than cross thermoclines (FAO, 1985), which in most oceans would deter movement through equatorial waters. The unique oceanography of the western Pacific could create a corridor of habitability across the equator, providing at least one mechanism to homogenize the two hemispheres.

| Evidence of population genetic structure at potentially adaptive loci
The current study is one of the relatively few that identifies and conducts analyses on potentially adaptive loci in albacore. As with comparable studies, our assessments incorporating LUPS detected much greater potential for structure than those using neutral loci; Samoa and the Cook Islands (Harley, Peter, Nicol, Hampton, & Brouwer, 2015;Williams et al., 2012). Similarly, albacore in the eastern South Pacific have larger gonads relative to their somatic weight than specimens in the western Pacific (Farley et al., 2013) and reach first maturity at a slightly smaller size (Farley et al., 2014). Growth and reproductivity are often influenced by factors including oxygen concentration, temperature, and prey availability (Murua et al., 2017); however, differences in absolute size reported by Williams and Terawasi (2013) did not correlate with gradients in any abiotic factor. It is possible that there is a genetic element convoluting the relationship. All of these observations also concur with our analyses that distinguish PF04, and to a lesser degree TO10, from western Pacific samples.

| Difficulties in describing a convoluted life history
The overarching picture of neutral panmixia and strong local adapta-  (Lu et al., 1998). Any environmental event that significantly reduces the size of a cohort will also carry implications for the population genetic health and diversity of cohorts produced during that time through the increased influence of genetic drift and will change the selective pressure on various alleles and genomic regions. But, again, no genomic study has been designed to isolate the impact of ENSO events on albacore population genetic structure.
It is evident that Pacific albacore life history includes numerous

| CON CLUS ION
In sum, we have demonstrated panmixia in albacore tuna in the WCPO using 1,837 neutral SNP loci and indications of population genetic structure at potentially adaptive regions of the genome using 89 LUPS. Well-controlled specimen selection in four of five population samples to establish that, using loci that do not comply with neutral allele distribution, differentiation occurs between geographically distant locations sampled at the same time, and within one location when sampled in the same season but different years.
Although one group, TO10, is undersampled and potentially poorly represented, the pattern of differentiation persists with or without the suspect data.
The differentiation between any of these samples and the fifth group, PF04, is a magnitude higher than the already notable separation among the original four. It is unclear whether the dramatic genetic differentiation stems more from the geographic distance between sample groups, the larger number of intervening years, or the distinctly more dispersed specimen selection, which includes individuals caught in both the tropics and subtropics, and over a span of 10 months. Whatever combination of these factors proves to be relevant, the fundamental deviation from panmixia deserves further exploration.
Without understanding the environmental basis and interaction that prompts adaptive differentiation within the WCPO, it is not yet appropriate to recommend changes to current management practices. However, further exploration of the present observations must be prioritized in order to identify the driving factors, establish the relevance to long-standing observations of morphological differentiation, and potentially update management assumptions to reflect this new reality.

ACK N OWLED G M ENTS
We gratefully acknowledge all the observers, research staff and port samplers who collected tissue bank samples, and Simon Nicol and

CO N FLI C T O F I NTE R E S T
The authors identify no competing interests. Ciro Rico https://orcid.org/0000-0002-0822-336X