Finding the adaptive needles in a population‐structured haystack: A case study in a New Zealand mollusc

Abstract Genetic adaptation to future environmental conditions is crucial to help species persist as the climate changes. Genome scans are powerful tools to understand adaptive landscapes, enabling us to correlate genetic diversity with environmental gradients while disentangling neutral from adaptive variation. However, low gene flow can lead to both local adaptation and highly structured populations, and is a major confounding factor for genome scans, resulting in an inflated number of candidate loci. Here, we compared candidate locus detection in a marine mollusc (Onithochiton neglectus), taking advantage of a natural geographical contrast in the levels of genetic structure between its populations. O. neglectus is endemic to New Zealand and distributed throughout an environmental gradient from the subtropical north to the subantarctic south. Due to a brooding developmental mode, populations tend to be locally isolated. However, adult hitchhiking on rafting kelp increases connectivity among southern populations. We applied two genome scans for outliers (Bayescan and PCAdapt) and two genotype–environment association (GEA) tests (BayeScEnv and RDA). To limit issues with false positives, we combined results using the geometric mean of q‐values and performed association tests with random environmental variables. This novel approach is a compromise between stringent and relaxed approaches widely used before, and allowed us to classify candidate loci as low confidence or high confidence. Genome scans for outliers detected a large number of significant outliers in strong and moderately structured populations. No high‐confidence GEA loci were detected in the context of strong population structure. However, 86 high‐confidence loci were associated predominantly with latitudinally varying abiotic factors in the less structured southern populations. This suggests that the degree of connectivity driven by kelp rafting over the southern scale may be insufficient to counteract local adaptation in this species. Our study supports the expectation that genome scans may be prone to errors in highly structured populations. Nonetheless, it also empirically demonstrates that careful statistical controls enable the identification of candidate loci that invite more detailed investigations. Ultimately, genome scans are valuable tools to help guide further research aiming to determine the potential of non‐model species to adapt to future environments.


| INTRODUC TI ON
In the context of climate change, adaptive variation is a critical constraint on a species' distribution, and understanding this variation will help predict a species' potential to respond to future environmental conditions (Razgoura et al., 2019). Currently, the increasing ease of obtaining large panels of genome-wide molecular markers has been fuelling research on the role of adaptive genetic variation and the identification of genes under selection, even in non-model species (Nielsen et al., 2009;Pardo-Diaz et al., 2015).
Genome scans are based on the assumption that natural selection leaves a specific signature in adaptive loci (and in genomic regions in linkage disequilibrium with them) that differs from the genome-wide variation caused by neutral effects such as genetic drift, gene flow and inbreeding (Beaumont & Balding, 2004;Foll & Gaggiotti, 2008;Nicholson et al., 2002). Such adaptive loci (or their linked markers) are expected to have a distribution of allele or genotype frequencies among populations that do not fit the neutral expectation (Ahrens et al., 2018). The distribution of allele frequencies among populations can also be used as a response variable in tests of association with environmental factors relevant to the ecology of the organism: loci with significant associations are interpreted as being under natural selection due to the focal (or correlated) environmental factors (Ahrens et al., 2018;Bassitta et al., 2021;Dalongeville et al., 2018;Hoban et al., 2016;Liggins et al., 2020;Rellstab et al., 2015;Yadav et al., 2019).
An environment that is highly variable, or structured by a spatial gradient, can subject natural populations to habitat-specific selective pressures (Kawecki & Ebert, 2004). Permanent variation in these selection pressures across populations may lead to local adaptation via divergent selection (Hedrick, 1986;Hedrick et al., 1976;Sanford & Kelly, 2011). Local adaptation should be more likely when gene flow is lacking among populations in these different habitats. However, recent evidence suggests that local adaptation is not necessarily impeded by gene flow, and the balance between migration and selection also depends on factors such as the strength of selection, and the genetic architecture of traits (Cornwell, 2020;Lenormand, 2002;Pespeni & Palumbi, 2013;Tigano & Friesen, 2016;Yeaman, 2015).
Natural populations with strong population structure are interesting models to understand local adaptation, as they are likely to have low gene flow, and perhaps stronger local adaptation.
However, it is known and expected that strong population structure may affect the results of genome scans by increasing the number of false positives Liggins et al., 2020;Meirmans, 2012). As a consequence, applying genome scans to strongly structured natural populations requires strategies to mitigate this issue.
Here, we exploited a naturally occurring contrast in the strength of population structure in a New Zealand endemic mollusc, Onithochiton neglectus, comparing contexts of strong and moderate population divergence, and used a new strategy to combine the results of different genome scan methods. O. neglectus is a brooding chiton, lacking a long-lasting pelagic larval phase and with low-motile adults (Creese, 1986). With a strikingly broad distribution for its low capacity to move, O. neglectus populations are present over a latitudinal gradient of environmental factors from the subtropical northern New Zealand to the subantarctic southern islands (O'Neill, 1985;Salloum et al., 2020). Previously, single-locus markers (COI, 16s and ITS-1) detected three O. neglectus genetic clades, namely the North, Central and South clades (Salloum et al., 2020). Within the North and Central clades, populations are highly differentiated (Salloum et al., 2020). In contrast, within the South clade, O. neglectus populations are less genetically structured (Nikula et al., 2012;Salloum et al., 2020). This is likely because connectivity among populations in the South clade is frequently enabled by rafting in the holdfasts of buoyant kelp (Durvillaea sp.), which is rarer in the north of O. neglectus' distribution (Bussolini & Waters, 2015;Nikula et al., 2012;Salloum et al., 2020;Waters et al., 2018).
In this study, we used genome-wide SNPs to identify candidate loci (significant results from genome scans) in this mollusc. We compared four spatial scales: a broad New Zealand-wide scale with low overall gene flow, including 16 O. neglectus populations from three genetic clades ('NZ-wide'); a more restricted southern New Zealand regional scale (a subset of nine populations), experiencing higher levels of gene flow due to kelp rafting ('southern'); a North Island scale, including only the four populations of the North clade ('NI'); and a South Island scale, comparable with the North Island in geographical range and including only the four South Island populations of the South clade ('SI') ( Figure 1a; Table 1). We hypothesize that the strong structure of O. neglectus populations across a variable environment 5. Our study supports the expectation that genome scans may be prone to errors in highly structured populations. Nonetheless, it also empirically demonstrates that careful statistical controls enable the identification of candidate loci that invite more detailed investigations. Ultimately, genome scans are valuable tools to help guide further research aiming to determine the potential of non-model species to adapt to future environments.

K E Y W O R D S
candidate loci, environmental adaptation, gene flow, genome scans for outliers, genotypeenvironment associations, outlier loci, population genetic structure, q-values geometric mean may lead to populations becoming adapted to local conditions, and expect less evidence of local adaptation among southern populations, which experience higher levels of gene flow and encompass only a subset of the entire environmental variation. Within this framework, we aimed to (a) compare the detection of candidate loci in low versus moderate gene flow scenarios and (b) determine F I G U R E 1 Geographical and genetic relationships among sampled populations of Onithochiton neglectus. (a) Geographical location of populations asnd clades. Populations included in the NI dataset are indicated with *, and in the SI dataset with +; (b) relative migration network (estimated using divMigrate). Each circle represents a population (colour-coded), with the intensity of arrows proportional to migration rate. Rates smaller than 0.05 are not plotted; (c) NZ-wide PCA for all populations over all loci; (d) NZ-wide PCA for remaining loci (after removing combined 'outlier' markers); (e) NZ-wide PCA for combined 'outlier' markers (combined q-value between PCAdapt and Bayescan smaller than 0.05). Population key: see Table 1 Long 170 whether population connectivity driven by kelp rafting counteracts local adaptation.
To address these aims, we first undertook demographic analyses to explore how variation in the different datasets is structured. Second, we compared geographical patterns of variation from 'outlier' sets of markers (derived from significant results of two methods of genome scans for outliers [GSO]: PCAdapt and Bayescan) with those from the remaining loci, to identify potentially different evolutionary histories for these sets of loci.
Third, we tested for genotype-environment association (GEA) using two methods (RDA and BayeScEnv) to identify loci correlated with variable environmental factors across the species range (GEA loci hereafter). While false positives can be problematic with genome scans, combining methods can achieve better signal-to-noise ratio, at the cost of lower power (de Villemereuil et al., 2014). Thus, we combined the results across the two GSO methods and across the two GEA approaches. To do so, we implemented a novel conservative approach that uses geometric means of q-values between the methods. This aims to balance a mid-way point between taking all significant results from any method (a union of results, with high power but high false-positive rate) and only shared significant results across methods (an intersection of results, with low false-positive rate but low power). Furthermore, to identify GEA loci with a higher probability of reflecting a true association (high-confidence GEA loci hereafter), after combining q-values across methods we also excluded loci that showed associations with randomly generated variables. Finally, we compared the DNA sequences of high-confidence GEA loci with a related chiton genome (Acanthopleura granulata) (Varney et al., 2021), to identify functional genomic regions.

| MATERIAL S AND ME THODS
Samples for this study were available from previous work (Nikula et al., 2012;Salloum et al., 2020)

| Sampling, DNA extraction and genotypingby-sequencing
A total of 188 samples of O. neglectus from populations distributed across New Zealand and its subantarctic islands were available from previous work (Nikula et al., 2012;Salloum et al., 2020). In all, 16 populations were included in the NZ-wide dataset (spanning 17 latitudinal degrees, and 167 individuals after filtering, Figure 1a, Table 1, Table   S1). A subset of 10 of these populations was included in the southern dataset, representing the previously identified South clade (Salloum et al., 2020). One population (Christchurch) was excluded from the southern dataset due to excess missing genotype data (see below; final southern dataset spanning six latitudinal degrees, and 88 individuals,  Table S1).
DNA extraction, initial checks of concentration and purity were undertaken following the protocol described in Salloum et al. (2020).
DNA concentration was normalized and samples were submitted  (Catchen et al., 2011;Catchen et al., 2013) for four datasets: New Zealand-wide across all populations (NZ-wide), within the North Island (NI) and South Island (SI), and within the South clade (southern). The Central clade was not called separately as it had only two populations, and the North clade is the same as the NI dataset. SNPs with minor allele frequency <0.05 were filtered out. To exclude loci in strong linkage disequilibrium (LD), one SNP per rad-tag was chosen at random, decreasing the possibility of having SNPs across loci in LD. Finally, individuals with more than 90% missing data (two individuals from the Christchurch population) and rad-tags with more than 30% missing data were removed (full details of software and parameter settings are provided in Supporting Information 1, Figures S1-S3).

| Demographic patterns
Population differentiation and genomic diversity (observed and expected heterozygosity, overall F ST , and Nei's F IS ) were estimated for all datasets, with the r package hierfstat v. 0.5-7 (Goudet & Jombart, 2020). The same package was used to estimate overall observed heterozygosity within each population in each dataset, and to check heterozygosity distribution across loci within populations.
Principal component analyses were done with the r package lea v. 2.6.0 (Frichot & François, 2015) for all datasets (Table S2). Allele frequencies were compared for all datasets with a custom R script. In addition, Weir and Cockerham's F ST was estimated for all loci with VCFtools (Danecek et al., 2011) for pairwise comparisons between clades in the NZ-wide dataset (North-South, North-Central and Central-South). The allele frequency spectrum was derived for each population NZ-wide, using vcf2sfs (Liu, 2020). In addition, δaδi (Gutenkunst et al., 2009) was used to plot 2D allele frequency spectra between North-Central, North-South and Central-South clades. To further analyse population structure in the NZ-wide dataset, a population assignment plot was done with the r package Adegenet v. 2.1.1 (Jombart, 2008;Jombart & Ahmed, 2011), as this enables the optimal number of ancestral populations and the admixture proportions of these different ancestries within each individual to be inferred (Jombart & Ahmed, 2011). Finally, directional relative migration was estimated for the NZ-wide dataset with the DivMigrate function in the DiveRsity r package (Keenan et al., 2013), using the G ST method (Sundqvist et al., 2016). The relative migration rates were then used in a Pearson's correlation test with a matrix of geographical distance between populations (shortest straight distance, in kilometres, between populations calculated using https:// www.dista nce.to, Table S3).

| Genome scans for outliers
To identify 'outlier' markers, two GSO were run on each dataset separately, the PCA-based method PCAdapt (Luu et al., 2017) and the Bayesian method Bayescan (Foll & Gaggiotti, 2008 was used as a test statistic, and p-values were corrected using the genomic inflation factor. The proportion of variance explained by the principal components was checked ( Figure S4) and the analysis was re-run with three principal components for the NZ-wide dataset, two principal components for the NI dataset, one for the SI dataset, and six for the southern dataset, using the same settings as above. After checking their distribution ( Figures  The results of both methods were combined by calculating geometric means of the q-values of PCAdapt and Bayescan for each SNP, where qval A and qval B correspond to the locus q-value resulting from the two different methods. The geometric mean can be seen as a form of arithmetic mean on the logarithmic scale and, compared to the absolute scale, is therefore more closely linked to the natural interpretation of the q-value.  (Table S2).

| Genotype-environment association
The PCA-based method RDA and the Bayesian method BayeScEnv were used to search for loci in association with environmental variables (GEA-loci). Environmental variables (described below) were mean-centred and scaled to a variance of one. RDA implemented in the r package vegan v. 2.5.6 (Oksanen et al., 2019) does not allow missing data; therefore, missing loci were imputed (imputing with the most common genotype within the clade). In addition, the environmental variables used as predictors in the RDA should not be correlated (Dormann et al., 2013); thus, only variables with a squared correlation coefficient smaller than or equal to 0.7 with each other were included, leaving six semi-independent variables in the NZwide and southern datasets, and three semi-independent variables in the NI and SI datasets (Tables S4 and S5 For BayeScEnv , correlation between environmental variables is not of concern because they are tested independently in multiple runs of a univariate method; thus, all environmental variables were included. In addition, no imputation is required for BayeScEnv analyses. Runs were performed with the default configurations (5,000 iterations, thinning interval = 10, 20 pilot runs of 5,000 length, burn-in = 50,000, upper bound for the Uniform prior of parameter g = 10, mean alpha prior = −1, prior probability for non-neutral models = 0.1, prior preference for the locus-specific model = 0.5) in the NZ-wide, NI, SI and southern datasets.
The environmental variables available to test in the GEA analyses were latitude, longitude and long-term means of monthly sea-level pressure (Kanamitsu et al., 2002), net shortwave, precipitation rate, surface pressure, total cloud cover, zonal wind velocity, meridional wind velocity, air temperature (Kalnay et al., 1996), ocean temperature, ocean salinity (Monterey & Levitus, 1997), salinity, sea surface height relative to geoid (Behringer & Xue, 2004) and sea surface temperature (Ishii et al., 2005). With the exception of latitude and lon- In addition, RDA and BayeScEnv were also run using a set of 100 randomly generated variables, normally distributed with a mean of zero and standard deviation of one. Our rationale behind these runs was to identify loci that generated spurious genotype-environment association, suggesting that significant results with the true environmental data for these loci are more likely to be spurious association. Both methods were run using the same parameters as for the real environmental variables.
The results of the two GEA methods were combined using the same method as for GSO: calculating geometric means of the qvalues of RDA and BayeScEnv for each locus and environmental variable pair, in the real and random datasets. Loci identified as significantly associated with real variables (combined q-value <0.05) were then examined for association with the randomly generated variables. Those loci that also significantly associated with fewer than five random variables (i.e. 5% of the random runs) were treated as high-confidence associations. Since BayeScEnv runs included the environmental variables correlated with latitude (r 2 > 0.7), q-values of all latitude-correlated variables were combined using the geometric mean, and significant associations to latitude in the 'broad sense' are reported. High-confidence GEA loci were checked for observed heterozygosity and missing data across populations.
Sequences of high-confidence GEA loci were submitted in a BLAST search against a custom database containing the Acanthopleura granulata genome and transcriptome (Varney et al., 2021), using the blastn algorithm with default configurations (Altschul et al., 1990).

| SNP datasets
The SNP calling pipeline resulted in a NZ-wide dataset consisting of all 16 populations and 10,987 loci, with population-average 0.2% missing data ( Figure S3A, Table 1, Table S2). SNPs were called with a population map grouping the three clades because it resulted in a more complete dataset with less missing data than grouping by population only (average 11% missing data; Figure S3B). The NI dataset consists of 12,012 loci with 0.0% missing data ( Figure   S3C, Table 1, Table S2). The SI dataset consists of 7,476 loci with population-average 0.1% missing data ( Figure S3D, Table 1, Table   S2). The southern dataset consists of 9 populations and 13,004 loci, with population-average 8.7% missing data ( Figure S3E, Table 1, Table S2). One population (Christchurch; CR) was removed from the SI and southern datasets due to excess missing data; note that it had less missing data in the NZ-wide dataset and hence is included there.

| Demographic patterns
All population differentiation and genomic diversity analyses support the subdivision of O. neglectus into the three genetic clades previously identified by single-locus mtDNA and nDNA analyses (Salloum et al., 2020): North, Central and South. Sampled populations within the South clade (in SI and southern datasets) show much less genetic differentiation than that shown among populations within the other clades ( Table 2). The same pattern is recovered by all analyses of population structure (gene flow: Figure 1b; PCA: Figure 1c, Figures S17-S23; assignment plots: Figure S24), and allele frequencies NZ-wide show loci differentially fixed between clades ( Figure S17).
In addition, the SNP datasets subsampled from the NZ-wide dataset (NI, SI and southern subsets) show very similar patterns to those datasets with SNPs called separately (NI, SI and southern specific), although the latter all have a larger number of loci (Figures S18-S23,  Figure   S25). The allele frequency spectrum plots reveal a broadly different pattern for all populations in the South clade as compared to populations from other clades, as southern populations have a higher proportion of low-frequency alleles ( Figure S26). Note that the population allele frequency spectra are based on counts from the NZ-wide datasets, and the larger number of South populations means the south alleles are the majority, and thus taken as reference. The 2D allele frequency spectra between clades show no correlation pattern, with most of the highcount alleles in one clade missing or infrequent in the other, indicating no evidence of introgression ( Figure S27). The distribution of observed heterozygosity per population NZ-wide shows most loci with low heterozygosity (below 0.2, Figure S28, Table S6). Finally, relative migration estimates show much higher connectivity among populations of the South clade than among populations of other clades, with no support for migration between clades ( Figure 1b, Table S7). Relative migration rates and geographical distances between populations are negatively correlated (R = −0.41, p < 0.001, Figure S29, Tables S3 and S7).

| Genome scans for outliers
PCAdapt identified 27% of the 10,987 NZ-wide loci as 'outlier' markers, 3.8% of the 12,012 loci in the NI-specific dataset, 4.1% of the 7,476 loci in the SI-specific dataset and 9% of the 13,004 loci in the southern-specific dataset (Figure 2). In comparison, Bayescan identified 21% of the NZ-wide loci as 'outlier' markers, 0.2% of the loci in the NI dataset, 0.3% of the loci in the SI dataset and 1% in the southern dataset ( Figure 2). There was a much smaller number of 'outlier' markers found to be in common (intersection) between the two genome scan methods (Figure 2). Combining the two methods  (Table S6).

| Genotype-environment association
Many of the environmental variables tested are strongly correlated with latitude at all scales (NZ-wide, NI, SI and southern, Figure S32). The RDA analyses identified no genotype-environment associations in the NZ-wide, NI and SI datasets, but 708 associations in the southern dataset (5% of the loci). In comparison, BayeScEnv identified 3,093 associations from the New Zealand-wide dataset (28% of the loci), 15 from the NI dataset (0.13% of the loci), 10 from the SI dataset (0.13% of the loci) and 135 from the southern dataset (1% of the loci) ( Table 3, Tables S4 and S5). After combining qvalues between RDA and BayeScEnv, and removing loci associated with random variables, no high-confidence GEA loci remained at the NZ-wide, NI and SI scales, but 86 GEA loci remained at the southern scale, corresponding to 0.6% of the loci ( Table 3). Of these, 81 are associated with latitude in the broad sense, three with longitude, one with precipitation rate, and one with both longitude and sea surface height relative to geoid (Table S4). Within the 86 highconfidence GEA loci, one had observed heterozygosity above 0.5, 33 were found to be completely missing from one population, and 41 from two or three populations, and were not included in the BLAST searches.
BLAST analyses of the sequences flanking the high-confidence SNPs against A. granulata scaffolds returned two matches, one of which also matched the A. granulata transcriptome (Table S8) (Ahrens et al., 2018;de Villemereuil et al., 2014;Excoffier et al., 2009;Forester et al., 2016;Hoban et al., 2016;Liggins et al., 2020;Meirmans, 2012;Storfer et al., 2018), although other factors known to influence genome scans cannot be ruled out (see below). For three of the four methods employed, the total F I G U R E 2 Proportion of 'outlier' markers found with each method in the NZ-wide, NI-, SI-and southern-specific datasets. Combination represents the number of 'outlier' markers after combining the q-values of PCAdapt and Bayescan using the geometric mean; intersection represents the number of 'outlier' markers found in common between PCAdapt and Bayescan proportion of candidate loci detected NZ-wide is much larger than within the other scales. Considering that 2%-10% of the loci in a dataset are usually found to be significant in this type of analyses (Bierne et al., 2011), our numbers of NZ-wide candidate loci are extremely high and are likely to include many false positives.
However, our GEA analyses revealed 86 high-confidence loci (with an increased probability of being true positives) within the southern region. Thus, despite greater gene flow in this region, there are candidate loci suggesting potential for local adaptation.
Apart from population structure, there are other factors that can lead to an increase in false-positive rates of genome scans (GSO and GEA). Not all loci in the genome will follow the same evolutionary path, which results in some loci that are more divergent among populations than others (Roux et al., 2016). The higher divergence in such loci is not necessarily caused by non-neutral factors (Roux et al., 2016). For example, variation in mutation rates can produce genomic regions of low variability and hence mimic footprints of selection (Thornton & Jensen, 2007). Recent demographic events such as bottlenecks can lead to unusual genealogies for different loci so that they coalesce at different times in the past (Hermisson, 2009;Thornton & Jensen, 2007). Recent research highlights the impact of differences in recombination rate across the genome, which tends to be lower near centromeres (Booker et al., 2020;Stapley et al., 2017;Stevison & McGaugh, 2020). Genomic regions of low recombination rate exhibit higher F ST metrics, which leads to an increase in the rate of false positives (Booker et al., 2020). Introgressive hybridization or a recent history of shared ancestry between populations has also been linked to false-positive results (Excoffier et al., 2009;Pfeifer et al., 2020). In addition, linkage disequilibrium (LD) is of relevance (Price et al., 2008) but is challenging to detect without a genome assembly. For GEA methods in particular, the results also can be impacted by the specific environmental variables chosen, their resolution and the challenges to detect smaller-scale heterogeneity (Rellstab et al., 2015). All of these phenomena may produce significant 'outlier' loci in GSO or significant associations in GEA, but would not necessarily be 'adaptive' loci, highlighting the impor- do not support the existence of introgression among clades ( Figure   S27), but bottlenecks might be present ( Figure S30), further increasing the likelihood of outliers not due to selection. There is insufficient genomic information for this species to assess the impact of genome-wide variation in mutation and recombination. Ultimately, more genomic resources are necessary to completely disentangle natural selection from the confounding factors mentioned above.
O. neglectus is a valuable system for exploring the influence of gene flow and environmental variability on local adaptation due to several factors: the broad distribution of the species, spanning a wide latitudinal gradient of environmental factors (O'Neill, 1985); the species' brooding development (Creese, 1986); and the differential level of gene flow mediated by the presence or absence of kelp rafting (Salloum et al., 2020). The first challenge of studying a system with such a strong background population structure is dealing with missing data when calling variants (Graham et al., 2020). For this reason, it was important to test different parameter combinations and population maps when identifying SNPs, particularly in the NZ-wide scale. The resulting dataset for this broader scale might be more representative of conserved regions of the genome, as we aimed to reduce missing data among clades with the applied filters. By also analysing the North Island, South Island and southern populations separately, determining SNPs within each group (as opposed to simply splitting the original SNP dataset), we were able to reduce missing data, and identified a larger number of loci that are variable within clades, and not only between them (Figures S17-S23, Table S2). These within-clade datasets are presumably more representative of the full diversity of variable loci within each clade and hence were used at these scales in our genome scans.

| Genome scans for outliers
PCAdapt detected a larger number of 'outlier' markers than Bayescan in all spatial scales, although both methods detected an unusually high number of loci NZ-wide. Combining these methods

| Genotype-environment association
Genotype-environment association methods applied to the NZ-wide context either identified a large proportion of GEA loci (BayeScEnv method), or no associations at all (RDA method) (

| Controlling false positives
There has previously been much consideration of the possible ways to control false-positive rates in genome scans (Lotterhos et al., 2017). Our novel strategy proposes to first use geometric means to combine q-values across methods and then, for the GEA methods, to remove loci that are also associated with 5% or more of the random variables. This approach was useful to find candidate loci that are less likely to be false positives and have spurious associations, although we acknowledge that true-positive loci are likely to have been excluded in the process. Previous work has shown that considering the intersection of genome scan methods tends to focus the result on a true signal (de Villemereuil et al., 2014, Figure S12).
Here, the intersection of all four genome scan methods results in 58 candidate loci for the southern scale.

| Implications of findings for O. neglectus
Our analyses of population structure and relative migration support the subdivision of O. neglectus into three, strongly isolated clades.
There is higher connectivity within the South clade due to gene flow driven by kelp rafting, and minimal connectivity between clades.
Populations in the South clade have a greater proportion of lowfrequency alleles compared to populations in the North and Central clades. This may be consistent with a bottleneck in the North and Central clades, as previously indicated by a mitochondrial DNA marker (Salloum et al., 2020). Here, the allele frequency spectrabased stairway plots support a recent bottleneck, although we note that allele frequency spectra patterns can be driven by a variety of factors, including demographic changes, population structure and selection (Gattepaille et al., 2013;Keinan & Clark, 2012;Ronen et al., 2013).
In agreement with our initial hypothesis, we saw evidence for local adaptation in O. neglectus. Initially, our GSO and GEA analyses appeared to identify a very large proportion of candidate loci potentially linked to local adaptation in the NZ-wide and in the southern scale. However, this perspective was modified with more careful and conservative consideration of these candidate loci, by combining q-values and removing all but the high-confidence GEA results using a novel approach (Table 3). By focusing attention on these high-confidence GEA loci, it became apparent that these could not be strongly supported at the NZ-wide scale. However, and in contrast to our initial expectation that higher rates of migration might lead to lower evidence of local adaptation, there remained considerable evidence for potential local adaptation at the southern scale. An examination of the allele frequencies of these loci ( Figure S33) may help to explain this phenomenon, as it is apparent that most GEA loci at the broad scale derive from changes in fixation of alternate alleles, and they do not systematically vary in frequency across the environmental gradient from north to south.
At the point of fixation, it is impossible to distinguish if drift or local adaptation were responsible for fixing such loci. In contrast, at the southern scale, most associations reflect gradual changes in allele frequency, which provide a more favourable context for association with the environment. Ultimately, the strongest evidence from this study reveals that higher gene flow among southern populations (driven by kelp rafting) does not appear to have removed the opportunity for local adaptation, and has in fact made it easier to detect candidate loci potentially under selection.
Most of the environmental variables tested are correlated with latitude, or to factors in the marine environment that vary latitudinally. Interestingly, migration rate and geographical distance did not show strong correlation, not supporting isolation-by-distance and providing stronger evidence for the influence of the environment in driving some of the differentiation observed in these loci ( Figure   S29). There was a match of one O. neglectus high-confidence GEA locus to the A. granulata transcriptome, but unfortunately it was to a non-annotated region. A. granulata belongs to the same family as O. neglectus, but is not closely related, thus loci that match its transcriptome are likely to be conserved genes. As for many non-model species, O. neglectus currently has insufficient genomic resources to enable identification of the function of candidate loci. As more genomes are annotated and genomic resources become increasingly accessible, it might soon be possible to recover the exact function of candidate loci even in non-model species. Such resources are paramount for a more thorough understanding of the genetic basis of local adaptation in natural populations (Hoban et al., 2016), helping characterize species' responses to environmental variability.
Ultimately, this is required for efficient management of biodiversity in the upheaval of our changing climate.

| CON CLUS IONS
The use of genomic scans for outliers and genotype-environment association methods has been providing great insight into understanding the distribution of potentially adaptive variation in natural populations, which is crucial for appropriately managing biodiversity in a changing climate (Flanagan et al., 2018). These methods appear to generally perform well when assessing simple population scenarios, but much development is still needed to attain the same standards for less 'ideal', non-model populations (Benestan et al., 2016;Booker et al., 2020;Forester et al., 2016). In O. neglectus, a novel method was used to increase the probability of identifying a reliable set of candidate loci for selection, but also indicates that it is still a challenge to correctly identify true positives in this confounding scenario of population structure. However, our methods have provided powerful insight within the southern region, showing that the potential for local adaptation has not been eliminated by gene flow.
Furthermore, among the high-confidence GEA loci found, we identified a functional genomic region that could have an adaptive role in the evolution of these populations. More empirical assessments and comparisons of 'challenging' populations can help with outlining the variation expected under real scenarios, prompting further development to better accommodate such diversity of the natural world.
In addition to the ongoing growth in the availability of genomic resources, advances in methodological approaches will enable more comprehensive understanding of local adaptation and its underlying causes in the wild, leading to a better understanding of the complex responses of organisms to changes in their environment.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to declare.

AUTH O R S ' CO NTR I B UTI O N S
All authors helped plan the research and contributed to writing the manuscript; P.d.V. also helped with statistical analysis and designed the novel statistical approach used in this work; P.M.S. undertook sampling, laboratory work, statistical analyses and writing. All authors gave final approval for publication.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data and scripts used for the analyses are available from the figshare https://doi.org/10.17608/ k6.auckl and.c.59061 35.v1 .