Reconstructing the colonization history of Indo‐Pacific bottlenose dolphins (Tursiops aduncus) in Northwestern Australia

Bottlenose dolphins (Tursiops spp.) are found in waters around Australia, with T. truncatus typically occupying deeper, more oceanic habitat, while T. aduncus occur in shallower, coastal waters. Little is known about the colonization history of T. aduncus along the Western Australian coastline; however, it has been hypothesized that extant populations are the result of an expansion along the coastline originating from a source in the north of Australia. To investigate the history of coastal T. aduncus populations in the area, we generated a genomic SNP dataset using a double‐digest restriction‐site‐associated DNA (ddRAD) sequencing approach. The resulting dataset consisted of 103,201 biallelic SNPs for 112 individuals which were sampled from eleven coastal and two offshore sites between Shark Bay and Cygnet Bay, Western Australia. Our population genomic analyses showed a pattern consistent with the proposed source in the north with significant isolation by distance along the coastline, as well as a reduction in genomic diversity measures along the coastline with Shark Bay showing the most pronounced reduction. Our demographic analysis indicated that the expansion of T. aduncus along the coastline began around the last glacial maximum and progressed southwards with the Shark Bay population being founded only 13 kya. Our results are in line with coastal colonization histories inferred for Tursiops globally, highlighting the ability of delphinids to rapidly colonize novel coastal niches as habitat is released during glacial cycle‐related global sea level and temperature changes.


| INTRODUC TI ON
Studying the mechanisms which shaped contemporary genetic variation is a fundamental aspect of evolutionary biology. Patterns observed in contemporary genetic variation are the result of selection, demographic history and stochastic processes such as drift, or a combination thereof, thereby producing characteristic signatures in the genome which encode information about the nature and magnitude of past processes (Excoffier et al., 2009(Excoffier et al., , 2013. Comparison with expected patterns derived from theory or simulation thus allows researchers to make inferences about the level of genetic structuring (Kaeuffer et al., 2007), detect selection (Nielsen et al., 2005;Waples & Gaggiotti, 2006), and reconstruct the demographic history of a species or population (Excoffier et al., 2013).
Inferring specific events in the past from contemporary genetic variation is a challenging task as genomic signatures not only erode over time as populations move towards a novel equilibrium state, but signatures from multiple events may also overlap (Cutter & Payseur, 2013;Weigand & Leese, 2018). Analyses thus benefit from increased marker density throughout the genome, under the assumption that stochastic effects affect all regions of the genome equally, while selection primarily affects selected loci and their closely linked regions (Cutter & Payseur, 2013;Kelley et al., 2006;Luikart et al., 2003;Nielsen et al., 2005). The generation of genomic datasets appropriate for the testing of complex demographic scenarios has become feasible in recent years owing to next-generation sequencing methods (Ekblom & Galindo, 2011;Goodwin et al., 2016;Luikart et al., 2003). As a frequently used approach, reducedrepresentation methods allow for the generation of genomic datasets which are several orders of magnitude larger than what was previously feasible using classical markers (reviewed in Andrews et al., 2016).
In addition to stochastic and adaptive processes, local genetic diversity is typically also affected by the interactions with other populations in the metapopulation network of the species (Pannell & Charlesworth, 2000). In such a network, founding groups originating from other subpopulations may colonize novel habitat, or recolonize an area after local extinction, with the newly founded population typically undergoing periods of low genetic diversity and drift (Excoffier et al., 2009;Mona et al., 2014;Seehausen & Wagner, 2014;Slatkin & Excoffier, 2012). This founder effect pattern appears to be universal and has been shown across a wide range of taxa, including delphinids such as the vaquita (Taylor & Rojas-Bracho, 1999) and Risso's dolphins (Gaspari et al., 2007).
Marine species generally have larger population sizes, larger ranges and a higher dispersal ability than terrestrial species (Burgess et al., 2016). In combination with the lack of obvious geographical barriers known from terrestrial systems in the marine environment, theory predicts less divergence and a more panmictic lifestyle for marine organisms (Norris, 2000). However, many marine groups are surprisingly species-rich and show more pronounced genetic structuring than expected (Bierne et al., 2003). These global patterns have largely been attributed to more cryptic barriers to gene flow such as parapatry, range shifts in conjunction with fluctuating sea levels during glaciation periods (Steeman et al., 2009), oceanic currents or bathymetry (White et al., 2010) and selection driven by environmental variables such as productivity, thermoclines and salinity (Hilbish, 1996).
Many cetaceans (dolphins, whales and porpoises) show high degrees of genetic structuring despite a markedly reduced molecular substitution rate and long generation times compared to other vertebrates (McGowen et al., 2012;Tollis et al., 2019;Vachon et al., 2018).
Furthermore, local habitat type has been shown to influence levels of gene flow among coastal populations Wiszniewski et al., 2010). Even within populations, culturally transmitted niche specializations have been shown to quickly lead to detectable fine-scale genetic structure ('sponging'; Kopps et al., 2014;Krützen et al., 2014). Such gene-culture coevolution in a complex social system has also been associated with the formation of phenotypically distinct and highly specialized ecotypes in killer whales (Foote et al., 2016). This highlights the ability of cetaceans to rapidly expand into and adapt to novel niches as they become available.
Both currently recognized species of Tursiops occur in Australian waters: the common bottlenose dolphin (T. truncatus), which typically occupies oceanic waters >50 m deep, and the Indo-Pacific bottlenose dolphin (T. aduncus), which occurs in more shallow coastal waters <50 m deep (Allen et al., 2016;Committee on Taxonomy, 2022). Oceanic T. truncatus show little local genetic structure; however, the coastal T. aduncus populations in Australia show a strong pattern of isolation by distance (IBD) along the coastline and levels of gene flow between neighbouring populations appear to depend on local habitat type (Allen et al., 2016;Möller et al., 2007;Wiszniewski et al., 2010). Both species occur in parapatry in waters around the Australian continent except in coastal waters of Tasmania and Victoria, where T. aduncus appears to be absent Jedensjö et al., 2020;Krützen et al., 2004;Möller & Beheregaray, 2001). Instead, the coastal habitat in southeastern Australia harbours Tursiops populations whose phylogenetic relationship to the currently recognized Tursiops species is currently unclear (Charlton et al., 2006;Charlton-Robb et al., 2011Jedensjö et al., 2020;Möller et al., 2008).
Much of the shallow Australian continental shelf was exposed during the last glacial maximum (LGM) and was gradually flooded once water levels started rising roughly 15 kya (Brooke et al., 2017).
The exposed area ranged from a few kilometres along the New South Wales coast in the east to several hundred kilometres in the Sahul shelf area which connected the Australian landmass to Papua New Guinea (Brooke et al., 2017;Yokoyama et al., 2001) (Louis et al., 2021;Nykanen et al., 2019), but little is known about the colonization history of Indo-Pacific bottlenose dolphins along the Australian coastline. Based on mtDNA and autosomal microsatellite data, coastal populations showed a pattern of IBD and a decline in genetic diversity from north to south. Such a result is consistent with a scenario of sequential colonization originating from a single ancestral population in the north (Wiszniewski et al., 2010). Using microsatellite data and mtDNA control region sequences from populations along the northwestern Australian coastline, Allen et al. (2016) found a similar pattern of IBD and, additionally, described strong structure and a lack of gene flow between extant coastal (T. aduncus) and offshore (T. truncatus) populations despite their relative geographical proximity.
To date, detailed genomic and demographic analyses of Western Australian Tursiops populations are lacking and studies thus far were limited to microsatellite data and mtDNA control region sequences (Allen et al., , 2016. Although population genomic measures of diversity correlate between classical and genomic SNP markers, the massive increase in power due to increased dataset size aids in reducing uncertainty in estimates (Zimmerman et al., 2020).
In addition, the more tractable mutation models that can be applied to sequence or SNP data allow for sophisticated demographic modelling and the estimation of specific demographic parameters, such as effective population sizes, split times between populations, as well as migration rates between extant populations.
Currently, development of theory lags behind our rapidly increasing ability to produce large amounts of 'omics' data (Fan et al., 2014;Marx, 2013). Especially for the inference of demographic histories, most currently available full likelihood or Bayesian approaches do not scale well both with model complexity and dataset size and are, thus, limited to simple demographic models or a strongly reduced number of individuals (Excoffier et al., 2013). Approaches building upon the site-frequency spectrum (SFS), a method of summarizing the frequencies of derived and ancestral alleles in populations, or minor allele frequencies if the ancestral state is unknown, appear promising as they decouple the analysis duration from the amount of SNP data that was used to generate the SFS (Nielsen, 2000). Implemented in the software packages 'dadi' and 'fastsimcoal2', demographic inference based on the SFS has quickly become an essential and frequently used tool in the reconstruction of demographic histories of populations (Excoffier et al., 2013;Gutenkunst et al., 2009).
In this study, we examine the demographic history of coastal T. aduncus and offshore T. truncatus populations along the northwestern Australian coastline to expand on Allen et al. (2016). Using double-digest restriction-site-associated DNA (ddRAD)-based SNP data obtained for a total of 112 individuals from two offshore and eleven coastal sites, we tested this scenario of sequential colonization for coastal T. aduncus. To this end, we combined genomic diversity measures, population structure analyses and demographic reconstruction using a diffusion approximation approach with a genetic algorithm for model optimization. In particular, we tested whether the Western Australian coastal populations of T. aduncus may be the result of a series of rapid colonization events originating from a single coastal source and investigated whether the offshore population may have played any role in the colonization of the coastal habitat at all, for example through gene flow into the newly founded populations after the LGM. A scenario of colonization of T. aduncus from a single coastal source appears likely based on the following observations: (i) A pattern of IBD for T. aduncus along the coastline and reduction in genetic diversity towards southern coastal sampling sites in the study area (Allen et al., 2016;Krützen et al., 2004). Similar patterns of IBD and diversity reductions were observed by Wiszniewski et al. (2010) for populations along the eastern Australian coastline. (ii) Strong and long-standing genetic isolation between the two species in this region as well as almost complete absence of identified T. aduncus individuals in offshore, respectively, of T. truncatus individuals in recent coastal sampling efforts (Allen et al., 2016).

| Sample collection
Sampling took place along the northwestern Australian coastline in two sites beyond the 50 m depth contour, where bottlenose dolphins interact with a trawl fishery (Manlik et al., 2022), and at eleven coastal sampling sites of <50 m depth and within 10 km of the shoreline. All sampling locations, sampling site abbreviations and full names are shown in Figure 1. Samples from the two offshore sites (FWE/FEA) were collected during commercial fishing operations between 2008 and 2011 (Allen et al., 2016), using either a biopsy pole system for bow riding dolphins (Bilgmann et al., 2007) or a remote biopsy system (Krützen et al., 2002) from a small (~4 m) tender. All coastal samples (DHA to CYG) were collected using the remote biopsy system from small research vessels (~5 m) between 2008 and 2013, except for the two Shark Bay sites WSB and ESB, where biopsy sampling has been ongoing since 2007 and 1998, respectively.
All 112 samples included in this study, along with corresponding GPS coordinates and other metadata, are listed in Table S1. Species identification was performed using mtDNA HVR-I analyses and photographic identification during and after sampling (Allen et al., , 2016. Individuals from the offshore sites FWE/FEA were characterized as T. truncatus and all samples from the eleven coastal sampling sites as T. aduncus.

| Library preparation
Genomic DNA was extracted from small tissue biopsy samples using the Gentra Puregene Tissue Kit (Qiagen) following the manufacturer's protocol for mouse tails. After extraction, DNA was quantified using a Qubit 1.0 fluorometer with the Qubit dsDNA BR Assay Kit (ThermoFisher Scientific). We set up a restriction digest for each sample using 250 ng of genomic DNA and 20 units each of MseI and high-fidelity EcoRI (New England Biolabs). We then cleaned up and eluted the digested DNA twice in 18 μL buffer using MinElute PCR cleanup kits (Qiagen). Finally, DNA was quantified and subsequently normalized for equal representation using the Qubit dsDNA HS assay kit (ThermoFisher Scientific).
Prior to pooling, we ligated EcoRI adapters, which contained a 5 bp barcode (Table S2) as well as an MseI P2 adapter which included the illumina index sequence and four additional degenerate bases for PCR duplicate identification (Tin et al., 2015) to each sample. After pooling, we performed two AMPure bead size selections removing fragments >500 bp (0.65 volume of beads/sample) and <200 bp (0.16 volume) followed by a third AMPure bead selection (1.2 volume) to remove remaining adapter dimers. After AMPure bead size selection, we eluted the DNA in 28 μL ddH 2 O.
We amplified the size-selected library in 10 separate 30 μL PCRs containing primers for the corresponding P1 and P2 adapters. Finally, to increase the probability of sequencing homologous regions, we performed an additional size-selection step for a size range of 307 and 404 bp using a Spreadex EL600 gel (AL Diagnostics) following the protocol outlined in Greminger et al. (2014)

| Quality filtering and demultiplexing
All analyses were conducted in ubuntu 20.04 using python V3.6.9 and R V3.6.2 (R Development Core Team, 2018). In a first filtering step, we removed reads in which one or more bases in the 10 bp (illumina index + degenerate bases) had a Phred-scaled Q score of 20 or lower. As sequencing quality tends to decline towards the end of illumina reads, we then used fastx_trimmer from fastx toolkit V0.0.13.2 to trim 5 bp from the end of the reads and only retained reads if 95% or more base calls within each read had a Q score of 30 or higher. We then demultiplexed individuals from each library using the fastx_barcode_splitter.pl script allowing for a maximum of one mismatch within the 5 bp barcode sequence. After demultiplexing, we removed the barcode sequences to reach a uniform length of 140 bp across all reads and retained reads only if they contained an intact EcoRI restriction site (AATT) at the beginning of the read.
Finally, we identified and removed PCR duplicates based on duplicate combinations of the entire read sequences and their associated 4 bp degenerate nucleotide sequence with a custom python3 script following the approach outlined by Tin et al. (2015).
We aligned all filtered ddRAD reads against a chromosome-level T. truncatus reference genome (Genbank no. GCA_011762595.1, Maloney et al., 2020) using the 'very-sensitive' preset in bowtie2 V2.3.4 (Langmead & Salzberg, 2012). The resulting alignment files were further filtered using samtools V1.13 (Li et al., 2009). In a first step, we discarded all reads which aligned to sex chromosomes, mtDNA and unplaced scaffolds. In a second step, we filtered the remaining alignments for a minimal mapping quality (MAPQ) of 30 and removed alignments for which bowtie2 reported more than one F I G U R E 1 Sampling localities along the northwest Australian coastline. The depth contours correspond to 5, 10, 20, 50 and >100 m water depth. The coastal sampling sites are grouped according to the results of the clustering analyses into three populations 'Shark Bay', 'Middle' and 'Cygnet Bay'. possible position. We then used the reference-based stacks V2.62 (Catchen et al., 2011(Catchen et al., , 2013 pipeline to perform variant calling. We retained loci only if they were present in at least 75% of samples over all 13 sampling localities and had an overall minor allele count of three or more, thus ensuring that an observed variant occurs in at least two (diploid) individuals across all sampled individuals (Catchen et al., 2013).
We then calculated nucleotide diversity π per sampling site and H obs per individual using vcftools V0.1.15. To test for a significant difference in π between two major coastal genetic clusters (see Section 3), we used a Wilcoxon signed rank test between Shark Bay (DHA, WSB, ESB) and all other coastal sites using the base R 'stats' package (R Development Core Team, 2018). To investigate whether heterozygosity declined significantly along the coastline, we calculated H OBS per individual using vcftools and performed a binomial GLMM with the 'lmer' and 'mumin' packages (Bartoń, 2020;Bates et al., 2015). We used sampling site as a random factor (+1/site) and tested the following two models: H OBS ~ 1 as a null model and H OBS ~ 1/(distance to CYG).
To determine the most likely number of genetic clusters, we conducted a spatially explicit analysis using 'tess3r' V1.1.0 (Caye et al., 2016). We ran all analyses using 10 repeats for each K considered. The first Tess analysis contained all sampling sites and Ks considered were 1 to 13; the second analysis contained the 11 coastal sites (excluding FWE and FEA) with Ks considered from 1 to 11. We aligned clusters across multiple K using Distruct (http://clump ak.tau. ac.il; Kopelman et al., 2015). Finally, we plotted the cross-validation score obtained from the tess3r analysis and inspected the resulting graph for breakpoints across K values to determine the optimal number of clusters.

| Demographic analysis
For all demographic analyses, we used a diffusion approximation approach with the software 'dadi' (Gutenkunst et al., 2009) in combination with a genetic algorithm for the inference of demographic history as implemented in the software gadma (Noskova et al., 2020). gadma leverages the power of a global search genetic algorithm which simultaneously optimizes both model structure and demographic parameters within a model for up to three populations. Rather than formulating explicit demographic models to be tested a priori, only the overall model structure such as the number of populations and the number of distinct 'epochs' between splits must be specified. 'Epoch' in this context refers to a section within the model for which a separate set of demographic parameters is estimated during optimization. Across multiple repeats, GADMA then attempts to determine the best-fitting model and parameter combination for the observed data using AIC. To reduce linkage between SNPs used, we only retained a single SNP per ddRAD locus and thinned the dataset by 50 kb.
We inspected the resulting parameters in modelling runs for signs of parameter runaway behaviour and removed affected model replicates from consideration. Runaway behaviour is characterized by parameters in a model either degrading to zero or tracking upper parameter bounds during parameter optimization. Parameter runaway can happen in any SFS-based analysis depending on the data used, the specific model being tested and the shape of the expected SFS (Rosen et al., 2018). We selected the best-fitting model based on AIC scores. To scale parameter values, the L parameter is needed which describes the total sequence length in bp which has effectively been used to generate the SNP data for the SFS. We calculated L by multiplying the total sequence length as reported by the 'populations' module from STACKS with the fraction of SNPs remaining in the analysis after filtering and thinning (L = total bp × SNP remaining ∕ SNP total ). Furthermore, we used a mutation rate of 1.4 × 10 −9 site/year (McGowen et al., 2012) and a generation time of 21.5 years (Taylor et al., 2007) for parameter scaling.
As a first step, we assessed the role of offshore populations as a potential source of migrants for coastal populations during the colonization of the coastline. If the offshore population played a significant role in the coastal colonization history, we expect the demographic model to identify significant levels of ongoing or historic gene flow between them.
In a second series of demographic models, combined with genomic diversity and clustering analyses, we focused on the coastal populations and their colonization history. In these three-population models, we estimated at which point in time populations may have split from each other to test whether the coastal populations may be the result of a series of founder events originating from a single coastal source in the north of Australia.

| Assessing offshore populations as a potential source of migrants
For the two-population analysis, we assigned individuals from the two offshore sites (17 individuals from FWE and FEA) to the 'Offshore' population and individuals from the coastal sampling sites (91 individuals) to the 'Coastal' population. To reduce the impact of missing data on the SFS, we used a projection of [15,15] and, following recommendations from the developers regarding grid and projection size, set a grid size of [30,40,50]. We then ran 50 repeats of the model search pipeline in GADMA with a single split model with a maximum complexity of (1,2). Estimates of divergence times between T. truncatus and T. aduncus in the literature range from roughly 750 kya to 2 Mya (Moura et al., 2013(Moura et al., , 2020. Therefore, to limit the total parameter space to be searched, we set an upper bound for the split time between the 'Coastal' and 'Offshore' populations to 2 Mya. We allowed for migration in both directions and left all other GADMA parameters at default values. variant biallelic SNP sites. Missing data per individual ranged from 0.5% to 15.6% (mean 4.6%, SD 3.5%). Since the results from our clustering analyses appeared to be robust when applying different missing data thresholds for individuals (not shown), and missing data in SFS can be addressed by projecting the data down, we did not exclude individuals based on missing data for subsequent analyses.

| Genetic diversity and population structure
The relatedness analysis identified four dyads of individuals which were likely first-degree relatives. We therefore removed one individual from PHE, CBA, CYG and DMP from subsequent analyses F I G U R E 2 Principal components analyses (PCA). (a) PCA for the complete dataset including all 13 sampling sites. Notably, a single offshore individual from FWE clustered with the coastal samples (individual marked with a square). (b) PCA for offshore samples (FWE and FEA). The FWE individual which clustered with the coastal samples was excluded from this plot (see Figure S2) (c) PCA for the coastal dataset with a separation between Shark Bay (DHA, WSB and ESB) and the other coastal sampling sites (CBA to BEA) with individuals from CYG diverging along PC2.
( Table S3) (Table 1). Most sampling site pairs were significantly differentiated after Bonferroni correction (1000 bootstraps, adjusted p = .0038). Furthermore, we detected a significant pattern of IBD with an increase of pairwise F ST with distance in water across all coastal sampling sites (Mantel test with 10,000 permutations, p = .0025, Figure 4a).
Average nucleotide diversity π as determined by STACKS was 0.171 for the offshore and 0.107 for the 11 coastal sampling sites (Table 2). There was a significant difference in average π between the Shark Bay cluster (28 individuals, sites DHA, WSB, ESB, see K = 2 in Figure 3c) and the other coastal sites, despite a larger number of individuals sampled in Shark Bay (Wilcoxon signed rank test, df = 10, W = 24, p = .012, Figure 4b). Heterozygosity H obs declined significantly with distance to the northernmost sampling site CYG (binomial GLMM, n obs = 88, R 2 c = 0.84, p < .001, Figure 4c). However, this appeared to be driven mainly by the three Shark Bay sites and the same model is no longer significant if analysed without them (binomial GLMM, n obs = 61, R 2 c = 0.85, p = .25). Private allele counts were highest in the offshore sites. Private allele counts among coastal sites were highest in CYG (77), lower for BEA to CBA (3-43) and, surprisingly, again higher for Shark Bay sites DHA (31), WSB (26) and

F I G U R E 3 Tess barplots of membership probabilities and corresponding cross-validation scores (CVS). (a)
Offshore (FWE + FEA) and coastal sites. The offshore individual which clustered with the coastal individuals in the principal components analysis (PCA) is highlighted with an asterisk. (b) K = 2 appears to be the optimal number of K based on the CVS, separating offshore and coastal samples. (c) Analysis using coastal individuals only. For higher K, distinct genetic clusters for Cygnet Bay (CYG), Port Hedland (PHE) and Broome (BRO) are visible. However, the breakpoint for the CVS scores is less extreme; therefore, the optimal K likely lies between K = 2 and K = 5 (d).
ESB (45), respectively. This pattern remained consistent after scaling the number of private alleles by the average number of samples covered per ddRAD locus per site (Table 2).

| Demographic analysis
3.3.1 | Assessing offshore populations as a potential source of migrants  Figure S2). In this model, migration rates were similarly low (0-0.001) and the inferred split time (~269 kya) as well as the beginning of the second epoch (39.6 kya) were slightly more ancient.

| Coastal colonization history
The dataset used in the coastal three population analysis included a total of 91 individuals which were assigned to three coastal populations (28 'Shark Bay', 55 'Middle', 8 'Cygnet Bay'). In this dataset, 19,964 SNPs remained after thinning which resulted in L = 2,151,498.
Of the 10 GADMA repeats, we discarded two runs for which time parameters degraded towards zero during optimization and thus were affected by parameter runaway behaviour (Table 4).
Of the eight successful GADMA repeats, the best-scoring repeat based on AIC estimated a split of the ancestral population into a con- timing for the Shark Bay split at 13.3 kya. In this model, migration rates after the second split were similar but there was no notable migration between Middle+ Shark Bay and Cygnet Bay before the second split. These models differed mainly in population growth parameters; however, in both models, the Shark Bay population was founded by a very small population.

| DISCUSS ION
In this study, we generated a large ddRAD SNP dataset for 112 Tursiops individuals sampled from two offshore and eleven coastal sampling sites along the northwestern Australian coastline. We  and southeastern Australian waters have also been shown to carry a unique mitochondrial lineage which is paraphyletic to both T. aduncus and T. truncatus and, thus, likely the result of introgression from another delphinid (Charlton et al., 2006;Charlton-Robb et al., 2011;Moura et al., 2013).
Cases of interspecific gene flow and introgression among cetaceans are not uncommon (Miralles et al., 2013). The presence of one individual in one of our offshore sampling sites, which appeared to be of either coastal origin or mixed ancestry, supports the notion that extant T. truncatus and T. aduncus populations in this area may not be completely isolated from each other. It is thus conceivable that the offshore population may have facilitated the colonization of coastal habitat by providing a source of novel or potentially adaptive variants for newly founded T. aduncus populations during colonization (Fitzpatrick et al., 2020;Whiteley et al., 2015). However, our two-population demographic model indicated a deep split and negligible gene flow between the coastal and offshore populations throughout the entire time. It must be noted, however, that the inferred split time in our model is relatively recent compared to estimates of species divergence between the two Tursiops species. Our estimate for this split is thus likely biased to be too recent, which may be a consequence of the genetic structure among all coastal samples which were treated as a single population in this analysis. Genetic structure may lead to inflated counts of low-frequency SNPs in the corresponding SFS which, in turn, may result in a split time estimate which may be too recent.
Despite a likely bias, the inferred split between the coastal and offshore Tursiops was estimated to be substantially more ancient than the split times among coastal populations. Therefore, although coastal and offshore Tursiops may not be completely genetically iso-

TA B L E 3
Log-likelihood and AIC scores for the coastal-offshore two-population models. from north to south. However, this effect was mainly driven by the strong reduction in the three Shark Bay sites (DHA, WSB and ESB) at the southern edge of the study area. Although Shark Bay was affected by a marine heat wave in 2011, which adversely impacted survival and reproductive rates (Wild et al., 2019), no other such event in recent history is known to us and tissue samples used in the present study were collected prior to this 2011 event. Surprisingly, despite overall low genetic diversity, Shark Bay harboured a high number of private SNPs and did not show elevated levels of inbreeding based on F is . In combination with the presence of introgressed variants in Shark Bay (Krützen et al., 2004), it is thus very likely that the Shark Bay population received, or continues to receive, migrants from an unsampled source.

GADMA run log-LL AIC ΔAIC
On top of the same three major coastal genetic clusters as were identified by Allen et al. (2016) (Phelps et al., 2018).
It was previously found that T. aduncus in less protected coastal habitat may disperse more readily than those in sheltered embayments as the latter promotes philopatry Wiszniewski et al., 2010). The open habitat of the Eighty Mile Beach area may thus promote movement between these sites, while gene flow with more sheltered neighbouring sites may be restricted (Wiszniewski et al., 2010). However, the degree of bisexual philopatry in this and neighbouring areas is unknown. Future studies therefore should attempt to characterize levels of philopatry, for example through mark-recapture photo identification of individuals, as well as assess gene flow among these sites and with neighbouring sites in more sheltered habitats.
In a similar fashion to the clustering analyses, the PCA clearly to the formation of seasonal eddies as well as upwellings driven by the northward flowing Ningaloo Current (Phelps et al., 2018). It is likely that there are substantial differences between the selective regimes of both meso-scale bioregions, potentially promoting distinct local adaptation on either side of the transition, while the complex and sheltered local reef habitat may promote philopatry in the Coral Bay area, thereby potentially restricting dispersal and gene flow between the genetic clusters .
However, as with Port Hedland and Broome, data about levels of bisexual philopatry in Coral Bay are lacking.
To investigate the potential scenario of colonization along the coastline from a single source in the north, we performed a demographic analysis using the three major coastal clusters Shark Bay, Despite having significantly reduced genetic diversity and having been established roughly 3000 years after the neighbouring coastal clusters, the population inside Shark Bay has grown to one of the highest densities worldwide (Nicholson et al., 2012;Preen et al., 1997). It is currently unknown whether the private variation found in Shark Bay is indicative of a high degree of local adaptation or could be linked to introgression from an unknown source (Brown et al., 2014;Gridley et al., 2018;Miralles et al., 2013). Indeed, there are indications of incomplete lineage sorting or past introgression, as Krützen et al. (2004) found two mtDNA haplotypes at high frequencies in the eastern gulf of Shark Bay that differed from all others observed in the area by at least 15 mutational steps. This was hypothesized to either be the result of introgression, for example from offshore T. truncatus, or due to independent colonization of this area by at least two evolutionary distinct lineages.
The local habitat in Shark Bay is complex and features vast, shallow seagrass meadows as well as deeper channels with rocky reefs and sponge gardens. Shark Bay waters are further characterized by a salinity gradient throughout the embayment which is maintained through high evaporation, limited rainfall and limited tidal movement (Bell et al., 2019). This habitat thus offers high potential for divergent local adaptation. For example, foraging specializations, such as the vertically transmitted 'sponging' behaviour (Krützen et al., 2005, have been linked to significant, fine-scale genetic structuring within the population . This illustrates the ability of cetaceans for rapid local adaptation and ecotype formation, es- The northwestern Australian coastline spans a natural gradient in salinity, productivity, habitat complexity and sea surface temperature of the subtropical and tropical waters. Although we did not explicitly test for this, we found that the major genetic clusters along the coastline appear to correlate with different meso-scale bioregions. The northwestern Australian coastline thus provides an ideal 'natural laboratory' setting for the functional study of genomic variation in these populations. Future studies should endeavour to explicitly link genomic variation along the coastline with environmental variables by applying a seascape genomics approach, which will greatly improve our understanding of the degree of local adaptation and its functional significance in Australian T. aduncus.

ACK N O WLE D G E M ENTS
The authors thank all members of the Evolutionary Genetics Group at the University of Zurich, and the Genetic Diversity Centre (GDC) at ETH Zurich for their support during data generation and analysis,

CO N FLI C T O F I NTER E S T S TATEM ENT
All authors declare that they have no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Metadata such as read counts, GPS location and sex associated with all samples used in this study are available in Table S1. Raw shortread data in fastq format for all samples are available on the NCBI short-read archive (BioProject PRJNA966102).