Genomic divergence in sympatry indicates strong reproductive barriers and cryptic species within Eucalyptus salubris

Abstract Genetic studies are increasingly detecting cryptic taxa that likely represent a significant component of global biodiversity. However, cryptic taxa are often criticized because they are typically detected serendipitously and may not receive the follow‐up study required to verify their geographic or evolutionary limits. Here, we follow‐up a study of Eucalyptus salubris that unexpectedly detected two divergent lineages but was not sampled sufficiently to make clear interpretations. We undertook comprehensive sampling for an independent genomic analysis (3,605 SNPs) to investigate whether the two purported lineages remain discrete genetic entities or if they intergrade throughout the species’ range. We also assessed morphological and ecological traits, and sequenced chloroplast DNA. SNP results showed strong genome‐wide divergence (F ST = 0.252) between two discrete lineages: one dominated the north and one the southern regions of the species’ range. Within lineages, gene flow was high, with low differentiation (mean F ST = 0.056) spanning hundreds of kilometers. In the central region, the lineages were interspersed but maintained their genomic distinctiveness: an indirect demonstration of reproductive isolation. Populations of the southern lineage exhibited significantly lower specific leaf area and occurred on soils with lower phosphorus relative to the northern lineage. Finally, two major chloroplast haplotypes were associated with each lineage but were shared between lineages in the central distribution. Together, these results suggest that these lineages have non‐contemporary origins and that ecotypic adaptive processes strengthened their divergence more recently. We conclude that these lineages warrant taxonomic recognition as separate species and provide fascinating insight into eucalypt speciation.

divergence is necessary for gaining a more realistic appreciation of global biodiversity, informing appropriate conservation management and furthering our recognition of species boundaries beyond traditional morphological divergence (Fišer et al., 2018). However, in line with the broader difficulties surrounding species concepts and delineation approaches (e.g. de Quieroz, 2007;Galtier, 2019;Stanton et al., 2019), it remains a challenge to identify cryptic species and the processes that have led to their evolution (Fišer et al., 2018).
Speciation is typically a protracted process in which morphological, molecular, and reproductive divergence can occur to varying extents and at different rates in space and time (Queiroz, 2005(Queiroz, , 2007.
As a result of such heterogeneity, the literature abounds with numerous species concepts that cover all possible scenarios and no one concept can be widely applied to all organisms in all circumstances.
However, it is a common view that reproductive isolation is fundamental to speciation and, if demonstrable, represents the strongest evidence for delineating species boundaries (Frankham et al., 2012).
Such isolation can be demonstrated directly through reproductive studies (e.g. Campbell et al., 2003;Kay, 2006;Larcombe et al., 2015), indirectly by high genetic divergence under sympatric conditions (e.g. Michalski & Durka, 2015;Stuart et al., 2006) or demonstrating both factors under allopatric conditions (Frankham et al., 2012). In either case, these studies require well-planned sampling designs to provide clear evidence of reproductive barriers. However, in the case of cryptic species, the lack of morphological variation means that divergent lineages are typically detected unintentionally while in the pursuit of other evolutionary or taxonomic information (e.g. Barrett & Freudenstein, 2011;Derieg et al., 2013;Warner et al., 2015).
Consequently, cryptic taxa are often described based on limited evidence (e.g. a single genetic locus) or inappropriate sampling designs (e.g. small sample sizes) and this has led to criticism of the validity of many cryptic taxa. This is particularly true in concert with recent advances in genomic technology, where population differentiation can be mistaken for species boundaries in high-resolution phylogenomic analyses (Sukumaran & Knowles, 2017). In this sense, detailed population genomic studies, which are well aimed at characterizing genome-wide divergence along the population-species continuum (Allendorf et al., 2010), are valuable as follow-up studies to thoroughly investigate the spatial patterns of divergence across the distributions of putative cryptic taxa. This study provides such a follow-up to investigate cryptic lineages that were unexpectedly detected within a eucalypt species.
Eucalyptus salubris F. Muell is the most widely distributed of nine 'gimlet' species that are valued for their shiny, multi-colored fluted trunks and are endemic to the Wheatbelt and Goldfields regions of southwestern Australia (French, 2012;Johnson & Hill, 1991). Major diagnostic characters among the gimlets include the number of flowers, bud and fruit size, bud attachment and the presence or absence of glaucous branchlets (Johnson & Hill, 1991). These characters are not known to vary within E. salubris in any consistent manner. A previous study, designed to investigate genomic signals of adaptation in E. salubris along an aridity gradient, detected two highly divergent genetic lineages at opposite ends of the species' range . These lineages were differentiated by both neutral and adaptive genomic markers, as well as leaf characters and their occurrence in soils of differing phosphorus levels. While intriguing, the disjunct sampling design limited an understanding of the evolutionary significance of the strong genomic divergence detected at these range extremes. Indeed, given the extensive haplotype sharing and weak reproductive barriers that are typically seen among closely related eucalypts (Grattapaglia et al., 2012;Larcombe et al., 2015), including known hybridization among the gimlet taxa (Johnson & Hill, 1991), we hypothesized that the genomic divergence observed at range extremes within E. salubris would likely dissolve under sympatric conditions. However, until the spatial distribution of the genomic, morphological, and ecological variation in these lineages is known, the evolutionary and taxonomic implications of the study by Steane et al. (2015) remain unclear. Given that E. salubris is a key species for re-vegetation projects (French, 2012), such clarity is necessary to inform provenance sourcing for successful ecological restoration, as well as to enrich our knowledge of eucalypt diversity and speciation.
Here, we investigated the genomic, morphological, and ecological variation between the two putative lineages across the full geographic range of E. salubris. We had three aims based on questions stemming from the work of Steane et al. (2015). First, we assessed whether the two putative lineages represent divergence at range extremes with a genetic cline of admixture over the intervening space or whether the two lineages remain genetically distinct across the range of the species (even in sympatry), which would be indicative of reproductive isolation. To do this, we added 11 populations to the original nine sampled by Steane et al. (2015) and independently reassessed genome-wide variation across the species' full geographic range. Second, we assessed how the lineage-associated differences in leaf morphology and soil phosphorus found by Steane et al. (2015) vary across the species' full range. And finally, we sequenced three regions of the chloroplast genome to test for lineage-specific haplotypes, which would be indicative of ancient lineage divergence, or in the case of no lineage-associated haplotypes, that the two lineages result from more recent divergence within E. salubris. We collate these data to discuss their evolutionary implications and determine whether these lineages likely represent cryptic species or simply intraspecific variation across a wide geographic range.

| Study species
Eucalyptus salubris is an evergreen tree up to 25 m that can grow in either single-stemmed or multi-stemmed form (Figure 1; Johnson & Hill, 1991). It is distributed widely across the semi-arid to arid regions of southwestern Australia, where it forms a dominant feature of the vast, mixed eucalypt woodland that stretches across most of its range.
This remote region represents an ancient landscape that remained unglaciated through the Pleistocene such that the region's flora have persisted and evolved over this long timescale (Byrne, 2007); indeed, E. salubris has been dated back ~1 Ma (Thornhill et al. 2019). An exception to this long historical persistence is that more recent agricultural land clearing has impacted the western edge of the species' distribution, leaving small fragments of a once continuous population. As is typical of many large eucalypts, individuals of this species are long-lived and can reach 400+ years in age (Gosper et al., 2013); however, the species lacks a lignotuber and mature trees are killed by fire if the whole canopy is burnt (Nicolle, 2006). While age assessment of larger individuals is challenging, E. salubris trees with a trunk diameter (at the base) of 15 cm have been reliably estimated to be ~100 years old (Gosper et al., 2013). The reproductive biology of E. salubris has not been studied specifically but the small, white flowers are likely to be generalist insect pollinated (Potts & Gore, 1995), and the seed, as with most eucalypts, simply falls to the ground beneath the tree canopy (Booth, 2017).
The two purported genetic lineages of E. salubris were hypothesized to be morphologically cryptic. Steane et al. (2015) found no significant difference in the number of stems, tree height, leaf size, or density between lineages but did find significant differences in leaf thickness, and in turn, specific leaf area; Lineage 1 populations were found to have significantly thinner leaves and higher specific leaf area than Lineage 2 populations. In addition, populations of Lineage 1 were found to grow on soils containing significantly higher phosphorous levels than the soils supporting Lineage 2 . These features are not easily applied in the field to enable straightforward identification of each lineage.

| Sampling and DNA extraction
A total of 20 populations of E. salubris were sampled across the species' distribution in southwestern Australia. Nine populations were collected in 2012 and sampling details can be found in Steane et al. (2015). In 2016, an additional 11 populations were collected in a similar manner.
Given the lack of distinguishing morphological characters that could be used to identify the lineages in the field, the 11 additional populations were of unknown assignment and were selected to fill the geographic sampling gaps in the 2012 collection. Samples were taken from mature trees that were at least 10 cm in diameter at the base of the trunk (i.e. from individuals at least 50 years old; Gosper et al., 2013). Two populations of E. ravida were also sampled for use as an indicator of specieslevel differentiation within the gimlet complex. Eucalyptus ravida is morphologically similar to E. salubris but is distinguished by conspicuously glaucous branchlets and buds (Johnson & Hill, 1991). It shares a similar evolutionary history and general ecology with E. salubris but has a smaller geographic distribution that is limited to the more arid, protocol was used for both collections: a 2% CTAB method (Doyle & Doyle, 1987) was modified by adding 1% polyvinylpyrrolodine to the extraction buffer.

| DArTseq methods and analysis
All 176 DNA samples were sent to Diversity Arrays Technology Pty Ltd (DArT, Canberra, Australia) for DArTseq™ analysis as per F I G U R E 1 Eucalyptus salubris can grow as either (a) single-stemmed or (b) multistemmed trees. Note that variation in growth form is not a diagnostic character for the two cryptic lineages because both lineages can exhibit either growth form . Images: R Binks (a) (b) Sansaloni et al., (2010). Briefly, library preparation involved DNA digestion using two methylation-sensitive restriction enzymes, PstI and SphI, and fragments were ligated with uniquely barcoded adaptors. Following PCR and quantification, the samples were standardized and pooled for sequencing in a single HiSeq 2500 (Illumina) lane.
Read assembly, quality control, and SNP calling were undertaken by DArT using DArTsoft14 software, aided by alignment to the E. grandis genome. This pipeline uses technical replicates for a measure of genotyping reproducibility and modelled Mendelian behavior of DArTseq markers to filter sequencing errors and paralogous regions.
The DArTsoft14 pipeline produced 51,897 SNP loci with 23.75% missing data. We applied further quality control filtering using the 'dartR' package (Gruber et al., 2018) in R (R Core Development Team, 2020) to reduce the dataset to a single SNP per locus and retain only loci with reproducibility >0.95, call rate >0.90, minor allele frequency >0.02, and heterozygosity >0.02. This produced a dataset of 3,730 high-quality SNP loci.
As a final filtering step to produce a neutral dataset for population genomic analysis, we tested for and removed any loci that may be under selection. We used BAYPASS v.2.1 (Gautier, 2015) to identify loci that represented outliers from neutral expectations and may be influenced by selection. BAYPASS utilizes an improved version of the BAYENV2 algorithm (Günther & Coop, 2013) that generates a covariance matrix to account for demographic history before using a differentiation parameter (XtX) to identify loci that are much more (directional selection) or less (balancing selection) differentiated than expected under neutral expectations. The simulate.baypass function was applied in R to simulate allele count data and calibrate XtX to determine thresholds for identifying outlier loci in the empirical datasets. We extended the running parameters (-nval = 100,000, -burnin = 10,000, -npilot = 30, -pilotlength = 2,000) and across three independent runs, 125 loci were identified as outliers at the 1% threshold, representing both directional and balancing selections.
To investigate genetic structuring at the individual level, we first used principal coordinates analysis (PCoA) because this multivariate analysis does not rely on any particular evolutionary model (Jombart et al., 2009). PCoA was performed in R using the 'adegenet' package (Jombart, 2008). We then applied the Bayesian analysis implemented in FASTSTRUCTURE (Raj et al., 2014) to detect K genetic clusters, without any priors regarding population identity or geographic location. The simple prior was applied with the default convergence criterion, the upper K limit was set to 23 and the chooseK.
py function was used to infer the most likely value(s) of K. Because discrete clustering patterns can be confounded with isolation by distance (Meirmans, 2012), we also used a spatially explicit alternative to FASTSTRUCTURE, as implemented in TESS v.2.3.1 (Durand et al., 2009). TESS assumes spatial autocorrelation in the data and accounts for the geographic distribution of individual samples in the clustering model. The program was run using the CAR admixture model, performing 100 iterations for each K-value up to 23, with 50,000 sweeps, a burnin length of 10,000, and the default spatial interaction parameter (0.6). The most likely value(s) of K was determined by stabilization in the plot of the deviance information criterion (DIC) against K max .
Population differentiation (pairwise F ST ) was estimated using ARLEQUIN v.35.2.2 (Excoffier et al., 2005) and visualized using the program's R-lequin graphical functions. Analysis of molecular variance (AMOVA) was also performed using ARLEQUIN to partition the total genomic variation within and among the major genetic groupings detected in the above clustering analyses. Genetic diversity was assessed using the 'hierfstat' package (Goudet, 2005) in R to estimate allelic richness (A R ) and expected heterozygosity (H E ), while GENALEX v.6.501 (Peakall & Smouse, 2006) was used to calculate the percentage of polymorphic loci. To assess the relationship be- ) and geographic (ln(km)) distance across the whole sampled range, as well as within each of the major genetic clusters, we used mantel testing in IBDWS (available at http://ibdws.
Marginal likelihood estimates (MLE) were used to rank the alternate models, and BFs were used to determine the support for the models (Leaché et al., 2014). In this analysis, we tested three species models: (a) a two species model as per the current taxonomy, which considers E. salubris and E. ravida as distinct species; (b) a three species model in which we split E. salubris into Lineages 1 and 2, while keeping E. ravida a separate species; and (c) a two species model in which we split E. salubris into Lineages 1 and 2 but combined Lineage 2 with E. ravida. The last two scenarios were chosen based on the results of our population genomic analyses (Results). Because only two populations of E. ravida were sampled, we retained all SNP loci but randomly reduced the number of Lineage 1 and Lineage 2 samples to four to provide even representation of each potential lineage. A path sampling analysis was performed with 24 steps for each model, using one million iterations, sampling every 5,000, after a burn-in of 1,000.
The allele frequencies in the dataset were used to calculate mutation rates and default settings were applied for the priors. Convergence was assessed using Tracer v.1.7.1 (Rambaut et al., 2018) and ESS values above 200, and each model was run three times using different starting seeds to ensure consistency among runs. BFs were calculated from the MLEs following Leaché et al. (2014) to compare the support for Models 2 and 3, relative to the current taxonomy (Model 1).

| Morphological and ecological traits
Of the suite of morphological, climatic, and soil traits measured by Steane et al. (2015), only specific leaf area (SLA), leaf thickness, and soil phosphorus content exhibited significant differences between the two lineages. Given that SLA and leaf thickness are strongly correlated, in the current study we measured SLA and soil phosphorus in each of the 2016 populations to add to the 2012 data and test whether these differences were maintained in line with the genomic data across the full sampled range. We applied the same methods as those detailed by Steane et al. (2015). Briefly, for SLA, 10 leaves per tree were imaged using a flatbed scanner and the area of each leaf (cm 2 ) was determined in Matlab (Mathworks, USA). Leaves were dried at 55°C and weighed. Specific leaf area was calculated as the ratio of leaf area to dry mass (cm 2 /g), averaged across leaves to give a single value per tree for all E. salubris populations. For soil phosphorus measurements, five soil cores (0-10 cm depth) were collected across the area spanning our tree sampling within each population.
The cores for each population were pooled and analyzed for nutrient content by CSBP Analytical Laboratories (Bibra Lake, Western Australia) to obtain soil phosphorus levels per population.
To test for differences in SLA and soil phosphorus between the major genetic clusters, we used t-tests based on population means.
Following Shapiro-Wilk testing, the SLA dataset conformed to a normal distribution; however, the soil phosphorus data required log transformation to achieve normality. Levene's test was used to examine homogeneity of variance and the appropriate two-tailed t-test was performed for each dataset. If relevant, minor and major outliers in each dataset were identified by 1.5× interquartile range and 3× interquartile range, respectively.

| Chloroplast sequencing and analysis
Thirteen non-coding cpDNA regions that have been found to be useful for phylogeographic studies in Australian plants (Byrne & Hankinson, 2012) were trialed in E. salubris. We chose three regions for the full analysis based on sequence quality and nucleotide diver- sequences were aligned using ClustalW v.1.4 (Thompson et al., 1994) and concatenated in MESQUITE v.3.2 (http://mesqu itepr oject.org/).
Any indels arising from mononucleotide repeats were ignored, while more complex indels (only found when the above outgroups were added to the alignment) were coded as single binary transitions for a total sequence length of 1,676 bp excluding the two outgroups and 1,742 bp including them.
Identification of haplotypes and calculations of haplotype (H D ) and nucleotide (π) diversity were performed in DNASP v.5.10 (Librado & Rozas, 2009). We also used DNASP to calculate Tajima's D (Tajima, 1989) (Pons & Petit, 1996), where N ST > G ST indicates that haplotypes within populations are more closely related than haplotypes among populations.
To visualize the evolutionary relationships among haplotypes, we performed a maximum likelihood phylogenetic analysis using IQ-TREE (Nguyen et al., 2015), as implemented in W-IQ-TREE (Trifinopoulos et al., 2016). The F81 + F + I model was determined to be the best fit for the data based on the Bayesian Information Criterion and a majority-rule consensus tree was constructed with 1,000 ultrafast bootstrap replicates (Minh et al., 2013). We also constructed a maximum parsimony median-joining haplotype network using NETWORK v.4.6.1.2 (Bandelt et al., 1999) because networks can be more informative than bifurcating trees in cases of limited divergence (Templeton et al., 1992), as was the case here.

| DArTseq analysis
All three clustering analyses found strong structuring in the data- Interestingly, the E. ravida individuals were not identified as a third genetic cluster but instead were closely associated with Lineage 2 in all analyses, with some separation along PC2 in the PCoA.
The distinction of the two lineages was further supported by high differentiation between lineages (average F ST = 0.252), relative to minimal differentiation within lineages (Lineage 1 F ST = 0.046; Lineage 2 F ST = 0.090; Figure 3). Excluding the two E. ravida populations, the average F ST for Lineage 2 was reduced further to 0.067.
Moreover, high levels of differentiation between populations of the two lineages were maintained irrespective of geographic proximity.
AMOVA partitioned 19.67% of the total genetic variation between lineages, 6.21% among populations within lineages and 74.12% within populations. Genetic diversity was higher in Lineage 1 than in Lineage 2 across all diversity indices (Table 1). Finally, there was no correlation between genetic and geographic distance across the whole sampled range (p = .411) or within Lineage 2 (p = .450); however, there was a weak signal of isolation by distance across the range of Lineage 1 (r 2 = 0.16, p = .013).
BF delimitation decisively rejected the current taxonomy represented by Model 1 (

| Morphological and ecological diversification
Both SLA and soil phosphorus content exhibited significant differences between the two lineages across the sampled range and in a pattern consistent with Steane et al. (2015). Specific leaf area was relatively stable within lineages (Table 1) and, on average, was significantly higher in Lineage 1 populations than populations of Lineage 2 (t = 34.974, df = 18, p < .001). Phosphorus content was more variable across the sampled range due to inflated values in some populations of both lineages, likely due to nutrient run-off from nearby agricultural land (Table 1). Overall, soil phosphorus was significantly higher in Lineage 1 populations than in Lineage 2 populations based on the full dataset (t = 2.213, df = 19, p = .039), and this difference increased following the removal of major outliers (t = 3.524, df = 17, p = .003) and the removal of both major and minor outliers (t = 5.326, df = 13, p < .001).

| Chloroplast sequencing
All three chloroplast regions were variable and their combination respectively. Both Tajima's D and R 2 statistics were non-significant and thus chloroplast sequence variation is considered neutral (Table 3).
Population differentiation across the sampled range was high, and values of N ST were significantly higher than G ST , indicating phylogeographic structure (Table 3)

| D ISCUSS I ON
This study provides a comprehensive follow-up to a previous study that unexpectedly detected cryptic lineages within E. salubris . With range-wide sampling and an independent genomic analysis, we confirm the presence of two divergent lineages that remain strongly distinct even in geographic proximity. This presents robust, although indirect, evidence for reproductive isolation between these lineages, warranting their recognition as distinct species. In addition, we confirm that specific leaf area consistently delineates the two lineages and that their geographic distributions are associated with differing soil phosphorus levels.
Finally, we find some evidence that these lineages have historical vicariant origins but were not fully differentiated until more recently. Together, these data present an intriguing case of cryptic speciation in a genus that is typically known for weak reproductive barriers.

| Lineage divergence is maintained range wide
The additional sampling undertaken in this study was needed in order to determine whether the two lineages detected by Steane et al. (2015) represented (   Such a case for reproductive isolation between closely related eucalypt taxa is not typical. Eucalypts are known to readily hybridize with closely related congeners within taxonomic sections (Grattapaglia et al., 2012;Griffin et al., 1988;Larcombe et al., 2015) and, indeed, hybrids have been recorded among species within the gimlet complex (Johnson & Hill, 1991). Where reproductive barriers F I G U R E 4 Relationships among and distribution of 14 haplotypes of three chloroplast regions (rpl16, trnG-trnS, psbD-trnT) in the Eucalyptus salubris complex: (a) maximum likelihood tree using Corymbia gummifera and Eucalyptus salmonophloia as outgroups (all ultrafast bootstrap support was <95%), (b) median-joining maximum parsimony network and (c) map of haplotype distribution among populations. Lineage 1 populations are indicated with bold labels and Lineage 2 populations with non-bold labels. The grey dots indicate the known distribution of E. salubris based on records from the Western Australian Herbarium have been studied among more divergent eucalypt species, isolation tends to be a result of both pre-mating and post-mating barriers (Larcombe et al., 2015). The assignment of many E. salubris populations to each lineage, particularly where their distributions meet, now presents an opportunity to investigate intrinsic mechanisms that may be preventing gene exchange, such as asynchrony in flowering times, gamete incompatibilities or hybrid inviability (Larcombe et al., 2015;Lowry et al., 2008). Across natural stands, specimens in the Western Australian Herbarium record E. salubris as flowering between September and March (available at www.flora base.dpaw. wa.gov.au), a wide range that may allow for two, non-overlapping flowering seasons, one for each lineage. And while we found no genetic evidence of hybrids, our collections only sampled adult trees and post-zygotic barriers in eucalypts often present at the juvenile stage (Larcombe et al., 2015); so it would also be interesting to explore whether hybrids exist in seed crops within natural stands of either E. salubris lineage. Of course, comprehensive artificial hybridization experiments would be most effective in identifying the specific mechanisms of reproductive isolation between these lineages.

| Morphological and ecological diversification
Our range-wide analysis also confirmed both morphological and ecological differences between the two lineages, consistent with the patterns identified by Steane et al. (2015). Populations of Lineage 1 had higher SLA, thinner leaves and occurred in soil with higher phosphorus, relative to the lower SLA, thicker leaves and lower phosphorus soil content of Lineage 2 populations. While this variation in leaf traits between lineages cannot be explained as an adaptive response to climate , the presence of consistent morphological differentiation between lineages, albeit subtle, provides an additional line of evidence for their divergence. The lack of other morphological diversifications may simply reflect yet unidentified diagnostic characters, a lag in the accumulation of morphological differences during a phase of early speciation, or perhaps selection to maintain similar morphologies across the sampled range.
Furthermore, the association of each lineage with soil phosphorus levels may be indicative of ecotypic adaptation, with selection against hybrids driving divergence between the two lineages. While this association was adjusted for and remained significant following outlier removal, we acknowledge that soil phosphorus is artificially modified in some parts of the species range and, therefore, some non-outlier values may also not reflect natural ecological variation.
Nevertheless, it is worth noting that while populations of each lineage were found in close proximity, each population presented as a pure stand of a single lineage, with no detection of mixed stands.
This suggests that each lineage has differential, fine-scale environmental requirements that preclude complete sympatry. The SWAFR is known for its heterogeneous mosaic of soil profiles that are considered be a major contributor to habitat specialization and speciation of plant taxa within this biodiversity hotspot (Beard et al., 2000).

| Historical origins but more recent resolution
The phylogeographic structure and uneven distribution of haplotypes across the sampled range are indicative of a history of isolation and divergence within E. salubris. This is a common feature of widespread plant species in the SWAFR (e.g. Byrne & Hines, 2004;Byrne et al., 2002;Nistelberger et al., 2014;Wheeler & Byrne, 2006), where historical climate fluctuations are thought to have driven broad contraction to general refugia during periods of high aridity in the mid-Pleistocene resulting in highly divergent lineages (Byrne, 2007;Byrne et al., 2014). In our dataset, the north-south spatial segregation of the two main but divergent haplotypes (H3 + H5, respectively) and the occurrence of unique haplotypes in edge populations are consistent with a pattern of contraction to the range extremes with localized persistence. Moreover, the higher haplotype diversity in the north suggests that northern populations remained relatively large and stable, while the small hotspot of diversity in the southernmost populations around the Ravensthorpe Range may be indicative of a southern refugium to which the species contracted, followed by subsequent northward re-expansion of the H5 haplotype and secondary contact with the northern region. The Ravensthorpe Range is a center of diversity (Hopkins et al., 1983;Hopper et al., 1996) and is known to harbor divergent haplotypes in other species (Byrne et al., 2001

| Taxonomic implications
The substantial and geographically abrupt genomic differentiation observed between the two lineages of E. salubris is indicative of strong reproductive isolation. In conjunction with the clear rejection of the current taxonomic model by the BFD* analysis that showed the 'three species' model to be superior, we suggest that these two lineages warrant taxonomic recognition as separate species. This reasoning specifically aligns with the Biological Species Concept (de Queiroz, 2007) but follows the underlying theory of most species' concepts (Frankham et al., 2012). From a comparative perspective, the fact that the genomic differentiation between lineages is greater than that between Lineage 2 and E. ravida provides further evidence supporting the recognition of these lineages at the species level. Given that the type specimen of E. salubris was collected in the far north of the species' range (Johnson & Hill, 1991), it follows that Lineage 1 should remain E. salubris and Lineage 2 should be re-named. Unfortunately, these taxa remain cryptic because SLA is not readily observed or measured in the field. Our finding of significant morphological and ecological differences suggests that more diagnostic features may be exposed with further investigation. Now that a number of populations have been attributed to each lineage, it would be worthwhile revisiting these sites and thoroughly assessing each taxon for other morphological characters that may be more useful for field identification, as well as flowering time. Under both scenarios, our results strongly indicate that cryptic speciation has occurred within the gimlet complex. In the case that Lineage 2 and E. ravida are different species, as per the bestsupported BFD* model, Lineages 1 and 2 are clearly established as separate species that are morphologically cryptic from each other (other than SLA, which is not readily observable in the field).
Alternatively, the only recorded morphological difference between both lineages of E. salubris and E. ravida is glaucousness (Johnson & Hill, 1991) so in the case of the second best-supported BFD* model where Lineage 2 and E. ravida are the same species, glaucousness is no longer a diagnostic character and that leaves the morphological comparison of Lineage 1 versus Lineage2/E. ravida as still being cryptic. Given the complexity of divergence seen both within E. salubris and between E. salubris and E. ravida, any future investigation of this study system would benefit from the molecular examination of all nine taxa currently recognized within the gimlet complex. Furthermore, given that some of these taxa are known to hybridize (Johnson & Hill, 1991) and that eucalypt divergence in the nuclear genome is often not matched in the plastid genome due to haplotype sharing (e.g. Grattapaglia et al., 2012;Healey et al., 2018;Schuster et al., 2018), we suggest that such an investigation would yield greater clarity of evolutionary relationships using genomic data from the nuclear genome, rather than the chloroplast genome.

| CON CLUS IONS
In our comprehensive analysis of populations across the geographic range, we demonstrate that substantial genomic divergence is maintained between two previously detected lineages of E. salubris. In addition, these lineages are strongly associated with differences in specific leaf area and soil phosphorus content.
Finding genomic, morphological, and ecological evidence of divergence that is maintained across the extensive zone of geographic proximity is strong evidence for reproductive isolation and we suggest that these lineages warrant recognition at the species level.
Future work could focus on identifying the mechanism of reproductive isolation and a broader phylogenomic investigation of the whole gimlet complex. Finally, the existence of these cryptic taxa needs be made known to managers actively working with E. salubris in ecological restoration activities because unknowingly mixing or translocating these taxa outside of their preferred niches may limit success.

E TH I C A L A PR ROVA L
We have no ethical considerations to declare.

ACK N OWLED G EM ENTS
We thank Tara Hopley and Kym Ottewell for help with field collections and to Bronwyn Macdonald and Maggie Hankinson for laboratory assistance. We are also grateful to René Vaillancourt and Carl Gosper for their constructive comments on the manuscript.

CO N FLI C T O F I NTE R E S T
We have no conflicts of interest to declare.