Genetic divergence and ecological adaptation of an eastern North American spring ephemeral Sanguinaria canadensis

Description of the driving forces for genetic divergence is important for understanding spatial pattern of biodiversity and development of conservation plans. Paleo‐climate, geographical barriers and habitat heterogeneity are considered to be the main influential factors; however, an integrative study is still lacking to reveal their interactions.


| INTRODUC TI ON
Understanding the spatial pattern of biodiversity and its formation mechanism is an essential prerequisite for optimizing biodiversity conservation strategies (Gaston & Fuller, 2009;Xu et al., 2019).
Among the influential factors considered, paleo-climate, geographical barriers and habitat heterogeneity are the most influential drivers for the biodiversity patterns observed (Liu et al., 2018).Pliocene-Pleistocene glacial cycles, accompanied by strong geographic barriers, often caused periods of isolation followed by rapid range shifts and demographic changes, which easily gave rise to unique allopatric lineages or new species (Avise, 2000;Coyne & Orr, 2004;Endler, 1977;Jansson & Dynesius, 2002;Knowles, 2001;Weir & Schluter, 2004).Parapatric/sympatric speciation with gene flow is also common throughout the tree of life.In such situation, local adaptation to continuous environmental gradients or varied ecological habitats can overcome the homogenization of gene flow and benefit long-term persistence of populations, ultimately lead to reproductive isolation by divergent natural selection (Lowry et al., 2019;Payseur & Rieseberg, 2016;Rundle & Nosil, 2005;Savolainen et al., 2013;Schluter, 2009).Repeated interactions between adaptive divergence and gene flow can generate complex patterns of genetic variation, and the description of how these driving forces interact with features of the genome differentiation remains challenging but is important for understanding the formation of biodiversity and development of conservation plans.
With the advances in high-throughput DNA sequencing technologies, it is now possible to study thousands of loci in order to characterize genome-wide effects of accumulating genetic divergence and genomic characteristics that influence the process of local adaptation.Genome-wide single nucleotide polymorphisms (SNPs) are traditionally used in many phylogeographic studies, which are improving our understanding of patterns of genetic diversity, facilitating the assessment of complex patterns of genetic structure and the estimates of ancestral recombination with modern demographic inference tools (such as FASTSIMCOAL2; Excoffier et al., 2013).It is meaningful for our identification of the effects of paleo-climate change and biogeographic barriers on the formation of diversity.Moreover, the development of landscape genomics, which integrates SNPs and ecological factors, has provided exciting opportunities to identify population genomic variation associated with current local adaptation (Bay et al., 2018;Landguth et al., 2014;Ruegg et al., 2018).Studies in this field aim to reveal the relationships between population adaptation and environmental heterogeneity by scanning and identifying genomics signals responsible for the local divergent selection (Joost et al., 2007;Li et al., 2017;Tigano & Friesen, 2016).In recent years, landscape genomics researches have made considerable progress in distinguishing the relative role of adaptive and neutral processes in the formation of genomic variation patterns and the influence of environmental variables on adaptive differentiation (Cao et al., 2020;Geraldes et al., 2014;Liang et al., 2022;Zhang et al., 2020).
Eastern North America (ENA) is geologically and ecologically complex, which resulted in a variety of plant diversity patterns that reflect numerous evolutionary processes (Soltis et al., 2006).It has long been known that repeated glacial-interglacial cycling during the Pleistocene had a remarkable influence on biodiversity in ENA by reducing optimal habitats and isolating or compressing taxa into southern refugia, and then expanding populations northward when favourable climates returned (Burbrink et al., 2022;Lascoux et al., 2004).In addition, major biogeographic barriers here, especially the Appalachian Mountains and the Mississippi River, are typically associated with changes in habitat and the formation of genetically distinct lineages/species (Rissler & Smith, 2010;Soltis et al., 2006;Wieringa et al., 2020;Zellmer et al., 2012).Although a number of phylogeographic studies have examined the genetic structure and population dynamics of endemics in ENA (Beatty & Provan, 2010;Campbell-Staton et al., 2012;Prior et al., 2020;Wang et al., 2023;Westergaard et al., 2019), the effects of heterogeneous landscapes on genetic differentiation have been rarely investigated (but see Moreira et al., 2023;Sandercock et al., 2022;Shen et al., 2022).
Ultimately, the key to resolving the processes that shape genetic variation across spatial scales is an integrated description of paleoclimate, geographical and heterogeneous environmental factors in the evolutionary history.
To better understand the current genetic structure and the formation of genetic diversity in plants from ENA, we established a project that combined phylogeographic and landscape genomics methods to address questions about the formation of genetic variation in Sanguinaria canadensis L., a perennial herb species of the monotypic genus Sanguinaria L., Papaveraceae.The plant is commonly known as 'bloodroot' because its rhizomes exude a bloodred latex when bruised or cut; the latex contains a number of biologically active alkaloids that have been investigated as potential drug candidates (Croaker et al., 2016).Bloodroot widely occurs in temperate deciduous forests of ENA, ranging from southern Ontario to eastern Texas in the west, and from northern Florida to New England in the east (Liao et al., 2019).The feature of petals and stems varies greatly among individuals from different populations, which may indicate transitional phenotypes related to geographic and ecological gradients (Kiger, 1997).Such a distribution pattern and the close association with temperate deciduous forests make it a good model with which to explore the formation and evolution of the temperate deciduous forest biome in ENA.
Bloodroot has a mixed-mating system, being able to self-pollinate or cross-pollinate (Lyon, 1992).During the initial 1-3 day of the flowering period, bees are the main pollinators, and if pollination has not occurred after this time, the stamens bend down contacting the stigma which results in self-pollination (Lyon, 1992).The lipid-rich seeds of S. canadensis are dispersed by ants (i.e.myrmecochory), which causes a patchy clumping of the plant throughout its distribution and a close genetic relatedness of local populations (Croaker et al., 2016;Pudlo et al., 1980).Additionally, like most spring ephemerals, bloodroot emerges upon the melting of snow, and takes advantage of the high levels of sunlight in spring to grow and reproduce briefly (only a month or so) before the formation of a canopy by woody plants (Vellend et al., 2017).Spring ephemerals are sensitive to environmental challenges, especially the mismatch with pollinator phenology under climate change, which may result in restricted gene flow and thus affect their evolutionary trajectory (Koski, 2022;Kudo & Cooper, 2019;Rivest et al., 2021).
Illustrating the population demographic history of S. canadensis

| Sampling and sequencing
In total, 403 individuals were collected from 59 populations ranging from 1 to 16 individuals per population, which cover the entire distribution range of Sanguinaria canadensis (Figure 1a; Table S1).performed to amplify ISSR regions from genomic DNA with MIGseq primer set-1.The 2nd PCR was carried out to add a unique index sequence to each 1st PCR product.PCR products qualified by electrophoresis were mixed in an equal volume to make a library.Then, the library was applied to sequencing on an Illumina MiSeq Sequencer (Illumina, San Diego, CA, USA) using a MiSeq Reagent Kit v3 (150 cycle, Illumina).
In addition, genome skimming was performed on seven samples of S. canadensis from different lineages and an Eomecon chionantha Hance for plastome assembly (Table S2).The high-molecular-weight DNA was sheared using a Covaris LE220 Focused-ultrasonicator (Covaris, Woburn, MA, USA), then the library was prepared using a NEBNext DNA Library Prep Master Mix Set for Illumina (New England Biolabs, Ipswich, MA, USA), followed by a size selection of 300-350 bp using the Agencourt AMPure XP system (Beckman Coulter, Shanghai, China).Finally, sequencing was conducted on the Illumina HiSeq 2500 platform (Illumina, San Diego, CA, USA) for 10 Gb raw data with 150-bp paired-end reads at the Beijing Genomics Institute (Shenzhen, China).

| Quality control and SNP calling
Paired-end sequencing data were trimmed to remove Illumina adapter sequences, low-quality reads and extremely short reads using FASTX-Toolkit 0.0.14 (http:// hanno nlab.cshl.edu/ fastx_ toolkit) and TagDust 1.13 (Lassmann et al., 2009).The trimmed 80 bp-nucleotide sequences from both ends of 300-800 bp forward-reverse amplicons of various ISSR regions were used as input data for SNP detection with the de novo map pipeline in Stacks 2.4 (Catchen et al., 2013).There were three major parameters that must be optimized: (1) the minimum number of raw reads required to form a stack (m) and (2) the number of mismatches allowed between stacks to merge them into a putative locus (M) in ustacks program; and (3) the number of mismatches allowed during the construction of the catalogue (n) in cstacks program (Paris et al., 2017).Following the protocol of Rochette and Catchen (2017), we investigated the parameter values by varying M and n from 1 to 9 (fixing M = n) when m = 3.Also, the optimal parameters were determined by plotted the number of polymorphic loci shared across 80% of samples (the r80 loci) and the distribution of the number of SNPs per locus.
The catalogue of reads was filtered using populations to produce a Variant Call Format (VCF) file, in which loci must be present in at least one population (−p 1) and 80% of individuals (−r 0.8) in each population, as well as only one random SNP was written per locus (−-write-random-snp). Loci were also removing when minimum allele frequency greater than 0.01 (−-min-maf 0.01) and observed heterozygosity greater than 0.7 (−-max-obs-het 0.7) as suggested by Rochette and Catchen (2017).The obtained SNPs were filtered again in VCFTOOLS v0.1.14(Danecek et al., 2011) with the following criteria: (1) deleted the individuals with a proportion of missing data >50% and depth of coverage (DP) < 5; (2) removed variant sites with max missing >80%.

| Plastome analyses
The genome skimming raw data were trimmed by removing adapters and low-quality reads, and then used for plastomes assembling in GetOrganelle v1.7.0 (Jin et al., 2020).All parameters were set to default.Preliminary annotation of assembled plastomes was carried out in GENEIOUS v9 (Kearse et al., 2012) with Papaver somniferum L. as reference (GenBank accession: NC_029434).The annotations were further corrected for the start/stop codons and intron/exon boundaries manually.
We conducted a phylogenetic analysis based on the whole sequence of plastomes with E. chionantha as the outgroup.Multiple sequence alignments were performed using MAFFT v7 (Katoh & Standley, 2013) under standard parameters, then visually inspected and manually adjusted in GENEIOUS.The Maximum Likelihood (ML) phylogenetic tree was inferred in IQ-TREE v.1.6(Nguyen et al., 2014).
The best nucleotide substitution model and base frequencies were automatically tested and compared by Bayesian information criterion (BIC).The 'K3Pu + F + I' model was selected and 1000 bootstrap replicates were generated to estimate the confidence in the resulting phylogeny.The Bayesian Inference analysis was performed in MrBayes XSEDE v3.2.7 (Ronquist et al., 2012).Two independent Markov chain Monte Carlo (MCMC) simulations were run for 10 6 generations, and sampling every 1000 generations.The first 25% of calculated trees were discarded as burn-in and the remaining trees were used to construct a consensus tree and associated clade posterior probabilities (PPs).

| Analyses of genetic diversity and genetic structure
We calculated genetic diversity indices including the observed heterozygosity (H o ), expected heterozygosity under Hardy-Weinberg equilibrium (H e ) and nucleotide diversity (π) using the populations module in Stacks.Only populations containing ≥4 individuals were included in this analysis.In addition, we also calculated unbiased π in pixy v1.2.7 to avoid systematic bias generated by missing data of various types, with a window size of 100 bp (Korunes & Samuk, 2021).
The input AllSites VCF file, which contains both invariant and variant loci for population samples, was generated using bcftools v1.12 (Li, 2011).Then, we conducted univariate linear regressions and quadratic regression analyses on these genetic parameters against both latitude and longitude using 'lm' function in R.
Population structure was assessed through a Bayesian clustering method in STRUCTURE v2.3.4 (Pritchard et al., 2000).The number of cluster K was tested from 1 to 10 through 10 independent runs with 10 5 iterations following a 5 × 10 4 burn-in.The ΔK method (Evanno et al., 2005) implemented in STRUCTURE HARVESTER (Earl & Vonholdt, 2012) was used to determine the best number of genetic clusters.Then, we performed a principal component analysis (PCA) to study the relatedness and clustering of populations or samples with default settings in PLINK v1.90 (Chang et al., 2015).

| Demographic simulation
In order to understand the evolutionary history of S. canadensis, we used the coalescent simulation method in FASTSIMCOAL v2.6 (Excoffier et al., 2013) to test different hypotheses for divergence time, migration rates, as well as the effective population size.A total of 9 scenarios were designed (Figure S1).In models 1-3, we set North, South and West lineages diverged first respectively followed by the split of the rest lineages, and migration only presented in T 1 -T 2 , which indicated a scenario of ancestral gene flow.On this basis, all migrations were transferred after T 1 corresponding to the model of isolation with secondary contact, which was shown in Models 4-6.Also, Models 7-9 verified the hypothesis of divergence with continuous migrations from T2 to present.
A folded site frequency spectrum (SFS) dataset was obtained as an input file with the easySFS.pyscript (https:// github.com/ isaac overc ast/ easySFS).Because only variable sites (1412 SNPs) were included in the SFS, we fixed the effective population size for the South lineage (N south ) to enable the estimation of other parameters in FASTSIMCOAL2 following the method of Ortego et al. (2017).
N south was calculated from N e = (π/4 μ), in which π was estimated from AllSites VCF file using pixy, and μ was presented by the average mutation rate of Papaveraceae (6.98 × 10 −9 per site per generation) from Guo et al. (2018).We used an average generation time of 6 years to calibrate models.We ran each model for 100 replicated times with 10,000 coalescent simulations and 20 expectation/conditional maximization (ECM) cycles.The optimal model was chosen by comparison of the Akaike Information Criterion (AIC) and Akaike's weight of evidence (Akaike, 1974;Thome & Carstens, 2016).We selected the point estimates from the run with the highest maximum composite likelihood, and obtained 95% confidence intervals of parameter estimates by 100 parametric bootstrap replicates from simulated SFS of the point estimates.
To identify potential historical distribution changes, we also projected ENMs to different historical periods with Maxent.Historical climate layers were obtained from the WorldClim with the same resolution, including the Last Interglacial (LIG, 130 kyr BP), Last Glacial Maximum (LGM, 22 kyr BP), and mid-Holocene (6 kyr BP) under the CCSM4 model.
We further carried out climatic niche differentiation tests to evaluate the extent to which different lineages were differentiated in their climatic niches.The West lineage, which only contained 2 populations was excluded from this analysis, because at least 3 populations were required in each group.We performed the "niche equivalency test" and the "niche similarity test" in the R package ECOSPAT (Di Cola et al., 2017).Both analyses were based on Warren et al.'s (2008) I and Schoener's (1968) D with each metric ranging from 0 (no niche equivalency/similarity) to 1 (full niche equivalency/ similarity).Each of the tests was run for 100 replicates.Niche differentiation between the two lineages was considered significant when both values were significantly approaching 0. Climatic niches were visualized in two dimensions using PCA and species density along the two climatic gradients was plotted by using the "plot.niche.dyn"function in ECOSPAT.

| Isolation by distance and environment
To illustrate the contribution of geography and environment on shaping genetic structure, the isolation-by-distance (IBD) and isolation-by-environment (IBE) analyses were performed using the Mantel test.The pairwise F ST /(1 − F ST ) matrix represents genetic distance among populations.For the geographic matrix, we calculated the pairwise least cost path distance (LCD) among each population using the SDM toolbox in ArcGIS.The LCD matrix was estimated by searching for single, optimal routes across resistance surfaces, and thereby assuming that organisms have complete knowledge of landscapes before dispersal (MacDonald et al., 2022).For the environmental distance matrix, we obtained Euclidean distances of eight bioclimate variables (bio2, bio5, bio7, bio8, bio9, bio13, bio15, and bio18) from each locality using the R package cluster.Both simple and partial Mantel correlations were performed in the R package vegan with 9999 permutation tests (Oksanen et al., 2019).
Then, redundancy analyses (RDA) were performed using vegan to further evaluate the effect of eight bioclimate factors on the spatial patterns of genetic variation.The Hellinger-transformed minor allele frequency (MAF) matrix was used as the response variable of RDA.Missing allele frequency were removed using 'na.omit' function.A forward selection procedure ('ordiR2step' function) was performed with 9999 permutations to identify the most important explanatory environmental variables solely on adjusted R 2 and p-value (<.05).Finally, six selected variables were included as the explanatory variables for RDA after scaling.An ANOVA-like permutation test was used to determine the significance of the overall model, each variable and axis with 9999 permutations.

| Potential loci related to local adaptation
In order to investigate the genetic signatures of local adaptation, tests of associated outlier loci with environmental variation were carried out using a latent factor mixed model in R package LFMM (Caye et al., 2019;Frichot et al., 2013) and BayPass v.2.2 (Gautier, 2015).
For LFMM analysis, we fixed the number of clusters as K = 3, and only SNPs with false discovery rates (FDR) below 0.01 were used as outliers.For BayPass, we ran the standard covariate model using a Markov chain Monte Carlo (MCMC) algorithm.BayPass accounts for hierarchical population structure or other models of population structure before identifying loci whose allele frequencies deviate significantly from the multivariate normal distribution and proceeds in two steps.Firstly, a covariance matrix of population allele frequencies (Ω) caused by demographic history (including potential migration or ancient admixture events) was estimated under the core model.And it allows scanning for differentiation without covariate using the XtX statistics as introduced by Günther and Coop (2013).
Then, the standard covariate model introduces a regression coefficient β ik to evaluate the association of each SNP with each environmental variable (covariate).A Bayes Factor (BF) is derived by comparing the alternative model (a specific SNP is associated with an environmental variable) and the null model (not associated).SNPs under selection were chosen with BF > 20.Only loci detected by both methods were identified as environmental adaptive loci.Gene functions of candidate loci were identified by online NCBI ESTs databases (BLASTX, Johnson et al., 2008), using the locus consensus sequences obtained in cstacks.

| Population genetic diversity and genetic structure
The optimal parameters of Stacks analysis were M = 2, n = 2, m = 3 for our dataset (Figure S2).After MIG-seq data processing (Table S3), we obtained an unlinked SNP dataset of S. canadensis populations, which consisted of 1412 SNPs for 403 individuals.Genetic diversity for each population was presented in Table S1.All genetic diversity indices (H o , H e , π and unbiased π) showed significant quadratic (Figure S3a-d) and linear (Figure S3e-h) correlation with latitude, but the quadratic models were more explanatory.However, none of correlation was presented between genetic diversity indices and longitude.It is significant that the genetic diversity of midlatitude populations was higher than that of upper/lower-latitude populations.Furthermore, peaks of genetic diversity were present somewhat at a lower portion of our species' latitudinal range (Figure S2a-d).
STRUCTURE analysis of genomic admixture with two groups (K = 2) had the best likelihood (Figure S4a).The major division of population structure at K = 2 was between populations distributed in the north of the Appalachian Mountains and those in the south part (Figure 1a), with admixture clearly evident for populations in the mid-latitude region (S02, S03, S09, S11, S32, S34, S35, S55) (Figure S4b).Moreover, when K = 3, populations can further divide into two sub-lineages along the Mississippi River (Figure 1a).
Admixture of the South lineage and West lineage were found in populations S23 and S24, populations S09, S32, S34 and S35 consisting of genetic components from the North and West lineages, and S11 and S55 showed admixture among three lineages (Figure 1b).

| Phylogenetic relationships based on plastomes
A total of eight complete plastomes, representing seven S. canadensis and an E. chionantha sample, were newly de novo assembled in this study.Their length ranged from 159,849 to 161,083 bp (Table S2).All plastomes contained 132 genes in the same order, of which 112 were unique genes, including 78 protein-coding gene sequences (CDS), 30 tRNA genes and four rRNA genes.The ML and BI phylogenetic trees based on complete plastome data resulted in the same topology, and all nodes had strong support values (>70%) (Figure 1d).With E. chionantha as the outgroup, XGC2 from the South lineage was the first diverged sample in S. canadensis (ML bootstrap support value (BS)/ Bayesian posterior probability (PP) = 100/1).Then S03 plus the clade of S29 and S37 formed successive topologies (100/1).Samples of the North lineage (S09 and S46) formed a well-supported monophyletic clade (73/0.99)and were sister to XGC1 from the West lineage (95/1).

| Past population dynamics
We explored possible divergence scenarios within S. canadensis using FASTSIMCOAL2.Nucleotide diversity (unbiased π) of South lineage estimated from polymorphic and nonpolymorphic loci was 0.0026; therefore, we fixed N south as 93,123.The best-supported demographic model (model 5) supported a scenario of isolation with secondary contact.It assumed that the South lineage was the first diverged lineage, and gene flow among the three lineages only presented after the split of the North lineage and West lineage (Table S4).According to this model (Figure 2, Table S5), the South lineage split from its ancestor at c. 0.67 Ma (95% HPD: 0.61-0.68Ma).
The divergence of the North lineage and the West lineage occurred at c. 0.52 Ma (95% HPD: 0.03-0.53Ma).After this time, there were bidirectional migrations among three lineages, and the migration rate between the North and South lineages was much lower than that between the North/West or West/South lineages.The current N e of North lineage (152,886) was about 1.5 times larger than that of South lineage (93,123), and the N e of the West lineage (52,236) was only one-third that of the North lineage.

| Ecological niche variation and differentiation
ENMs with eight bioclimate variables obtain high predictive power (AUC = 0.938).The predicted result in present was generally congruent with the observed distribution of S. canadensis (Figure 3a).
Distribution projections for LIG confirmed that the optimal ecological niche for this period expanded to the area over 50° N, but the area northwest of the Appalachian Mountains and south of the Great Lakes seemed to be inhospitable (Figure 3b).A significant suitable habitat during the LGM appeared along the present-day Gulf and Atlantic coasts, as well as in the southern Appalachians, providing support for southern refugia (Figure 3c).Until the mid-Holocene, the suitable distribution area had expanded to a similar size to the present (Figure 3d).In general, paleodistribution models showed significant range shrinkage to lower latitude regions during the glacial period and northward expansion in post−/interglacial periods.
In addition, we implemented climatic niche differentiation tests for the North lineage and South lineage.For both equivalency and similarity test, the values observed for Warren's I (0.2682) and Schoener's D (0.1612) were close to zero and significantly different from the expectation based on the null hypothesis (t-test, p < .05)(Figure S5a).The densities of the North lineage and South lineage were partially separated along climatic gradients (i.e.PC1 and PC2), which also indicated that climatic niches have been partitioned to some extent (Figure S5b).

| IBD and IBE
The results of simple Mantel tests revealed the impact of environmental distance (r = .3144,p = 1.0e−4) was slightly greater than geographical distance (r = .3044,p = 1.0e−4) on genetic differentiation (figure not shown), which was also supported by partial Mantel tests (IBE: r = .3114,p = 2.0e−4; IBD: r = .1541,p = 2.2e−3) (Figure 4a,b).To evaluate the relative contribution of different bioclimate factors on genetic variation, we further performed RDA.After removing the missing allele frequency, 493 loci were included in this analysis.The forward selection of eight environmental variables identified six significant predictors (bio2, bio5, bio7, bio8, bio13 and bio18), which represented 42.59% genetic variation.The first/second RDA axis explained 76.11% and 7.83% of variation respectively and the three lineages can be clearly distinguished (Figure 5).Temperature annual range (bio7) and temperature annual range (bio8) were positively correlated with genetic variation in North lineage and negatively correlated with genetic variation in South and West lineages, while the rest environmental factors showed the opposite pattern.A permutation test of each significant variable indicated that temperature annual range (bio7) was the most significant predictor in both RDAs followed by mean diurnal range (bio2) (Table 1).

| Environmental adaptive loci
Totally 99 outliers were detected by LFMM with FDR < 0.01, while 68 outliers were discovered by BayPass with BF > 20.Among them, 23 loci were identified by both programs, and only two loci were successfully annotated by BLASTX (Table 2).Both of them were transposable elements (TEs) and were associated with temperature annual range (bio8).The locus 1256 was homologous to transposon Tf2-12 polyprotein (Tf2-12), a member of the retrotransposable   S5.

Sanguinaria canadensis
Geographical isolation followed by genetic drift and natural selection may have led to genetic divergence among populations (Hewitt, 1996).It may also facilitate the development of genetic incompatibility and the potential for reproductive isolation and speciation, thus clades identified in phylogeographic studies may correspond to incipient species (Avise et al., 1998;Carstens & Knowles, 2007).If so, the description of genetic variation patterns within a species will improve our understanding the speciation process, as well as the formation of diversity in biomes.In this study, on the basis of 1412 high-quality SNPs obtained from MIG-seq, we have detected three distinct genetic lineages within S. canadensis by STRUCTURE and PCA (Figure 1a-c).Specifically, this phylogeographical pattern of differentiation was coincident with the Appalachian Mountains and the Mississippi River, which has been identified in many animal and plant species (Austin et al., 2004;Barnard-Kubow et al., 2015;Brant & Orti, 2003;Li et al., 2013).
In these researches, such divergence pattern is explained by independent origin from multiple southern refugia close to the south Appalachian Mountains and on both sides of the Mississippi River (Barnard-Kubow et al., 2015;Prior et al., 2020;Soltis et al., 2006).
The suitable habitat projected by ENM provides evidence to support the presence of southern refugia for S. canadensis along the Gulf and Atlantic coasts, and also in the southern Appalachians (Figure 3).
It is no doubt that refugia in these regions sheltered the Southern and Western lineages.And the current southern distribution limit of Northern lineage appears at the edge of climatically suitable areas in LGM, which indicated that there might be also refugia for populations of Northern lineage.Moreover, the absence of gene flow in the early stages of lineage differentiation (Figure 2) indicates that isolation has already occurred in different refugial populations before the recolonization along separate routes in the post-glacial stage, which was consistent with the refugial speciation model (Moritz et al., 2000).
The divergence between the South lineage and the ancestor of North and West lineages was estimated to be during the Pleistocene (0.67 Ma, 95% HPD: 0.61-0.68Ma) on the basis of an optimal coalescent model from FASTSIMCOAL2, and North/West lineages divergence across the Mississippi River subsequently happened at 0.52 Ma (95% HPD: 0.03-0.53Ma) (Figure 2).Importantly, the divergence times correlated with the onset of full-scale North American glaciations, which caused many organisms in ENA to contract their distribution southward (Soltis et al., 2006;Zachos et al., 2001).
Combined with the ENM results (Figure 3), a classical glacial southward contraction and interglacial norward expansion pattern was also suitable to explain the distribution of genetic variation within S. canadensis.Interestingly, we found that genetic diversity was better revealed by a hump-shaped model than by a decreasing pattern from south to north (Figure S3).It provided genetic evidence for the central-marginal hypothesis along the latitudinal gradient, although those at the lower latitudinal limits still hold higher diversity than those at higher latitudinal limits (Guo, 2012).It is no doubt that interglacial south-to-north recolonization can shape a decline in genetic diversity (Pfeifer et al., 2009).On the other hand, the geographical disparity in current or past climatically suitable habitats and topographical barriers could also change the pattern of genetic variation (Keppel et al., 2012;Ortego et al., 2012).
Taken together, it is the Pleistocene climatic fluctuation along with geographical isolation driving habitat contraction and expansion that is likely to have initially shaped the current population structure, which set the groundwork for further divergence.

| Secondary contact caused by repeat range shifts
Despite the obvious isolation effect of these geographical barriers, admixture populations were also observed within S. canadensis (Figure 1).A common explanation is that secondary contact occurred between divergent lineages during the fluctuating Pleistocene climatic cycles (Finger et al., 2021).Biogeographical barriers were generally regarded as an antergic force to migration, although populations remain interconnected along the arrangement of mountain, woodland, and riparian ranges which provided the habitat connectivity necessary to maintain gene flow (Hernández et al., 2022;Sork et al., 2010).Remarkably, the emergence of gene flow was probably greatly influenced by its breeding system.The seed of S.
canadensis is spread by ants, making the offspring prone to patchy aggregation (Pudlo et al., 1980), however, the species can produce pollen flow among distant populations by bees within an obligate  cross-pollination period at the beginning of flowering (Lyon, 1992).
Therefore, S. canadensis is more likely to display genetic admixtures in contrast to some sympatric herbs that are pollinated by weak-flying flies and beetles or that exhibit higher levels of selfcompatibility, such as Trillium erectum (Sage et al., 2001).Both the plastome phylogenetic tree using E. chionantha as the outgroup and

| Ecological adaptive differentiation driven by heterogeneous environments
Recent studies have shown that soft ecological divides such as heterogeneous environmental factors may have a greater impact on diversification and speciation than the traditional hard allopatric geographical barriers (Finger et al., 2021;Myers et al., 2019).
Indeed, our study presented a greater effect of IBE than IBD within S. canadensis (Figure 4).Among the six bioclimate factors which had significant influence on genetic variation, temperature-related variables, especially Temperature Annual Range and Mean Diurnal Range, exhibited stronger contribution than precipitation-related variables (Figure 5 and Table 1).It is generally believed that spring ephemerals were impressible to the climate changes, because their growing season was only about a month after the snow melts (Vellend et al., 2017), and extending or contracting this brief window of opportunity by even just a week would, therefore, represented a substantial influence to their growing season (Lapointe, 2001).Thus, as an accelerant of snowmelt, temperature changes in different regions likely resulted in differences in flowering time among populations, which may act as an important hindering force for gene flow and further enhance population differentiation.
Our study confirmed that environmental divergence can not only promote genetic differentiation among populations but also reflect ecological adaptation.Two of 23 environmental adaptive loci, which were associated with maximum temperature of the warmest month, have been annotated to transposon related genes (Tf2-12 and TY3B-I; Table 2).TEs have previously been confirmed to play a key role in environmental adaptation because of their capacity to rapidly create genetic diversity required for survival in stressful situations (Casacuberta & González, 2013;Li et al., 2018;Oliver & Greene, 2009).They can act as vectors to promote the horizontal transfer of new genetic content (Frost et al., 2005) or jump into non-coding regions then regulate the expression levels of genes (Lisch, 2013;Niu et al., 2019).Whether they do or do not transfer genes, horizontal transfer of TEs is an important source of genetic variation, which can influence the ability of the organism to adapt to changing environment, and to colonize new ecological niches (Casacuberta & González, 2013;Schaack et al., 2010) compounding the effects of niche divergence, and these isolated populations may gradually occupy unique ecological niches (Rundle & Nosil, 2005).Actually, the climatic niche of the North lineage has partially diverged from that of the South lineage (Figure S5).
It is a proven fact that the environmental and climatic fluctuations triggered by mountains can always lead to the divergent selection and species adaptation associated with dramatic ecological niche changes (Ren et al., 2017).Nevertheless, due to the small population numbers of the West lineage, it is impossible for us to predict whether its niche diverged from the other two lineages.Future research with expanded sampling is necessary to verify if the ecological niche differentiation along rivers has the same effect as that of mountains.In a word, these results provide compelling evidence that S. canadensis currently experiences adaptive differentiation driven by heterogeneous environments.

| Implication for conservation
Currently, global biodiversity is under a serious threaten because of climate change, habitat fragmentation and overexploitation (Ackerly et al., 2010;Johnson et al., 2017).The importance of understanding the distribution of genetic variation and its formation mechanism across spatiotemporal scales is widely recognized for biodiversity conservation (Mamoozadeh et al., 2023;Willis & Bhagwat, 2009).
In management practices, conservation units are often defined based on the determination of population distinctiveness, estimation of genetic diversity, test of population bottlenecks, measuring landscape connectivity and consideration of adaptive differentiation (Flanagan et al., 2018;Hendricks et al., 2020;Samad-zada & Rehan, 2023).Benefitted from the use of genomic SNPs and plastome data, we have identified three genetic lineages (i.e.North, South and West lineages) within S. canadensis, which were separated by the Appalachian Mountains and the Mississippi River.Given that the three lineages are genetically quite different and exhibit adaptabilities to the local environment, conservation efforts should be focus on maintaining their respective genetic diversity, especially for the West lineage, which have the smallest effective population size comparing to the others (Figure 2).Moreover, environmental heterogeneity in most cases is a well-known factor supporting greater genetic and species diversity (Vellend & Geber, 2005).Therefore, particular attention should be paid to protect climate heterogeneous regions and their connectivity, which is expected to increase the opportunity of dispersal events and facilitate the timely occurrence of local adaptation to keep up with a rapidly changing climate (Ackerly et al., 2010).

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
under paleoclimate changes and its adaptation to heterogeneous environment is thus essential for future biological conservation and management work.In this study, on the basis of genome-wide SNPs generated by multiplexed inter-simple sequence repeats genotyping by sequencing (MIG-seq; Suyama & Matsuki, 2015) and chloroplast genome (plastome) sequences, we performed phylogeographic and landscape genomics analyses to understand the process and mechanism of genetic divergence within S. canadensis, the first study of its kind for this species.Our aims are to: (1) identify distinct lineages and their geographical distributions; (2) trace the demographic history of the species under paleoclimatic change; and (3) clarify the driving forces of local environmental selection on ecological adaptive differentiation.
Total genomic DNA was extracted from silica gel dried leaf samples using Plant DNAzol Reagent (Thermo Fisher Scientific, Waltham, MA, USA) following the manufacturer's protocol.Population sequencing was performed using the technology of MIG-seq, which can detect of SNPs in the absence of a reference genome.The DNA library was prepared through a two-round PCR experiment following the method of Suyama et al. (2022).The 1st PCR was F I G U R E 1 Geographic distribution and genetic structure of Sanguinaria canadensis.(a) Geographic distribution of 59 populations.Pie colours correspond to the STRUCTURE assignment result at K = 3; (b) Histograms of individual ancestry assignment result from STRUCTURE analyses when K = 3; (c) Plot of the first two dimensions of PCA.The oval dotted line represents 95% confidence interval; (d) Maximumlikelihood phylograms of S. canadensis and Eomecon chionantha generated from the whole length of plastome sequences with ML bootstrap support value (BS)/Bayesian posterior probability (PP) given at each node.Asterisks on the nodes mean BS/PP = 100/1.
duplicate records within 20 km to one another to reduce the effects of spatial autocorrelation, 129 records were retained.Climate data of these points were retrieved from the 19 bioclimate variables downloaded from the WorldClim at 2.5 arc-minutes.Pearson's correlation analysis in R was used to select bioclimate variables with a correlation coefficient < .8 to prevent potential overfitting.Eight variables were chosen, including bio2, Mean Diurnal Range; bio5, Max Temperature of Warmest Month; bio7, Temperature Annual Range; bio8, Max Temperature of Warmest Month; bio9, Mean Temperature of Driest Quarter; bio13, Precipitation of Wettest Month; bio15, precipitation seasonality; and bio18, Precipitation of Warmest Quarter.ENM was generated using 20 cross-validated replicate runs with a maximum number of background points of 10 4 , a maximum of 1500 iterations, and a convergence threshold of 10 −5 .The prediction capability of the model was assessed by calculating the areas under the curve (AUC) of the receiver operating the exception of several samples from S32, S34 and S35 clustered with West lineage, 95% individuals of North lineage can separate from South/West lineage along PC1.The separation of the South lineage and the West lineage was evident along PC2.Combining the results of STRUCTURE and PCA, we divided all samples into three lineages, that is North lineage, South lineage, and West lineage, which showed a pattern of isolation by the Appalachian Mountains and Mississippi River.

F
Schematic representation of the best hypothesized historical demography of Sanguinaria canadensis.T 1 refers to the divergence time of the North and West lineages.T 2 refers to the divergence time between the South lineage and the ancestor of the North and West lineages.Effective population sizes of present lineages (N south , N west , N north ) and hypothetical ancestral populations (N NW , N anc ) are marked in Polygons.Lines with arrows indicate the directions of gene flow, and the migration of individuals per generation (M NE , M EN , M NW , M WN , M WE , M EW ) were marked above/ below the lines.See estimates of each parameter with 95% CI in Table

F
I G U R E 4 Geographical (a) and environmental (b) effects on genetic variation of Sanguinaria canadensis populations revealed by partial Mantel tests.RDA biplot of six important bioclimate variables on RDA1 and RDA2 axes included in the best model of forward selection analysis.The dots represent population locations and are coloured based on lineages.
the ancestral recombination tree obtained from FASTSIMCOAL2 have inferred the same topology of lineages divergence and demonstrated a closer relationship between the North lineage and West lineage on either side of the Mississippi River than they were to the South lineage located in the Appalachians (Figures1 and 2).Besides, admixed populations between West/South lineages or North/ South lineages occur only in boundary areas of their geographical barrier, whereas S32, a population of North lineage distributed in Ann Arbor showed genetic admixture with the West lineage, which may be the result of long-distance transmission of gene flow.It seems that pollinators can carry pollen across rivers more easily than across high mountains, which may also be supported by unbalanced levels of gene flow among different lineages (Figure2).
This is the first research that integrated molecular phylogeography and landscape genomics to examine the formation mechanism of genetic variation within S. canadensis.Our findings suggested the genetic divergence within S. canadensis was primarily triggered by the combination of Pliocene climatic oscillation and geographical barriers, despite gene flow increased between pairs of lineages with secondary contacts mediated by repeat range shifts.Moreover, local adaptation due to heterogeneous environments enhanced such differentiation and promoted ecological niche divergence.In summary, our study highlighted the role of neutral and adaptive processes in shaping plant diversity in temperate deciduous forests of eastern North America.According to the genetic lineages found in this study, we recommend that S. canadensis should be managed as three separate conservation units.Still, a more detailed work of adaptive evolution based on reference genome would allow us to assess the impact of ecological processes on population genome differentiation, particularly the responding mechanisms of natural selection.ACK N O WLE D G E M ENTSWe sincerely thank Dr. Shen-Yi Wang, Dr. Zhe-Chen Qi, Dr. Chih-Chieh Yu, Dr. Shi-Chao Chen, Miss Ren-Yu Liao, Mr. Yi-Biao Zou, Mr.Xin Wang, Miss Su-Jing Xiao for their help with sample collection, and Prof. Douglas E. Soltis for valuable comments and suggestions.FU N D I N G I N FO R M ATI O N This research was supported by the National Science Foundation of China (31970225), the Science and Technology Research Program of Chongqing Municipal Education Commission (KJQN202301332) and the Foundation for High-level Talents of Chongqing University of Arts and Science (R2022YS09).
Annotation information of two environment adaptive loci.
. Because of the extensive distribution range of S. canadensis, favourable adaptations were likely to entrench in the population at a faster rate,