Genomic insights into historical population dynamics, local adaptation, and climate change vulnerability of the East Asian Tertiary relict Euptelea (Eupteleaceae)

Abstract The warm‐temperate and subtropical climate zones of East Asia are a hotspot of plant species richness and endemism, including a noticeable number of species‐poor Tertiary relict tree genera. However, little is understood about when East Asian Tertiary relict plants diversified, how they responded demographically to past environmental change, and to what extent their current genomic composition (and adaptive capacity) might mitigate the effects of global warming. Here, we obtained genomic (RAD‐SNP) data for 171 samples from two extant species of Euptelea in China (24 E. pleiosperma populations) and Japan (11 E. polyandra populations) to elucidate their divergence and demographic histories, genome‐wide associations with current environmental variables, and genomic vulnerability to future climate change. Our results indicate that Late Miocene changes in climate and/or sea level promoted species divergence, whereas Late Pliocene uplifting in southwest China likely fostered lineage divergence within E. pleiosperma. Its subsequent range expansion into central/east (CE) China bears genomic signatures of climate‐driven selection, yet extant CE populations are predicted to be most vulnerable to future climate change. For E. polyandra, geography was the only significant predictor of genomic variation. Our findings indicate a profound impact of Late Neogene geological and climate change on the evolutionary history of Euptelea, with much stronger signals of local adaptation left in China than in Japan. This study deepens our understanding of the complex evolutionary forces that influence the distribution of genetic variation of Tertiary relict trees, and provides insights into their susceptibility to global change and potential for adaptive responses. Our results lay the groundwork for future conservation and restoration programs for Euptelea.


| INTRODUC TI ON
The warm-temperate and subtropical climate zones of China and south/central Japan are a hotspot of plant biodiversity in East Asia (Qian & Ricklefs, 2000;Qiu, Fu, & Comes, 2011;Wang, 1992). In both regions, habitats associated with mountain riparian forests (MRFs) in particular feature high levels of species richness and endemism, including a noticeable number of species-poor Tertiary relict tree genera (e.g., Cercidiphyllum, Euptelea, Eurycorymbus; Tang & Ohsawa, 2002;Wei et al., 2009). The MRF is also among the most threatened of all forest types in East Asia because of its high susceptibility to natural or human-mediated erosion (Wei et al., 2009).
The evolutionary and population demographic history of East Asia's Tertiary relict flora (Cercidiphyllum: Qi et al., 2012), or components of its affiliated MRF community (e.g., Meng, Wang, & Wang, 2016;Sun et al., 2011;Xing & Ree, 2017), has previously been associated with environmental changes since the Late Miocene (including changes in climate, topography, drainage systems, and sea level) using traditional molecular markers (e.g., DNA Sanger sequencing and microsatellites). For instance, Cercidiphyllum japonicum  has been cited as "more dynamic in history than previously thought" (cf. Mao , and the same is true for other Tertiary relicts showing a similarly wide geographic distribution (e.g., Euptelea: Cao, Comes, Sakaguchi, Chen, & Qiu, 2016;see below). However, given their limited number of variable loci, previous studies using traditional markers might be inadequate for characterizing the current genomic composition of species with complex demographic histories Hancock et al., 2011). In consequence, further studies are required to disentangle the relative roles of historic-environmental (geographical, climatic) and contemporary factors (e.g., drift, gene flow) in shaping the genomic architecture of East Asia's Tertiary relict tree species.
Such Tertiary relicts apparently persisted over long periods of geological time (in line with the concept of "living fossils" ;Lidgard & Love, 2018). Moreover, in view of their constantly changing and often isolated MRF habitats (Wei et al., 2009), populations of these relicts are predicted to show signs of genetic impoverishment due to drift and/or limited gene exchange, thereby increasing their vulnerability to ongoing climate change Yannic et al., 2014). This raises the question of how population genetic diversity relates to their alleged potential to endure environmental changes through adaptation (see also Parmesan, 2006;Parmesan & Yohe, 2003;Wiens, 2016;Yannic et al., 2014). In this era of rapid anthropogenic climate change, exploring such relationships is crucial to improve predictions of species' climate vulnerability Hoffmann & Sgro, 2011) and inform future conservation and restoration programs (Landguth et al., 2014;Ruegg et al., 2018).
In recent years, high-throughput sequencing technologies, such as restriction site-associated DNA sequencing (RADseq), have made it possible to rapidly collect genomic data and abundant single nucleotide polymorphisms (SNPs) in nonmodel organisms with increasing reliability and without prior information of a reference genome (Savolainen, Lascoux, & Merilä, 2013). When combined with approximate Bayesian computation (ABC), the RADseq approach in particular has proven useful in facilitating assessments of complex genetic structures and key demographic parameters, such as times since population isolation, postdivergence admixture rates, or changes in effective population size through time (e.g., Parchman, Jahner, Uckele, Galland, & Eckert, 2018).
Moreover, the integration of RADseq and environmental (geographical, climatic) data provides exciting opportunities to identify population genomic diversity associated with current local adaptation or even vulnerability to future climate change Fitzpatrick & Keller, 2015;Landguth et al., 2014;Ruegg et al., 2018).
f. et Thoms. and E. polyandra Sieb. et Zucc. (Cao et al., 2016). The distribution of E. pleiosperma extends from the southeastern margins of the Qinghai-Tibetan Plateau (QTP)/Hengduan Mts. Region (HMR) to central/east China, with populations occurring in isolated stands of MRF across a wide range of altitudes (c. 700-3,600 m above sea level). By contrast, E. polyandra is restricted to south/central Japan where it occurs in similar habitats of lower altitude (c. 100-1,600 m above sea level) (Sakai et al., 1995). Surprisingly, both species are still classified as "Least Concern" by the IUCN (International Union for Conservation of Nature; https://www.iucnr edlist.org), although it has long been recognized that they are at risk of loss of their MRF habitats (Sakai et al., 1995;Wei et al., 2009).
Overall, this study aims to further clarify (a) when and how the two extant species of Euptelea diverged; (b) how they responded demographically to past environmental change; (c) to what extent historical, geographical, and/or climatic factors contribute to their current genomic variation; and (d) which populations of E. pleiosperma might be most vulnerable to future climate change. Hence, for comparison with our earlier study using plastid/nuclear DNA sequences and nSSR loci (Cao et al., 2016), our first objective was to use ABC simulations to determine the best model of population divergence and demographics that fits the patterns of RAD-SNP diversity observed in E. pleiosperma/E. polyandra. Our second objective was to use a generalized dissimilarity model (GDM) framework (Manion et al., 2018) to explore the relative importance of environmental (geographical, climatic) factors underlying the neutral genomic variation of each species. In addition, by means of "F ST outlier" tests, we scanned their genomes for signatures of climate-driven local adaptation. Finally, we adopted a gradient forest (GF) approach (Ellis, Smith, & Pitcher, 2012) to predict the genomic composition of Euptelea populations under current and future (2050) climate scenarios (Fitzpatrick & Keller, 2015). Together, our results provide robust inferences about the historical population dynamics and adaptive capacity of an emblematic East Asian Tertiary relict genus, along with predictions of its "genomic vulnerability" to future climate change (sensu Fitzpatrick & Keller, 2015).
Any RAD reads containing sequencing errors in the sample-specific barcodes and restriction cut sites were removed. Nucleotide bases with a Phred quality score (Q) below 33 were replaced with an ambiguous base ("N"), and reads with more than 10% "N"s were discarded. Filtered reads of each individual were first assembled de novo into putative loci. For within-sample clustering, sequences were clustered at 90% similarity. In order to ensure accurate base calls, only clusters that had a minimum depth of coverage ≥ 6 were retained. After clustering, error rate (E) and heterozygosity (H) were jointly estimated from the base counts in each site across all aligned clusters for each sampled individual (Lynch, 2008), and the average parameter values were used when calling consensus bases. Bases that could not be assigned with ≥ 95% probability in the consensus sequences were replaced with appropriate ambiguity code (N) in the consensus sequence. In addition, loci containing more than two alleles after error correction were excluded as potential paralogs since both Euptelea species are diploid. Consensus sequences were then clustered across samples at 90% similarity and aligned with muscle v3.8.31 (Edgar, 2010). A final filtering step excluded loci that contain any site appearing heterozygous across more than 25% of samples (Table S2), as this is more likely to represent a fixed difference among clustered paralogs than a true polymorphism (Hohenlohe, Amish, Catchen, Allendorf, & Luikart, 2011). The remaining clusters representing multiple alignments of putative orthologs were treated as RADseq loci and assembled into population genomic data matrices. We kept only one SNP per RADseq locus to create a dataset without closely linked loci.
To explore the effect of missing data (locus dropout or low coverage) and ensure enough SNPs for analyses, we assembled three data matrices with different minimums for sample coverage (the number of samples for which data must be recovered to include a RAD locus in the dataset): (a) the data matrix that includes all loci shared across at least 85 samples ("maximum" dataset); (b) the median data matrix that contains all loci shared across at least 120 samples ("median" dataset), and (c) the data matrix that includes all loci shared across at least 160 samples ("minimum" dataset). Following a previous study's suggestion (Paris, Stevens, & Catchen, 2017), a locus was kept only if it occurred in at least 60% of samples within each population to ensure a wide representation of each SNP across all sampling sites. We designated a 1% minor allele frequency (MAF) cutoff to the three datasets. To evaluate whether our inference of population structure is robust to missing data and rare alleles, we performed population structure analyses using datasets with and without filtering loci with MAFs < 0.01.

| Population genetic structure and diversity
Bayesian clustering of individuals was conducted for the three Euptelea datasets ("maximum," "median," and "minimum") with and without filtering loci with MAFs < 0.01 using faststructure v1.0 (Raj, Stephens, & Pritchard, 2014). The number of clusters (K) was set to vary depending on the dataset. The most probable values of K for explaining population structure were determined by estimating the minimum value of K that accounts for almost all of the ancestry in the three datasets and maximizes the log marginal likelihood lower bound. In addition, for each of the three datasets, population structure was also investigated by discriminant analysis of principal components (DAPC) for Euptelea and each species, respectively, using the r package adegenet (Jombart, 2008). The optimal number of clusters was chosen on the basis of the lowest associated Bayesian information criterion (BIC). Finally, we subjected each of the three SNP matrices to a maximum likelihood (ML) tree inference analysis in raxml-hpc v7.2.8 (Stamatakis & Ott, 2008) under the general time-reversible (GTR) substitution model and with 11 individuals randomly selected from each E. polyandra population as outgroups and missing data coded as "N"s.
Based on the putatively neutral RAD-SNP loci of the "minimum" dataset, that is, 8,733 SNPs (see Results), mean nucleotide diversity (π) and average expected and observed heterozygosities (H exp /H obs ) were calculated for each population with n ≥ 5 using arlequin v3.5 (Excoffier & Lischer, 2010). For E. pleiosperma, we also calculated π, H exp , and H obs from only the 49 outlier loci identified (see below and Results). Five Euptelea populations (BM, TM, DG, DQ, and SA) with small sample size (n < 5) were removed from all population-level analyses of genetic diversity (marked with asterisks in Table S1). For E. pleiosperma and E. polyandra, measures of genetic diversity (π and H exp /H obs ) were regressed against latitude and longitude, respectively, using the ggplot2, iswr, and scales packages implemented in r v3.3 (R Development Core Team, 2015) to test the hypothesis that suites of environmental conditions could promote or constrain different levels of genetic diversity. Analysis of molecular variance (AMOVA) in arlequin was used to quantify the genomic variance among species and populations, with significance of Φ-statistics tested using 10,000 permutations (Excoffier & Lischer, 2010).

| ABC modeling of divergence and demographic histories
We used the coalescent-based approximate Bayesian computation (ABC) implemented in diy-abc v2.0 (Cornuet et al., 2014) to infer the divergence and demographic histories of Euptelea. To avoid the impacts of missing data and removal of rare alleles on the inference of population history, we performed our ABC analysis on a high-quality data matrix without missing data and without filtering loci with MAFs < 0.01 (the "full" dataset, i.e., 1,383 SNPs) across all 171 samples, as the two extant Euptelea species apparently underwent climate-induced expansions (Cao et al., 2016;Wei, Sork, Meng, & Jiang., 2016) and were thus expected to have an excess of rare alleles (Excoffier, Dupanloup, Huerta-Sánchez, Sousa, & Foll., 2013). We tested five plausible divergence scenarios on the basis of the genetic structure identified by faststructure and DAPC (see Results): the simultaneous divergence of three regional groups (i.e., southwest China: SW; central/east China: CE; and Japan: JP) from a common ancestor ( Figure S1a, Scenario 1) against three alternative models, reflecting all possible relationships among these groups (see Scenarios 2-4 in Figure S1a), and an admixture model, in which the SW and JP groups diverged from an ancestral population at time t 1 , followed by an admixture event between them at time t 2 , which then gave rise to the CE group with admixture rate ra (Scenario 5 in Figure S1a). We selected all summary statistics (Table S3) of genetic variation to generate reference tables with 10 6 simulated datasets for the five models. The parameters describing each model (i.e., divergence times, admixture rate, and effective population sizes; Table 3) were treated as random variables drawn from uniform prior distributions with a restriction on temporal parameters (t 1 > t 2 ).
First, we performed leave-one-out cross-validation using neural network method for model selection via the "cv4postpr" function in the "abc" R package to evaluate whether model selection with ABC is able to distinguish between the five proposed models by making use of the existing simulations from diy-abc (Csilléry, François, & Blum., 2012). Next, we calculated the posterior probabilities of each demographic scenario using the multinomial logistic regression and neural network methods implemented with the function "postpr" across a range of tolerances (0.001, 0.005, 0.001, 0.05) (Csilléry et al., 2012). The goodness of fit of the scenario with the highest posterior probability was assessed using the "model checking" option with principal component analysis (PCA) in diy-abc, which evaluated the discrepancy between the model and the observed data (Tsuda, Nakao, Ide, & Tsumura., 2015).
We then used the function "cv4abc" to evaluate the accuracy of ABC parameter estimates and the robustness of the estimates to tolerance rates. The accuracy of parameter estimates was evaluated under tolerance rates of 0.001, 0.005, and 0.01 using the rejection, the local linear regression, and neural network methods. Finally, because of a lower prediction error rate (see results), a local linear regression was used to estimate the posterior distributions of parameters for the best-fitting scenario on 1% of the simulated datasets closest to the observed dataset and applying a logit transformation to parameter values (Beaumont, Zhang, & Balding, 2002). To convert estimated divergence times into millions of years ago, we assumed a conservative generation time of 10 years for Euptelea (Sakai et al., 1995).
Additionally, we used diy-abc to investigate past changes in population size in each regional lineage of E. pleiosperma (SW and CE) and E. polyandra. We tested three simple models of population size changes: (a) population growth following a constant population size ("expansion"); (b) expansion followed by shrinkage ("shrinkage"); and (c) expansion followed by shrinkage and a new expansion event ("expansion-shrinkage-expansion") ( Figure S1b; Wang et al., 2016).
We used the same strategies as detailed above to choose the demographic scenarios that best fit the data and estimated the parameters of interest.

| Generalized dissimilarity model (GDM) of genomic, geographic, and climatic data
To evaluate the effects of geographic distance and environmental dissimilarity on genetic differentiation, we fit generalized dissimilarity models (GDMs; Manion et al., 2018) to our "minimum" dataset (8,782 RAD-SNPs) (see Results) for the 22 E. pleiosperma populations (n ≥ 5) and the eight E. polyandra populations (n ≥ 5), respectively.
GDM is a nonlinear extension of matrix regression that models spatial patterns of pairwise genetic dissimilarity between sampling sites caused by pairwise site differences in environmental and geographic variables (Fitzpatrick & Keller, 2015). For each species, we constructed three site-by-environment predictor matrices from values of 19 bioclimatic variables (Table S4) extracted at each locality from GIS data layers at 30 arc-sec resolution ) that we downloaded from WorldClim (http://www.world clim.org). To retain only the predictors that significantly contributed to the model in each GDM analysis, we employed a backward elimination procedure (Ferrier, Manion, Elith, & Richardson, 2007). Starting with the full model, this process iteratively removes the variable with the lowest coefficient, recalculates the model fit, and uses a variable permutation procedure to assess significance. Under the permutation procedure, the significance of the model is tested by permuting all predictor variables, refitting the model under each permutation to generate a null distribution of deviance explained values, and then comparing the data-driven model to the null distribution. The significance of each predictor variable is tested by permuting each variable individually to generate a null distribution of the change in deviance explained for the model and comparing each variable's contribution to the model against the null distribution. The final outcome is a fitted model that retains only the statistically significant predictor variables (Manion et al., 2018).
Based on the 8,733 neutral RAD-SNPs of our "minimum" dataset, we generated two response matrices of AMOVA-derived Φ ST values between pairs of populations for the 22 E. pleiosperma populations (n ≥ 5) and the eight E. polyandra populations (n ≥ 5) (Table S1), respectively, using arlequin. Then, we fit GDMs to the response and predictor matrices and used the resulting models to explore the spatial and climatic drivers of differences in genetic turnover. To estimate the relative genetic importance of each predictor, we adjusted the maximum values of the fitted I-splines to a range from −1.5 to 1.5. We used the r package gdm (Manion et al., 2018) to fit models and assessed model performances by computing percent deviance explained.

| Detecting signatures of climate-driven local adaptation
We scanned the "minimum" dataset for outlier loci in E. pleiosperma and E. polyandra (populations with n ≥ 5; Table S1), respectively, using the Bayesian approach implemented in bayescan v2.1 (Foll & Gaggiotti, 2008) and the nonhierarchical model implemented in arlequin (Excoffier & Lischer, 2010). As a fully Bayesian approach, bayescan directly estimates the posterior probability that each locus is under selection by decomposing locus-population F ST coefficients into a locus-specific component (alpha) shared by all populations and a population-specific component (beta) shared by all loci. We ran the program bayescan with the following settings: 5,000 iterations; 20 thinning intervals; 20 pilot runs of length 5,000; 50,000 additional burn-in; uniform distribution between 0 and 1; and a prior odd of 10 for neutral model. Positive values of alpha indicate diversifying selection, whereas negative values indicate balancing or purifying selection (Foll & Gaggiotti, 2008). Q-values of the loci were also automatically calculated by the program, and those results (alpha > 0) were filtered to retain loci with q-values below 0.001. For the hierarchical model in arlequin, 20,000 simulations were conducted with 100 demes per population, with the false discovery rate (FDR) set at 0.01. Loci bearing signatures of diversifying selection identified by both methods were segregated into an outlier matrix, and the remaining loci without outliers constituted the neutral dataset.
For the 49 outlier loci detected in E. pleiosperma (see Results), we calculated population allele frequencies in arlequin and used multiple linear regressions (MLRs; Zulliger, Schnyder, & Gugerli, 2013) to test for their association with the six variables most important in explaining the observed genetic variation in the GDM (Table S4).
Bioclimatic values per site were extracted as in the GDM (see above).
All regressions were performed using the r package vegan v2.5.1 (Oksanen et al., 2018). Loci showing model fit (R 2 adj ) values > 0.5 and significant correlation with at least one variable were considered "adaptive" loci (Manel, Poncet, Legendre, Gugerli, & Holderegger, 2010). Lastly, we used the r package gradient forest (Ellis et al., 2012) to investigate patterns of allelic turnover at each of the 49 outlier loci with regard to the six variables.

| Gradient forest prediction of genomic vulnerability to future climate change
We further used gradient forest to predict Euptelea's "genomic vulnerability" using the method proposed by Fitzpatrick and Keller (2015). Here, "genomic vulnerability" is a measure of the mismatch between genotypes and future predicted environment using associations across contemporary climate gradients as a baseline. The current  and future (based on 2050 RCP2.6 projections, Van Vuuren et al., 2012) bioclimatic variables (Table S4) were downloaded from WorldClim. For the implementation of the gradient forest model, we first calculated population allele frequencies from the all 8,782 SNPs loci of our "minimum" dataset and extracted current bioclimatic variables for each population (parameter settings: 500 regression trees per SNP; maxLevel = log2(0.368n)/2; variable correlation threshold: 0.5). The fitted model was then used to predict genomes under current and future climate scenarios across the entire range of the genus by projecting the model onto the future climate layers. For each grid cell, "genomic vulnerability" was calculated as the Euclidian distance between current and predicted genomic compositions (Fitzpatrick & Keller, 2015). Lastly, we mapped this Euclidian distance metric at the genus' range-wide scale (using ecological niche distribution models of Cao et al., 2016) to visualize regions (and populations) predicted to experience greater impacts under future (2050) compared to current climate conditions.

| RADseq data and processing
A total of c. 1,380 million reads passed quality checking. After quality filtering, the number of reads per sample averaged 8.08 × 10 6 (minimum: 1.63 × 10 6 ; maximum: 19.35 × 10 6 ) with an average read depth of 28.78 (range: 11.17-60.68) (Table S5). For each individual, the assembled RAD clusters (or "stacks") with a sequence similarity threshold of 90% ranged from 0.86 × 10 5 to 3.26 × 10 5 . The number of consensus sequences called for each cluster averaged 1.93 × 10 5 (range: 0.72-2.75 × 10 5 ). Clustering of consensus sequences across all 171 samples by ipyrad yielded 29,494 informative sites (unlinked SNPs) for the "minimum" dataset, 89,158 for the "median" dataset, and 107,839 for the "maximum" dataset. After filtering loci with excess missing data within population, 18,182 ("minimum" dataset), 75,101 ("median" dataset), and 76,365 ("maximum" dataset) SNPs were retained, of which 51.70%, 48.71%, and 48.69% SNPs had a MAF ≤ 1% ( Figure S2), respectively. Our final MAF-filtered datasets retained 8,782 informative sites for the "minimum" dataset, 38,522 for the "median" dataset, and 39,180 for the "maximum" dataset. A total of 8,733 loci in the "minimum" dataset passed the two filtering steps for neutrality (see below).  Figure S5b). The topology of the ML tree based on the "maximum" dataset ( Figure S6b) was identical to that estimated from the "median" data and was quite similar to that based on the "minimum" data ( Figure S6a). In the rooted trees ( Figure S6), samples of E. polyandra formed a monophyletic clade (bootstrap percentage, BP = 100%), and those of E. pleiosperma were also monophyletic (BP = 100%), with two subclades (each 100%) representing the SW versus CE lineages, respectively.

| Population genetic structure and diversity
Within the latter group, all samples from east of the Sichuan Basin (except for HP) occupied a nested position relative to those from the northwest (BM/LX) and southeast (HP). The AMOVA based on the neutral "minimum" dataset (Table 1)

| ABC-based inference of divergence and demographic histories
Assessments of the performance of ABC model selection analyses show that the simulated model was correctly identified in between 68% and 96% of cross-validation replicates, demonstrating all scenarios were distinguished correctly by the calculated summary statistics for both divergence and demographic history inferences of Euptelea ( Figure S7). When divergence history was examined using ABC based on the "full" dataset (i.e., 1,383 SNPs), "Scenario 2" (i.e., ancient divergence of E. pleiosperma and E. polyandra and more recent origin of CE from within the SW lineage of E. pleiosperma; Figures 3a,   4a) was the best fit to the data, as it had significantly higher posterior probability than the other four scenarios tested under the both multinomial logistic regression and neural network methods ( Figure S1a and Table 2). Regarding demographic history, the bestfit scenarios for both the SW and CE lineages of E. pleiosperma were Scenario 1 ("expansion"). For E. polyandra, although Scenario 3 ("expansion-shrinkage-expansion") showed the highest posterior probability, Scenario 1 ("expansion") was also found to have a high posterior probability ( Figure S1b and Table 2). In the corresponding goodness-of-fit PCA graphs (divergence: Figure S8a; demography: Figure S8b-d), the observed data points were located within a large cluster of points for the simulated data from the prior and within a smaller cluster of data from the posterior predictive distribution, indicating good model performance.
Cross-validation for parameter estimation showed that local linear regression had a lower prediction error for most parameters when compared with the other two methods (i.e., rejection and neural network) (Table S6). Therefore, we calculated posterior distributions of all parameters using a local linear regression (Figure 3).  Figures 3 and 4b).

| Impact of geographical and climatic factors on genetic structure
For E. pleiosperma and E. polyandra, the GDM analysis explained, respectively, 63.90% and 81.35% of the deviance in spatial patterns of TA B L E 1 Analyses of molecular variance (AMOVAs) based on neutral RAD-SNPs of the "minimum" dataset for Euptelea, E. pleiosperma and its two lineages (SW and CE), and E. polyandra  Figure 5a). By contrast, for E. polyandra, GEO was the only significant predictor (Table S4; Figure 5b), indicating that its population genetic structure has a strong overall spatial component, driven by geographic isolation.

| Signatures of climate-driven adaptation in E. pleiosperma
Based on the entire filtered 8,782 SNPs of the "minimum" dataset,   Table 3 for identification of corresponding parameter codes. The time parameters are estimated in generations and converted into years by multiplying generation time, which was set to 10 years for Euptelea species  vulnerability for E. polyandra was low across its range ( Figure 6).

| D ISCUSS I ON
In this study, we applied stringent filtering methods to generate a high-quality SNP dataset for phylogeographic inference. Consistent with previous results using nuclear microsatellites (440 samples and 8 microsatellites, Cao et al., 2016; 678 samples and 7 microsatellites, Wei et al., 2016), our analyses of RADseq datasets identified two classic phylogeographic breaks across the East China Sea and between the Sino-Himalayan and Sino-Japanese Forest subkingdoms (Cao et al., 2016;Wei et al., 2016). However, we were also able to detect three subclusters (SW1-3) within the SW lineage of E. pleiosperma from our RADseq data ( Figures S3b, S4b), suggesting that RADseq can recover finer population structure than microsatellites, despite including only 17.7-38.6% of the individual samples in previous studies. In addition, we found that our inferences of population structure were relatively robust to missing data, at least when the percentage of missing data ranged from 49.7% ("maximum" dataset) to 93.6% ("minimum" dataset). Likewise, irrespective of whether or not we filtered out SNPs with MAF < 1%, our results recovered the same general phylogeographical pattern, suggesting that our inferences of population structure were less affected by minor allele frequency thresholds. Clearly, our RADseq datasets with fewer samples but many more loci recovered finer population structure and inferred a more detailed evolutionary history in Euptelea, when compared to microsatellites (Cao et al., 2016;Wei et al., 2016). Moreover, we were able to use the thousands of genomic SNPs contained in our RADseq datasets, along with environmental data, to provide insights into its local adaptation and genomic vulnerability to future climate change in this system.

| Late Miocene speciation and diversification of Euptelea
Using ABC simulations, we dated the split between E. pleiosperma of Euptelea (Cao et al., 2016). Hence, the present results support our earlier vicariant-speciation hypothesis for Euptelea (Cao et al., 2016).

According to this, a Late Miocene landbridge across the East China
Sea (ECS; c. 7.0-5.0 Ma; Kimura, 1996Kimura, , 2003 would have allowed the common ancestor of E. pleiosperma and E. polyandra to migrate from China to Japan, followed by range fragmentation either due to an increasingly cooler and drier global climate around that time (Cerling & Sharp, 1996) and/or a subsequent rise in sea level (see also Cao et al., 2016). Concomitantly, all of our divergence time estimates, as well as species-specific distribution models for the Last Glacial Maximum  Table 3 TA B L E 2 Model comparison in approximate Bayesian computation analysis. Posterior probability values and Bayes factors in brackets (of the best supported scenario against the respective model) are shown for the multinomial logistic regression and neural network methods

Neural network
Tolerance rate

| Contrasting demographic histories between E. pleiosperma and E. polyandra
For E. pleiosperma, our best-fitting ABC model (Scenario 2) identified the SW lineage as being ancestral to the CE group (Figure 3a) Table 3). This timing coincides with the latest phase of intense uplifting of the HMR since the Late Miocene (Xing & Ree, 2017, and references therein). It is feasible, therefore, that the origin of the CE group reflects uplift-driven population subdivision, divergence, and (incipient) speciation and associated changes in landscape and environmental conditions, as reported for numerous other plant and animal species from the HMR and adjacent areas (e.g., Fjeldså, Bowie, & Rahbek, 2012;He & Jiang, 2014;Xing & Ree, 2017;Yuan, Zhang, Peng, & Ge, 2008). Not unexpectedly, therefore, when compared with the CE group, the SW lineage features stronger genetic structure (AMOVA Table 1; DAPC Figure S3b), reflecting restricted dispersal in a topographically complex region of high mountains and deep river valleys (e.g., Fan et al., 2013 (Rüber, Britz, Kullander, & Zardoya, 2004), and the eastward flowing Middle/Lower Yangtze in the "Three Gorges Mt.
Region" (TGMR; Liu et al., 2018;Zhang et al., 2016). This river capture event could have promoted the eastward expansion of E. pleiosperma out of the HMR. In support of this, we found a significant decrease of within-population genetic diversity (in terms of π) with longitude ( Figure 2a), that is, along the species' presumed route of colonization and possibly as a result of serial founder events (Hewitt, 1999;Swaegers et al., 2014). Moreover, in the ML tree ( Figure S6), all CE samples from east of the Sichuan Basin occupied a nested, and thus potentially derived, position relative to those originating from northwest (BM/LX) and southeast (HP) of the basin (i.e., in the HMR). In sum, therefore, the present data suggest that the CE lineage of E. pleiosperma originated in the HMR from within the SW lineage but then expanded its range eastward, predominantly along the Yangtze River valley and its tributaries. Similar scenarios of southwest-to-east migration have also been invoked for other tree species from China (e.g., Sophora davidii, Fan et al., 2013;Myricaria laxiflora. Liu, Wang, & Huang, 2009). Nonetheless, the two main lineages of E.
pleiosperma (SW vs. CE) largely represent genetically cohesive units, with little evidence for admixture at RAD-SNPs (except BM/LX/HP; Figure 1). When combined with the species' overall strong population subdivision (Φ ST = 0.44; Table 1), as also revealed by maternally inherited cpDNA (Cao et al., 2016), this would further suggest that both SW and CE populations largely persisted in separate and multiple MRF refugia over periods of Quaternary climate change, with only limited inter-regional and population gene exchange via both pollen and seeds.  In contrast to E. pleiosperma, the genomic data of E. polyandra revealed markedly lower levels of population subdivision (Φ ST = 0.13; Table 1). This could reflect, at least in part, a historical signature of the species' latest range expansion. This was also confirmed by the fact that both Scenario 3 ("expansion-shrinkage-expansion") and Scenario 1 ("expansion") received substantial support when compared to Scenario 2 ("recent shrinkage") in the ABC model selection. be treated with caution, as our data could not provide reliable posterior distributions for these parameters due to lack of power ( Figure 3 and  (Han, Fang, & Berger, 2003). Notably, both RAD-SNP data of this study and cpDNA data (Cao et al., 2016) provide no support for any extensive postglacial range expansion from the refugia at the coast areas of Japans' Southeast Pacific Ocean, which was inferred by fossil pollen data (Gotanda & Yasuda, 2008)

| Species and lineage differences in climatedriven adaptation and genomic vulnerability
In addition to the history of divergence and demographic changes, one might expect that environmental factors have also contributed to the current genomic (RAD-SNP) structure of Euptelea. Indeed, for E. pleiosperma, the GDM analysis indicated that both geography and two temperature-related variables (BIO7, temperature annual range; BIO11, mean temperature of coldest quarter) shaped the spatial distribution of genomic variation in this species (Figure 5a).
However, for E. polyandra, our GDM analysis only supported geographic distance as the main driving force of spatial genomic variation ( Figure 5b). In addition, our population genomic data failed to detect outlier loci and strong population structure in E. polyandra ( Figure 1b). This pattern most likely reflects a balance between gene flow and genetic drift across geographic space (Orsini, Vanoverbeke, Swillen, Mergeay, & De Meester, 2013;Wright, 1943).
In E. pleiosperma, of the 49 outlier loci identified by both F ST outlier approaches, only six were inferred to be under diversifying selection using MLR analysis. They were significantly associated with four temperature-related variables (BIO4, temperature seasonality; BIO6, min temperature of coldest month; BIO7, temperature annual range; and BIO11, mean temperature of coldest quarter) (Table 4), suggesting these regions of the genome are likely adaptive and diverge to a greater extent than the rest of the genome (Salojarvi et al., 2017). Nevertheless, due to the limited genomic resources available for E. pleiosperma, we cannot further annotate these six outlier loci, so their more specific role in local adaptation remains unclear.
Nonetheless, across the species' range, diversity at the 49 outlier loci (e.g., π) significantly decreased with longitude (Figure 2d), suggesting a functional role in climate adaptation. In addition, most of the 49 outlier loci showed a pronounced pattern of allelic turnover along the west-to-east gradients of temperature-related variables ( Figure 6). Together, these results provide compelling evidence that E. pleiosperma currently experiences climate-driven diversifying selection.
One may further expect that any sudden change in climate or climate variability will further increase the magnitude of diversifying selection (Exposito-Alonso et al., 2018). In fact, by using climate projections for 2050, our metric of "genomic vulnerability" predicts that populations of the CE lineage are at the greatest risk of climate-in-

| CON CLUS IONS
To our knowledge, this study is the first that uses a multidisciplinary approach combining phylogenetics, phylogeography, and population-ecological genomics to unravel the historical demography and climate-related adaptation of an East Asian Tertiary relict genus, along with predictions about its "genomic vulnerability" to future climate change. The present data seem to suggest that the more complex geological history and greater environmental (e.g., physiographic, climatic) heterogeneity of subtropical China more readily promoted lineage diversification and a signal of adaptation in a Tertiary relict tree than was the case in Japan. This is evidenced by the deep intraspecific lineage divergence and the strong signal of local adaptation in E. pleiosperma, but not in E. polyandra. In fact, the major genetic groups identified in E. pleiosperma (CE vs. SW) that are adapted to different ecological niches and thus should be managed as two separate conservation units. As the locally adapted but F I G U R E 7 Map of genomic vulnerability across E. pleiosperma (China) and E. polyandra (Japan) distribution in the future (2050). Red = high genomic vulnerability; blue = low genomic vulnerability. The population locations (n ≥ 5) are represented by dots with color-coding corresponding to the genetic structure in Figure 1  genetically impoverished populations of the CE lineage are facing a particularly high risk of losing more genetic variation due to future climate change and habitat loss, conservation measures are urgently required. Broadly, therefore, our study highlights the significance of combining genomics with environmental data when assessing the impact of future warming on East Asia's Tertiary relict flora by quantifying the ecological factors that have produced genomic variation in this system.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data for this study are available at Dryad Digital Repository: https:// datad ryad.org/revie w?doi=doi:10.5061/dryad.c77nm4g.