Phylogeographic analyses of a widely distributed Populus davidiana: Further evidence for the existence of glacial refugia of cool‐temperate deciduous trees in northern East Asia

Abstract Despite several phylogeographic studies had provided evidence to support the existence of glacial refugia of cool‐temperate deciduous trees in northeast China, the species used in these studies were limited by the species ranges, which could not exclude the possibility that northern populations were the colonists from southern refugial populations during the last glacial maximum (LGM). Here, we estimated the nucleotide variation in Populus davidiana, a widespread species distributed in Eurasia. Three groups in northeast, central, and southwest China were constructed according to the simulation results from SAMOVA, composition of chloroplast haplotypes and structure results. We revealed that the northeast China had endemic haplotypes, the haplotypes and nucleotide diversity in northern regions were not lower than that in southern China, and this species has not experienced population expansion base on the estimation of Bayesian skyline plots. Ecological niche modeling (ENM) indicated that the northeast China had a high suitability score during the last glacial maximum. The combined evidence clearly demonstrated that northeastern and southwestern refugia were maintained across the current distributional range of P. davidiana during the LGM. The genetic differentiation between these two refugia might be mainly caused by differences of climate among these areas. The phylogeographic analyses of a widely distributed P. davidiana provided robust evidence to clarify the issue of refugia in northeast China, and these results are of great importance for understanding the influence of Quaternary glaciations on the distribution and evolution of species in East Asia.


| INTRODUC TI ON
Climatic oscillations have profoundly affected the current distributions and genetic structures of many plant and animal species in the Northern Hemisphere in the past three million years (Hewitt, 2004;Qiu, Fu, & Comes, 2011). In Europe and North America, it is now well appreciated that the advance and retreat of ice sheets through multiple glacial cycles, especially the last glacial maximum (LGM, 21,000-18,000 years ago), strongly impacted the distribution and genetic diversity of many temperate species (Hewitt, 2004). In contrast, mainland China had never been directly affected by extensive and unified ice sheets although it experienced several climatic oscillations throughout the Quaternary (Harrison, Yu, Takahara, & Prentice, 2001;Qiu et al., 2011;Sun, Mu-Lin, & Pan, 1977). Moreover, mainland China stretches across tropical, subtropical, temperate, and cold temperate zones from south to north and contains a rich variety of terrains, being the region that harbors the most diverse temperate flora in the world and has a higher level of biodiversity than Europe and North America (Qian & Ricklefs, 1999). Some ice-age refugia have been confirmed in the southern regions of China (Chen, 2011), where the tropical areas and the presence of great mountainous regions contributed to the maintenance and possibly continuous diversification of the more tropical elements of the Chinese flora (Qian & Ricklefs, 1999). During the Quaternary glaciations, although most of regions in China were never covered by ice sheets, dramatic climatic oscillations led to the extinction of many plants in the north of China (Cheng, Hwang, & Lin, 2005;Lu, Peng, Cheng, Hong, & Chiang, 2001;Shen et al., 2005;Zhang, Chiang, George, & JQ & ABBOTT, R., 2005), and whether these areas have glacial refugium of cool-temperate deciduous trees is a continuing debate. This is of great importance for understanding the influence of Quaternary climate fluctuations on the distribution and evolution of species in China and in Asia.
Some authors have pointed out that during the LGM, temperate forests, steppe, and even desert vegetation which covered the areas currently dominated by coniferous and deciduous forests would have retreated southward below 30°N in the northwest and reaching 25°N in eastern China, and then at the warm and wet interglacials, they recolonized the previously uninhabitable northern regions (Cao, Herzschuh, Ni, Zhao, & Boehmer, 2015;Ni, Harrison, Prentice, Kutzbach, & Sitch, 2006;Yu et al., 2000), such as the thermophilous (Castanea, Castanopsis, Cyclobalanopsis, Fagus, Pterocarya) and eurythermal (Juglans, Quercus, Tilia, Ulmus) broad-leaved tree taxa in tropical or subtropical areas of China (Cao et al., 2015;Ni et al., 2006;Yu et al., 2000). Other researchers, however, rejected this hypothesis and suggested that the cold climate and ice sheets might have reduced the distribution of most forest species, providing evidence that multiple LGM refugia may have allowed species to persist across northern China. Bai, Liao, and Zhang (2010) examined the phylogeography of Juglans mandshurica, a temperate deciduous walnut tree distributed in northern and northeastern China, and found two independent refugia in northern China during the LGM. This was contrary to the inference of the southward retreat and northward expand of temperate forests. Phylogeographic studies of organisms from North China and Northeast China suggested that, during the LGM, cool-temperate deciduous tree species in East Asia persisted within their modern northern range (Tian et al., 2009;Zeng, Wang, Liao, Wang, & Zhang, 2015).
Despite several phylogeographic studies that provided evidence supporting the existence of glacial refugia in northeast China, some questions are still poorly solved. Hewitt pointed out that while species were expanding northwards, southern populations would die out as the southern edge of the species' tolerance range also moved north during the postglacial period (Hewitt, 1999). This phenomenon could also lead to the distribution patterns of those species in north and northeast China noted above. Nevertheless, the previous studies could not rule out this possibility because all the species they used were restricted to north and northeast China, the Korean Peninsula, Japan, and the Russian Far East. So, a species widely distributed across mainland China is needed to solve this problem through the phylogeographic study.
Populus davidiana Dode (Salicaceae), a temperate deciduous tree distributed continuously in mainland China, Mongolia, Korea, and the Far East of Russia, which is a keystone species in boreal forest communities. P. davidiana is highly resistant to cold, drought, and barren soils. Natural populations mainly grow on slopes, ridges, and gullies, often forming a small area of pure or mixed forest with other tree species, so the influence of human activities on their distribution is relatively small. Therefore, P. davidiana is an excellent model to shed light on the phylogeographical history of Asian trees.
To verify the two possible origins of cool-temperate deciduous trees in northeastern China, it is necessary to obtain a widely distributed species' population structure and genetic diversity distribution. In this research, we used both nuclear and cpDNA sequences and ecological niche models (ENMs) to identify the phylogeographic patterns of P. davidiana. We specifically aimed to address: (a) whether or not glacial refugia existed across northeast China, and (b) reconstruct the Quaternary history of the species through evaluating the distribution patterns of chloroplast haplotypes and nuclear gene polymorphisms within and among populations of P. davidiana.

| Population sampling
A total of 502 individuals of P. davidiana were sampled from 32 natural populations throughout its distribution range in mainland China ( Figure 1). These populations covered most of the species range. The population code, location, and sampling size are shown in Supporting information Table S1. The distance between any two sampled trees in the same population was at least 100 m to prevent repeated sampling because of the root sprout reproduction of P. davidiana. Leaf tissues were silica-dried and stored at room temperature until DNA extraction.
Primers used are listed in Supporting information Table S2. The polymerase chain reaction (PCR) and sequencing were based on the research of Du et al. (2015). We sequenced (ABI 3730 DNA ANALYZER) the PCR amplicons from both ends. For samples with sequence peaks or sequence failure, we sequenced after cloning. We used the software DNAsp 5.10 (Librado & Rozas, 2009)
A spatial analysis of molecular variance (SAMOVA) was performed for cpDNA sequence matrices to define the number of population groups using SAMOVA 2.0 (Dupanloup, Schneider, & Excoffier, 2010). Based on a simulated annealing procedure, the SAMOVA algorithm iteratively seeks the population composition of F I G U R E 1 Geographic distribution of the 32 sampled populations of P. davidiana. Distribution frequencies of cpDNA haplotypes, with networks of the cpDNA haplotypes constructed using NETWORK 5.0.0.0. Colored haplotypes are shared by two or more populations, and blank ones are private haplotypes. Haplotypes O1-O5 appear exclusively in outgroups. The sizes of circles in the network are proportional to the observed number of individuals in the haplotypes, and the sizes of the circles on the map are proportional to the population sizes of sampling locations, the number on the line show the number of mutations between haplotypes in the network, bottom right dots of different colors represent the three population groups identified by SAMOVA a user-defined number of groups (K) that aims to maximize the proportion of total genetic variance (FCT) due to differences between population groups (Dupanloup et al., 2010). The software was run with default parameters. The number of initial conditions was set to 500 with K = 2-10, and FCT was used to determine the most likely K.
Two measures of population differentiation, G ST and N ST , were calculated using the program PERMUT (Pons & Petit, 1996). We inferred phylogeographic structure by testing whether N ST was significantly larger than G ST using a permutation test with 1,000 random permutations of haplotypes across populations. A significantly higher N ST over G ST usually indicates the presence of phylogeographic structure and that the populations are strongly differentiated genetically. If N ST is equal to G ST , then it is likely the haplotypes are phylogenetically equivalent. Finally, if N ST is significantly smaller than G ST , then the relative geographic distribution of haplotypes is likely unrelated to their genetic distances.

| Molecular diversity of each locus in each region of P. davidiana
The molecular diversity indices, including a number of segregating sites (S) and haplotypes (H), haplotype diversity (H d ), nucleotide diversity parameters (π), Watterson's θ w (Watterson, 1975), and the minimum number of recombination events (R m ), were estimated for the regions (the composition of these regions are listed in the Table 2) at each locus using DnaSP (Librado & Rozas, 2009).
Tajima's D (Tajima, 1989) and Fu and Li's D* and F* (Fu & Li, 1993) were applied to determine whether a locus is evolving neutrally and is therefore appropriate for phylogeographic study (Caicedo & Schaal, 2004). The geographical distributions of haplotypes were labeled on a relief map of China using DIVA-GIS (Hijmans et al., 2004). Genetic differentiation among regions (F ST ) for each nuclear locus was also calculated in Arlequin 3.5.2.2 (Excoffier, Laval, & Schneider, 2005).

| Genetic structure
To identify the possible genetic structure of P. davidiana populations, the program STRUCTURE version 2.3.4 (Falush, Stephens, & Pritchard, 2003) was run, assuming a preassigned number of genetic clusters (K) ranging from 1 to 10 and 20 replicate runs were carried out for each K. We used the admixture model and the correlated allele frequencies model to estimate the genetic structure of P.
davidiana. All runs included a burn-in period of 100,000 iterations, following 1,000,000 Markov Chain Monte Carlo repetitions with the aim of ensuring the reproducibility of the STRUCTURE results (Gilbert et al., 2012). The final posterior possibility of K, Delta K (ΔK), Ln P (K), and the rate of change of Ln P (K) between successive K values (Evanno, Regnaut, & Goudet, 2005) were estimated using Structure Harvester (Earl & Vonholdt, 2012) to determine the most possible number of clusters.
To estimate the relative contribution to molecular variance within and among the defined three groups (the DB, HB and NF, excluding the GBJD population), which were obtained from the STRUCTURE results, an analysis of molecular variance (AMOVA) was performed in Arlequin 3.5.2.2 (Excoffier, Laval, & Schneider, 2007) based on the cpDNA and five nuclear single-copy genes.

| Coalescent inferences of demographic history
To reconstruct the demographic changes of three identified groups over time, the historical demographic dynamics of P. davidiana were inferred from Bayesian skyline analyses implemented in BEAST 1.2 (Drummond, Rambaut, Shapiro, & Pybus, 2005). This recently developed coalescence-based approach utilizes standard MCMC sampling procedures to evaluate the posterior probability distribution of effective population size during the intervals under a GTR substitution model (Pybus, 2006). We used both nuclear loci and cpDNA loci sequences for this analysis. Independent MCMC analyses were run for 1 × 10 8 steps, sampling every 105 steps and discarding 1,000 samples as burn-in. For each clade, multiple analyses were performed with different random seeds to test for convergence, and results from replicate runs were pooled using LogCombiner and visualized skyline plots with TRACER 1.4 (Drummond et al., 2005).
We used Bayesian analysis to estimate the divergence times between pairwise groups (Drummond et al., 2005). Simultaneously to estimate theta (θ, measured as 2N e μ, where N e is the effective population size and μ is the mutation rate per sequence per generation), population divergence time (t pop , measured as t/N e , where t is the time since population divergence). Markov chain simulation for 5 × 10 7 steps, where the first 5 × 10 6 were discarded as burn-in; and uniform prior distributions from 0 to 5 for t pop . Convergence was determined by assessing the consistency of model values for each of the parameters across three runs. The modes of the posterior distribution for both T and θ were used to estimate divergence times between pairwise clades. We assumed a mutation rate of 2.5 × 10 −9 per site per year and a generation time of 15 years in Populus (Koch, Haubold, & Mitchell-Olds, 2000).

| Pleistocene and present ecological predictive models
The global climate database from WorldClim (Hijmans, Cameron, Parra, Jones, & Jarvis, 2005) was used to infer the potential distribution of P. davidiana in the present, the last inter glacial (LIG), and the last glacial maximum (LGM) using MAXENT v.3.3.335 (Phillips & Dudík, 2008). For this species, the distribution model was generated using 19 bioclimatic parameters (Table S3) for the current, LGM, and LIG climate of the collection localities. To reduce over-fitting of ecological niche modeling, we conducted Pearson's correlation for environmental variables using the methods of Sheppard (Zheng, Fan, et al., 2017). A total of 230 distribution records were used for modeling, which included the sampling sites in this study supplemented by the Chinese Virtual Herbarium (http://www.cvh.org.cn/). Climate estimates for the LGM were provided by the Community Climate System Model (CCSM) and the Model for Interdisciplinary Research on Climate (MIROC) 3.2 (http://www.pmip2.cnrs-gif.fr) at 2.5 arc-min resolution, as well as climate estimates for the current time (http://www.worldclim. com). In addition, we projected the model to the LIG (~120,000 to 140,000 years BP) using the climate model of Otto-Bliesner et al. (Otto-Bliesner, 2006). Binomial tests of omission were carried out by randomly selecting 25% of the occurrence locations as test data and using 10,000 randomly chosen pixels from the study region as random instances.

| cpDNA haplotype variation and distribution
The total alignment length of the cpDNA sequence was 3,263 bp. Simulation results from SAMOVA suggested that FCT increased greatly from K = 2 to K = 5 and then reached a plateau at K > 5.
However, if single-population clusters were left out, as is usual in SAMOVA, we ended up with K = 3 as the optimal number of population groups (Figure 1).
According to the simulation results from SAMOVA and composition of chloroplast haplotypes in these populations, three regions were structured: we used DB, HB, and NF to represent the northeast of China, the central China, and the southwest of China, respectively ( Figure 1). Haplotypes H3 appeared in the DB and HB region and its frequency in DB (20.0%) was higher than in HB (3.2%). This phenomenon was also found for H1 (46.3% in DB and 11.3% in HB).
According to the network of chloroplast haplotypes, H5 located at the center of the gene tree and was geographically widespread. Both  Table 1).

| Genetic diversity and differentiation
The nucleotide diversity of each locus is shown in Table 2. The average nucleotide diversities of the three groups across loci were π = 0.0031-0.0040, θ w = 0.0038-0.0053. As expected, the sequence variation of cpDNA was lower than that of the nuclear loci, reflecting the lower mutation rate observed in the chloroplast genome of plants. The mean haplotype diversity (H d ) ranged from 0.571 to 0.656 for the six nuclear loci. The mean diversity levels of nucleotides and haplotypes of the nuclear loci in the DB and HB were slightly higher than in the NF region (Table 2). For single locus tests, most loci displayed negative, but not significant Tajima's D. However, the results for the DSH21 locus deviated significantly from neutrality both in the species and most of the regions, suggesting that this locus may have been subject to selection and will be excluded from further analysis. The minimum number of recombination events (R m ) of the nuclear loci ranged from 1 to 12 in the species, implying high levels of outcrossing. As expected, the sequence variation of cpDNA was lower than the nuclear loci, which was reflected by the lower mutation rate in the chloroplast genome of plants and the lower effective population size of the cpDNA.
The average haplotype diversity ranged from 0.375 to 0.406 and nucleotide diversity varied between 0.0002 and 0.0005. The mean diversity levels in the DB and HB were slightly higher than those in the NF region ( Table 2).
The F ST analysis indicated 93.3% of the pairwise F ST values were significantly different (p < 0.01) and ranged from 0.0069 to 0.5820 for the five nuclear loci (Table 3).

| Genetic structure
The structure result indicated that the mean estimated logarithm of probability of the data, L (K), increased linearly from K = 1 to K = 2, and then increased slowly up to K = 8, while the highest ΔK occurred at K = 2 (Figure 2a AMOVA indicated that variations among groups were significant (p < 0.001) and ranged from 14.75% at DSH 7% to 24.51% at DSH12 (

| Test of expansion and demographic history
Bayesian skyline plots (BSP) revealed effective population size of the three groups (Figure 3)

| Present and past distribution of P. davidiana
The

| D ISCUSS I ON
A variety of forest types are distributed in northeast China, comprising coniferous, broad-leaved, subalpine, and alpine tundra forest zones at different altitudes (Zheng, Xiao, Guo, & Howard, 2017).
Glaciers only occurred at elevations above 2,000 m in this region and adjunct areas in eastern Asia during the glacial cycle in the Pleistocene, while the comprehensive and potentially much larger unglaciated areas may have allowed for the maintenance of smaller refugia (Wei, Niu, Ling, & Cui, 2008).

| Glacial refugia in northeastern China
Recently, several phylogeographic studies had proved that there were many glacial refugia in northeast China (Bai et al., 2010;Li-Jiang et al., 2008;Liu & Harada, 2014). However, northern populations were likely to migrate from the south due to environmental changes during the postglacial period. To verify the two possible origins of cool-temperate deciduous trees in northeastern China, it is necessary to obtain a widely distributed species' population structure and genetic diversity distribution. P. davidiana, a widely distributed in China from south to north, can be used to verify the two possibilities.
According to the results of cpDNA haplotype variation and distribution, haplotypes H1 and H3 only existed in the DB and HB region and mainly appeared in the DB region ( Figure 1) and they were all high-frequency haplotypes (frequency > 5%). In the haplotype network, the most ancient haplotypes should be located at the center of the gene tree and be geographically widespread, while the most recent haplotypes should be at the tips of the gene tree and be geographically localized (Schaal, Hayworth, Olsen, Rauscher, & Smith, 2010). Based on the network of cpDNA haplotypes, it was clear that H1 and H3 were located at interior nodes of the gene tree and indicate that these two haplotypes were ancient (Schaal et al., 2010), which may indicated that there was a glacial refugium in northeast China. A recent study based on several nuclear and chloroplast loci suggested that the ancestor of P. davidiana originated in Northern America and diffused into Eurasia through the Bering land bridge (Du et al., 2015). Geographically, after P. davidiana spreading to Asia, it may migrate from northeast China to the south. In particular, the connection between the outgroup and northern haplotypes, while southern haplotypes tend to be derived, is highly suggestive of this.
Therefore, it is reasonable that the ancient haplotypes existed in northeast China. This result also supported that the northeast populations were the descendant of the local ancestral refugial populations rather than the colonist from the south populations.
In addition, the number of cpDNA haplotypes in the DB region (16) was not lower than other regions, and this was supported by the mean higher level of haplotype and nucleotide diversity of the cpDNA locus (π = 0.0004, θ w = 0.00200-0.00472, H d = 0.352, Table 2). If the species re-colonized the northern regions from southern areas in the process of postglacial expansion, the nuclear gene polymorphism of the northern populations would be lower as alleles and heterozygosity are lost in the process of migration (Hewitt, 2000). In Europe and North America, many studies have reported lower genetic diversity in northern populations that expanded from southern refugial populations (Hewitt, 2000;Soltis, Gitzendanner, Strenge, & Soltis, 1997;Zink, 1996). However, according to the results of genetic diversity, the DB region had a higher mean level of haplotype and nucleotide diversity of nuclear loci (π = 0.0034, θ w = 0.0047, H d = 0.6681). Analysis of the differences in genetic diversity between the extent geographic distributions is the major method for reconstructing species histories during the Quaternary (Jakob, Martinezmeyer, & Blattner, 2009).
As found in northern Europe, the recently colonized territories from multiple glacial refugia show rather increased genetic diversity due to lineage admixture. Groups with high levels of genetic diversity provide strong evidence for the existence of multiple glacial refugia for cool-temperate forest trees (Widmer & Lexer, 2001), this result did not support the possibility that the northern populations migrated from the south during the LGM.
Furthermore, a significantly higher N ST over G ST was found between DB and NF based on the cpDNA data, which indicated the presence of phylogeographic structure and that the populations are strongly differentiated genetically and there was no extensive migration between the DB and NF groups after divergence (Pons & Petit, 1996).
Geographically, the DB group mainly occurred north of 40°N.
The ENM estimates based on the CCSM and MIROC models showed

| Glacial refugia in southwestern China
From the results of haplotype variation, distribution, and the network of cpDNA, the older distinctive haplotype (H32) only existed in the NF region and the ancient haplotype H5 mainly appeared in southwest China (Figure 1). The diversity of haplotypes and nucleotides for the cpDNA sequence and nuclear locus in the NF region was also high (  also found possible refugia of Ginkgo biloba in southwestern China (Hwang et al., 2003;Shen et al., 2005). In addition, the Hengduan range and the Lingnan region were normally considered to be the most likely refugia for plant species in southern China, as this region had high levels of plant diversity and endemism in China and many ancient plant species were also found in the region (Shen et al., 2005;Ying, 2001). Moreover, the NF region is rich in topography, climate, and ecological conditions and was spared from the direct effects of the repeated Pleistocene continental glaciation, providing many appropriate environment conditions for the species in the LGM (Liu & Basinger, 2000). Thus, the Hengduan range was most likely to be a refugium for P. davidiana according to the results of our research.

| Possible reasons for the higher genetic diversity level of P. davidiana populations in central China
According to the results, the frequencies and compositions of the HB regions were much different from those in the DB and NF regions. They had their own private cpDNA haplotypes, and all the region had higher average levels of haplotype and nucleotide diversity ( Figure 1 and Table 2). As pointed out by Wright & Gaut (Wright & Gaut, 2005) and Wen et al. (Wen, Han, & Shun, 2010), several factors may contribute to nucleotide diversity in plant species and populations, such as natural selection, demographic population history, and mode of reproduction. Natural selection can be ruled out based on the neutrality test of nuclear loci apart from DSH21. Therefore, we speculated that there might be one or more refugia in these areas and these regions have abundant species and a large number of endemic genera of plants (Ying, 2001), maintaining high effective population sizes that result in high levels of nucleotide diversity (Ismail et al., 2012 of population genetic structure and the variation and distribution of nuclear haplotypes based on the fragment from the DSH3, 5, 6, 7, 12, 21 loci. The pairwise F ST values showed strong genetic differentiation among groups (0.192-0.362). These results together with the AMOVA analysis indicated high genetic differentiation between pairwise groups The pattern of genetic differentiation during ecological speciation is shaped by a combination of evolutionary forces. Processes such as genetic drift, local reduction of gene flow around genes causing reproductive isolation, hitchhiking around selected variants, variation in recombination and mutation rates are factors that can contribute to the heterogeneity of genetic differentiation (Feulner et al., 2015). For the widely distributed species, genetic variation among populations mainly caused by geographic isolation and different climate conditions. Gene flow can increase genetic variation within populations and reduce divergence among populations, leading to populations tend to be consistent. However, P. davidiana is a wind-pollinated species and there were several mixed zones of populations in DB, HB, and NF groups according to our study. It was surprise that genetic variation still significantly existed between these regions, indicating that geographic isolation might not be the main reason to produce genetic differentiation among the three groups. On the other hand, the populations of P. davidiana in SOUTH (excluding GBJD population) group mainly distributed in south of Sichuan Province, west of Guizhou Province and Yunnan Province (Table S1), which belong to the southern subtropical areas. The climatic characteristics in these areas such as altitude, temperature, humidity, illumination, and accumulated temperature are different from those in northern subtropical. It was probably because of the differences of climate among these areas, and different adaptabilities lead to the high differentiation of the three groups.

CO N FLI C T O F I NTE R E S T
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.