Pleistocene climate change and phylogeographic structure of the Gymnocarpos przewalskii (Caryophyllaceae) in the northwest China: Evidence from plastid DNA, ITS sequences, and Microsatellite

Abstract Northwestern China has a wealth of endemic species, which has been hypothesized to be affected by the complex paleoclimatic and paleogeographic history during Quaternary. In this paper, we used Gymnocarpos przewalskii as a model to address the evolutionary history and current population genetic structure of species in northwestern China. We employed two chloroplast DNA fragments (rps16 and psbB‐psbI), one nuclear DNA fragment (ITS), and simple sequence repeat (SSRs) to investigate the spatial genetic pattern of G. przewalskii. High genetic diversity (cpDNA: h S = 0.330, h T = 0.866; ITS: h S = 0.458, h T = 0.872) was identified in almost all populations, and most of the population have private haplotypes. Moreover, multimodal mismatch distributions were observed and estimates of Tajima's D and Fu's FS tests did not identify significantly departures from neutrality, indicating that recent expansion of G. przewalskii was rejected. Thus, we inferred that G. przewalskii survived generally in northwestern China during the Pleistocene. All data together support the genotypes of G. przewalskii into three groups, consistent with their respective geographical distributions in the western regions—Tarim Basin, the central regions—Hami Basin and Hexi Corridor, and the eastern regions—Alxa Desert and Wulate Prairie. Divergence among most lineages of G. przewalskii occurred in the Pleistocene, and the range of potential distributions is associated with glacial cycles. We concluded that climate oscillation during Pleistocene significantly affected the distribution of the species.

Northwestern China has experienced complex orogenesis and climate change. Following the tectonic uplift of the Qinghai-Tibet Plateau, the Paratethys Ocean was compelled to retreat from Central Asia, and the beginnings of an arid climate emerged (Guo et al., 2002;van Hinsbergen et al., 2012;Zhang, Dong, Yu, & He, 2013). Over a lengthy period of successive growth of the Qinghai-Tibet Plateau from mid-Tertiary onwards, the climate of northwestern China was increasingly dry, because moist sea breezes from the Indian Ocean were obstructed, and the Pacific monsoon could not reach inland (Chen, Liu, Zhou, & Wang, 1999).
In addition, glacial-interglacial cycle had further impacts on the climate of the region during the Quaternary. Both climatic oscillations and historical orogenesis have caused complex landscapes and drier climates in northwestern China. Duo to aridification and orogenic, a large number of deserts and mountains of northwestern China have emerged as effective geographical barriers, resulting in fragmentation of species' distributions and limited gene flow between fragmentations. Thus, in the populations of many desert plants have higher total genetic diversity but lower within populations genetic diversity, and allopatric divergence has generally been found in desert plants, for example, Ribes meyeri (Xie & Zhang, 2013), Hexinia polydichotoma (Su, Zhang, & Cohen, 2012), Helianthemum songaricum (Su, Zhang, & Sanderson, 2011), Nitraria sphaerocarpa (Su & Zhang, 2013), and Atraphaxis frutescens . Due to dry glacial-humid interglacial cycle in the Quaternary in these regions, many species experienced glacial contraction and postglacial expansion with corresponding climate cycle. Several glacial refugia have been found in Ili Valley, Tianshan Mountains, and Helan Mountains (Meng & Zhang, 2011;Shi & Zhang, 2015;Zhang & Zhang, 2012b;Zhang, Zhang, & Sanderson, 2013). Previous study also found that mountain ranges surrounding the Dzungarian Basin probably served as migration corridors for Clematis sibirica and the Loess Plateau was a dispersal corridor for Lagochilus ilicifolius during postglacial time (Meng & Zhang, 2011;Zhang & Zhang, 2012a). In the past few decades, although the phylogeography has developed rapidly in northwestern China, the evolutionary history of most plants in these regions is still poorly understood.
Gymnocarpos przewalskii (Figure 1) is a small shrublet distributed throughout the semi-arid regions of northwestern China, from the westernmost end of the Tarim Basin, through the Hami Basin and the Hexi Corridor, to the Wulate Prairie at the easternmost.
The plant has succulent-mucronate leaves and a well-developed root system, and shows an outstanding adaptation to arid regions.
It has low dispersal capabilities, because both seeding and germination rates are low, and although it can propagate clonally, the dispersal of cloned offspring is very limited in scope and concentrated around the mother plant (Wang & Ma, 2007). Therefore, G.
przewalskii can serve as a good model for spatial and temporal scale phylogeographic analysis in the region. Moreover, it is a protected plant of first conservation priority (Fu, 1992). Thus, the phylogeographic study of the species has important values, for both the origin and evolution of flora of the Chinese northwestern deserts, and plant protection. Previous phylogeographic studies of G. przewalskii (Ma, Zhang, & Sanderson, 2012) based on chloroplast genetic data alone showed the species to have high levels of genetic variation, especially in the western Tarim Basin, the Hami Basin, the Liuyuan region in western Gansu, and at the easternmost end of the distribution of the species. To explain the above genetic structure, a scenario that the four regions were glacial refugia for G. przewalskii was hypothesized . However, that study used only chloroplast DNA fragments, and therefore, the ability to investigate population dynamics and genetic structure of the species was limited.
In the present study, we employed both nuclear and chloroplast markers and simple sequence repeat (SSRs) to investigate the spatial genetic pattern of G. przewalskii. In angiosperm species, the chloroplast genome is maternally inherited from the ovary, while the nuclear genome is biparentally inherited from both pollen and ovary. The different patterns of inheritance of markers may reveal the complexity of gene flow because the distribution of genetic F I G U R E 1 Gymnocarpos przewalskii | 5221 JIA And ZHAnG diversity estimated from different patterns of inheritance of markers may be tremendous difference (Petit et al., 2005). Thus, the two modes of inheritance allow us to examine the population structure from different perspectives. In this work, we aimed to address the following questions: (1) to reveal the evolutionary history and (2) explore current population genetic structure of G. przewalskii from in northwestern China.

| Sample collection
For the analysis of nuclear and chloroplast markers, samples of G. przewalskii were collected during field surveys from June to  Table S1).

| DNA extraction, amplification, and sequencing
Genomic DNA from a single G. przewalskii individual was extracted according to a modified CTAB protocol (Doyle & Doyle, 1987), and the quality of the DNA was tested using electrophoresis in 1% agarose gel.
For the analysis of SSRs, microsatellite enrichment library was constructed according to the protocols reported by Jia, Liu, Han, Li, and Pan (2011), with minor modifications.
Microsatellite loci were screened by the SSRHunter software (Li & Wan, 2005). The redundant sequences were discarded manually.
Mantel tests were used to investigate patterns of isolation by distance (IBD), which test the correlation between the matrix of pairwise genetic distance (F ST ) values and the matrix of geographical distances. The analyses were carried out in the Alleles In Space software (Miller, 2005).
Phylogenetic analyses were performed using maximum parsimony (MP) and Bayesian inference (BI) methods. Outgroups for the cpDNA analysis were Silene noctiflora (JF715056, complete genome) and Silene atocioides (EU314655, EU308518); outgroups for ITS were Silene noctiflora (FN821141), Paronychia canariensis (AJ310959), and Gymnocarpos decandrus (KF850591). Outgroup sequences were downloaded from NCBI (http://www.ncbi.nlm.nih.gov/). The haplotypes from unphased ITS genotypes of Silene noctiflora were obtained using methods similar to those with G. przewalskii. Optimal models of DNA evolution for the data were inferred using the MrModelTest 2.3 program (Nylander, 2008). The best substitution model was GTR + G for both cpDNA and ITS according to the Akaike information criterion, and was applied in BI, and BEAST analyses.
The MP analyses were conducted in PAUP 4.0b10 (Swofford, 2002), with 1,000 additional sequence replicates and branch-swapping using TBR. Parsimony bootstrapping (PB) was calculated from 1,000 replicates using fast and stepwise addition of taxa. BI analysis was carried out using MrBayes version 3.0b4 (Huelsenbeck, Ronquist, & Hall, 2003), with 2,000,000 generations in two parallel runs, sampling trees at every 1,000th generation. The first 10% of sampled trees were treated as burn-in and discarded.
To investigate approximate divergence times of G. przewalskii, we employed a Bayesian approach in the software BEAST version 1.8.1.
An uncorrelated lognormal relaxed clock was used for clock models, and a constant size process was used for modeling speciation. The MCMC search was run for 70,000,000 and 80,000,000 generations for ITS and cpDNA, respectively, and sampled every 1,000 generations. Four independent Markov chains were used in this process.
TRACER 1.5 (Rambaut & Drummond, 2009) was used to check the convergence of the MCMC chains. The maximum clade credibility (MCC) tree was generated in TreeAnnotator using the product method with a burn-in of the first 10% of sampled trees. Due to the lack of specific substitution rates in G. przewalskii, we estimated the divergence time using published substitution rates. Based on the synonymous substitution rates for most angiosperm species of cpDNA genes (1.0 × 10 −9 to 3.0 × 10 −9 s/s/y) (Wolfe, Li, & Sharp, 1987), and substitution rates of plant ITS (3.46 × 10 −9 to 10 × 10 −9 s/s/y) (Richardson, Pennington, Pennington, & Hollingsworth, 2001), we followed  in using a normal prior distribution and set a mean of 2 × 10 −9 s/s/y and a standard deviation of 6.08 × 10 −10 s/s/y for cpDNA, and a mean of 6.73 × 10 −9 s/s/y and a standard deviation of 1.99 × 10 −9 s/s/y for ITS.
To investigate the demographic history of G. przewalskii, we employed mismatch distribution analysis (MDA) implemented in ARLEQUIN 3.1 (Excoffier et al., 2005). MDA represents the frequency distribution of pairwise differences among haplotypes; multimodal distributions are related to demographic decline or equilibrium, while unimodal pairwise mismatch distributions indicate that populations have experienced recent demographic expansion (Rogers & Harpending, 1992;Slatkin & Hudson, 1991). The Harpending raggedness index (HRag) and the sum of squared deviations (SSD) between observed and expected mismatch distributions were estimated with 1,000 parametric bootstrap replicates.  (Kalinowski, 2005).

| Analysis of SSRs
To indicate the genetic relationship among populations, an UPGMA tree based on pairwise Nei's genetic distances was constructed by POPGENE 1.32 (Yeh et al., 1997). Population genetic relationship was further assessed by the Bayesian clustering analysis using STRUCTURE version 2.3.4 (Pritchard, Stephens, & Donnelly, 2000). The optimal number of subgroups was set from 2 to 16 (K = 2-16). For each K value, ten independent runs were conducted with a burn-in period of 10,000 and 10,000 Markov Chain Monte To compare the genetic differentiation within individuals, within populations, among populations within the groups and among the groups, a hierarchical analysis of molecular variance (AMOVA) was carried using Arlequin 3.1 (Excoffier et al., 2005). Mantel tests were also carried out in the Alleles In Space software (Miller, 2005).

| Species distribution modeling
We performed species distribution modeling (SDM) to reconstruct the paleo-and current distributions for G. przewalskii through the maximum entropy algorithm as implemented in MAXENT v3.3 (Phillips, Anderson, & Schapire, 2006). MAXENT employs environmental data in conjunction with presence-only data to estimate the probability distribution. Species occurrence data were taken from our field surveys and the online database of the Chinese Virtual Herbarium (CVH; http://www.cvh.org.cn/cms/). In total, 27 points were obtained to be used for SDM analyses (Supporting information Table S3). SDM predicted current distribution using climatic variables at 2.5-min resolution, downloaded from the WorldClim database (www.worldclim. org). The nineteen bioclimatic variables were strongly correlated with each other, which could result in model overfitting (Graham, 2003

| Recombination analysis of ITS data set
Recombination was not detected in the ITS alignment by the pairwise homoplasy index (p = 0.738). Lack of recombination allowed direct analysis of the ITS sequences.

| Genetic diversity
For cpDNA, the data set comprised 176 cpDNA sequences, which collapsed into 25 haplotypes. Mean h and π estimates were 0.8442 and 0.008417, respectively. For ITS, the data set comprised 370 ITS sequences yielding 27 haplotypes; mean h and π estimates were 0.8477 and 0.008825, respectively. The detail results of sequence diversity for each population are shown in Table 2.
Distribution of cpDNA and ITS haplotypes is shown in Figure 2 and Detailed polymorphic information is shown in Table 3.

| Population structure
Estimates of average within population diversity and total diversity for both cpDNA and ITS were very high in G. przewalskii In analysis of SSRs, according to the Nei's genetic similarity coefficients and genetic distance, the relationship between populations SB, LY, CM, ZW, and AZQ is closest (Supporting information Table   S6). The UPGMA tree showed that the 17 populations could be di-

| Phylogenetic analyses and estimate of divergence time
The MP tree and Bayesian tree are highly similar, only differing in minor aspects, so we here employed the MP tree ( Figure 5)  The results of BEAST for both ITS and cpDNA data showed that divergence between most lineages of G. przewalskii occurred in the Pleistocene ( Figure 6).
Estimates of neutrality tests did not identify significantly departures from neutrality, with p-values of the whole data above 0.05 (Table 8). In the mismatch distribution analysis (Figure 7), multimodal mismatch distribution was observed in both cpDNA data and ITS data set. The LGM potential distribution of the species was predicted by CCSM and MIROC models. Compared with the present potential distribution, the LGM distribution showed a general range contraction.

| Species distribution modeling
Both CCSM and MIROC models showed that all these suitable areas were isolated from each other. Overall, the range of potential distributions is associated with glacial cycles; the distribution range contracted under glacial conditions and expanded during interglacials.

| Genetic diversity in G. przewalskii
The cpDNA haplotype diversity (Tables 2, 3) Wen, Xu, Zhang, & Feng, 2015). Considering that the divergence of haplotypes of these species occurred during the Pleistocene, the high total genetic diversity detected in these taxa might reflect the fragmenta-   (Wu et al., 2010), indicating that climate and vegetation types in two subregions are differences.
The differences in these environments may have an impact on the genetic diversity of G. przewalskii. Moreover, G. przewalski represents the relicts of Tethyan ancestor. The high genetic diversity of the species might reflect the accumulation of mutations over time . for other plants (Nybom, 2004). The breeding methods and plant life of the species may contribute to high diversity within population. G. przewalskii is facultative clones that have sexual reproduction and cloning (Wang & Ma, 2007). The genetic diversity of these plants is closely related to the proportion of sexual reproduction in their reproductive systems, since sexual reproduction can increase genetic diversity within the population and reduce genetic differences among populations (Kirsten, Dawes, & Cochrane, 1998). The results of HWE show that most of the population of G. przewalskii did not deviate significantly from HWE, indicating random mating population (Table 3). In addition, the genetic diversity among the populations within group was the smallest, and the genetic variation among individuals was the largest, indicating that individuals within the group may be free to mate. Previous physiology studies have shown that G. przewalskii is mainly sexually propagated as crosspollination plants (Li, Tang, & Fu, 2016 The plant life also has an important effect on h T and h s of plants. Long-lived plants may be less affected by genetic drift and are more likely to retain genotypes (Lowe, Boshier, Ward, Bacles, & Navarro, 2005). Aparicio, Hampe, Fernández-Carrillo, and Albaladejo (2012) also show that the diversity of plants with high longevity is high.
As the G. przewalskii is a small shrublet, long life may maintain high genetic diversity.  Table 2). Therefore, based on the available data, the genotypes of this species were roughly divided into three groups. The lack of shared cpDNA or ITS haplotypes among these three discrete groups indicated that there has been little gene flow among the groups since their formation, leading to the accumulation of haplotypes differences. Moreover, each of the three groups has its own distribution and habitat (Figure 2), which could also be responsible for the high genetic differentiation among groups. The Tarim (Sun, 2002;Zhang & Men, 2002). The initial Gurbantunggut Desert was present at the mid-Pleistocene (Fang et al., 2002), the Badain Jaran-Tengger Desert greatly enlarged during the mid-Pleistocene (Yang, Fang, Dong, Peng, & Li, 2006), and the sand desert landscape developed in the Ulan Buh Desert in late-Pleistocene (Li et al., 2014).

| Phylogeographic history of G. przewalskii
Although G. przewalskii is an arid-adapted species, moderate moisture is essential for its growth and reproduction. The edge of oases provides many habitats with moderate moisture .
Because of this reason, when the desert expanded, the gene flow caused by pollen and seeds was limited, resulting in the increase in genetic variation.
Coalescent theory predicts that ancient haplotypes will occupy interior nodes of a haplotype network and be geographically widespread, whereas the recent haplotypes will occupy at the tips of the haplotype network and be geographically local distribution (Schaal, Hayworth, Oseni, Rauscher, & Smith, 1998 Prairie. However, due to the limited data, further research is needed. Biological refugia are relatively stable areas where a species is predicted to have had persistent survival throughout climatic fluctuations during the Pleistocene. According to Petit et al. (2003), plant populations that have high levels of genetic diversity and unique haplotypes indicate that their associated areas may have served as glacial refugia. Previous studies of G. przewalskii based on cpDNA data alone have suggested four independent glacial refugia for the species (the western Tarim Basin, Hami Basin, western Gansu, and the easternmost part of this species distribution area), because of high levels of genetic variation, unique haplotypes, and ancestral haplotypes in these four regions .
In present study, combining all data sets, we observed that almost all populations have high levels of genetic diversity, and 14 populations distributed in each group have unique haplotypes (Tables 2,   3). Thus, we observed neither hot spots nor cold spots, and could not find clines of genetic diversity from hypothetical refugial groups, or hypothetical recolonized groups. Furthermore, mismatch distribution analysis and the neutrality tests do not agree with a recent expansion ( Figure 7, Populations of the species are distributed in scattered patchiness and mainly grow along foothills. Areas of complicated topography might provide suitable microenvironment in relative stability and survived there during severe climatic oscillations (Rull, 2009). In addition, there were mountains that do not exceed 2000 meters in northwestern China without ice cover during the Pleistocene.
The foothills that locate between the desert and the mountains were characterized by a relatively mild Pleistocene climate. Thus, we hypothesized that there are many patch-like stable microenvironment in the distribution of G. przewalskii so that the species can survive during climatic oscillations (Rull, Schubert, & Aravena, 1988). Thus, plant grown in microenvironment did not undergo large-scale recolonization during interglacial, but rather a process of increasing the sizes of occupancy of its geographical distribution (Rull, 2009;Ye et al., 2014).

ACK N OWLED G M ENTS
We thank Hong-Xiang Zhang and Xiao-Long Jiang for help on data analysis and comments on the manuscript. We thank Yuanming Zhang for supporting the fund for the publication of the manuscript. This study was funded by Biodiversity Conservation Strategy

Program of Chinese Academy of Sciences (ZSSD-012) and China
National Key Basic Research Program (2014CB954201).

CO N FLI C T O F I NTE R E S T
None declared.