Phylogeography and the population genetic structure of flowering cherry Cerasus serrulata (Rosaceae) in subtropical and temperate China

Abstract Cerasus serrulata (Rosaceae) is an important flowering cherry resource which is valuable for developing new cultivars of flowering cherries. It is broadly distributed and possesses abundant variations. In this study, phylogeographic analysis was conducted to reveal the evolutionary history to better understand the genetic diversity and genetic structure of C. serrulata so as to provide more accurate molecular insights into better conservation and utilization of the germplasm resources. A total of 327 individuals from 18 wild populations were collected. Three chloroplast DNA (cpDNA) fragments (matK, trnD‐E, and trnS‐G) and the nuclear internal transcribed spacer (ITS) were utilized. The results showed a high genetic diversity at both species level and population level of C. serrulata. High genetic differentiation and the existence of the phylogeographic structure were detected. No significant expansion events were discovered. Two geographic lineages were inferred. One was confined to the Qinling Mountains and the Taihang Mountains. The other was from the Wuling Mountains to the Jiangnan Hilly Regions and then went northeast to the coast of Asia. In addition, some taxonomic treatments of the C. serrulata complex are discussed and reconsidered. Conservation and utilization strategies of wild C. serrulata germplasm resources were recommended.

use of this C. serrulata resource will contribute significantly to the development of new flowering cherry cultivars.
C. serrulata (Lindl.) G. Don ex London (Rosaceae) has a wide distribution range from west to east in China and is also distributed in the Korean Peninsula and Japan (Li & Bartholomew, 2003).
It also covers a broad latitude range from about 24°N to 45°N (Li et al., 2014). Various environments are included in this broad area, including mountains, hilly regions, plains, islands, and inland and coastal areas of Asia. This distribution range is also mainly inside the subtropical and temperate area. Thus, such diverse habitats and different climates should have generated ample genetic resources in C.
serrulata, which is of great value to the breeding of new flowering cherry cultivars (Yi et al., 2018). However, along with the high variations, there are many transition morphological characteristics that result in taxonomic controversies and which cause further trouble in records of the correct breeding procedure and making full use of the species resource. Although simple sequence repeats (SSRs) were used to study the genetic diversity and structure of C. serrulata (Chen, 2016;Yi et al., 2018), measurement at the base level, which is believed to provide deeper and better precision, is lacking.
A phylogeographic study is therefore a good choice for discovering more in-depth and clearer information about the germplasm resources of C. serrulata. Phylogeography focuses on the gene genealogy spatiotemporal pattern of related species or intraspecies as well as how the pattern was formed (Bai & Zhang, 2014;Hickerson et al., 2010). Chloroplast DNA (cpDNA) sequences are generally used as it is uniparentally inherited (cpDNA in angiosperm is usually paternally inherited); this can provide a clearer evolutionary history without the effects of genetic recombination. Nuclear sequences (especially ITS) are also effective and can provide additional information. Thus, phylogeographic analysis using sequence data can provide information about the genetic diversity and structure at the base level, rather than a fragment length of traditional molecular markers, which ought to provide more abundant and accurate insights. In addition, divergence time and expansion time of subsets of a germplasm resource can be estimated based on the base substitution rate of nuclear sequences. Combining geological events at a corresponding time, the evolutionary history of geographic groups of a species can be inferred (Hickerson et al., 2010). Therefore, a more efficiently explained genetic structure and genetic diversity of a species can be revealed.
In phylogeographic analysis, glacial refugia play an important role, since they harbored species, allowing them to survive the harsh environment during the glacial period, and were the origin places of population colonization after glaciation; thus, they have eminent effects on shaping the contemporary distribution pattern of species and their genetic diversity and genetic structure (Chung, López-Pujol, & Chung, 2017 TA B L E 1 (Continued) Zhou, 1997). The main Korean mountain range (the Baekdudaegan (BDDG)) is also thought to be a glacial refugium, mainly for the boreal and temperate flora of northeastern Asia (Chung et al., 2017).
In addition to those large mountains, microrefugia such as smaller massifs or lowland sites are also of great importance for sustaining species. It was hypothesized that multiple microrefugia were located in the northern part of the southern refuge (24°N to 33°N; Wang & Ge, 2006).
Phylogeographic studies can also provide new clues about taxonomy, as genetic relationships are analyzed using sequence data.
There are taxonomic controversies regarding C. serrulata as mentioned above. C. serrulata is a complex consisting of three varieties recorded in Flora of China (Li & Bartholomew, 2003) Province, China). These individuals are similar to C. serrulata, but are predominantly short, and the leaves are red and small. There are also controversial taxonomic relationships between C. serrulata and some other species, such as C. xueluoensis (Nan, Wang, Tang, & Yi, 2013).
In this study, a total of 18 wild populations distributed in subtropical and temperate China and the Korean peninsula were collected in order to conduct a phylogeographic analysis of C. serrulata. The sampling has covered the majority of C. serrulata distributions. For the first time, three paternally inherited cpDNA fragments and the biparentally inherited nuclear ITS sequences were used together for phylogeographic analysis of C. serrulata to reveal the spatiotemporal distribution history, so as to better understand its genetic diversity and genetic structure. Suggestions on conservation and utilization of the germplasm resources are recommended. In addition, in this study, individuals described as C. laoshanensis (Zang, 2017) in Laoshan Mountain (Qingdao, Shandong, China) were collected as population LaoS, and individuals recorded by Yi (2007) in the Huanggangshan Mountain (Wuyishan, Fujian, China) were sampled as population HGS to allow discussion of the taxonomic problems, in order to provide more proof for further clarifying their relationships with C. serrulata.

| Population sampling
We conducted field investigations of C. serrulata according to specimen and literature records since 2010. A total of 327 individuals were collected from 18 wild populations, 17 in China and one in Korea (Table 1, Figure 1a). Our collections roughly covered the distributions of C. serrulata. Green, young, and healthy leaves were sampled from individuals at least 30 m apart and were quickly stored in silica gel. All samples were used for cpDNA analysis, while 174 were used for ITS analysis with a maximum accession number of 10 for each population (Table 1).

| DNA extraction, PCR amplification, and sequencing
The genomic DNA was extracted by the CTAB-SiO 2 method (Nan, 2012;Zhang, Zhang, Zhu, Wang, & Huang, 2004). Three cpDNA fragments selected after screening, matK, trnD-E, trnS-G, and the nuclear ITS sequence, were amplified. Polymerase chain reaction (PCR) amplifications were carried out in a system of reaction mixtures of 25 μl, containing 12.5 μl of 2 × Taq PCR Master Mix (Tiangen, Beijing, China), 1 μl of each primer, 2 μl of DNA template, and 8.5 μl of ddH 2 O. The PCR procedure began at an initial denaturation of 5 min at 95°C, followed by 35 cycles of 45-s denaturation at 94°C, 45-s annealing at 56°C, 45-s extension at 72°C, and ended at an extension of 8 min at 72°C. The PCR products were checked by a 1.0% agarose gel and were sent for sequencing on ABI Prism 3,730 Genetic Analyzer (Sangon, Shanghai, China).
The three cpDNA sequences were concatenated for subsequent analysis. Identification of cpDNA haplotypes (H) and ITS ribotypes (R), and calculation of their diversities (H d /R d ) as well as nucleotide diversity (P i ) were performed using DnaSP v6.0.01 (Rozas et al., 2017). Analysis of molecular variance (AMOVA) was conducted to estimate the genetic variances within and between populations and regions using Arlequin version 3.5 (Excoffier & Lischer, 2010).
G ST , representing haplotype frequency, and N ST , representing haplotype difference, were calculated to examine the existence of population structure by PERMUT v. 2.0 with 10,000 permutations (Pons & Petit, 1996). N ST > G ST , p < .05 indicates the presence of a phylogeographic structure. Genetic differentiation coefficient (GammaSt) between regions clustered using BAPS 6.0 was calculated using DnaSP v6.0.01 (Rozas et al., 2017). Mismatch distribution analyses of each group and all were conducted using analysis of population size changes in DnaSP v6.0.01 to test whether they were constant in size or experienced growth or decline (Rozas et al., 2017). Neutrality tests of Tajima's D and Fu's F S were conducted to test population expansion using Arlequin ver. 3.5 with the number of simulated samples set to be 10,000 (Excoffier & Lischer, 2010).
These analyses were only conducted with cpDNA sequences.

| Divergence time dating
CpDNA haplotypes were used for analyzing the divergence time dating. PartitionFinder 2 (Lanfear, Frandsen, Wright, Senfeld, & Calcott, 2017) was utilized using greedy algorithm (Lanfear, Calcott, Ho, & Guindon, 2012) with PhyML (Guindon et al., 2010) to find the best partitioning scheme for the sequences and the best-fit models for each partitioned subset. The branch lengths were set to be linked.
Models were selected from all available substitution models in BEAST V1.10.4 . Model selection was set to AICC.
The divergence times were estimated by BEAST v1.10.4 . Lacking fossil records, secondary calibration points according to the diversification of Rosaceae were applied to calibrate node ages. Divergence time dating was calculated in two steps with two trees, respectively. Five secondary calibration points were ap-

| Genetic diversity
The total length of aligned sequences for the three chloroplast fragments (matK, trnD-E, and trnS-G) was 2,161 bp with 26 variable sites,

| Genetic structure
Analysis of molecular variance (AMOVA) revealed a high differentiation level of both cpDNA (F st = 0.72461) and ITS (F st = 0.88730) in C. serrulata (Table 3). The most variation was among regions, indicated by the percentage of 67.78% cpDNA variation and 87.6% ITS variation. Variation among populations within regions was very small (cpDNA: 4.68%; ITS: 1.13%). AMOVA indicated a genetic structure in the C. serrulata population. The phylogeographic structure was also affirmed by comparing the haplotype frequency and haplotype dif- Analysis using BAPS identified six groups of regions of the 18 populations of C. serrulata of both DNA fragments, but there were some differences in the groupings (Table 2,  and is the largest group. Region VI was composed of populations FHS and K-NS, ranged from the Korean Peninsula to almost the junction between China and North Korea. As for ITS sequences, population HeS (grouped in the southwest region, Region II by cpDNA analysis) was set to Region III, along with the Jiangnan Hilly Region; population TTZ was clustered into Region V on the plain on the east coast of Asia, rather than to be set to Region III in the Jiangnan Hilly Region. This seems to indicate a geographic lineage from the southwest region to the east coast of Asia, and the Jiangnan Hilly Region is the transition area; thus, populations at junctions of its two sides (HeS and TTZ) showed an ambiguous station in grouping.
In both sequence analyses, population LaoS, representing C.

| Distribution pattern and phylogenetic relationship of ITS ribotypes
A total of nine ITS ribotypes were identified in the 174 individuals ( Figure 2a, Tables 1, 2). Of all the ribotypes, five were unique only

TA B L E 5
Pairwise values of genetic differentiation coefficient (GammaSt, above diagonal) and gene flow (Nm, below diagonal) of ITS sequences among six regions of C. serrulata F I G U R E 3 BEAST-derived chronogram of C. serrulata based on cpDNA sequences. Numbers at each node indicate the node age (million years ago, mya). Ages of node 1 and node 2 have been calibrated. The blue bars illustrate the extent of the 95% highest posterior density (HPD) credibility intervals for each divergence time. Ages of nodes marked in purple spots and pointed by purple arrows marked from A to D correspond to the divergence time of Group 1 to Group 5, respectively to one population, and ribotype R4 was the most standard ribotype which covered 12 populations, and was the unique ribotype for seven populations.
The distribution pattern (Figure 2a), the phylogenetic tree

| Population dynamics
Mismatch distribution is the distribution of the observed pairwise nucleotide site differences. Mismatch distribution analysis in BAPS shows the observed mismatch distribution and the expected values in growing and declining populations (Rogers & Harpending, 1992, equation 4). Under the assumption of neutrality, a significantly negative value for Tajima's D and Fu's F S , along with a unimodal mismatch distribution curve, indicates a population expansion (Fu & Li, 1993;Tajima, 1989Tajima, , 1996.
In the neutrality test, the value of Tajima's D and Fu's F S for all and the six C. serrulata groups was mainly positive or unremarkable negative (Region II), but p > .05 (

| Genetic diversity
Cerasus serrulata, of the genus Cerasus, is a widely distributed species in China. It is also distributed in the subtropical and temperate monsoon climate zones. With such a broad distribution, it is expected to have a higher genetic diversity than other Cerasus species. Both cpDNA (H d = 0.829, P i = 1.500) and ITS (R d = 0.673, P i = 5.210) analysis proved a high genetic diversity of C. serrulata. This is higher than wild C. pseudocerasus (cpDNA: h = 0.557, π = 2.09), which is also a widely distributed Cerasus species (Chen et al., 2015). The diversity based on ITS sequences was lower than C. dielsiana (ITS: H d = 0.879, π = 3.56), which is relatively widely distributed (Zhu et al., 2019). This may be a result of less sampling, rather than a real reflection of a lower diversity level. The broad spreading pattern must be a significant reason for the high diversity since diverse topographies within the distribution pattern of C. serrulata such as alps, mountains, and hills, as well as plains, have provided varying habitats for C. serrulata.
In addition, the different climate zones have further increased the variances. All these multifarious environments must have impelled the variations and high diversity during adaptation in the diffusion of C. serrulata. The large distribution also indicates that C. serrulata has a strong ability to adapt. Besides the broad distribution range, large sampling, which covered most of the natural distribution records, was also a fundamental factor. Another reason may be the long evo-  Xu et al., 2015). This long existence allowed a more significant accumulation of variants, especially strengthened by a wide and strong adaptation.

| Genetic differentiation and structure
Genetic differentiation is related to gene flow, natural selection, and mutation. Gene flow can promote genetic exchanges between different populations and thus decrease population differentiation (Slatkin, 1987). Plant gene flow mainly consists of pollen flow and seed flow (Dick, Hardy, Jones, & Rémy 2008). Therefore, the dispersal of seeds and pollen plays an important role in gene flow. Generally, cross-pollinated plants and plants whose seed dispersal assisted by animals have higher gene flows (Loveless & Hamrick, 1984), while geographic obstacles such as mountains, seas, and rivers will impede gene flow (Slatkin, 1987). Natural selection usually happens in a specific environment which is distinct, and which puts high pressure on a species to evolve to adapt to the special environment. Thus, natural selection will strengthen specialization and increase genetic differentiation (Ehrlich & Raven, 1969;Endler, 1997). Mutation is the ultimate origin of all genetic variation (Barton, 2010); therefore, it provides the basic source of genetic differentiation.
In this study, AMOVA revealed a high genetic differentiation level among C. serrulata populations (cpDNA: F st = 0.72461; ITS:  Table 4). This phenomenon was also observed in analyses using nuclear SSRs (Chen, 2016;Yi et al., 2018).
But the ribotypes of nuclear ITS were clustered with Region I in the northwest region and shared little gene flow with Region V (Figure 2, Table 5). This confusion may be because of the less sampling of ITS sequences. Furthermore, actually, distributions of C. serrulata were also recorded in Hebei Province and Heilongjiang Province in China, as well as in Japan. But we failed to obtain samples in these regions. Thus, a lack of comprehensive sampling also caused this confusion.

| Demographic dynamic and potential glacial refugia
The most recent common ancestor of C. serrulata appeared around 8.84 mya in the mid-to late Miocene, as calculated in the molecular dating analysis (Figure 3). Of the two lineages of C. serrulata, the crown time of the first lineage, in Region I in the northwest, was around 3.93 mya; the crown time of the other lineage, distributed from the southwest to northeast, is dated to around 7.27 mya. In other species, a similar time was also detected for the C. dielsiana ancestor (6.28 mya; Zhu et al., 2019), the Polystichum glaciale ancestor (6.89 mya; Dong, Xu, Rana, Li, & Sun, 2018), and the divergence of Ixiolirion tataricum and I. songaricum (~7 mya; Li, Song, Zhang, & Lv, 2019). It seems that the divergence event of many species was concentrated during such a period of time, which overlapped with Himalayan movement, the beginning of the development of the Arctic ice sheet, and when the global climate becoming cooler (Liu, Zheng, & Guo, 1998) and drier (Ma & Tian, 2015;Zhao, Lu, & Tang, 2018). These strong landform and climate changes might lead to such a concentration of species divergence. As such, the uplifting of the Qinling Mountains might have taken place accompanied by Himalayan movement (Xue et al., 2004) and might have gradually isolated the north groups from others, finally resulting in the divergence of the two geographic lineages.
No dominant expansion events were recognized either in the neutrality test or in the mismatch distribution analysis of different regions at the species level, indicating that C. serrulata might have reached neutrality in recent times. The genus Cerasus is reported to have formed its distribution center and pattern before the Last Glaciation Maximum (LGM, from 0.0265 mya to 0.019-0.02 mya) (Cao, 2006;Li, 2009;Wang, 2014). Thus, we would like to infer that C. serrulata also finished its distribution center and pattern before the LGM, experienced no significant colonization, and stayed relatively stable thereafter.
Glacial refugia, which protected species and enabled them to survive, are usually considered to be large mountains. Nevertheless, more and more attention is being given to the importance of microrefugia, such as smaller massifs or lowland sites (Suchan, Malicki, & Ronikier 2019;Zhu et al., 2019). It is believed that the broad land of the southeast of Asia was linked together by the Bohai Sea and the coast of Asia during glaciation, when sea levels declined (Guo et al., 2014). This area provided corridors for plants and animals, allowing them to travel between areas that are now broken up by the Bohai Sea. As such, the genetic link of populations in the Korean Peninsula and populations in the coast of east Asia can be explained.
Valleys in the east-west mountains, such as the Qinling Mountains, could provide refugia and travel access for many species (Wulufu, 1964;Zhou, 1997

| Taxonomic reconsideration
Cerasus serrulata is a complex with ample variations and transition morphological characters which have brought great taxonomic controversies. Here, we sampled two controversial groups of this complex to reveal the relationship. C. laoshanensis, published in 2017, was described to resemble C. serrulata (Zang, 2017). In this study, population LaoS was sampled to represent C. laoshanensis. But it showed no specificity from C. serrulata individuals in any relationship analysis (Figures 1, 2 and Tables 1, 2). Thus, it was endorsed to merge C. laoshanensis into C. serrulata. As for C. huangangensis, a record in 2007 (Yi, 2007), represented by population HGS in this study, showed distinctiveness as it exclusively possessed only one kind of haplotype or ribotype of both cpDNA and ITS sequences (Figures 1,   2 and Tables 1, 2). But it was still well mixed within the big clade in both phylogenetic analyses of NJ trees (Figures 1b, 2b) and ML tree ( Figure 3) of C. serrulata. Therefore, better treatment of it would be a variety: C. serrulata var. huangangensis.

| Implications for conservation and utilization
Flowering cherry possesses good ornamental value and is popular around the world. Nowadays, cultivars of flowering cherries are mainly developed from Japan. According to our accumulated observation and study on the cherry resources of China and Japan, we believe many of the good cultivars have some origin from C. serrulata.
There The unique environment should have pressed strong selection on this population, and the high altitude has prevented gene flow from other populations; thus, the analysis showed no genetic diversity of this population. Therefore, protection combining in situ conservation and ex situ conservation is suggested for this resource. Phenotypic characteristics of this population are also unique as the leaves are red and the plant is short and shaped like a shrub. Resources in this region have potential value for developing new cultivars with wet resistance, lower shape, and colored leaves.

| CON CLUS ION
We found high genetic diversity and the existence of phylogeographic structure in Cerasus serrulata. No dominant expansion events were detected. Finally, two geographic lineages were inferred. One was confined to the Qinling Mountain and the Taihang Mountains. The other was from the Wuling Mountains to the Jiangnan Hilly Regions, and then went northeast to the east coast of Asia. Taxonomic treatments of several related controversial species were reconsidered. C.
laoshanensis was suggested to be merged into C. serrulata. And a new variety, C. serrulata var. huangangensis, was preferred. Feng for suggestions on polishing up the manuscript.

CO N FLI C T O F I NTE R E S T
None declared.