Population genetic structures of two ecologically distinct species Betula platyphylla and B. ermanii inferred based on nuclear and chloroplast DNA markers

Abstract Climatic oscillations during the last glacial maximum (LGM) significantly affected the distribution patterns and genetic structure of extant plants. Northeast China (NEC) is a major biodiversity center in East Asia, and the influence of historical climate change on NEC populations is critical for understanding species responses to future climate change. However, only a few phylogeographic studies of cool temperate deciduous tree species have been conducted in the area, and results are inconsistent for species with different niches or distribution areas. We employed multiple chloroplast and nuclear markers to investigate the genetic structure of two ecologically contrasting species, Betula platyphylla and B. ermanii, in NEC. Rare haplotypes were identified in the chloroplast genome of these species, and both exhibited high levels of nucleotide diversity based on a fragment of the nuclear gene G3PDH and microsatellites. Moreover, significant phylogeographic structure was detected for B. platyphylla, suggesting that these populations had recolonized from independent glacial refuges, whereas no genetic structure was found for B. ermanii. OPEN RESEARCH BADGES The nSSR datasets used in the current study and the table of pairwise FST (below diagonal) and its standardized F'ST (above diagonal) among 25 populations based on seven SSRs are available from the Dryad (DOI: https://doi.org/10.5061/dryad.230d176). Sequences generated from this study were deposited in GenBank under Accession nos. KY199568–KY200162 and MK819541–MK819970.

Indeed, recent population genetic analyses from several independent studies have identified multiple refuges for angiosperms, such as the Hengduan, Tianmu, Qinling, Dabashan, and Taihang mountains in China (Chiang et al., 2006;Chou, Thomas, Ge, LePage, & Wang, 2011;Huang, Hwang, & Lin, 2002;Huang et al., 2010;Qiu, Fu, & Comes, 2011;Zhang, Yan, Zhang, & Zhou, 2008;Zhang, Zeng, Shan, & Ma, 2012;Zhang, Chiang, George, Liu, & Abbott, 2005) and the Baekdudaegan in North Korea and South Korea (Chung, López-Pujol, & Chung, 2017;Chung, Vu, et al., 2018). Although tremendous progress has been achieved in recent decades, these studies have been concentrated in southern, southwestern, and northern China. In addition, studies on some widespread species, such as Acer mono var. mono, Phellodendron amurense, Allium neriniflorum, and A. tubiflorum, have indicated clear genetic differentiation between taxa from Northeast China (NEC) and other regions; thus, we speculate that NEC taxa constitute a relatively independent genetic clade (Liu et al., 2014;Yang et al., 2016;Yang, Zhou, Huang, & He, 2017). Northeast China is a megadiversity area with a complex topography including the Northeast Plain and major mountain ranges, such as the Changbai Mountains, Xiaoxing'anling Mountains, and Daxing'anling Mountains (China, 1998;Xu, Wang, & Xue, 1999) ( Figure 1a). Controlled by the high-latitude East Asia monsoon, the warm temperate climate of the south becomes a cooler temperate climate in the north. With these complex topographic and climatic gradients, NEC is a major biodiversity center in East Asia (Chen, 1998). The area has been divided into four distinct terrestrial ecosystems according to vegetation types, namely northern China flora, Changbai flora, Greater Khingan flora, and Mongolian grassland flora ( Figure 1b) (Fu, 2003). The major Quaternary glaciation appears not to have occurred in NEC; however, the climate was at least 6-8°C cooler and more arid during the LGM than it is today (Sun & Chen, 1991).
Among other congeners in NEC,B. platyphylla Sukaczev (2x = 28) has almost the same distribution as B. ermanii (4x = 56), though they have different ploidy levels. B. platyphylla is a pioneer species that is widely distributed in low-elevation regions and is usually a dominant species in forest systems. B. ermanii and B. platyphylla are all distributed from Siberia to Far East Asia, and NEC comprises the main and central range of the two species according Tsuda et al. (2016). These species belong to different subsets based on morphological features, but some sharing of cpDNA haplotypes has been detected (Tsuda & Ide, 2010). Moreover, STRUCTURE results indicate some admixture for the two species based on seven nuclear simple sequence repeats (nSSRs) (figure 1 in Tsuda et al., 2016), and the pairwise F ST or F' ST between them is low (table 2 in Tsuda et al., 2016). Therefore, the two birch species are ideal for addressing how differences in ecological tolerance or niche can potentially lead to divergent demographic histories and differences in population structure.
To obtain the genetic structures of the two congeneric species, we surveyed nucleotide variation patterns of chloroplast genomes for both species. As the chloroplast genome is maternally inherited and usually shows a low mutation rate, we also sequenced a fragment of the single-copy nuclear gene glycerol-3-phosphate dehydrogenase (G3PDH) and 10 nSSRs. The aims of the study were to (a) evaluate the population genetic structure of the two birch species and (b) assess whether these species in NEC survived in multiple refuges during the LGM. The findings of our study may provide insight into the demographic history of ecological systems in NEC.  (Table S1). We sampled 3, 1, and 2 populations of B. ermanii from the Changbai Mountains, Xiaoxing'anling Mountains, and Daxing'anling Mountains, respectively (Table S1). To avoid collecting clones, all these individuals were at least 100 m apart from each other; the material was dried with silica gel. In China, no specific permission is required for the collection of specimens of B. platyphylla or B. ermanii, which are widely distributed in NC and NEC, respectively. Fourteen samples of B. platyphylla were kindly provided by Herbarium Hallym University (HHU).

| Sample collection and DNA extraction
We undertook the formal identification of the plant material used in the study, and voucher specimens were collected for each population and deposited with Northeast Normal University Herbarium (NENU4081-NENU4101 for B. platyphylla and NENU4102-NENU4107 for B. ermanii). Total genomic DNA was extracted from dried leaves for each individual of B. ermanii using the Plant Genomic DNA kit (TianGen). In contrast, as the quality of B. platyphylla DNA extracted by the Plant Genomic DNA kit was poor, we used a modified CTAB method to extract genomic DNA from this species (Doyle, 1987).
Seven of these primer pairs (BP4, BP12, BP15, BP16, BP17, BE6, and BE12) were employed to analyze the genotypes of both species. Of the remaining three microsatellites, one (BE8) was scored for B. platyphylla, and two (BP10 and BE7) were genotyped for B. ermanii. PCR amplifications using these nSSR primers were performed in a 20-μL reaction volume including 50 ng of genomic DNA, 1 × PCR buffer (plus Mg 2+ ), 0.2 mM of each dNTP, and 0.5 μM of each primer, with each forward primer labeled with fluorescent dye (FAM, TAMRA, or HEX) (Invitrogen) and 1 unit (U) of Taq polymerase (Takara). The PCR amplification reactions were carried out using an ABI2720 Thermocycler (Applied Biosystems) according to the following procedure: an initial denaturation step at 95°C for 5 min, followed by 35 cycles of 30 s at 94°C, 30 s at an optimal annealing temperature, and 30 s at 72°C, and a final elongation step at 72°C for 8 min. The amplified fragments were mixed and resolved using an ABI 3730 DNA sequence (Applied Biosystems) capillary electrophoresis instrument.
We then determined genotypes based on the dosage of PCR product peaks (Tsuda et al., 2016) and carefully checked each locus to avoid errors. However, for some individuals, especially tetraploids, of B. ermanii, the peaks were insufficiently clear to determine allele copy number in this way.

| Chloroplast and nuclear gene sequencing
We examined polymorphisms in the chloroplast genomes of the two Betula species using 40 universal primer pairs (Table S2) (Table   S2), and extension for 90 s at 72°C, and a final extension at 72°C for 8 min. PCR was performed using an ABI2720 Thermocycler (Applied Biosystems). The PCR products were verified by 1.5% agarose gel electrophoresis and sequenced using an ABI-3730 analyzer (Applied Biosystems).
Because low nucleotide polymorphisms were observed across the chloroplast regions, we designed a primer targeting the nu-  a total of 179 and 351 positive clones were sequenced for B. platyphylla and B. ermanii using an ABI 3730 DNA analyzer (ABI), respectively (Table S1).

| Microsatellite data analysis
The number of alleles (N A ), genetic diversity (h) (Nei, 1987a), proportion of private alleles (N P ), allelic richness (Ar) (El Mousadik & Petit, 1996), fixation index (FIS), deviation from zero indicated significant deviations from Hardy-Weinberg equilibrium (HWE), were calculated based on SSR markers using FSTAT software (Goudet, 1995). To compare genetic diversity and differentiation (F ST ), tetraploid individuals were regarded as two diploid individuals using FSTAT software. This approach was employed in previous studies (Tsuda et al., 2016), and it demonstrated that the biases caused by duplication of the sample size in tetraploid species can be ignored.
Moreover, observed heterozygosity (Ho), total heterozygosity (Ht), the standardized value of F ST (F' ST ), and principal component analysis (PCA) were performed using GenoDive (Meirmans & Van Tienderen, 2004). Analysis of molecular variance (AMOVA, Excoffier, Smouse, & Quattro, 1992) was employed to evaluate how genetic variability is structured in different groups of these two species. When analysis was performed on a total of 315 individuals, according to Tsuda et al. (2016), the genotypes of B. platyphylla individuals were duplicated to show tetraploid-like genotypes using GenoDive software, which is not able to analyze samples of different ploidy levels.
The population genetic structure of the total samples and each species was analyzed using the Bayesian clustering program STRUCTURE version 2.3.3 (Pritchard, Stephens, & Donnelly, 2000). This program estimates the number of genetic clusters (K) within each species without consideration of sampling location. In genetic structure analysis of total samples, the genotypes of the B. platyphylla individuals were duplicated to show tetraploid-like genotypes. As ambiguous locus was found for a few individuals of B. ermani, we created another dataset of B. ermanii for STRUCTURE analysis according Hu et al. (2019). In the new dataset, we used missing data (−9) to replace the repeated genotypes when a tetraploid has two or three alleles. For example, if a tetraploid has three alleles, such as AABC, ABBC, or ABCC, we used ABC-9; if it has two alleles, such as ABBB, AAAB, or AABB, we encoded it as AB-9-9. To obtain the best K value for each species, we preassigned the number of K ranging from one to the number of populations for each subset with ten independent replicates. Each run was performed with 100,000 MCMC iterations and an initial burn-in of 100,000, as based on the admixture model and the correlated allele frequencies model. The final posterior probability of K, ln p(K), and Delta K (ΔK) was calculated using STRUCTURE HARVESTER (Earl & vonHoldt, 2012) to determine the most likely K value, as suggested by Pritchard et al. (2000) and Evanno, Regnaut, and Goudet (2005). Replicate runs were summarized using CLUMPP (Jakobsson & Rosenberg, 2007) software and visualized using DISTRUCT (Rosenberg, 2004) software.

| Nucleotide polymorphism and phylogenetic analysis
DNA sequences of both chloroplasts and nuclear genes were aligned using the software Clustal X version 1.81 (Thompson, Gibson, Plewniak, Jeanmougin, & Higgins, 1997) and adjusted and edited using BioEdit v7.1.3.0 if necessary (Hall, 1999). The genetic diversities of the two species were calculated using the program DnaSP v5 (Librado & Rozas, 2009), including the haplotype diversity (Hd), nucleotide diversity (π) (Nei, 1987b) and (θ) (Watterson, 1975), segregating sites (S), and number of haplotypes (h), with p values based on the chi-squared text or t test (two-tailed). In addition, the neutrality test statistics Tajima (1983) and Fu and Li (1993) were estimated for cpDNA and nuclear genes using DnaSP (Librado & Rozas, 2009). Relationships of the haplotypes of chloroplast genes obtained were joined and analyzed with statistical parsimony networks using TCS 2.1 (Clement, Posada, & Crandall, 2000), with a 95% connection limit for cpDNA, and indels were coded as the 5th state. The unprocessed networks produced by TCS were redrawn using the vector drawing software Adobe Illustrator CS5 (Adobe Systems Inc.). In addition, the population differentiation of the two species was evaluated in terms of both unordered alleles (GST) and ordered alleles (NST) using Permut software (Pons & Petit, 1996) with 1,000 random haplotype permutations. ML trees based on G3PDH were generated using RAxML (Stamatakis, 2014) with the GTRGAMMA model and 1,000 bootstrap replicates.
We grouped bioclimatic variables with strong collinearity (i.e., Pearson's correlation coefficient r ≥ |.80|) Chung, Vu, et al., 2018). The variables from these groups were then selected based on their relative contribution to the model (percent contribution, jackknife tests of variable importance) and the shape of their response curves. Four variables were selected: annual mean temperature (bio1), minimum temperature of the coldest month (bio6), mean temperature of the coldest quarter (bio11), precipitation of the warm-

| Population genetic structure based on nSSR markers
Both individual-based and population-based PCA revealed a separate pattern of genetic structure between the two species
The ML tree topologies obtained for each species based on the D3PDH gene were highly similar, with both being divided into two main clades. For B. platyphylla, although the populations from South Korea and the Qingling Mountains are slightly different from the NEC populations, no obvious genetic breaks were found ( Figure S2).
Interestingly, in the two clades of B. ermanii, 64 of 97 individuals with multiple clones were polyphyletic ( Figure S3).

| Genetic diversity
Genetic diversity was estimated based on microsatellite data at both the population and locus level (Tables S1 and S5 (Table S1).

| Current and past distribution of the two Betula species
We employed ecological niche models to infer the historical and po-

| D ISCUSS I ON
Previous studies have provided a framework for understanding the impacts of the LGM on current terrestrial ecosystems (Clark et al., 2009;Comes & Kadereit, 1998;Hewitt, 1999Hewitt, , 2004. In the present study, we investigated the genetic structure of two Betula species based on multiple nuclear and chloroplast markers. The distinct genetic structures of the two species indicated different demographic histories between them.

| Genetic diversity in B. platyphylla versus B. ermanii
In our study, only two nucleotide substitutions were detected in cpDNA sequences (Table S4). This may be because the chloroplast genome has a lower mutation rate than does the nuclear genome.
We propose that only a few chloroplast haplotypes survived in the refuge together with repeated expansions and contractions during the Quaternary climatic oscillation. Of course, the low genetic diversity in the two species could simply be a result of low sampling, with an average of five samples from each population, though with random sampling within populations. However, 1 and 7 haplotypes were detected among B. platyphylla and B. ermanii populations sampled in Japan, and the B. ermanii population was mostly dominated by three haplotypes (Tsuda & Ide, 2010). Extremely low regional genetic variation is commonly found, particularly in the northern portion of the range of temperate species. For example, Acer mono, Juglans mandshurica, and Eleutherococcus sessiliflorus also show low cpDNA diversity in NEC (Bai, Wang, & Zhang, 2016;Guo et al., 2014;Wang, Bao, Wang, Wang, & Ge, 2016). We therefore believe that the result is not simply an artifact but due to historical events. Indeed, simultaneous cpDNA haplotype extinction events in NEC provide evidence that many of these northern populations may have experienced a bottleneck during the LGM (Hewitt, 2004(Hewitt, , 2011. Ar and Ht values, representing genetic diversity, were higher in B. ermanii than in B. platyphylla populations (Table S1). The former was expected to have higher genetic diversity than the latter because it is a tetraploid species. If B. ermanii were found to be an allotetraploid species, supported by clones of an individual, distributed in two clades ( Figure S3), the level of allelic richness would be expected to be higher than that of a diploid species (B. platyphylla) because new alleles would derive by allopolyploidization. However, we cannot rule out the possibility of combined effects of autopolyploidy and admixture. We also suggest that the relatively high Ar and Ht of B. ermanii may be attributed to relatively broad LGM distributions and greater population connectivity in glacial and postglacial landscapes. Both gene diversity and allelic richness are generally higher in tetraploid birch species than in diploids (Tsuda et al., 2016).
Accordingly, it was not beyond the expectation that the B. ermanii populations have higher FIS values.

| Different genetic structures in B. platyphylla versus B. ermanii
Our study showed a slight range contraction during the LGM for both species (Figure 4). Ostryopsis davidiana (Tian et al., 2009), Mongolian oak (Zeng et al., 2015), and Juglans mandshurica (Bai et al., 2010). Although introgression may also explain this pattern and hybrid zones between B. pendula and B. platyphylla were detected in Siberia, we can still ignore this situation because no introgression in NEC was detected based on eighteen nSSRs or seven nSSRs in the study of Tsuda et al. (2016).
Although this study was not designed to collect samples from Japan, South Korea, and North Korea, the results do exclude the possibility of recolonization from one refuge because no difference in genetic diversity among the populations and low gene flow due to the isolation of distribution were found. In previous studies, no evidence of phylogeographic structure was detected in B. ermanii from other regions (Tsuda et al., 2016(Tsuda et al., , 2009).

| Multiple glacial refuges for B. platyphylla in NEC
In NEC, a total of 2,621 species were recorded, belonging to 128 families and 74 genera (Fu, 2003). In our study, the Changbai flora (northern Changbai Mountains) exhibited the greatest number of plant species (1,798) among the four floras in NEC (Figure 1). It is worth noting that the Greater Khingan flora also contains many plant species (1,497). The highest endemic species richness was also found in this area (19.9% for the Changbai flora and 15% for the Greater Khingan flora). High species richness and differences in species richness allowed us to propose a distinct demographic history in NEC. As Svenning (Svenning & Skov, 2007) (Provan & Bennett, 2008).

| CON CLUS IONS
Climatic oscillations during the Quaternary have led to the establishment of current terrestrial ecosystems. The high species richness and endemic species richness in the Changbai and Greater Khingan mountains suggest that these two regions may have different demographic histories. Here, phylogenetic analyses based on multiple chloroplast and nuclear genome markers demonstrate that populations of B. platyphylla in NEC had recolonized from two independent glacial refuges. More cold-tolerant species were likely able to survive at higher latitudes, leading to high levels of gene flow between geographically separated populations during the LGM, with no geographic structure detected for B. ermanii.

ACK N OWLED G M ENTS
We thank the three anonymous reviewers for their constructive comments that greatly improved our manuscript. We also thank Dr.
Lin-Feng Li for his suggestions on the data analysis and critical comments on the previous version of this manuscript. We thank Mingzhou Sun for sample collection. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

CO N FLI C T S O F I NTE R E S T
The authors declare that they have no competing interests.

AUTH O R S CO NTR I B UTI O N
HYW, XY, DXY, and LL conducted the experimental work and performed the analyses. HYW and XY drafted the manuscript. HXX participated in the design and coordination of this study, collected the samples, and helped draft the manuscript. All authors read and approved the final manuscript.

O PE N R E S E A RCH BA D G E S
The nSSR datasets used in the current study and the