A comparative plastomics approach reveals available molecular markers for the phylogeographic study of Dendrobium huoshanense, an endangered orchid with extremely small populations

Abstract Comparative plastomics approaches have been used to identify available molecular markers for different taxonomic level studies of orchid species. However, the adoption of such methods has been largely limited in phylogeographic studies. Therefore, in this study, Dendrobium huoshanense, an endangered species with extremely small populations, was used as a model system to test whether the comparative plastomic approaches could screen available molecular markers for the phylogeographic study. We sequenced two more plastomes of D. huoshanense and compared them with our previously published one. A total of 27 mutational hotspot regions and six polymorphic cpSSRs have been screened for the phylogeographic studies of D. huoshanense. The cpDNA haplotype data revealed that the existence of haplotype distribution center was located in Dabieshan Mts. (Huoshan). The genetic diversity and phylogenetic analyses showed that the populations of D. huoshanense have been isolated and evolved independently for long period. On the contrary, based on cpSSR data, the genetic structure analysis revealed a mixed structure among the populations in Anhui and Jiangxi province, which suggested that the hybridization or introgression events have occurred among the populations of D. huoshanense. These results indicated that human activities have played key roles in shaping the genetic diversity and distributional patterns of D. huoshanense. According to our results, both two markers showed a high resolution for the phylogeographic studies of D. huoshanense. Therefore, we put forth that comparative plastomic approaches could revealed available molecular markers for phylogeographic study, especially for the species with extremely small populations.

huoshanense has a long-life cycle and require a warm and humid habitat; however, due to their low germination rate, slow growing, habitat deterioration, and being overexploited, the wild populations of D. huoshanense are extremely in danger of extinction. As documented in Tsi (1999) and Wood (2006), D. huoshanense has a narrow geographical distribution, only distributed in southwest Anhui province, southwest Henan province, and a small part of Jiangxi province ( Figure S1). It has been listed in the "Conservation Program for Wild Plants with Extremely Small Population in China" (State Forestry Administration of China, 2012). So, it is urgent to assess the genetic diversity and distributional patterns of D. huoshanense. However, the narrow geographical distribution and small population size of D. huoshanense have resulted in great challenges for its phylogeographic study.
The noncoding sequences in chloroplast genomes (i.e., plastomes) are the most important genetic tools for plant studies at low taxonomic levels, especially for phylogeographic analysis (Beheregaray, 2008;Prince, 2015). As reviewed in Morris and Shaw (2018), more than two-thirds of the publications in the past 10 years employed noncoding chloroplast regions alone or in combination with nuclear DNA sequences or microsatellites. For instance, in Dendrobium orchids, Ye et al. (2016) evaluated the ecological and genetic processes of D. moniliforme by using two noncoding chloroplast (cp) DNA sequence (trnC-petN and trnT-trnE). Hou et al. (2017) elucidated the iteration expansion and regional evolution history of D. officinale and four related taxa by employing the sequence combination of nuclear ITS regions and three cpDNA regions (accD-psaI, trnC-petN, and rps15-ycf1). Unfortunately, though cpDNA sequence data use increase, most researchers continue rely on regions of relatively low variability. For example, the commonly used cpDNA regions, psbA-trnH, trnL-trnF, and trnL intron present lower variability than most of other regions (Morris & Shaw, 2018). Besides, most of cpDNA sequences do not provide sufficient resolution for the phylogeographic studies of plant species with extremely small populations.
Advances in next-generation sequencing (NGS) approach over the last decade has largely facilitated gathering of plastome sequence data in public databases, which led to the use of comparative plastomics approaches (i.e., comparisons of sequences variability among different regions at whole-plastomic level) to identify rapidly evolving regions appropriate for low taxonomic level studies use (e.g., Ahmed et al., 2013;Niu, Xue, et al., 2017;Shaw et al., 2014). Recently, numerous mutational hotspot regions have been screened in different orchid genera, for example, Cymbidium (Yang, Tang, Li, Zhang, & Li, 2013), Phalaenopsis (Shaw et al., 2014), Holcoglossum  and especially in the genus of Dendrobium. For instance, the five informative hotspot regions trnT-trnL, rpl32-trnL, clpP-psbB, trnL intron, and rps16-trnQ were selected for identifying species of Dendrobium (Niu, Zhu, et al., 2017). On the basis of comprehensive plastome-wide comparison, a total of 19 SNPs were used to design RT-ARMS primers for the authentication study of Dendrobium species (Niu et al., 2018). While the potential benefit of comparative plastomics approaches to low taxonomic level study has become increasingly clear, the adoption of such methods has been largely limited in phylogeographic studies. At least for now, plant phylogeographic studies in Dendrobium orchids remain dependent on the noncoding cpDNA sequences (e.g., Hou et al., 2017;Ye et al., 2016). Therefore, in this study, the plastome sequences of D. huoshanense were used as a model system to address two questions, as follows: (a) Could the comparative plastomic approaches screen available molecular markers? (b) If so, could these markers use to infer the phylogeographic relationships among the populations of D.
huoshanense. To address these questions, we sequenced two more complete plastome sequences of D. huoshanense and compared them with our previously published one. Based on the comparative plastomic approaches, a total of 27 mutational hotspot regions and six polymorphic cpSSRs have been selected to assess the genetic diversity and distributional patterns of D. huoshanense.

| Plant materials and DNA extraction
A total of 28 samples from five populations of D. huoshanense were used in this study (Table 1) The quality of obtained DNA was measured on NanoDrop 8,000 Spectrophotometer (Thermo Scientific). DNA samples that met the quality requirements (concentration >50 ng/μl, A260/A280 = 1.8-2.0, and A260/A230 >1.8) were used for sequencing. The genomic DNA of other samples were isolated from about 0.5 g of leaves using a standard CTAB DNA extraction protocol (Doyle & Doyle, 1987) modified by adding 2% PVP-40 to the buffer.

| Plastome sequencing, assembly, and annotation
The total genomic DNA of two individuals was sequenced using an Illumina Hiseq4000 sequencer with the pair-end strategy of 150 bp reads with an average 400 bp insert size. Approximately, 5.5 Gb raw reads were yielded for each species. The raw reads were trimmed with an error probability < .05 and by removing one nucleotide at both terminal ends, and then assembled on CLC Genomics Workbench 8.5.1 (CLC Bio) with the plastome of D. moniliforme (AB893950) as reference. The gaps and junctions between inverted repeat (IR) regions and single copy (SC) regions were confirmed by PCR amplification. Genes were annotated using DOGMA (Wyman, Jansen, & Boore, 2004) and tRNAscan-SE 1.21 (Schattner, Brooks, & Lowe, 2005). The exact boundaries of annotated genes were manually checked by aligning them with homologous genes of other plastomes in the genus of Dendrobium.

| Comparative plastomics analyses
The comparative plastomics approaches were employed to select the most informative regions, which could be used for the phylogeographic analysis of D. huoshanense. Three plastomes of D. huoshanense were compared by using D. moniliforme (AB893950) as reference. The complete plastome sequences were aligned with MAFFT program (Katoh & Standley, 2013) and adjusted manually in MEGA 5.2 (Tamura, 2011). The ambiguous sites were removed.
Then, we divide the aligned sequences into 372 bins with 400 bp per length using D. moniliforme as outgroup. DnaSP v5 was employed to count the numbers of single nucleotide polymorphism (SNP) sites and insertion-deletion (InDel) events for each bin (Librado & Rozas, 2009). Chloroplast simple sequence repeat (cpSSR) elements were also detected using GMATo according to the criteria that the "Mini-length" for mono-nucleotide and multi-nucleotide SSRs were set to be 8 and 5 units, respectively (Wang, Lu, & Luo, 2013).

| Population genetic analyses
A total of 27 mutational hotspot regions were sequenced from 28 samples of five populations of D. huoshanense (Table S1). The amplified sequences were aligned using MUSCLE (Edgar, 2004) with the "Refining" option implemented in MEGA 5.2. Gaps located at the 5′-and 3′-ends of alignments and all mononucleotide repeats were excluded. Indels were treated as single mutation events. We used DnaSP v5 to identify haplotypes, and calculated the haplotype diversity (H d ) and nucleotide diversity (P i ) for each population and for all samples. Within-population diversity (H S ) and total diversity (H T ) were determined using PERMUT version 1.0 (Pons & Petit, 1996).  Note: "Mountains" were abbreviated to "Mts.".

| Phylogeographic analyses
Phylogenetic analyses of the 11 haplotypes were assessed via Bayesian inference (BI) analyses. The best substitution model was "GTR + I+ G", which was determined according to the Akaike Information Criterion (AIC) in Modeltest 3.7 (Posada, 2008). The BI tree was constructed using MrBayes 3.2 (Ronquist et al., 2012). Two simultaneous runs were conducted, each consisting of four chains.
In total, chains were run for 2,000,000 generations, with topologies sampled every 100 generations. The first 25% of our sampled trees were discarded. The remaining trees were used to construct a majority-rule consensus tree and calculate posterior probabilities (PPs) for each node. In addition, we used NETWORK version 4.5 to build a maximum-parsimony median-joining network to visualize the phylogenetic relationships among all the 11 haplotypes (Bandelt, Forster, & Rohl, 1999).

| Structure analysis
The model-based program STRUCTURE was used to determine the genetic structure of five populations of D. huoshanense by using cpSSR data (Falush, Stephens, & Pritchard, 2003). We used a burnin of 500,000 and MCMC iteration number = 1,200,000, assumed an admixture model and correlated allele frequencies, and included no prior information on taxon identity; default values were used for all other parameters. The number of groups (K) was varied from 1 to 7 with 10 independent runs. The most probable number of clusters was estimated by calculating the natural logarithm of the likelihood function, and the ΔK statistic was estimated using Structure Harvester (Earl, Vonholdt, Earl, & VonHoldt, 2012).

| Genome features
The two newly sequenced plastomes of D. huoshanense were 148,786 and 149,186 bp in size, which were similar to the already reported plastome sizes of related Dendrobium species (e.g., Niu, Zhu, et al., 2017). The plastomes were circular and possessed typical quadripartite structure, which included a pair of inverted repeats (i.e., IR A and IR B ) (25,984 and 25,966 bp) and separated SSC (12,099 and 12,087 bp) and LSC (84,719 and 85,167 bp) regions ( Figure S2).
The overall AT content were 37.53 and 37.66%, respectively, indicating nearly identical levels among the Dendrobium plastomes.
Moreover, the overall genomic structure including gene number and gene order were also well-conserved. A total of 103 functional genes were encoded in the plastome of D. huoshanense, consisting of 69 unique protein-coding genes, 30 unique tRNA genes, and four unique rRNA genes. The sequence of eleven plastid NDH genes was compared with Apostasia odorata (NC_030722) which contains full set of functional NDH genes in orchids. Like other Dendrobium species, D. huoshanense also experienced the loss of plastid NDH genes. Among them, only ndhB genes in IR regions contain full reading frames, whereas other ten plastid NDH genes were truncated or completely lost.

| Comparative plastomics analyses of D. huoshanense
A total of 372 bins with 400 bp per length were achieved from three plastomes. The relationship between GC content and distribution of SNP and InDel in pairwise comparison of each bins were showed in Figure 1 and  (Table S3). Besides, six polymorphic cpSSRs were identified in those regions, which could be used to evaluate genetic structure of each D. huoshanense populations (Table S4).

| Sequence variation and genetic diversity
These 27 mutational hotspot regions were sequenced from 28 samples of five populations of D. huoshanense, with aligned length of 10,415 bp. Based on sequence variation analysis, a total of 11 haplotypes were identified, which contains 14 polymorphic sites, including 9 SNPs and 5 InDel mutations (Table S5, Table 2).
The AMOVA analysis showed that: (a) The genetic variation within populations (88.39%) was significantly higher than that among populations (11.61%). The estimated value of genetic differentiation (Φ ST ) was 0.1161. (ii) Among three groups, the total genetic variation was partitioned into 9.10% among groups, 4.85% among populations within groups, and 86.05% within populations (Table 3).

| Phylogenetic analysis
We performed the Bayesian inference (BI) analyses to infer the  Figure S3). and geographically wide distribution, which suggested its nature   huoshanense.

| Comparative plastomic approaches could revealed available molecular markers for the phylogeographic study of endangered species with extremely small populations
The term "extremely small populations" refers to a population having two essential factors: (a) with a narrow geographical distribution which has resulted from some negative external factors over a long time; (b) contains less individuals than the minimal number that required to prevent extinction (Chen et al., 2014). Indeed, most of the endangered species with extremely small populations in China are qualified as the biological resources important to the country's ecology, science, culture, and economics (Volis, 2016;Yang, Xiang, Zhang, Kang, & Shi, 2015). However, the two essential factors for species  (Niu et al., 2018). Moreover, to protect the endangered species and meet the great market demand, plant breeders have also raised the artificial cultivation size of D. huoshanense. However, until now, the phylogeographic relationship among D. huoshanense populations still remains unclearly due to the lack of available molecular markers.
In order to identify the most informative markers for orchid species, comparative plastomic approaches have been already used to assess sequence variabilities at different level studies, for instance, Niu, Xue, et al. (2017), Niu, Zhu, et al. (2017)  were located in the four regions, ccsA to ndhF, matK to 3′trnG, rpoB to psbD, and trnT to rbcL, which were identified as the most informative hotspot regions in Dendrobium plastomes (Niu et al., 2018).
Moreover, both molecular markers showed high resolution for the phylogeographic studies of D. huoshanense. These results indicated that comparative plastomic approaches could revealed available molecular markers for the phylogeographic study of endangered species with extremely small populations.

| Human activities have played key roles in shaping the genetic diversity and distributional patterns of D. huoshanense
In the current research, a total of 11 haplotypes were identified among 28 samples of five natural populations of D. huoshanense.
With carefully calculation and statistical analysis, two groups of opposite results have been revealed. (a) We have observed a high level of total genetic diversity (H t = 0.815), which implicated that D.
huoshanense has a long evolutionary history and wide distribution.
While the documented distribution of D. huoshanense was limited in Tsi (1999). Moreover, the haplotype diversity (H d ) was diversi- than other seed plants in China (e.g., Qiu, Fu, & Comes, 2011;Wang et al., 2009;Xu et al., 2010), which is in line with a low level of genetic differentiation in other Dendrobium orchids (e.g., Hou et al., 2017;Ye et al., 2016). These opposite results might be caused by two factors: the unique characters of orchid species and the influence of human activities. Generally, dust-like orchid seeds have the capacity to travel long distance through typhoons or tornadoes, promoting gene flow among populations (Adams, 2011;Li et al., 2008). However, due to the lack of endosperm and the requirement of mycorrhizal fungi for germination, orchid seeds falling close to their parental plants may have more chance to germinate (Chung & Chung, 2008;Fay, Pailler, & Dixon, 2015). Actually, only 5% seeds can be successfully germinated in the wild, contributing to limited gene flow (Phillips, Dixon, & Peakall, 2012;Swarts & Dixon, 2009). Dendrobium orchids commonly inhabit warm and moist habitat, being highly susceptible to environmental deterioration (Pinheiro, Cafasso, Cozzolino, & Scopece, 2015). However, Human activities, for example, destruction or alteration of the natural environment and the overex-

| Implications for phylogeographic study
The cpDNA sequences have been widely used in plant phylogeographic studies. However, with extensive comparative analyses of noncoding cpDNA sequences, Shaw et al. (2014) concluded that the most variable regions were diverse among different plant lineages.
This conclusion was confirmed by our previous plastomics studies in orchid species (Niu et al., 2018;Niu, Xue, et al., 2017;Niu, Zhu, et al., 2017), which indicated that marker screening is necessary before subsequent phylogeographic or other low taxonomic studies. Nevertheless, as reviewed in Morris and Shaw (2018), it is common for authors to indicated that they had screened several markers in phylogeographic studies, but they neither showed what markers were compared and why they chose the markers that they have used. Furthermore, phylogeographic studies to data have, however, faced extremely difficult to reveal the phylogeograpy of the species with extremely small populations (Gaos et al., 2016;Yang, Feng, & Gong, 2017). In this study, based on comparative plastomic approaches, two classes of molecular markers have been selected to inferring the genetic diversity or phylogeographic structure of D.
huoshanense. According to our results, the phylogeograpy of D. huoshanense has been well resolved. Therefore, we put forth two implications for future phylogeographic studies: (a) Molecular marker screening is necessary before phylogeographic studies, (b) comparative plastomices approaches could revealed available molecular markers for phylogeographic study, especially for the species with extremely small populations.

ACK N OWLED G M ENTS
We thank Mr. Xianglin He, the inheritor of intangible cultural her-

DATA AVA I L A B I L I T Y S TAT E M E N T
The DDBJ accession numbers for the two plastomes of D. huoshanense were LC493898 and LC493899, and DDBJ accession numbers for all chloroplast sequences LC494015-LC494038.