Plastome sequencing reveals phylogenetic relationships among Comastoma and related taxa (Gentianaceae) from the Qinghai–Tibetan Plateau

Abstract Genus Comastoma (subt. Swertiinae, Gentianaceae) contains species, such as “Zangyinchen,” that are important herbs in Tibetan medicine. The phylogenetic relationship of this within Gentianaceae and the circumscriptions of its species have long been controversial with conflicting morphological and molecular data reported. Here, we used whole chloroplast genome sequences for Comastoma species and related taxa to reconstruct their phylogeny and clarify their taxonomic relationships. The results revealed that the length of all plastome sequenced varied from 149 to 151 kb and have high similarity in structure and gene content. Phylogenomic analysis showed that Comastoma is a monophyletic group, closely related to the genus Lomatogonium. The divergence time estimation showed that Gentianaceae diverged at about 21.81 Ma, while the split of Comastoma occurred at 7.70 Ma. However, the results suggested the crown age of species formation in this genus is after 4.19 Ma. Our results suggest that QTP uplift, the alternation of Quaternary glaciation and interglaciation, and monsoon changes might have acted as drivers of speciation in Comastoma.


| INTRODUC TI ON
As one of the most important global biodiversity hotspots, the Qinghai-Tibetan Plateau (QTP) and its adjacent regions harbor the world's richest recently diverged flora with high endemicity (Khan et al., 2018;Wu, 1988). The QTP is an important center of origin for many alpine taxa (Liu et al., 2002;Ren et al., 2015;Zhang et al., 2012) where lineages exhibit accelerated evolution as a consequence of the region's complex geological history (Muellner-Riehl, 2019;Spicer, 2017). Gentianaceae is mainly distributed in cold temperate regions and comprises ~80 genera and ~700 species worldwide (Ho & James, 1995). As the biogeographical source area for several large alpine lineages, the QTP mountains host 22 genera and about 419 species (Ebersbach et al., 2017;Favre et al., 2016;Ho & Liu, 2001). Of these, the genus Comastoma which are important traditional Tibetan medicine, widely used to treat hepatitis, liver fibrosis, and cholecystitis (Tang Li et al., 2008, 2020. Comastoma is named for the hairy bases of its corolla lobes, which has adaptive significance for reproductive success in the harsh environment of alpine regions . While there have been many taxonomic and systematic treatments of Gentianaceae, the origin and phylogenetic position of Comastoma remains controversial (Hagen & Kadereit, 2002;Kissling et al., 2009;Schonswetter et al., 2004;Yuan & Kupfer, 1995). For example, Yuan and Kupfer (1995) divided the subtribe Gentianinae into two independent evolutionary branches as Gentiana and Gentianella, placing Comastoma in the second branch and suggested that Comastoma is monophyletic (Yuan & Kupfer, 1995).

Furthermore, recent molecular phylogenetic surveys revealed
Comastoma, Lomatogonium, and Gentianella are not monophyletic and are located in one more derived clades (Xi et al., 2014).
Further, results from Xi et al. showed Lomatogonium and Gentianella are on the same evolutionary branch which is corroborated by evidence that these genera can cross with each other (2014). Xi et al.'s (2014) results were parallel to Liu and Ho (1996), who hypothesized that Comastoma has the closest relationship with Gentianella based on the embryological characteristics (Xi et al., 2014). Toyokuni, however, considered Comastoma as a genus most closely related to Lomatogonium and distant from Gentiana and Gentianella (1961). This hypothesized relationship between Comastoma and Lomatogonium has been supported by molecular phylogenetic investigations (Chassot et al., 2001). Similarly, based on flower morphological characteristics, Wu et al. (2003) treated Comastoma as a separate genus and suggested that it is more primitive than Gentianella in phylogenetic position and started a new debate about its evolutionary history and position.
Most of the phylogenetic hypotheses about Comastoma evolution have been based on a few chloroplast markers as well as the ITS region and some morphological characters (Toyokuni, 1961;Wu et al., 2003;Xi et al., 2014). These molecular phylogenies used noncoding plastid regions with uniparental inheritance to infer the true species tree for groups with complex evolutionary histories (Hung et al., 2009;Valcárcel et al., 2003).
Unfortunately, the most variable regions of the chloroplast genome were not known and even the most informative plastid regions did not have the resolving power of low-copy nuclear markers (Shaw et al., 2007). To compensate for the comparably low variation in plastid genetic markers, whole plastome sequences are required to construct robust species trees (Hollingsworth et al., 2011;McCormack et al., 2013). Studies have proven that whole chloroplast genome sequencing can resolve phylogenetic relationships at various taxonomic levels and while also elucidating the molecular evolution of plastome structure and function (Jansen et al., 2007;Moore et al., 2007Moore et al., , 2010. Such valuable information has shown the effectiveness of full plastome data to resolve broader level questions at family, order, tribal, generic, and species levels (Barrett et al., 2016;Givnish et al., 2016).
Utilizing the high-throughput sequencing technology and advanced statistical tools, we report for the first time a robust phylogeny and evolutionary history of genus Comastoma based on whole chloroplast sequences. The monophyly of Comastoma was tested in a phylogenetic context; if monophyletic, Comastoma species would form their cloud to the exclusion of species from all other genera.
To this end, we sequenced the whole chloroplast genomes of five species of Comastoma and 11 other species representing four genera from Gentianaceae. Also, we included 19 complete chloroplast sequences from the NCBI: as Swertia (4 spps.), Helenia conrniculata, and Gentiana (14 spps.). The final aim is to (1) investigate the functional and structural differences in plastome of Comastoma and its allied taxa and (2) provide a robust phylogeny and evolutionary history of Comastoma.

| DNA extraction, sequencing, and bioinformatics
Total genomic DNA was extracted following a modified CTAB protocol (Englen & Kelley, 2000) and was used to prepare Illumina sequencing libraries as described Thomson et al. (2018) and processed on an Illumina NovaSeq 6000 platform (Novogene) with 150 PE chemistry.

| Chloroplast phylogenomics and diversification analyses
All sequence alignments were performed with MAFFT v7.471 (Katoh et al., 2019). Two different datasets were used to reconstruct Comastoma phylogeny: The first one included the complete genome of chloroplast while the second dataset included only protein-coding sequences. Both the datasets included 35 ingroup and five outgroup species (details below) and were analyzed with maximum likelihood (ML) with IQ-TREE v1.610 (Nguyen et al., 2014) and Bayesian inferences (BI) statistics MRBAYES v.3.2 (Ronquist & Huelsenbeck, 2003). Optimal substitution models were assessed with jMODELTEST using Akaike information criterion (Nguyen et al., 2014;Posada & Buckley, 2004 To assess the evolutionary history, we calibrated the divergence time of Comastoma and its related groups as implemented in BEAST (Drummond & Rambaut, 2007). For this analysis, we only used the whole chloroplast dataset in a concatenated fashion setting GTR as a substitution model with lognormal relaxed clock (Thomas et al., 2007

| Plastome comparison
All 16 sequenced chloroplast genomes displayed a typical quadripartite structure, including a large single-copy region (LSC), a small single-copy region (SSC), and two reverse repeat regions (IRa and IRb; Figure 1; Figure 2). Plastome size varied from 149,001 to 151,699 bp.
The details of protein-coding genes were available in Appendix S1.
The number of PCGs in other species ranged from 131 to 133, where the gene content has very subtle variation. We found the highest GC contents (38.27%) in Comastoma polycladum and the lowest (37.65%) in T. volubile (Table 2). We also found gene loss in other species, for example, rps16 has been lost in nine species (except Comastoma) and rpl33 in three species (Gentianella azurea, Gentianella arenaria, and T. volubile). Similarly, the number of rRNAs is the same in all species; however, the tRNAs are slightly different, for example, trnS GCU was lost in Gentianopsis paludosa and Gentianopsis barbata; and trnV UAC was lost in T. volubile. Our results found that the variation of plastome length is mainly due to the expansion and extraction of LSC and SSC regions. The IR regions were relatively conserved (Figure 3).

| Phylogenomics and diversification time
Both the ML and BI statistics recovered trees with the same topologies based on the whole chloroplast genome. Similarly, we recovered congruent trees based on only the protein-coding sequence data. In all results, Comastoma species formed a clade sister to a couple of Lomatogonium species; together these taxa were sister to a clade containing other Lomatogonium species with Gentianella ( Figure 4). The

| Plastome comparison
Genome structure, gene order, and gene content were highly con- suggested to be the best strategy for genetic diversity analysis and molecular identification (Ni et al., 2015). In our study, we found the loss of trnS GCU in G. paludosa and G. barbata plastomes, but not G. paludosa var. aipina and G. barbata var. stenocalyx. We can explore more

| Phylogenomics of Comastoma
We found phylogenies based on whole chloroplast sequence data was more consistent and had higher bootstrap support values than using only CDS data. However, all the results showed Comastoma as one monophyletic group. Our result supports previous studies based on ITS sequences and embryological characteristics (Liu & Ho, 1996;Yuan & Kupfer, 1995). Our results are in contrast with Xi et al.'s results (Xi et al., 2014), where they found Comastoma as polyphyletic.
In (Xi et al., 2014), ITS-based phylogeny showed C. pedunculatum in one cluster but matK phylogeny recovered C. pedunculatum and C. polycladum with other groups as polyphyletic.
Despite our results substantiated the initial hypothesis that the genus is monophyletic by clustering all their five species, the interspecific relationship of species within the genus Comastoma showed inconsistencies based on different datasets. For example, C. pedunculatum, C. falcatum, and C. polycladum from a clade, but it is not clear whether C. falcatum or C. polycladum clustered with C. pedunculatum to the exclusion of the other. This distinction can be explained for two reasons. Firstly, evolutionary rates of coding and noncoding sequences vary and may contribute to discordant topologies (Tian & Li, 2002). Although some species Lomatogonium and Swertia for a clade in this study, more complete sampling of these genera are needed to fully understand the polyphyly of these genera. Secondly, we have included only one individual per species, which might lead to the inconsistency of results. This study supports that Comastoma is monophyletic, but there are not enough Comastoma species in this study, the clear phylogeny of Comastoma and Gentiana needs more in-depth study in the future.

F I G U R E 4
Phylogeny of Comastoma based on the maximum likelihood and Bayesian inferred trees (a) ML/BI tree based on complete chloroplast genomes of a concatenated data matrix. (b) ML/BI tree including only the protein-coding sequences. A support rate less than 100 has been shown on the branches, where the first value represents the support rate of ML-based statistics and the second value that of BI inferences The overall phylogeny of different groups of Gentianaceae revealed Comastoma clustering with two species in Lomatogonium (L. perenne and L. macranthum) in a more recently evolved branch.
Four species in Gentianopsis diverged earlier, so they have a more distant relationship with Comastoma than Lomatogonium. Interestingly, L. gamosepalum and L. carinthiacum clustered together with two species of Gentianella in the ML/BI tree (Figure 4a,b). Similarly, four species of Swertia not clustered in one group, for example, S. bimaculata has a closer sister relationship with H. corniculata rather than with the other three Swertia species. Although some species in Lomatogonium and Swertia for a clade in this study, more complete sampling of these genera is needed to fully understand the polyphyly of these genera. The 14 species of Gentiana formed a well-supported clade.
The overall topology of sect. Kudoa and sect. Cruciata is almost consistent with Sun's (2018) and  results. Support values for our topology were greater than those previously published (Sun et al., 2018;Xi et al., 2014;Zhou et al., 2018) because our whole plastome dataset includes vastly more phylogenetically informative characters.

| Calibrated divergence time of Comastoma
The origin, distribution, and differentiation of many species have a close association with the change in the geology of the QTP (Ebersbach et al., 2017;Ren et al., 2015;Wang et al., 2009).
Comastoma and its related groups are mainly distributed on the QTP. The uplift of the QTP is mainly divided into three stages.
Based on the molecular clock hypothesis, the differentiation time of Gentianaceae has been estimated to be about 21.87 Ma and the F I G U R E 5 Molecular dating based on the complete chloroplast genome sequence. "P" represents Pliocene, "Q" represents Quaternary divergence time between Comastoma and its related groups about 7.71 Ma. This time of differentiation is close to that of tectonic change of the QTP (Mosbrugger et al., 2018;Su et al., 2019;Xing & Ree, 2017 (Spicer, 2017;Xing & Ree, 2017). With its rich and diverse ecosystem, the QTP has become a natural place for the convergence and fusion of various biota. In this area, there are multiple biological types and complex floristic components, which are ecosystems that are sensitive to climate change, but it is also the diversification hotspot of many natural populations and the cradle of speciation (Mosbrugger et al., 2018;Muellner-Riehl, 2019 These factors led to the rapid radiation of the Comastoma. Changes in the monsoon cycle also presented new selective pressures asynchronous with the QTP uplift (Clift et al., 2008;Ji et al., 2005;Tada et al., 2016); such monsoon effects are often overlooked in biogeographic studies of this region. Therefore, we believe that the transformation of the monsoon climate may also be another essential driving force for the evolution of Comastoma and allied taxa.

| CON CLUS IONS
The whole chloroplast genomes of Comastoma and its related taxa were sequenced for the first time to establish a robust hypothesis about the monophyly of this genus and its evolutionary relationships with other Gentianaceae. Analyzing our plastomes with 19 others from NCBI revealed the similarity of structure and content for our 16 species, underscoring the recent diversification of this lineage.
Our results suggested that Comastoma is monophyletic and has the closest relationship with Lomatogonium through both ML and BI tree reconstruction. We further concluded that combined with the relevant geological and historical events, the uplift of the QTP, the alternation of Quaternary glaciation and interglaciation, and the change of monsoon may have created conditions inductive for speciation in the common ancestors of extant Comastoma. Additionally, we found that the whole chloroplast genome sequencing is an excellent strategy to better resolve the phylogeny of historically enigmatic groups at deep and recent divergences. We have substantiated the hypothesis of monophyly in Comastoma and suggest that such types of studies will provide helpful insight into solving similar problems in other groups.