Patterns of genetic variation and morphology support the recognition of five species in the Gaultheria leucocarpa Blume (Ericaceae) group from mainland China

Abstract Gaultheria leucocarpa and its varieties form a clade of aromatic shrubs that is widely distributed in subtropical and East Asian tropical regions. The group is taxonomically difficult and in need of thorough taxonomic investigation. This study focused on taxonomic delimitation within the G. leucocarpa group from mainland China. Field surveys covering the distributional range of G. leucocarpa in mainland China were conducted, wherein four populations from Yunnan and one from Hunan were found bearing morphological and habitat differences. A 63‐species phylogenetic tree of Gaultheria based on one nuclear and three chloroplast markers that included samples from the G. leucocarpa group was reconstructed with maximum likelihood to clarify the monophyly of the G. leucocarpa group. Taxonomic relationships among populations were investigated with morphology and population genetics, the latter by using two chloroplast genes and two low‐copy nuclear genes. Based on the sum of morphological and genetic analyses, we described three species of Gaultheria as new to science, clarified the taxonomic status of G. leucocarpa var. pingbienensis, elevating it to the species level, and resurrected G. crenulata and treated the varieties G. leucocarpa var. crenulata, and G. leucocarpa var. yunnanensis as synonyms of this species. We provide a key to the five species now recognized, along with descriptions and photographs.

Fang is the most widespread variety within this species, occurring nearly throughout the area of China south of the Yangtze River Basin and Taiwan (Fang & Stevens, 2005). It is characterized by glabrous branchlets and leaves, filaments with various trichomes, a pubescent ovary, and deep bluish black fruits at maturity (Fang & Stevens, 2005; Table S1). Because of high morphological variation, its taxonomy has been long debated. Taxonomic revisions have either elevated it to the species level or reduced it to a variety of G. leucocarpa. Hsu (1981)  is, G. leucocarpa var. yunnanensis and G. leucocarpa var. crenulata. The latter occurs south of the Yangtze River Basin and was distinguished from G. leucocarpa var. yunnanensis only by its glandular-hirsute twigs, petioles, leaf margins, and inflorescences (Fritsch et al., 2008).
However, analysis of plastid data indicated that samples of G. leucocarpa var. crenulata nested within those of G. leucocarpa var. yunnanensis; the analysis did not support G. leucocarpa var. crenulata as a variety (Li et al., 2020).
Another variety in mainland China with taxonomic controversy is G. leucocarpa var. pingbienensis, collected from Pingbian County, Yunnan Province (barcode KUN 1208603). This variety was differentiated by coriaceous elliptical leaves but treated as a synonym of G. leucocarpa var. yunnanensis by Fang and Stevens (2005) . Fritsch et al. (2008) placed G. leucocarpa var. yunnanensis in the synonymy of G. leucocarpa var. pingbienensis because they considered these to be the same variety and the varietal epithet pingbienensis has nomenclatural priority over the varietal epithet yunnanensis. In the description of Hsu (1981), no diagnostic characters for G. leucocarpa var. pingbienensis were mentioned except for what appears to be taxonomically trivial leaf morphology.
During 2017-2021, we conducted six field surveys wherein we collected 84 populations of G. leucocarpa throughout mainland China. In addition to representative populations of G. leucocarpa var. yunnanensis, G. leucocarpa var. crenulata, and G. leucocarpa var. pingbienensis, we found three more populations whose characters appeared not to match well those of the three named mainland varieties and also differed from one other; nor did they appear to resemble the other varieties of G. leucocarpa outside of mainland China. We collected these unusual populations from Wuliang Mountain of Jingdong county, Fenshuiling Divide of Luchun in Yunnan Province, and Mang Mountain of Yizhang County in Hunan Province, respectively.
In this study, we focus on the delimitation of the taxa of the G. leucocarpa group from mainland China. With phylogenetics and principal component analysis (PCA) analyses on both morphological and population genetic data, we investigate the taxonomic relationships among G. leucocarpa var. yunnanensis, G. leucocarpa var.
crenulata, G. leucocarpa var. pingbienensis, and the three unusual populations mentioned above, together with samples of two forms, that is, G. leucocarpa var. leucocarpa f. leucocarpa and G. leucocarpa var. leucocarpa f. var. cumingiana (Vidal) Sleumer, both of which were recognized at the varietal level in the classification of Middleton (1991Middleton ( , 1993

| Taxon sampling
Morphological, genetic, biological, and ecological characteristics may not always agree. A broader definition of a species should incorporate several criteria, especially in considering that different lines of evidence can be available for separating a given species from others (Dantas-Torres, 2018;Liu, 2016). In this context, we selected 11 populations from a larger pool of 84 populations of G. leucocarpa that were sampled across mainland China in a cpDNA phylogenetic study by Li et al. (2020). We integrated both molecular and morphological data to support the recognition of the number of species and their delimitation, with habitat information also considered. We To assess the monophyly of the G. leucocarpa group and reconstruct the phylogenetic relationships among its mainland members, we conducted phylogenetic analysis on all 11 populations above plus three adjacent-region populations within the G. leucocarpa group | 3 of 16

| DNA extraction, gene amplification and sequencing
Four DNA genic regions (nrDNA ITS and three cpDNA regions rpl16, matK, and trnL-trnF) were used for phylogenetic reconstruction of Gaultheria as in Fritsch et al. (2011). In addition, the cpDNA regions rpl33-psaJ and rpl32-trnL (Li et al., 2020), and the low-copy nrDNA regions AAT (aspartate aminotransferase, Gong & Gong, 2016) and LOC (an intergenic region newly screened) were employed for a population genetic analysis. Total DNA from leaf samples was extracted with the cetyl trimethyl ammonium bromide (CTAB) method (Doyle & Doyle, 1987). The PCR reaction mixture contained 10 μL of PCR Master Mix (2×) (Thermo Scientific), 9 μL of nuclease-free water, 0.5 μL of each pair of primers (10 ng/μL) (Table S3), and 1 μL of template DNA. The methods of PCR amplifications and procedures were performed as in Li et al. (2020); for primer annealing temperatures, see Table S3. The amplified products were directly sequenced with the Sanger method by using amplification primers and BigDye on an ABI 3730 capillary sequencer (Applied Biosystems).

| Phylogenetic analysis and population genetic clusters
We downloaded 260 DNA sequences of ITS, rpl16, matK and trnL-trnF from 64 species (two species of Eubotrys Nutt. as outgroup) based on Lu et al. (2019) from GenBank (Table S4). We generated 112 new DNA sequences from 28 samples among 14 populations of the G. leucocarpa group for a 92-terminal phylogenetic analysis (Appendix S1, Table S5). Sequences were aligned with the webbased version of MAFFT (https://mafft.cbrc.jp/align ment/serve r/) and manually adjusted. We reconstructed the phylogeny with maximum likelihood in RAxML 7.0.4 (Stamatakis et al., 2008), simultaneously generating 1000 bootstrap replicates under the GTRGAMMA model. To evaluate taxon/entity boundaries within G. leucocarpa, we compared the genetic divergence among populations of G. leucocarpa with the interspecific divergence among the sampled Gaultheria species using K2P-distance and p-distance, both of which were calculated by MEGA 6.0 with the default parameters and 10,000 bootstrap replicates.
The genetic structure of 110 individuals from 11 populations was analyzed with the cpDNA combined data from rpl33-psaJ and rpl32-trnL (Appendix S2) and the nrDNA combined data of AAT and LOC (Appendix S3, Table S6), both with a Bayesian clustering method implemented in STRUCTURE ver. 2.3.4 (Hubisz, 2010). This program assigns individuals to K subpopulations (clusters) based on an admixture model and a correlated allele frequencies model. To infer the "best" value of K, we first ran the analysis at each value of K from 1 to 10 with 10 replicate runs per value of K with a burn-in of 10,000 and 100,000 sampled MCMC generations. The results of preliminary runs were processed with Structure Harvester (Earl & Vonholdt, 2012). These runs suggested deltaK peaks for each of the nuclear and plastid datasets at K = 2. Results based on the nuclear data discriminated one genetic cluster in the MS population and another in all the remaining sampled populations. We then performed a STRUCTURE analysis on the nuclear dataset with the MS population excluded to obtain high resolution of the clusters within the remaining sampled populations, with deltaK peaks at K = 2 and 4 both suggested.

| Principal component analysis with morphological and genetic data
To clarify the taxonomy of the 11 populations of G. leucocarpa group from mainland China, we examined our collections from the field and other herbarium specimens. Morphological characters based on habit, vegetative organs, flowers, and fruit were measured and compared. From each population, data were taken on 10 individuals randomly selected in the field and three other herbarium specimens.
From these characters, 19 quantitative and six qualitative characters were used for principal component analysis. Most characters were selected on the basis of our 18-year taxonomic investigation and other work, which found these characters to be diagnostic for species delimitation of Gaultheria Lu et al., 2010;Middleton, 1991) absence of indumentum on calyx lobe margin, (5) calyx color (green vs. green flushed with red), and (6) corolla color (light whitish green vs. light whitish green flushed with red; Appendix S4). PCA analyses were performed with two datasets, that is, morphological data, and morphological data combined with genetic mutation sites. Genetic mutation sites were generated from the data of the STRUCTURE analysis that concatenated the two chloroplast regions rpl33-psaJ and rpl32-trnL and the two low-copy nrDNA regions AAT and LOC (Appendix S4) with the FactoMineR package in R 4.0.2.

| Sequence alignment, phylogenetic analysis, and genetic divergence
For the four-genic dataset of Gaultheria, the aligned DNA sequence matrix consisted of 4593 bp (with 35.5% GC content), of which 14.93% were variable sites and 8.99% were parsimony-informative characters (PICs). The most variable genic region was ITS, with 24.26% variable sites and 15.53% PICs, whereas the least variable region was rpl16, with 12.46% variable sites and 7.18% PICs. For the datasets of the 11 populations of G. leucocarpa from mainland China, the aligned DNA sequence matrix of the two-cpDNA combined dataset (rpl33-psaJ and rpl32-trnL) consisted of 1268 bp (with 30.7% GC content), of which 1.18% were variable sites and 1.02% were PICs, and that of two low-copy nrDNA dataset (AAT and LOC) consisted of 1407 bp (with 48.1% GC content), of which 8.52% were variable sites and 6.89% were PICs. In addition, a 31-bp insertion was found in the trnL-rpl32 region of the YLC samples, and a 65-bp insertion was found in the AAT region of the MS samples.
The best phylogenetic tree of Gaultheria from ML recovered a topology in which the samples of G. leucocarpa form a monophyletic group (BP = 100%, Figure S1, Appendix S5). This group is sister to the Diplycosia clade (see Kron et al., 2020). Phylogenetic relationships among most populations were resolved with bootstrap support greater than 80%. The G. leucocarpa group was divided into a clade comprising samples of the MS population, G. leucocarpa var.  yunnanensis, that is, DHY, DL, JWS, PB, and YGN, were poorly resolved, but they all form a clade with high support (BP = 93%).
The K2P-and P-distances were compared at the intraspecific level within G. leucocarpa and interspecific level within Gaultheria and found to be generally the same, and some were greater than or equal to those found between species of some Gaultheria (Appendix S6).

| Population genetic structure
Bayesian clustering analysis with STRUCTURE yielded a best-fit model where the highest ΔK of the two-gene nrDNA dataset is  2.9-4.8 mm long), and margins of bracts, bracteoles, and calyx lobe apices glabrous or rarely ciliolate (vs. ciliolate). MS resembles populations of G. leucocarpa var. yunnanensis in its flexuous stem, inflorescence position, calyx shape and color, and fruit indumentum but differs by leaf blade base deeply cordate and apex acute (vs. base shallowly cordateovate and apex acuminate), secondary veins 2 or 3 pairs (vs. 3 or 4), F I G U R E 1 STRUCTURE analyses on the low-copy nuclear and plastid gene datasets of the Gaultheria leucocarpa group from mainland China. (a-1) The best-fit model (ΔK) for the dataset based on concatenated two nuclear genes; (a-2) the best-fit model (ΔK) for the dataset based on concatenated two nuclear genes with the data of MS population removed; (a-3) the best-fit model (ΔK) for the dataset based on concatenated two-plastid-gene regions. (b-1) The plot of genetic clusters of the two-nuclear-gene dataset of all populations at K = 2 (each column/grid represents a population); (b-2) plots of genetic clusters based on the two-nuclear-gene dataset of all populations with MS removed at K = 2 and 4; (b-3) plot of genetic clusters based on the two-plastid-gene dataset of all populations at K = 2.  shows that G. leucocarpa var. pingbienensis (DWS), JD, and YLC tend to segregate from G. leucocarpa var. crenulata (WD). When we excluded the data of MS, we found higher resolution among the other populations (40.6% of the observed variables in the first two principal components; PC1 = 29.4%, PC2 = 11.2%; Figure 8b). PC1 is highly influenced by the variation in genetic data of JD and YLC, whereas PC2 is mainly influenced by the morphometric parameters with the highest coefficient values (i.e., corolla size, leaf blade texture, F I G U R E 2 Gaultheria crenulata (L. Lu et al. LL-2019-52 Figure 8c). Thus, MS segregates well along PC1.

| DISCUSS ION
The populations of the G. leucocarpa group from mainland China were found not only to be morphologically variable but also to form morphological and genetic clusters, suggesting that a taxonomic revision is warranted. The morphological differences were found in both vegetative and reproductive characters, in particular in the morphology of the bracts, bracteoles, and calyces ( Figure S2).
Moreover, the infraspecific K2P-and P-distances were found to be equal to or higher than interspecific distances of several clades within Gaultheria, suggesting that species-level recognition, rather than the current varietal recognition, is justified. G. leucocarpa var.  Another population from Yunnan, JD, is here recognized as the newly described species Gaultheria wuliangshanensis (Figure 7).
Although the monophyly of G. wuliangshanensis is not strongly supported (BP = 77%, Figure S1), both the population genetic structure and PCA analyses based on the concatenated morphologicalgenetic data support its separation from all other studied pop-

MS Others
Col.

Diagnosis:
Gaultheria wuliangshanensis is similar to G. crenulata in plant height and leaf size; they both also have ovate to ovatelanceolate leaves blades with auriculate-cordate base, and secondary veins with 3 or 4 pairs. However, G. wuliangshanensis differs in its thickly coriaceous leaf blade texture and the persistent style is longer. Moreover, G. wuliangshanensis has dense brownish glandular trichomes on branchlets and its leaf margins are sparsely hispid and serrulate. Notably, it has the largest corolla among all taxa of G. leucocarpa from mainland China (7.2-9.2 × 7-9.3 mm). The broadly ovate calyx lobes and the glabrous or rarely ciliolate margins of bracts, bracteoles, and calyces are also unique.
Current-year branchlets pale green or red, glabrous or densely The additional specimens examined of the five species described above are provided in Appendix S7.