Uncovering the genetic diversity of yams (Dioscorea spp.) in China by combining phenotypic trait and molecular marker analyses

Abstract Yam is an important edible tuber and root plant worldwide; China as one of the native places of yams has many diverse local resources. The goal of this study was to clarify the genetic diversity of the commonly cultivated yam landraces and the genetic relationship between the main yam species in China. In this study, 26 phenotypic traits of 112 yam accessions from 21 provinces in China were evaluated, and 24 simple sequence repeat (SSR) and 29 sequence‐related amplified polymorphism (SRAP) markers were used for the genetic diversity analysis. Phenotypic traits revealed that Dioscorea opposita had the highest genetic diversity, followed by D. alata, D. persimilis, D. fordii, and D. esculenta. Among the 26 phenotypic traits, the Shannon diversity indexes of leaf shape, petiole color, and stem color were high, and the range in the variation of tuber‐related traits in the underground part was higher than that in the aboveground part. All accessions were divided into six groups by phenotypic trait clustering, which was also supported by principal component analysis (PCA). Molecular marker analysis showed that SSR and SRAP markers had good amplification effects and could effectively and accurately evaluate the genetic variation of yam. The unweighted pair‐group method with arithmetic means analysis based on SSR‐SRAP marker data showed that the 112 accessions were also divided into six groups, similar to the phenotypic trait results. The results of PCA and population structure analysis based on SSR‐SRAP data also produced similar results. In addition, the analysis of the origin and genetic relationship of yam indicated that the species D. opposita may have originated from China. These results demonstrate the genetic diversity and distinctness among the widely cultivated species of Chinese yam and provide a theoretical reference for the classification, breeding, germplasm innovation, utilization, and variety protection of Chinese yam resources.

is one of the most popular D. opposita cultivars and has been used for more than 2000 years to treat conditions such as diarrhea, diabetes, and asthma (Peng et al., 2017). In China, yam resources are extremely rich, with a total of 65 species (Guo & Liu, 1994). D. opposita, D. alata, D. persimilis, D. fordii, and D. quinquelaba are widely cultivated (Huang et al., 2011). Yams are cultivated in all provinces, except Qinghai and Tibet, and include a large number of landraces.
However, yams have long been regarded as "orphan" or "neglected" crops despite their considerable edible and medicinal value and have received little attention or investment from researchers (Tamiru et al., 2017). Moreover, Dioscorea is mainly dioecious, rarely flowers, and has difficulty forming mature seeds (Bressan et al., 2011).
The development of medicinal ingredients from a few species of yam has long been emphasized in China (Cheng et al., 2020;Lebot et al., 2018;Li et al., 2018), but the analysis of the resource types and genetic diversity of yam is insufficient. In addition, most studies on the genetic diversity of yam are focused on D. alata (Arnau et al., 2017;Siqueira et al., 2014;Wu et al., 2019), while research on D. opposita, as the most popular species with the largest cultivated area in China, is rarely reported. In the long-term cultivation and domestication process, the varieties of yam are complex, and a single classification method has been difficult to identify, thereby causing confusion across various records and nomenclatures and even in the classification of some species. These factors seriously hinder resource conservation and the further utilization of yam. Therefore, studying the genetic diversity, genetic variation, and population structure of yam is highly important to its origin, distribution, resource utilization, parental selection, and development (Mignouna et al., 2003). To date, phenotypic traits, karyotype analysis, and DNA diversity have been used to describe the genetic diversity of yam germplasm (Cao et al., 2020;Kouam et al., 2018;Nemorin et al., 2013;Sartie et al., 2012). Phenotypic traits are important for the identification and effective utilization of germplasm resources (Daley et al., 2020). Although morphological traits are easy to measure, they are subject to many limitations and are particularly dependent on the environment. However, molecular markers are not affected by environmental factors and have been effectively applied in plant systematics, breeding, and gene resource assessment (Naval et al., 2010).
Simple sequence repeat (SSR) markers are widely used markers in the fields of ecology, biology, and genetics, with the advantages of codominance, high occurrence in genomes, and high polymorphism (Chapman et al., 2009). Sequence-related amplified polymorphism (SRAP) PCR markers target the open reading frame (ORF) and combine simplicity, reliability, moderate throughput, and convenient band sequencing. In addition, SRAP targets coding sequences in the genome and generates a moderate number of codominant markers (Li & Quiros, 2001).
SSR and SRAP markers have a high degree of polymorphism, which is useful for the identification of germplasm resources, and the application of these two markers has effectively produced a large amount of reliable genetic data (Dong et al., 2019;György et al., 2016). The methods described above have been widely used in the identification of genetic diversity and genetic relationships of yam germplasm resources (Anokye et al., 2014;Mignouna, Dansi, et al., 2002;Mignouna, Mank, et al., 2002;Silva et al., 2017). At present, there are some reports on the genetic diversity of D. alata in China (Wu et al., 2009(Wu et al., , 2019. However, there are a few reports on the genetic variation and structure of Chinese yam based on the combination of molecular and morphological markers for a wide range of germplasm locations.
The goal of this study was to clarify the genetic diversity of the commonly cultivated yam landraces and the genetic relationship between the main yam species in China. In the current study, 106 yam landraces and 6 wild resources from 5 species widely used for cultivation were collected in 21 provinces, and their genetic diversity, genetic relationship, population structure, and interspecific genetic relationship were comprehensively identified and evaluated by combining phenotypic traits with SRAP and SSR molecular markers. This study will provide the basis for the identification, classification, and breeding of Chinese yam landraces and provide a theoretical reference for the exploration of the origin and domestication of yam.
Landraces (106) were collected from farmers' fields, institutions, and markets in China, and wild resources (6) were acquired from mountainous regions. All accessions were planted in the yam germplasm resource garden of Jiangxi Agricultural University (Nanchang City, Jiangxi Province). Experimental planting was arranged in ridges on 10 April 2019, with a 20 cm distance between each individual plant and a 1.2 m distance between ridges. Tuber segments (80-120 g) were used as "propagules". Individual plants were supported by bamboo stakes. Standard weeding and agronomic measures were applied regularly to provide adequate plant growth conditions. Three replicates were performed for each accession, and 10 individual plants were planted in each replicate.

| Phenotype assessment
A total of 26 phenotypic traits of leaves, stems, flowers, aerial stems, tubers, and roots of yam were evaluated (Table 1), including 20 qualitative traits and 6 quantitative traits. The aboveground phenotypic traits were investigated 60-90 days after planting, the stem and leaf-related traits were investigated about 60 days after planting, and the traits such as flowers and aerial tubers were investigated about 90 days after planting. The investigation of the traits related to underground tubers was conducted in October and November after harvest. Phenotypic traits were observed in the field, and data recording was performed as previously described (Huang & Huang, 2013;IPGRI/IITA, 1997;Wang & Shen, 2014

| SSR and SRAP genotyping
Twenty-four SSR markers (Table S2) with polymorphic bands in all accessions were selected for further analysis from the initial 53 SSR markers that produced amplicons (Loko et al., 2016;Narina et al., 2011;Nemorin et al., 2013). The primers were synthesized F I G U R E 1 The geographical distribution of the different yam species and the number of yam accessions in different provinces (a) and images of leaves (b), stems (c), and tubers (d) of five yam species in China. The number indicates the quantity of all resources collected in each province, and the solid circles with different colors indicate different yam species. Bar = 1 cm by Sangon Biotech (Shanghai) Co., Ltd. PCR amplification reactions were performed using a master mix solution of 10 μl containing 5 μl of 2 × Master Mix Blue (TSINGKE, China), 0.25 μl of each primer (10 mM), and 0.75 μl of template DNA (20 ng/μl), and the remaining volume was supplemented with ddH 2 O. The following cycling parameters were used in the amplification reaction: first predenaturation at 94°C for 5 min, followed by 40 cycles of 30 s at 94°C, annealing for 30 s at 54°C, and 30 s at 72°C, and a final extension of 10 min at 72°C. The amplified PCR products were detected on an 8% nondenaturing polyacrylamide gel. Silver nitrate staining was employed, and images were captured for analysis.
Forty-nine different SRAP primers were obtained from the combination of seven forward primers and seven reverse primers (Li & Quiros, 2001 ; Table S3), of which 29 primer combinations with good repeatability and high polymorphism were selected for this study.
Each 14 μl PCR mixture consisted of 7 μl of 2 × Master Mix Blue, 0.35 μl of each primer (10 mM), and 1.4 μl of template DNA (20 ng/μl), and the remaining volume was supplemented with ddH 2 O. PCR amplification was performed under the following conditions: denaturation at 94°C for 5 min, five cycles of three steps: denaturation at 94°C for 1 min, annealing at 35°C for 1 min, and elongation at 72°C for 1 min. In the following 30 cycles, the annealing temperature was increased to 56°C, with a final extension step of 10 min at 72°C. The amplified products were analyzed through 3% agarose gel electrophoresis prepared in 1× TBE buffer. The gels were then visualized in a UV transilluminator (Bio-Rad GeL Doc XR+, USA) and photodocumented.

17
Place of roots on the tuber (PRT) 1 = All, 2 = Upper and Middle

21
Leaf length (LL) Average leaf length of six mature leaves (cm).

22
Leaf width (LW) Average leaf width of six mature leaves (cm).

23
Length-to-width ratio (L/W) Average leaf length/average leaf width.
TA B L E 1 Descriptors used for the phenotypic assessment of yam accessions in this study

| Data analysis
The survey results of 20 qualitative traits were classified and assigned different values according to Table 1. The distribution frequency of each classification was also calculated. Then, the Shannon diversity index (I) was calculated in accordance with the distribution frequency as follows: where p i represents the relative frequency of the ith phenotypic class of a trait (Kouam et al., 2018).
The maximum, minimum, average, standard deviation (SD), and coefficient of variation (CV) of six quantitative traits were calculated using SPSS 25.0 software. Then, in accordance with the overall average (x) and SD (σ), the quantitative trait data were divided into 10 lev- , in increments of 0.5 σ. In accordance with the phenotypic trait survey data, a matrix (1, 0) was constructed, and the registration at the ith level of a trait was 1; otherwise, it was 0.
The polymorphic bands of SRAP and SSR markers were labeled as present (1)  of phenotypic traits and molecular markers uses OmicShare, a free online platform for data analysis (www.omics hare.com/tools). Cluster analysis of the unweighted pair-group method with arithmetic means of the phenotypic traits and molecular markers was performed using MEGA software version 4.1 (Tamura et al., 2007). Based on the combined data of SRAP and SSR, the population structure of all accessions, D. opposita separately, and D. alata separately, was analyzed by Bayesian model in STRUCTURE software version 2.3.1 (Pritchard et al., 2000).
K (number of clusters) was estimated to be in the range of 2-10, and the software was run ten times to determine this value. Estimates were obtained with the Markov chain Monte Carlo (MCMC) method with 100,000 iterations followed by a burn-in period of 500,000 iterations.
STRUCTURE HARVESTER (Earl & Vonholdt, 2012), which determines the best K based on the probability of data given K and ΔK (Evanno et al., 2005), was used to estimate the most likely number of clusters (K).

| Analysis of qualitative and quantitative traits
Twenty qualitative traits showed great variability across all accessions, and the I values ranged from 0.09 to 1.03, with an average value of 0.650 (Table 2). For the five species, the I value of D. opposita was the highest, followed by those of D. alata, D. persimilis, D. fordii, and D. esculenta (Table 2). The I values for leaf shape, petiole color, and stem color were greater than 1 ( Table 2). The trait with the highest diversity was stem color (I = 1.030), while the I values of stem thorn and twining direction were the lowest (0.090, Table 2).
Similar results could be obtained from the distribution frequency ( Table 3). Among the five species, the CV of leaf length in D. fordii was the highest (23.97%), and that in D. esculenta was the lowest (4.39%).
The CV of leaf width in D. opposita was the highest (21.17%) and that of D. persimilis was the lowest (3.55%). The CV of the ratio of length-to-width of leaves was the highest in D. alata and the lowest in D. esculenta. D. persimilis had the highest CV of tuber length, and the highest CVs of tuber diameter and tuber fresh weight were both found in D. opposita. The CVs of tuber length, tuber diameter, and tuber fresh weight in D. esculenta were the lowest (Table 3). Table 3 shows that the range of variation in tuber-related traits in the un- Principal component analysis (PCA) was employed to analyze the phenotypic traits ( Figure 3b). As shown, the two principal components (PCs) accounted for 28.1% (PC1) and 11.4% (PC2) of the total variance, respectively. The PC1 was dominated by flowering, aerial tubers, stem wing, tuber skin color, tuber skin color under bark, leaf length, leaf width, tuber length, and tuber diameter. The PC2 combined leaf shape, distance between lobes, leaf margin color, stem spine, twining direction, length-to-width ratio, and tuber flesh weight (Table S4). Similar to the cluster dendrogram, the 112  (Table 7).

| Cluster analysis based on SSR and SRAP markers
Based on the polymorphic band data of SSR, SRAP, and SSR-SRAP, cluster dendrogram analysis was performed, and the clustering results of SSR and SRAP markers were relatively consistent ( Figure S1 and Figure 4a). Based on the cluster analysis of SSR data, the 112 accessions could be divided into four groups ( Figure S1A). In general, the accessions of each species could be distinguished, but there were a few accessions of the same species that were not clustered together. Similar to the SSR analysis results, based on the cluster analysis of SRAP data, the 112 accessions could be divided into five groups, and there were also some accessions of species that were not distinguished from other accessions (e.g., CY-256 and CY-257 belonged to D. esculenta, Figure S1B). To this end, this study used SSR and SRAP polymorphic band data for joint cluster dendrogram analysis. As shown in Figure

| Population structure analysis
According to the output from STRUCTURE HARVESTER, when ΔK was at a maximum, the optimal K value was 2 ( Figure S2A)  Table S6).

| Interspecific and intraspecific genetic differences in Yam landraces in China were discovered by combining phenotypic trait and molecular marker identification analyses
Phenotypic diversity is the external manifestation of genetic diversity, and it is the most basic method for germplasm selection and genetic background research (Mignouna, Dansi, et al., 2002 Mignouna, Mank, et al., 2002;Sartie et al., 2012;Zhang et al., 2019).
In this study, 26 phenotypic traits of 112 yam accessions from five species were analyzed, and the results showed that the five species showed high diversity. Among the five species, the highest genetic diversity was found for D. opposita, followed by D. alata, D. persimilis, D. fordii, and D. esculenta.
The five species of yam showed high differentiation in different organs (leaf, stem, flower, aerial tubers, tuber, and root), and some phenotypic traits could be used for species identification. For instance, stem wings could be used effectively to identify D. alata, and stem spines and stem counterclockwise rotation could be used effectively to identify D. esculenta, which is consistent with previous reports (Bressan et al., 2011). Based on phenotypic variation, the 112 accessions were clustered into six groups, which was basically consistent with classical biological classification (Pei & Ding, 1985).
Additionally, flowering is a very important breeding requirement in any crop, but the entire genus Dioscorea is characterized by dioecy, and most important yam varieties are cultivated for their edible tubers and do not flower (Girma Tessema et al., 2017;Renner, 2014  Previous research has shown that the flowering sex of yams is related to their yield. Tamiru et al. (2011) supposed that female yams mature early and produce tubers of excellent quality, but are less vigorous in growth compared to male yams and yield poorly under sub-optimal conditions. It may be that the male flowers withered easily and had little influence on underground tubers, so their yield and quality were higher than those of female plants. This may explain why the majority of male flowers were observed in this study.
Aerial tubers are an important organ of yam; they are also an effective means of nutritional reproduction and have been widely used in food or pharmaceutical applications (Asiedu & Sartie, 2010;Main et al., 2006). In this study, aerial tubers were found in two accessions  (Sewall, 1978). For SRAP marker analysis, the detected  Thus, the combined analysis of the two methods can identify the landraces of yam well (Denwar et al., 2019;Siqueira et al., 2014).
For instance, CY-3 (ZhuGaoShu) is a native variety that has been cultivated for 500 years in Jiangxi Province; its leaves look similar to those of D. persimilis, but the tuber grows similar to that of  (Maurin et al., 2016). Dioscorea is considered to be a monophyletic group originating from a common ancestor (Wu et al., 2014), which represents an early-diverging lineage of monocots just internal to Acorus (Hansen et al., 2007). However, there are still many different arguments about the origin, evolution process, and domestication process of Dioscorea. In this study, a total of 112 cultivars, landraces, and wild varieties of yam were collected in 21 provinces (cities)   The wild resources of D. persimilis were previously reported to be distributed in Hunan, Guangdong, Guangxi, Guizhou, and Yunnan provinces (Pei & Ding, 1985). We also collected accessions of D. persimilis in Fujian and Jiangxi provinces, which have a long history of yam cultivation. In addition, D. persimilis and D. opposita were closely related, and some accessions were highly similar based on phenotypic traits. The rDNA internal transcribed spacer (ITS) sequences also showed that D. persimilis and D. opposita were closely related (Liu et al., 2001;Wu et al., 2013). It has also been speculated that D. persimilis is mutant form of D. opposita (Liu et al., 2001). The wild resources of D. fordii are distributed in Zhejiang, Guangdong, Guangxi, Fujian, and Hunan provinces. This species has been widely cultivated for more than 200 years for its high yield and good resistance to stress. D. fordii may have formed from long-term domestication of wild species. In the current study,  and CY-209 from D. fordii were grouped with D. alata (Figure 3a,

| CON CLUS ION
The germplasm of yam species widely used in cultivation shows high intraspecific and interspecific diversity in China. Phenotypic and molecular markers are very effective tools to detect the diversity of yam. The best method to identify genetic differences is combining molecular and phenotypic data to obtain more information for ge-

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflicts of interest. Writing-review & editing (equal).

DATA AVA I L A B I L I T Y S TAT E M E N T
Supplementary data sets are available at the associated Dryad repository: https://doi.org/10.5061/dryad.gmsbc c2kw.