Genome-wide development of intra- and inter-specific transferable SSR markers and construction of a dynamic web resource for yam molecular breeding: Y2MD

Background Microsatellite markers represent a low-cost and efficient tool for rapid genotyping as compared to single nucleotide polymorphism markers in laboratories with limited resources. For the economically important yam species widely cultivated in developing countries, very few microsatellite markers are available and no marker database has been developed to date. Herein, we conducted a genome-wide microsatellite marker development among four yam species, identified cross-species transferable markers, and designed an easy-to-use web portal for the yam breeder community. Results The screening of yam genomes resulted in 318,713; 322,501; 307,040 and 253,856 microsatellites in Dioscorea alata, D. rotundata, D. dumetorum, and D. zingiberensis, respectively. Mono-, di- and tri-nucleotides were the most important types of repeats in the different species and a total of 864,128 primer pairs were designed. Furthermore, we identified 1170 cross-species transferable microsatellite markers. Among them, a subset of 17 markers were experimentally validated with good discriminatory power regarding the species and the ploidy levels. Ultimately, we created and deployed a dynamic Yam Microsatellite Markers Database (Y2MD) available at http://yamdb.42web.io/. Y2MD is embedded with various useful tools such as JBrowse, Blast, insilicoPCR, and SSR Finder to facilitate the exploitation of microsatellite markers in yams. Conclusions The present work is the first comprehensive microsatellite marker mining across several yam species and will contribute to advance yam genetic research and marker-assisted breeding. The released user-friendly database constitutes a valuable platform for yam breeders, especially those in developing countries.

Considering the low level of difficulty regarding the technical aspect of setting an SSR-based genotyping core facility, the relatively inexpensive cost of reagents, and the level of throughput, SSRs markers are quite more affordable and implementable in a standard laboratory (Herrera and Ghislain, 2013), especially those from low-income countries. Besides, SSRs can exhibit good discriminatory power, are randomly present in the genome, and can be located in the genic region, enabling the genetic dissection of traits of interest (Ellegren, 2004). Microsatellites also exhibit co-dominance with Mendelian inheritance, facilitating their traceability within different breeding populations (Normann et al., 2018). Therefore, owing to their simple identification and high reproducibility, breeders have preferentially relied on SSRs because they are easier to use than single nucleotide polymorphism (SNP) markers, which demand expensive charges and facilities (Wang and Wang, 2016).
Since the usage is based on a graphical user interface, the database offers the breeders, a userfriendly path for rapid identification of molecular markers without a pre-requisite expertise in command line scripting. A web database concentrates all information in one platform and assists the breeders to design markers related to a trait of interest.
Yams (Dioscorea spp.) represent an important staple food crop, feeding over 300 million people in tropical and subtropical regions (Mignouna et al., 2008). The biggest producer country is Nigeria with 70% of the global production, followed by Ghana, and Côte d'Ivoire (https://www.fao.org/faostat/en/FAOSTAT). The genus contains approximately 650 species (Virouel et al., 2018). As a cash crop, yams contribute to the income of more than 60 million people (Asiedu et al., 2010), with a production of 74 million tons and an economic market value of USD 21 billion estimated in 2020 (United Nations Statistics Division 2022). Besides, in West Africa where yam is the most cultivated in the so-called "yam belt" (from Western Cameroon to central Côte d'Ivoire) (Sarcelli et al., 2019), more than five million people directly benefit from yam culture (Mignouna et al., 2020); making this crop a valuable resource for food security and poverty alleviation in this part of the African continent (Darkwa et al., 2020).
To accelerate yam breeding programs, attention has been paid to the development of polymorphic DNA markers. Restriction Fragment Length Polymorphisms (RFLPs) were first developed to investigate the origin and phylogeny of cultivated Guinea yams and wild relatives by using chloroplast DNA (cpDNA) and nuclear ribosomal (rDNA) (Terauchi et al., 1992). Later on, Amplified Fragment Length Polymorphism (AFLP) and Random Amplified Fragment DNA (RAPD) were employed for genetic diversity assessment (Malapa et al., 2003, Egesi et al., 2007, Zannou et al., 2009), yam mosaic virus resistance markers identification (Mignouna et al., 2002a), and genetic linkage map construction (Mignouna et al., 2002b).
Furthermore, due to their high level of polymorphism and cost-effectiveness, SSRs were developed from cultivated and wild yam species (Terauchi and Konuma, 1994, Mignouma et al., 2003, Mizuki et al., 2005, Hochu et al., 2006, Tostain et al., 2006, Siqueira et al., 2011, Silva et al., 2014, Tamiru et al., 2015. However, certain yam species including D. batatas and D. opposita have not yet available SSR markers. More importantly, less than 100 validated SSR markers are available in yams to date and no information on their genome coverage is available. Knowing that yam breeding programs are mostly based in countries of the global South where SSRs are widely used (Tostain et al., 2007, Otto et al., 2015, Zawedee et al., 2014, Loko et al., 2016, Korsa et al., 2022, generating SSR markers will surely empower the breeders in their routine tasks such as paternity testing, hybrid identification, QTL detection, marker-assisted selection, genome-wide association study, etc. Meanwhile, with the help of short-, long-reads and chromosome conformation sequencing technologies, good-quality genomes were recently released in yams. These data enable the identification of genomic regions associated with agronomically important traits such as sex determination systems (Tamiru et al., 2017, Cormier et al., 2019, Sugihara et al., 2020, anthracnose resistance, tuber oxidative browning, tuber starch content (Bredeson et al., 2022), post-harvest hardening (Siadjeu et al., 2021).
To date, the genomes of four species v.i.z D. alata (Bredeson et al., 2022), D. rotundata (Tamiru et al., 2017), D. dumetorum (Siadjeu et al., 2020), and D. zingiberensis (Cheng et al., 2021) are publicly available. The genomes presented a high level of collinearity (Bredeson et al., 2022), indicating that SSRs might be highly transferable from one species to another, even likely, to those wild species that have not yet been sequenced. Therefore, having a comprehensive SSR database for the yam breeders is fundamental for pre-breeding and breeding activities.
The present study was designed to (i) mine the genome of four yam species for the identification single-species and cross-species transferable SSR markers, (ii) create primer sets for the markers, (iii) evaluate the discriminatory power of selected markers, and (iv) build a digital web resource to support yam breeding and genetic research.

Genomic sequence resources
A total of four genome sequences were retrieved from the National Center for Biotechnology

Identification of microsatellites and primer design
SSRs were searched using the Genome-wide Microsatellites Analyzing Tool Package (GMATA) (Wang and Wang, 2016) and primers were designed from the flanking sequences (200 bp on each side) to probe the genomic sequences. The genome assembly was imported into the GMATA 'SSR identification' module in FASTA format. The microsatellite search parameters applied for SSR detection are: minimum length (nt) : 1, maximum length (nt) : 6, minimum number of repeats : 5.
Using custom python scripts, only sequences with a size greater than 10 bp were retained. The SSR locus information files generated (.ssr) contain the start and end positions of the SSRs on the chromosomal sequence, the SSR template and the number of repeat units.
The genome assemblies in FASTA format and the output files (.ssr) generated by the "SSR identification" module were imported into the "Marker Designing" module of GMATA. Primer design parameters were as follows: minimum amplicon size: 100 bp, maximum amplicon size: 400 bp, optimal annealing temperature: 60 °C, flanking sequence length: 200 bp, maximum model length The identified SSRs were classified into genic and non-genic SSRs for D. alata and D. rotuntada for which genome annotations are available. Subsequently, functional enrichment of genes containing SSRs was performed with KOBAS-i webserver (Bu et al., 2021) in order to find out their biological significance.

In silico Polymerase Chain Reaction
The e-Mapping module was used to run the e-PCR algorithm (Schuler, 1997), to validate markers, generate amplicons and assign marker positions across the Dioscorea genomes. This module was also used for intraspecific (the two D. alata genomes) and interspecific marker mapping at the genome scale. In e-PCR, FASTA files containing all genome assemblies and newly developed markers created with the "Marker designing" module (.sts files) were imported as "Sequence File" and "Marker File '', respectively. The gap (-n) and max. indel (-g) parameters were set to 2 and 1, which means that two mismatches and one insertion are allowed. The output file (.emap) provides detailed marker amplification information with calculated amplification sizes and chromosomal targets and identifies single locus and multilocus markers, ultimately producing those that are potentially polymorphic (different sizes of PCR products).

SSR primer validation Plant materials, DNA isolation and PCR experiment
Leaf

SSR genotyping data analysis
The genotyping data served to perform an hierarchical clustering to depict the genetic relationships between the 16 accessions (Supplementary Table 2). The clustering was conducted in R v4.2.2 (R Core Team, 2022) following the Unweighted Pair Group Method with Arithmetic mean method using the function hclust() of the package base. The resulting tree was rendered with the function fviz_dend() of the package factoextra v1.0.7 (Kassambara and Mundt, 2020).

Construction and deployment of Yam Microsatellite Markers Database
The Yam Microsatellite Markers Database (Y2MD) database conception relied on three major steps detailed in Figure 1: (i) data compilation in SQL database, (ii) back and front-end web construction, and (iii) tools integration.
In brief, using the generated SSR markers datasets, the Y2MD SQL database was built. Thus, the Y2MD server was implemented using Linux, Apache, MySQL, and PHP (LAMP) web application platform. The user-friendly web interface was then developed with the aid of JavaScript, and the HyperText Preprocessor (PHP). To enable the users to conduct their own analyses conveniently, additional functionalities such as JBrowse (Skinner et al., 2009), Blast (Altschul et al., 1990)

Detection, distribution and characteristics of microsatellites detected on yam genomes
The genome sequences of four yam species were used for the identification of simple sequence repeats (SSRs

Development of genome-wide SSR primers
A total of 864,128 unique primer pairs were created from the 1,201,570 SSRs extracted from the four zingiberensis) followed by the consecutive numbers of the primers in the genomes.

Cross-species transferability of SSRs
First, we performed a pairwise comparison of the microsatellite markers between the four yam species (Supplementary  Figure 4a). Collectively we identified 1170 core SSR markers able to amplify all the four species and potentially more yam species.
At the intraspecific level, we compared the SSR maker data identified in the genomes of the two D.
alata cultivars (TDa95/00328 and Kabusa). The results showed that both cultivars had in common 77% of the total SSR markers (Figure 4b).

Validation of selected SSRs and genotyping analysis
Firstly, to validate the newly developed markers in yam species, we extracted 92 SSR markers from the literature routinely used for genotyping D. alata and D. rotundata. This validation was done using the e-PCR method. Of the 92 markers, 84 (91%) yielded PCR products on their respective reference genomes.
Next, a set of 18 interspecific markers were selected from 18 chromosomes and used for a wet-lab PCR validation in ten yam species (Supplementary Table 2; Supplementary Table 3). Out of the 18 selected markers, 17 produced amplicons (except the marker YMar128). Sixteen markers produced amplicons in at least six species over the ten (Table 1). Interestingly, the marker amplifications in diploid and tetraploid D. alata cultivars were in the range of maximum two or four alleles, respectively, suggesting the ability of those markers to potentially discriminate cultivars following the ploidy criteria.
For instance the marker YMar79 was able to clearly delineate diploid from tetraploid forms of D. alata by the expression of the number of allele inferior or equal to the number of ploidy (Table 1).
nummularia (marker YMar79), indicating that the cultivars of these species used in this study are polyploids (4x or more). Besides, a maximum of two amplified alleles were found for both cultivars of D. rotundata, suggesting that they are diploid types.

Development of Yam Microsatellite Markers Database
We built up a comprehensive database by integrating SSR markers and whole genome assemblies and annotations datasets (Figure 1). The Y2MD page contains data of four Dioscorea species and over 864,128 primer pairs. Also all SSRs data are downloadable in Excel, PDF, text, CSV format. The graphical user interface (GUI) menu has six tabs " Home", " Species", " Tools", " SSRs", " Download " and " About ".
The "Species" tab displays information on each species with the possibility to choose the SSR of choice according to the type of patterns, the number of repeats, the position, etc. Information on experimentally validated SSRs and cross-species transferable SSRs are available in the "SSRs" tab.
On the "Download" page, the links to the different genomes are available.
To enable the database user to perform some fundamental analyses, several functionalities were embedded in Y2MD among which JBrowse (Figure 6a), insilicoPCR (Figure 6b), SSR Finder (Figure   6c), and Y2MD Blast (Figure 6d). Thus, Y2MD empowered the user to find an SSR with SSR Finder, and proceed to a quick amplification check with insilicoPCR. Besides, the user can also blast his own sequence of interest onto a genome with Y2MD Blast tool. With JBrowse, the user is able to dive into the genome annotation map to find out potential genes related to a targeted SSR. Altogether, Y2MD offers a dynamic platform to the yam breeder to remotely use the available genomic resources in a GUI user-friendly manner. The website is freely accessible at http://yamdb.42web.io/.

Developing genome-wide SSR markers in Dioscoreaceae
Microsatellite markers proved to be effective to characterise yam germplasms worldwide (Obidiegwu et al., 2009a;Obidiegwu et al., 2009b;Siqueira et al., 2011;Tostain et al., 2006;Tamiru et al, 2015;Arnau et al., 2017;Cao et al., 2021). The present study enabled the extraction of quite high numbers of microsatellites in different yam genomes. The extracted SSRs were more concentrated in the telomeres as reported by Jian et al. (2021). Dioscorea zingiberensis (253,856 microsatellites) had fewer microsatellites as compared to the other yam species. Nonetheless, we observed a high correlation (R 2 =0.99) between the SSR numbers and chromosome lengths (Figure 3). Our findings are similar to the report of Dossa et al. (2017) in sesame where the relationship between chromosome length and the number of SSRs on each chromosome showed a strong correlation (R 2 = 0.94).
Overall, the SSR densities observed in yam species (from 529 to 662 SSR/Mbp) are similar to that of sesame (507 SSR/Mbp) while, lower than other species such as Arabidopsis (Arabidopsis thaliana) and rice (Oryza sativa) (875 and 807 SSRs/Mbp, respectively) (Lawson and Zhang, 2006;Dossa et al., 2017). Comparing two genome assemblies of D. alata (Kabusa and TDa95/00328) highlighted major differences in terms of SSR numbers and densities. We speculate that the completeness of the genome assemblies (Figure 2) and the genetic distances between the two accessions may explain these differences. Among the observed motifs, mononucleotides that account for 50% of SSRs are the most represented (Figure 3). However, they are not ideal targets for PCR markers (Clarke et al., 2001). Arabidopsis (Cavagnaro et al., 2010), sesame (Uncu et al., 2015), sorghum (Sorghum bicolor) (Yonemaru et al., 2009) and rice (Mccouch, 2002).

Cross-species transferability of the SSR markers
The identification of polymorphic SSR markers will support research on yam genetics and breeding, especially in developing countries. At the genome scale, we identified 1170 cross-species transferable markers in the studied four Dioscorea species. Previously, Tamiru et al. (2015) showed that crossamplification of SSR markers from D. cayenensis, revealed that 94.4% and 56.7% could be successfully transferred to D. rotundata and D. alata, respectively. Transferring markers between species is often considered a better investment than developing and testing new markers, especially when available funding is limited (Fan et al., 2013). Because in most breeding programs, different yam species are grown and evaluated together, therefore having SSR marker sets applicable on different species is highly useful. We also noticed that the number of cross-species transferable markers decreased significantly when D. zingiberensis was included (Figure 4) When several genome assemblies are available within the same species, in silico detection of polymorphic markers is resource-efficient in terms of time and cost. This approach has been successfully applied in sesame, spinach, and Perca fluviatilis (Dossa et al., 2017;Bhattarai et al., 2021;Xu et al., 2022). We compared the common SSR markers from the genomes of D. alata (TDa95/00328 and Kabusa) and obtained 93,92% polymorphic SSR markers. In other species such as potato, tomato and bell pepper, the levels of intraspecific polymorphisms appear to be rather limited (Stàgel et al., 2008), therefore much efforts were needed to develop informative SSR markers in these species. In D. alata, the high SSR polymorphic rates could be attributed to the dioecy nature of the species.
rotundata. In other species such as rice (McCouch et al., 2002), sesame (Dossa et al., 2017), and cowpea (Vigna unguiculata) (Jasrotia et al., 2019), a large number of markers have been identified, experimentally characterised and available for genetic studies and breeding. In this study, we extracted 92 markers from the literature (wet-lab validated markers) and most of them were found among the markers created in this study. As the e-PCR parameters do allow only two mismatches and one insertion for the primers, this could be partly responsible for the absence of PCR products for the missing wet-lab validated markers. This analysis also provided the exact locations of these SSRs in the yam genomes, which will enable researchers to select markers from different chromosomes in order to have a good discriminatory power.
Out of 18 selected SSR markers, 17 were able to amplify by PCR several yam genomes in this study.
The experimentally tested markers were from different chromosomes. They demonstrated a robust discriminatory power by clearly delineating the studied yam species and ploidy levels ( Figure 5). In yam species, the basic chromosome number is 20 and most species harbour different cytotypes (Scarcelli et al., 2005;Arnau et al., 2009;Abraham et al., 2013;Sughihara et al., 2021). Prior to our work, D. rotundata cultivars were reported to be mainly diploid using flow cytometry, isoenzymes, DArTseq SNP and a set of six microsatellite markers (Gatarira et al., 2021;Scarcelli et al., 2005).
Both D. rotundata cultivars investigated in this study were found to be diploid. Besides, the tested SSR markers also exhibited a promising accuracy even at intra-species level with a clear classification of diploid and tetraploid forms of D. alata cultivars. Based on the allele numbers, we report that yam species including D. transversa, D. abyssinica, D. cayenensis, and D. nummularia have cultivars with tetra-or higher ploidy levels ( Table 1) (Gatarira et al., 2021;Sughihara et al., 2021). Therefore, we provided in the present study, SSR markers that can be used at low cost for ploidy check and species differentiation in yams.

Deployment of Yam Microsatellite Markers Database for yam breeders
Many databases dealing with SSR marker data from various species have been made available to the public. However, no database is available to study yam microsatellites. A yam microsatellites marker database will be a useful platform for advancing genetic evaluation, genomic research, and yam breeding. The Pan Species Microsatellite Database (Du et al., 2020) containing SSR data from 18,408 organisms is limited by the lack of features such as ePCR, Jbrowse (Buels et al., 2016) and Blast (Altschul et al., 1990). This is also the case for the Tea Microsatellite Database (Dubey et al., 2020) and SSRome (Mokhtar and Atia, 2019). In this study, we compiled all generated microsatellite data into Yam Microsatellite Markers Database (Y2MD) (Figure 6). Y2MD is embedded with various useful tools such as JBrowse, Blast, SSR Finder and insilicoPCR. These tools will facilitate the localization, SSR detection and polymorphism analysis for existing and new experimental yam SSR markers/nucleotide sequences. Y2MD has been developed to be a very user-friendly database enabling all users, especially those with limited knowledge or resources in bioinformatics to analyse their SSR related data.

Conclusion
In this study, a graphical interface containing the SSR markers of four commercially important Dioscorea species was constructed and made available to the public via http://yamdb.42web.io/. To perform the database construction, microsatellites of four Dioscorea species were first extracted and primers were designed on the flanking sequences of these microsatellites. In total 1,201,570 microsatellites were extracted from all species with 864,128 designed primer pairs. Then, additional analyses were performed such as cross-species transferability and determination of polymorphic markers which are determining factors in plant genotyping. Experimental validation of selected markers were also performed to provide credence to our datasets. Overall, the SSRs developed in this work will be useful for genetic characterization of yam germplasms, quantitative trait loci (QTL) mapping and marker assisted breeding.
All of this information has been compiled into Y2MD and will be of great use for the yam breeding community, especially in developing countries. With the rapid development in genome sequencing projects, we expect more yam genomes to come. Hence, we will expand Y2MD to include new Dioscorea species as soon as their genome sequences are available. In addition, Y2MD will be enhanced by adding new tools and functionalities such as the integration of Primer3, introducing molecular markers associated with QTLs and new wet-lab validated SSRs.

Availability of Data and Materials
All SSR data generated and the genomes used are available online at http://yamdb.42web.io/. The genome sequence of Dioscorea alata cultivar Kabusa has not been made available publicly yet, but it can be shared upon request to the corresponding author.

Acknowledgment
Not applicable.

Funding
This research was financially supported by the AfricaYam Project (Grant OPP1052998-Bill and Melinda Gates Foundation).

Conflicts of Interest
The authors declare no competing financial interests.

Figure 5.
Hierarchical clustering tree depicting the relationships between ten yam species using 17 SSR markers. The tree was constructed based on the Unweighted Pair Group Method with Arithmetic mean method. Taxa labels were arranged as follows: scientific name, genotype name and ploidy.
Ploidy information was not mentioned for the species with undetermined ploidy. The scale bar represents the length of the tree branch.