A digital catalog of high- density markers for banana germplasm collections

most important food crop in the least developed countries (LDCs) as defined by the United Nations, ranked by total production and food consumption. a digital catalog of high- density for germplasm conserved at the access to subsets of diversity the potential to maximize conservation and use of climate- and to optimize strategies. The catalog is extendable with data from any banana collection and the software is easily deployable in other genebanks.


| INTRODUC TI ON
Crop plant diversity managed by genebanks is of great value in the context of the changing needs of agriculture (Smale & Jamora, 2020), but genetic and phenotypic information on this diversity is insufficiently available for most genebanks (McCouch et al., 2013. The advent of Next Generation Sequencing has enabled-at an ever-decreasing cost-the sequencing of reference genomes of many crops as well as high-density genotyping for large numbers of samples per crop. Genotyping is a powerful tool to help identify gaps or redundancies in germplasm collections, and when combined with phenotyping data, can be used to detect correlations between genome regions and agronomic traits. For some crops, massive sequencing and data processing have been undertaken, as shown in the rice, wheat and barley germplasm collections (Milner et al., 2019;Sansaloni et al., 2020;Wang et al., 2018). These approaches represent increasingly reachable targets for many genebanks worldwide, including the CGIAR international collections (Halewood, Lopez Noriega et al., 2018). are arguably the world's most important fresh fruit and are a major staple food for hundreds of millions of people in low-income countries. With an estimated world production of 158 million tons annually, the volume of gross banana exports is worth US$12.8 billion to exporting countries (FAOSTAT 2019). Furthermore, most of the global production is by smallholders for their own consumption or for local trade, making it the fourth-most important food crop in the least developed countries (LDCs) as defined by the United Nations, ranked by total production and food consumption.
In order to increase understanding of its complex genetics so as to boost crop improvement, the first whole banana genome sequence was released in 2012, for an accession belonging to the Musa acuminata species (D'Hont et al., 2012) (Table 1). This original reference has recently been supplemented with of a number of other Musa species Wang et al., 2019;Wu et al., 2016).
While genetic variant information is being produced at a fast pace through various projects and is increasingly processed via standardized bioinformatics workflows, one of the main challenges is the management of an increasing volume of raw and intermediate files that are difficult to handle for many applications. Bioinformatics workflows can produce millions of markers but need to be filtered in multiple ways according to analysis type or user perspective, and working with these data often presents challenges to those without capacity in bioinformatics. Online information systems coping with big data linked to germplasm collections are scarce (König et al., 2020;Mansueto et al., 2017;Raubach et al., 2020;Ruas et al., 2017). Moreover, lack of access to phenotypic information continues to be an additional factor limiting the use of plant genetic resources. Phenotypic data are complex-information on the context under which they were collected is indispensable, and the domain is continuously evolving (Germeier & Unger, 2019). Recognizing these challenges, the availability of easy-to-use, interoperable and flexible solutions to navigate high-density genotyping and phenotyping data online continues to be a key aim for genebanks' delivery of their mission of germplasm documentation and utilization.
In this study, we present an approach used to generate, store and disseminate a catalog of genetic variants of banana and plantain F I G U R E 1 Diversity of banana bunches at a germplasm collection exhibited at the National Research Centre for Banana (NRCB) in Trichy, India (with genebank curators at the back).

TA B L E 1 An overview of banana
maintained in the ITC, which is available at https://www.crop-diver sity.org/mgis/gigwa and is embedded in the genebank information system through which users can order available germplasm.

| MATERIAL S AND ME THODS
Material used to create the catalog mostly originates from lyophilized leaf tissues of young banana plants distributed by the ITC. Such tissue is the most convenient way to obtain DNA of an acceptable quality and quantity for high-throughput restriction enzyme-associated DNA sequencing methods, as for other omics techniques (Carpentier et al., 2007). Another advantage is that once in stock, the tissues are readily available for distribution, whereas in vitro material takes longer to obtain (i.e., an average of 2 months for proliferating tissues and 4 months for in vitro rooted plantlets).
The generated sequence-short reads from Illumina sequenc-

| RE SULTS AND D ISCUSS I ON
The diversity of edible bananas has been classified using genome  for selected subsets of accessions (Table 2). It offers access to datasets with sizes ranging from 245,285 to more than 7 million SNPs depending on the study.
While the system is optimized to explore a large volume of data, it enables efficient filtering options based on a full range of parameters, mostly genetics (e.g., chromosome location, missing data percentage, minor allele frequency, gene mutation effect) but not only. Accession details can be enriched with metadata such as passport data or agronomic traits (e.g., control vs. stress on gene expression analyses), which then become elements which can be filtered. The interface is designed to work with one or two groups of samples, a feature which, when the latter case is used in conjunction with genotype pattern filters, makes it straightforward to identify SNPs discriminating the groups (Figure 2). This is particularly useful to filter by taxonomy or a certain trait between contrasted genotypes to reveal unique alleles The concern that such high levels of genotypic and phenotypic information, associated with germplasm accessions, would enable new breeding techniques (NBTs) that would bypass the access and benefitsharing (ABS) arrangements linked currently to the distribution of physical material has generated much recent attention (Aubry, 2019;Halewood, Chiurugwi et al., 2018;Smyth et al., 2020). At the moment, any genebank user (e.g., researcher, breeder) can order plants and sequence them without further obligations, and many organizations have already made publicly available such datasets for a wide range of crops. As potential solutions are elaborated , an important and challenging crop to breed such as banana should not be ignored, as access to its genetic and phenotypic data may contribute significantly to its progress as a crop (Gaffney et al., 2020).
This catalog intends to provide open access to genomic resources in an equitable way, ultimately benefiting all, including those in low-income countries (Halewood et al., 2017). It should be noted that it does not include gene functions, but is linked to a genome browser from the banana genome hub which contains gene annotation for references banana genomes (Droc including fruit quality, are also still missing, which may inhibit adoption of improved hybrids (Thiele et al., 2020). Furthermore, new plant breeding techniques such as gene editing will have to be fine-tuned for banana, even if some encouraging perspectives have been recently published (Tripathi et al., 2019;Zorrilla-Fontanesi et al., 2020). Finally, regulation frameworks of edited crops are still to be legislated in many countries (Schmidt et al., 2020). While waiting for future policy options, training on the use of such catalogs should be strengthened, particularly for breeders in national programs in those low-income countries with supportive funding schemes.

| CON CLUS IONS
A digital catalog of genetic variants is available for banana and is directly linked to the diversity held in the ITC genebank. It is accessible online as a proof of concept for exploration and export of SNP datasets. We adapted the system with the objective of keeping the genetic information connected to the physical material maintained in the genebank. Users can browse genetic information, identify interesting material and order it online for further investigation and use in breeding programs. While many genebanks are wondering if managing high-density markers is in their scope, the GIGWA web application offers a simple and elegant solution. With a reasonable transaction cost, its framework can be extrapolated to any germplasm collection.
Challenges still need to be addressed. First, on a technical side,