Metabolite database for root, tuber, and banana crops to facilitate modern breeding in understudied crops

Summary Roots, tubers, and bananas (RTB) are vital staples for food security in the world's poorest nations. A major constraint to current RTB breeding programmes is limited knowledge on the available diversity due to lack of efficient germplasm characterization and structure. In recent years large‐scale efforts have begun to elucidate the genetic and phenotypic diversity of germplasm collections and populations and, yet, biochemical measurements have often been overlooked despite metabolite composition being directly associated with agronomic and consumer traits. Here we present a compound database and concentration range for metabolites detected in the major RTB crops: banana (Musa spp.), cassava (Manihot esculenta), potato (Solanum tuberosum), sweet potato (Ipomoea batatas), and yam (Dioscorea spp.), following metabolomics‐based diversity screening of global collections held within the CGIAR institutes. The dataset including 711 chemical features provides a valuable resource regarding the comparative biochemical composition of each RTB crop and highlights the potential diversity available for incorporation into crop improvement programmes. Particularly, the tropical crops cassava, sweet potato and banana displayed more complex compositional metabolite profiles with representations of up to 22 chemical classes (unknowns excluded) than that of potato, for which only metabolites from 10 chemical classes were detected. Additionally, over 20% of biochemical signatures remained unidentified for every crop analyzed. Integration of metabolomics with the on‐going genomic and phenotypic studies will enhance ’omics‐wide associations of molecular signatures with agronomic and consumer traits via easily quantifiable biochemical markers to aid gene discovery and functional characterization.


Importance of RTB crops
The annual global production of root, tuber, and banana (RTB) crops exceeds 1000 million tonnes (Food and Agriculture Organization of the United Nations, 2019) and feeds over 2 billion people worldwide (Scott et al., 2000) (Figure 1). RTBs are especially vital in the least developed countries where they provide ≥15% of daily calories and are a source of economic subsistence to over 750 million people (Kennedy et al., 2019). In Africa, the production of RTBs exceeds that for all other staples combined (Sanginga, 2015) and are the most important crops for direct human consumption. Over 30 000 RTB crop accessions are currently held in the genebanks of four CGIAR institutes with many further accessions in national and regional collections, representing the diversity currently available for breeding (Tay, 2013). Whilst the RTB crops are cited to have high yield potential (especially regarding calories per hectare production) when compared with other staples (cereals), the extent of diversity available for breeding cannot be capitalized upon due to limited knowledge on the biological potential of these accessions. In addition to the dearth of genetic resources, basic characterization such as phenotypic and agronomic traits, including growth and yield parameters, are scarce for a large proportion of accessions. Consequently, insufficient germplasm characterization and evaluation has hindered the exploitation of the available diversity within breeding programmes (Jansky et al., 2015). Depending on the RTB crop three factors have contributed, to a varying degree, to the current situation: (i) poor or under-representation of crop wild relatives in germplasm collections (Castañeda-Alvarez et al., 2016); (ii) high levels of accession duplication and misidentifications in the collections, particularly prevalent in clonal crop collections (yam up to 30% (Girma et al., 2012), potato varies from c. 4.5 % (Ellis et al., 2018) to c. 75 % (Huam an et al., 2000) across different subsets); and (iii) the poorly recorded assessment of germplasm diversity, which is especially complex in RTB crops due to crop wild gene flow via ennoblement, hybridization from overlapping natural and cultivation habitats, and genetic assimilation from vegetative propagation (Scarcelli et al., 2017).

Why metabolomics in breeding?
Agronomic and consumer traits can often be directly associated with metabolite composition (Bino et al., 2004), which favours the use of metabolomics to generate measurable biochemical signatures for characterization. Metabolomics approaches can provide a standalone technique when genetic mechanisms are not well understood (Price et al., 2017), as evident in RTB crops. Phenotypic evaluation of materials is required multiple times along the breeding pipeline and integration of metabolomics into current practices is advocated to greatly shorten the development time of new varieties, reduce costs, and provide unbiased phenotypic profiles for validation of genetic parameters (Fernie and Schauer, 2009), and has the potential of being a powerful approach for future precision breeding (Zivy et al., 2015).
Various different metabolomics approaches can be undertaken, generally encompassing untargeted metabolite profiling including broad-scale relative quantification of known and unknown metabolites and targeted profiling and absolute quantification of identified metabolites. As the accuracy of identification and quantification increases, so does the time required for analysis. Through integration with other 'omics to associate genotype with phenotype, the regulation of agronomic/ phenotypic traits (phenomics) at the genetic (genomics, epigenomics), transcriptional (transcriptomics), translational (proteomic) and metabolic level (metabolomics) can be dissected in a holistic systems biology manner to enhance the understanding of crop development and its responses to biotic and abiotic changes. The development of bioinformatics tools and resources has rapidly progressed alongside 'omics technologies to facilitate the integration and management of these large and complex datasets. However, the interpretation of integrated datasets is complex, requiring expertise and collaboration across many scientific fields, and remains the major challenge for multiomics investigations (Pinu et al., 2019;Misra et al., 2019). This system biology approach has already been applied to model crops such as tomato, rice, and wheat, in which metabolomics analyses have provided a richness of resources (Grennan, 2009;Perez-Fons et al., 2014) available to integrate with genetic breeding approaches. These resources rapidly accelerated progress for identifying trait markers (Schwahn et al., 2014;Li et al., 2016a;Sprenger et al., 2018), elucidation of biosynthetic pathways contributing to traits (Schwahn et al., 2014;Daygon et al., 2017), and validation of genetic/ metabolic prediction (Wei et al., 2018). For example, integrating genetic and metabolite markers for phenotypic traits of wheat has provided more robust signatures than either alone (Ward et al., 2015), and both were equally predictive for complex traits (Riedelsheimer et al., 2012).
Furthermore, metabolite markers are inherently affected by environmental factors and can provide more precise measures for crop trait variation compared with genetic markers. Metabolite markers can be stably inherited (Chan et al., 2010) and, as such, the metabolome can be viewed in an analogous manner to the epigenome, acting as a dynamic yet conserved network comprised from genetic and environmental influence. Consequently, when performing comparative analyses of crop growth under different environments, quantifying the contributions of biochemical signatures towards phenotype is often simpler than for genetic markers, especially in highly heterozygous crops, like RTBs. This gives rise to the potential to generate chemotype core collections (CCC) for use in breeding, in which material selection is based on fixation of a complement of biochemical signatures that could confer the desired characteristics more robust to environmental variation. This is contrary to genotypic core collections, in which breeding tries to fix gene variants that can then often harbour different traits under different environments. Furthermore, increased trait stability of CCCs would provide a suitable base for comparative GxE (Genotype 9 Environment) studies to elucidate environmental effects on crop production (Xu, 2016). CCCs would therefore complement genotypic core collections to facilitate localized precision breeding in the future. Despite these advantages, the deployment of enhanced cultivars directly from metabolomics-directed breeding is still limited, largely based on the slow uptake by breeders and the limited access to this technology, with the field still being listed as prospective but with the potential to be game-changing for future agricultural practice (Kumar et al., 2017).

Prospective societal impact
Given the role that RTB crops play in the livelihoods of millions of people in the least developed nations, improvement is paramount. On the whole, RTBs are primarily grown through small-holder farms with a large proportion of child and female labour and, therefore, the crops hold extreme importance for the most vulnerable portions of society.
Increasing the precision and speed of phenotyping during the breeding ladder ( Figure 2) would enable faster crop improvements and, therefore, a multitude of benefits: (i) enhanced agronomic, breeding efficiency and consumer traits (e.g. increased yields, increased flowering, reduced dormancy and bio-fortification) to tackle food insecurity and malnutrition, which are more prevalent in RTB growing regions; (ii) decreased fertilizer inputs and improved pest and disease resistance to lower production costs and increase incomes; (iii) increased abiotic stress tolerance to improve climate change adaptation and yields on marginal, saline or drought prone soils; and (iv) facilitate a better understanding of basic phenomena such as crop evolution/domestication, ploidy, and inheritance mechanisms for understudied clonal crops.

RESULTS AND DISCUSSION
Metabolomics approachgeneral screening The metabolomics workflow implemented and optimized for each crop was based on a general concept ( Figure 2). All plant materials collected were flash-frozen, lyophilized, and ground to a homogenous powder before undergoing metabolite profiling workflow to ensure consistent reproducibility. A common two-phase solvent extraction method was implemented to extract a broad range of metabolites from each type of sample. This standardized and widely used method also allowed rapid optimization of different tissue types. Furthermore, the partition into aqueous and organic phase allowed the independent analysis of polar and non-polar extracts, which simplified sample handling, chromatographic method development, and metabolite identification. During analysis, the requirements for extraction blanks, quality controls and internal standards were implemented to maintain consistency and good laboratory practices and enable normalization and batch correction (Fernie and Klee, 2011).

Database curation
The data generated can be deposited in public repositories addressing metabolomics in general (Metabolights, Dataverse, Metabolomics Workbench, Metexplore or Metabolonote) and/or crop specific database such as Cas-savaBase and MusaBase or PlantCyc. Initial fingerprinting via LC-MS was conducted on materials to enable a rapid screen of biochemical diversity, especially focussed on secondary metabolism as this is typically where the largest proportion of chemical diversity resides (De Luca et al., 2012). The bottleneck in many LC-MS based metabolomics studies is compound identification and use of the same chromatographic method meant data generated could also be used to guide the purchase of metabolite standards for LC-MS library generation. Typical fingerprinting screens were performed on methanol extracts and measured only one biological replicate for speed. A minimum of three biological replicates and at least two analytical platforms were used for untargeted studies, including study of both aqueous and organic extracts for more comprehensive coverage of the metabolome. For the identification of features/compounds detected during the untargeted analysis, quality controls representing a pool of samples for each species were used. Peaks detected during GC-MS and LC-MS analyses were identified using published libraries (e.g. NIST, GMD (Kopka et al., 2005), MassBank (Horai et al., 2010) etc.) and confirmed by authentic commercial standards to build a crop specific library. After database curation, automated analysis was possible for the whole dataset of each species and the identification process integrated as an element of the metabolomics data analysis pipeline. Nevertheless, manual curation was undertaken for each dataset to reduce matching errors. The analysis of isoprenoid derived metabolites, such as carotenoids and chlorophylls, was carried out using ultra high or high performance liquid chromatography coupled with a diode array detector (U/HPLC-DAD). As the composition of leaf and tuber materials has been reported extensively (Burns et al., 2003;Drapal et al., 2017;Price et al., 2018;Drapal et al., 2019b;Drapal et al., 2019c) and methods previously validated (Fraser et al., 2000;Nogueira et al., 2013), this was performed in a semitargeted mode in which the majority of compounds was quantified absolutely. This approach remains essential due to the intrinsic chemical nature of the photosynthetic pigments displaying a lack of amenability to MS.

Current progress in defining the metabolome of RTB crops
The database curated for banana, cassava, potato, sweet potato, and yam, currently includes over 300 identified metabolites (Table S1). Additionally, significant numbers of reoccurring unidentified features summarized as 'unknowns' were measured ( Figure 3 and Table S2). The metabolites identified in each crop present a broad range of the plant metabolome including amino acids, organic acids, compounds of the tricarboxylic acid (TCA) cycle, isoprenoid derived compounds, phenylpropanoids, sugars, fatty acids, sterols, and corresponding subfamilies. The metabolite libraries have been implemented in the current projects of the RTB programme, facilitating the assessment of biochemical diversity, with future intentions to aid the identification of trait biomarkers in the RTB crops. The limits of metabolite concentrations have been reported to include all the available quantitative range for use in targeted breeding. This is exploitable because extremes are often favoured in crop breeding to achieve the maximum gains and enhancements above the average range and contrasts with other databases reporting the average and/ or standard deviation.
Potato had the simplest biochemical profile with the presence of just 10 chemical classes (excluding unknowns); four of these related to primary metabolism. Sweet potato and banana comprised 13 and 16 chemical families, respectively, whilst the cassava and yam chemo-libraries sum up over 20 families of compounds (Figure 3a).
Sugars was the largest annotated chemical class in all crops. This is expected in sink/storage organs as for the tissues analyzed in the collection. Similarly, chemical classes related to primary metabolism (namely amino acids, organic acids and components of the TCA cycle) were also well annotated in all species. Potato's chemical composition presented the largest proportion of these primary metabolite sectors with sugars comprising more than the other crops representing the presence of higher starch quantity.
The divergence between crop compositions resided mostly in components related to secondary metabolism. For example, yams had a greater proportion of odd-chain fatty acids, which are rare in plants. Also characteristic of yam was the higher content and diversity of nitrogencontaining compounds such as amines, nucleobases, and catecholamines. Nevertheless, the catecholamine dopamine was vastly more abundant (up to one order of magnitude) in Musa. Triterpenoids also constituted a source of chemical diversity within the RTB crops with a more complex composition found in both cassava and yam. Whilst typically these compounds were detected in the leaf tissue of the accessions, yam tubers also presented significant amounts of sterols. Crude extracts of yam presented a range of triterpenoids, including cholesterol, reflecting the production of glycosylated steroidal saponins within this crop (Sautour et al., 2007). Similarly, cassava leaves showed an accumulation of amyrins and isomers, which are likely to represent the glycosylated pentacyclic saponins. High levels of b-carotene and xanthophylls were also observed for orange-fleshed lines of sweet potato and yam tubers, cassava roots, and Musa fruit, as to be expected. The largest diversity of phenolic compounds such as phenylpropanoids, coumarins, flavonoids and lignin/lignin oligomers was encountered in cassava and sweet potato, although for sweet potato many phenolics remain structurally elusive (level 3 unknown). Unknowns comprised over half of all metabolites measured ( Figure 3b) and ranged from approximately one-quarter to one-third of features recorded, for each individual crop following the analysis of crude extracts (Figure 3a). Distinguishing the chemical features detected via LC-MS, and turning these into distinct compounds was challenging and will require further work to determine whether each peak is of biological origin. Given that in typical LC-MS screening over 90% of features detected are not true metabolites (Mahieu and Patti, 2017;Aksenov et al., 2017), a conservative approach to limit false positives was chosen in which only unknowns that are well characterized (e.g. via MS/MS, clear UV-vis spectra) were included in the database. The drawback to this is that the true level of unknowns may be greatly underestimated in the current database. As to be expected, the unknowns that could be assigned to a compound class were predominantly secondary metabolites (Table S2). Unknowns have been given unique identifiers to allow on-going annotations of compounds for libraries and curation and updating of the database (Table S2).
The diversity of compound classes recorded was highest in yam and cassava, then banana, sweet potato, and lowest in potato (Figure 3a). This finding is not unsurprising, given that cassava was most intensively studied (most accessions and on all platforms) and yam is a multispecies crop and large biochemical diversity has previously been evidenced across the genus (Price et al., 2016). In line with this, yam presented the highest proportion of unknowns (c. 50%, Figure 3a); despite not undergoing LC-MS study as per the other crops. Sweet potato also had a comparably large proportion of unknowns (c. 45%) mostly comprising phenolic-derived compounds, which are likely to be conjugates (Drapal et al., 2019c). Accurate identification of such compounds has been shown to require comprehensive MS 3 fragmentation and is therefore beyond that typically conducted in current metabolite screening practices (Akimoto et al., 2017). Interestingly, even with the relatively extensive application of metabolomics to potato (Puzanskiy et al., 2017), a large number of unknowns still exists and is  (Table S2). Carbohydrate analysis is particularly complex, with high numbers of isomers and complex polymers that are likely to contribute to the lack of conclusive annotation. Level 3 unknowns detected in banana extracts were mostly sugars and phenolics. Furthermore, cassava had the lowest proportion of unidentified metabolites. Cassava material has been the most intensively studied area (subjected to all three analytical platforms and the largest number of tissues and accessions analyzed). This highlights that extensive analysis via diverse methods can elucidate unknowns and slowly conquer the challenge of identification, commonly touted as metabolomics' biggest hurdle.
Overall, the observed differences between crops' metabolite databases may be the result of the application of different analytical platforms to each crop within the modular pipeline. However, current observations match that expected from literature. Dominance of particular classes of compounds in each crop reflected the plasticity of plants metabolism to develop physiological features than can be linked to particular phenotypes.

Future developments
Presenting the ranges of metabolites recorded in a simple spreadsheet format enables the easy use of information regarding the comparative biochemical diversity of these under-characterized crops. All compounds detected represent a portion of the steady-state metabolome of the plant samples and can be used for untargeted data analysis to unravel the great amount of variation that can be used to guide breeding decisions. The system has proven to be robust over datasets even when measured months apart. Therefore, it is possible for future work to extend the platform from relative to proximate absolute quantification for many compounds through the generation of relative response factors to the internal standard (Cifkova et al., 2012) and subsequent correction following testing of extraction recovery. Therefore, the next step will represent the transition of the untargeted pipeline to a holistic semitargeted system. From this, data can be more informative for use in flux modelling and genome-wide reconstructions, which are essential for understanding the fundamental processes governing plant physiology (Kruger and Ratcliffe, 2015).
More elaborate sample preparations, such as solid phase extraction (SPE) and molecular recognition, via immunoaffinity, or imprinting, can be used to extend the breadth of metabolites captured and increase metabolome coverage. However, this would concurrently increase the number of unidentified compounds, which already represent a considerable proportion of the dataset (Figure 3b). Extensive structural elucidation via multistage MS fragmentation (MS n ) and/or coupling of LC to NMR platforms (e.g. LC-SPE-MS/NMR) or ion mobility (e.g. LC-IMS-MS) has not yet become routine, largely hindered by the high capital costs at outset, and expert knowledge required for data interpretation, which is labour intensive. That said, in recent years a great deal of progress has been made towards the accessibility of tools for computational interpretation of such data (Spicer et al., 2017;Tsugawa, 2018). Investments in automated structural elucidation of unidentified compounds have the potential to revolutionize metabolomics workflows by overcoming the current bottleneck of structural elucidation.
However, knowing the structure of a compound does not allow one to fully assess biological relevance. Recent years have seen a shift towards increased spatial resolution via mass spectrometry imaging and localization through cell sorting and laser microdissection etc., alongside flux-omics and longitudinal (time-series/developmental) applications. These applications evidence that contextualizing metabolomics data requires a detailed understanding of metabolic network dynamics and functional activity, which will become the next hurdle for the field.
Screening of complete germplasm collections will allow the establishment of a CCC that comprises the majority of biochemical diversity available. CCCs would therefore represent an advance in precision over morphological core collections and can be overlaid with genotypic collections to reduce and focus the selection on accessions with the highest prospects for successful transfer of desired traits, that is through overcoming genetic differences that do not translate through to phenotype and by encompassing biochemical traits not observed at the morphological level.

Outlook for metabolomics in breeding of RTBs
Future work appears set to capitalize on the synergy of pursuing a multiple 'omics platform for rapid progress during crop improvement and breeding. At the forefront of this pursuit is the combination of genomics and transcriptomics for breeding and trait understanding. Moreover, recently, metabolomics has been favoured to enhance precision during molecular phenotyping, and the utilization of such methods looks set to increase. Metabolomics can prove especially useful when tackling complex traits, that is those with many determinants, as the metabolome inherently reflects environmental factors and other stimuli such as chemical interactions. This is evidenced by the preference for elucidation of 'interactomes' such as the rhizosphere and volatile-ome of plants by incorporating deep sequencing of the microbiome (Hu et al., 2018;Jacoby and Kopriva, 2019) or atmospheric transformation of volatiles (Blande et al., 2014;Li et al., 2016b), respectively. Combining these measurements expands the biological system to the complete local environment and therefore characterization occurs at the ecosystem level. Improvement of RTB crops is vital for the attainment of the UN Sustainable Development Goals and improving livelihoods in the most deprived regions of the globe. In addition, the RTB crops show potential as scientific models for the analysis of complex genetic architectures, revealing the interplay between evolution and domestication in clonal crops.
Breeding and development for each of the RTB crops shows unique pitfalls and problems, yet each is widely grown due to the unique traits they present. The complexities that have hindered crop improvement and agronomic development for production of RTBs to date may also be the crops' largest saviours. In light of climate change, the large morphological plasticity, limited genetic assimilation, and resilience of these crops to extreme conditions and low technology agricultural systems provide the potential to adapt and overcome the impacts of global warming and, therefore, provide the incentive to increase research efforts towards these critically important understudied RTB crops. To ensure this, the breeding community needs to move beyond viewing metabolomics and other 'omics as a hypothesis-free service science to techniques that can be integrated to solve complex biological questions in a rapid, large-scale manner. Ironically, the initial characterization of plant genetic resources and diversity available is crucial to pose the biological questions for investigation and, as such, metabolomics can progress on both fronts.

EXPERIMENTAL PROCEDURES
Samples from in vitro cultures and plants grown in the field were harvested, flash-frozen with liquid nitrogen, and lyophilized to remove all water content. The samples comprised a collection of different tissues, for example leaf, root, tuber, stem, and fruit from each crop. The tissue samples were then ground to a fine powder and metabolites extracted. Sample preparation and extraction and the profiling procedure of the extracts was based on previously published protocols and optimized for each crop to account for the matrix effects of the respective tissue (Perez-Fons et al., 2014;Price et al., 2016;Drapal et al., 2017;Price et al., 2017;Price et al., 2018;Drapal et al., 2019a;Drapal et al., 2019b;Drapal et al., 2019c). To account for the difference in chemical properties of the metabolites, three different platforms were utilized in a modular manner for the screening process: ultra/high performance liquid chromatography with diode array detector (U/HPLC-DAD), liquid chromatography-mass spectrometry (LC-MS) and gas chromatography-mass spectrometry (GC-MS). The yam materials underwent GC-MS of both polar and non-polar extracts alongside HPLC-DAD of the non-polar phase. All other crops underwent GC-MS and LC-MS analysis on polar extracts and UPLC-DAD of non-polar extracts. Non-polar extracts from cassava and sweet potato were also subjected to GC-MS analysis.
The curation of crop specific libraries with identified metabolites followed the same workflow for both the GC-MS and LC-MS analytical platforms (Figure 3), whereas an established UPLC-DAD library was used for all crops (Fraser et al., 2000;Burns et al., 2003) with an extended version used for yam and sweet potato (Price et al., 2018). All features detected in the generated sample set were aligned and following statistical analysis, significant features were identified and confirmed with standards (Fernie and Klee, 2011). GC-MS data were processed via AMDIS (v2.71, NIST) whereas the alignment and filtering of chromatograms for LC-MS was achieved via metaMS (Wehrens et al., 2014;Franceschi et al., 2014). U/HPLC-PDA data were analyzed via Empower 2 TM software (Waters Corp.). Manual confirmation of the identified compounds was carried out (Table S1) and recurrent unidentified features that represent hypothetical compounds have been reported with unique identifiers per species (Table S2) (Bino et al., 2004). Normalization to internal standards and sample weight allowed relative quantification, concatenation of data from the platforms, and subsequent comparison between tissue types and species. For the UPLC, absolute quantification for the major photosynthetic compounds (b-carotene, violaxanthin, neoxanthin, phytoene, phytofluene, chlorophyll a, chlorophyll b, b-cryptoxanthin, lutein, antheraxanthin, and zeaxanthin) was achieved via comparison with dose-response curves of authentic commercially available standards. For carotenoids, for which an authentic standard was not available, quantification was based on standard curves of carotenoids with the closest chemical structure and spectral properties similarity. When compounds were detected on more than one analytical platform, the values reported in the database represent that of the maxima recorded and the analytical technique that proved to be more amenable was cited first. The database and pie-charts were created in Microsoft Excel 2013.
As the compiled dataset was comprised of numerous independent analyses undertaken over a three-year time-frame, the metabolite ranges reported for each crop differed in the number of samples analyzed and replicate measurements made. However, for each metabolite reported per crop a minimum of 12 measurements were taken and the validity and repeatability of measures were controlled within each independent study. Furthermore, analytical drift and different response factors were controlled platform-to-platform, batch-to-batch and study-to-study via the analysis of both reference sample (quality control) and reference metabolite (internal standard) to ensure robustness. discussion of data, Harriet Berry (RHUL) for validation of a subset of data sets and Michael Friedmann (CIP) for critical reading and suggestions.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest in accordance with journal policy.

AUTHOR CONTRIBUTIONS
EP, MD and LP-F generated the datasets, assembled the figures, compiled supplementary tables, and drafted the manuscript and devised the concept. DA, RB, BH, MR and RS selected plant materials, aided interpretation of results, and elaborated the manuscript. LABL-L selected plant materials, aided interpretation of results, coordinated across centres, and elaborated the manuscript. PDF aided interpretation of results, drafted and edited the manuscript, secured funding and devised the concept.

DATA AVAILABILITY STATEMENT
All data compiled for this resource paper are included in this published article (and its supplementary information files) and references to the original publications/data sets are cited.

SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article. Table S1. Database of metabolite concentration range per crop. Table S2. Lists of recurrent unknowns identified per crop.