A curated list of genes that affect the plant ionome

Abstract Understanding the mechanisms underlying plants’ adaptation to their environment will require knowledge of the genes and alleles underlying elemental composition. Modern genetics is capable of quickly, and cheaply indicating which regions of DNA are associated with particular phenotypes in question, but most genes remain poorly annotated, hindering the identification of candidate genes. To help identify candidate genes underlying elemental accumulations, we have created the known ionome gene (KIG) list: a curated collection of genes experimentally shown to change uptake, accumulation, and distribution of elements. We have also created an automated computational pipeline to generate lists of KIG orthologs in other plant species using the PhytoMine database. The current version of KIG consists of 176 known genes covering 5 species, 23 elements, and their 1588 orthologs in 10 species. Analysis of the known genes demonstrated that most were identified in the model plant Arabidopsis thaliana, and that transporter coding genes and genes altering the accumulation of iron and zinc are overrepresented in the current list.


| INTRODUC TI ON
Understanding the complex relationships that determine plant adaptation will require detailed knowledge of the action of individual genes, the environment, and their interactions. One of the fundamental processes that plants must accomplish is to manage the uptake, distribution, and storage of elements from the environment.
Many different physiological, chemical, biochemical, and cell biology processes are involved in moving elements, implicating thousands of genes in every plant species. Modern genetic techniques have made it easy and inexpensive to identify hundreds to thousands of loci for traits, such as the elemental composition (or ionome) of plant tissues. However, moving from loci to genes is still difficult as the number of possible candidates is often extremely large and the ability of researchers to identify a candidate gene from its functional annotations is limited by our current knowledge and inherent biases about what is worth studying (Stoeger et al., 2018;Baxter, 2020).
The most obvious candidates for genes affecting the ionome in a species are orthologs of genes that have been shown to affect elemental accumulation in another species. Indeed, there are multiple examples of orthologs affecting elemental accumulation in distantly related species, such as Arabidopsis thaliana and rice (Oryza sativa), including Na + transporters from the HKT family (Ren et al., 2005;Baxter et al., 2010); the heavy metal transporters AtHMA3 and OsHMA3 (Chao et al., 2012;Yan et al., 2016); E3 ubiquitin ligase BRUTUS and OsHRZs that regulate the degradation of iron uptake factors (Selote et al., 2015;Hindt et al., 2017;Kobayashi et al., 2013) and the K + channel AKT1 Lagarde et al., 1996). To our knowledge, no comprehensive list of genes known to affect elemental accumulation in plants exists. To ameliorate this deficiency, we sought to create a curated list of genes based on peer-reviewed literature along with a pipeline to identify orthologs of the genes in any plant species and a method for continuously updating the list. Here we present version 1.0 of the known ionome gene (KIG) list.

| MATERIAL S AND ME THODS
The list includes all functionally characterized genes from the literature that are linked to changes in the ionome. Criteria for inclusion in the primary KIG list were as follows: 1. The function or levels of the gene are unambiguously altered (i.e., a confirmed knockout, knockdown or overexpressor). For double mutants, both genes are listed.
2. The levels of at least one element are significantly altered in plant tissue.

Publication in the form of a peer-reviewed manuscript.
Note that our definition excludes genes that are linked to metal tolerance or sensitivity but do not alter the ionome, or genes where the levels of the transcript are correlated with elemental accumulation. In order to identify the KIG genes, we created a Google survey that was distributed to members of the Ionomicshub research coordination network (NSF DBI-0953433), as well as advertising on Twitter and in oral presentations by the authors. We asked submitters to provide the species, gene name (or names where alleles of two genes were required for a phenotype), gene ID(s), tissue(s), element(s) altered, and a DOI link for the primary literature support. Subsequently, authors FKR and LW did an extensive literature search.

| Creating the inferred orthologs list
The known ionome gene list contains known genes from the primary list and their orthologous genes inferred by InParanoid (v4.1) pairwise species comparisons (Remm et al., 2001). The InParanoid files were downloaded from Phytozome for each organism-to-organism combination of species in the primary list, plus Glycine max, Sorghum bicolor, Setaria italica, Setaria viridis, and Populus trichocarpa.
Orthologs of the primary genes were labeled as "inferred" genes.
If a primary gene was also found as an ortholog to a primary gene in another species, the status was changed to "Primary/Inferred" in both species. It is important to note that only primary genes can infer genes; inferred genes cannot infer other genes. The pipeline for transforming the primary list into the known ionomics gene list can be found at https://github.com/baxte rlab/KIG.

| Gene Enrichment analysis
Overrepresentation analysis (released July 11, 2019) was performed on the primary and inferred genes in A. thaliana using the GO Consortium's web-based GO Enrichment Analysis tool powered by the PANTHER (v14) classification system tool (Ashburner et al., 2000;Mi et al., 2017;The Gene Ontology Consortium, 2017). We restricted overrepresentation analysis to A. thaliana because of its dominance in the KIG list and our lack of confidence in the functional annotation of the other species on the list. An analysis performed by Wimalanathan et al. (2018) found that maize gene annotations in databases like Gramene and Phytozome lacked GO annotations outside of automatically assigned, electronic annotations (IEA). IEA annotations are not curated and have the least amount of support out of all the evidence codes (Harris et al., 2004). A. thaliana annotations come from a variety of evidence types, showing a higher degree of curation compared to maize (Wimalanathan et al., 2018). The whole-genome Arabidopsis thaliana gene list from the PANTHER database was used as the reference list.
We tested both the PANTHER GO-slim and the GO complete datasets for biological processes, molecular function, and cellular component. GO-Slim datasets contain a selected subset of terms that give a broad summary of the gene list, whereas the complete dataset contains all the terms returned for a more detailed analysis.
The enriched terms (fold enrichment >1 and with a false discovery rate <0.05) from the complete dataset were sorted into five specific categories relating to the ionome based annotation terms: 1. Ion homeostasis -terms include homeostasis, stress, detoxification, regulation of an ion 2. Ion transport -terms specifically state transport, export, import or localization of ion(s). Does not include hydrogen ion transport 3. Metal ion chelation -terms relating to phytochelatins, other chemical reactions or pathways of metal chelator synthesis 4. Response to ions-vaguely states response to ions, but does not have any parent annotation terms that offer any more clarification (ie. stress response). Broadly this is referring to any change in the state or activity of cell secretion, expression, movement, or enzyme production (Carbon et al., 2009) 5. Other transport-annotation stating the transfer of anything that is not an ion (glucose, peptides, etc.) Genes may belong to more than one category, but if they belong to a parent and child term in the same category, they are only counted once.  (Kim et al., 2006;Zhang et al., 2012), and the vacuolar Mn transporters AtMTP8 and OsMTP8 (Eroglu et al., 2016;Chen et al., 2013). Thus, we can reliably generate inferred genes and create a species-specific KIG list for any species in PhytoMine.

| RE SULTS
The primary list covers 23 elements (Figure 2) according to the reported elements from authors in the primary list, which is more elements than predicted by the GO term annotations for those genes. Some GO annotations for these genes mention only a portion of elements listed by the literature on the primary list. This may be due to GO annotation evidence codes lacking curation or biological data (IEA, ND, NAS) (Wimalanathan et al., 2018), or it may be due to alterations in one element leading to alterations in other elements Baxter, Vitek, et al., 2008).  There is a bias toward manganese, zinc, and iron which have two, three, and four times more associated genes than the average 13 ± 12 genes of other elements. Iron is the only element to contain genes from all five species in the primary list. In addition to biases toward certain elements, our primary list is also skewed toward an overrepresentation of ionome genes in above-ground tissue studies (Figure 3). This is likely due to the difficulties in studying the elemental content of below-ground tissues. All M. truncatula genes come from studies of the nodule in this model legume species.
Querying the manually curated PANTHER GO-slim biological process database (PANTHER v14.1, released March 12, 2019) and the GO complete biological process database (GO Ontology database, released October 08, 2019) with the A. thaliana KIG genes returned significantly (FDR < 0.05) overrepresented annotation terms related to the transport, response, and homeostasis of iron, zinc, copper and manganese ions. Additionally, the GO complete results had terms for cadmium, nickel, cobalt, sulfur, arsenic, lead, selenium, boron, magnesium, phosphorus, sodium, potassium, and calcium; covering most of the elements in the KIG list (Figure 4).
Even though some genes were annotated as associated in the "other transport" of glycoside, glucose, oligopeptides, or phloem transport, the citations that have added them into our primary list show that their mutant alleles altered elemental accumulation.
AtABCC1 is annotated as encoding a glycoside transporter protein, but Park et al. (2012) found overexpression of AtABCC1 increased cadmium concentrations in shoot tissue. The YSL genes and OPT3 are annotated as genes encoding oligopeptide transporters, but more specifically they are encoding predicted phloem-localized metal-nicotianamine complex and iron/cadmium transporters,  respectively (Waters et al., 2006;Zhai et al., 2014). Last, NRT1.5/ NPF7.3 is also annotated as encoding an oligopeptide transporter, but Li et al., (2017) identified it as a xylem loading potassium ion antiporter.
The PANTHER GO-slim molecular function annotation database found a significant overrepresentation for iron and potassium cation transmembrane transporter activity in the A. thaliana genes. The results using the GO complete molecular function database supported this and additionally included terms for arsenic, cadmium, zinc, boron, manganese, phosphate, sulfur, and magnesium ion transmembrane transporter activity. The GO complete molecular database also returned overrepresented terms for metal ion-binding and cyclic nucleotide-binding annotations. The cyclic nucleotide-binding annotation genes were more specifically cyclic nucleotide ion gated channel genes (Gobert et al., 2006). The PANTHER GO-slim cell component and GO complete cell component annotation database both returned significant overrepresentation for vacuoles and the plasma membrane, both known to be critical for elemental movement and storage (Barkla & Pantoja, 1996

| D ISCUSS I ON
Here we have produced a curated list of genes known to alter the elemental composition of plant tissues. We envision several possible uses for this list: Most entries on this list are derived from model organisms, suggesting that most of our knowledge about genes that affect elemental accumulation comes from these species. A. thaliana and M. truncatula account for 64% of the primary genes list, which is probably a lower bound for the influence of knowledge generated in model organisms. Several of the genes in crop plants were found due to being orthologs of genes in the model organisms Xu et al., 2017), and on closer inspection of the 50 papers identifying primary genes in rice, 38 cited a gene in Arabidopsis (not necessarily the direct ortholog) as a source for why the gene was investigated. The higher quality of the GO terms in Arabidopsis, when compared to other species, is another reflection of this disparity of knowledge and a significant hindrance when trying to clone genes in other organisms.

| Call for more submissions
While we have done our best to ensure that the current list is useful and thorough, it is possible we are still missing genes. We ask readers who know of genes that we are missing to contribute by submitting them here: https://docs.google.com/forms/ d/e/1FAIp QLSdmS_zeOlx TOLmq 2wB45 BuSQm l1LMK tKnWS atmFR GR2Q1 o0Ew/viewf orm?c=0&w=1 or email corresponding author. KIG lists v1.0 for each of the species can be viewed in Table S1, and future updates to the list can be found at https://docs.google.com/sprea dshee ts/d/1XI2l 1vtVJ iHrlX LeOS5 yTQQn LYq7B OHpmj uC-kUejU U/ edit?usp=sharing.

ACK N OWLED G M ENTS
The authors thank the editors and reviewers for their consideration and comments.