KnetMiner: a comprehensive approach for supporting evidence‐based gene discovery and complex trait analysis across species

Abstract The generation of new ideas and scientific hypotheses is often the result of extensive literature and database searches, but, with the growing wealth of public and private knowledge, the process of searching diverse and interconnected data to generate new insights into genes, gene networks, traits and diseases is becoming both more complex and more time‐consuming. To guide this technically challenging data integration task and to make gene discovery and hypotheses generation easier for researchers, we have developed a comprehensive software package called KnetMiner which is open‐source and containerized for easy use. KnetMiner is an integrated, intelligent, interactive gene and gene network discovery platform that supports scientists explore and understand the biological stories of complex traits and diseases across species. It features fast algorithms for generating rich interactive gene networks and prioritizing candidate genes based on knowledge mining approaches. KnetMiner is used in many plant science institutions and has been adopted by several plant breeding organizations to accelerate gene discovery. The software is generic and customizable and can therefore be readily applied to new species and data types; for example, it has been applied to pest insects and fungal pathogens; and most recently repurposed to support COVID‐19 research. Here, we give an overview of the main approaches behind KnetMiner and we report plant‐centric case studies for identifying genes, gene networks and trait relationships in Triticum aestivum (bread wheat), as well as, an evidence‐based approach to rank candidate genes under a large Arabidopsis thaliana QTL. KnetMiner is available at: https://knetminer.org.


Searching with keyword, gene list and/or genome regions
You can search KnetMiner with keywords, gene list or genome regions (or any combination of these). KnetMiner will provide different types of responses based on the given inputs. The keyword search will search the whole genome while the other two search modes will be restricted to the specified gene list or genomic regions, respectively.
The gene list search allows users to enter a list of gene names or accessions (limit of 1 entry per line and a maximum of 100 gene ID's). The names/accession IDs need to match (partial matches are not enabled -name/accession ID searches are exact) the gene names/ids stored in the knowledge network. Tools like the Ensembl ID converter can be used to convert old gene IDs to those supported by KnetMiner.
The genome region mode restricts the search to genes that fall within the specified region. Entering the start and end position of a region will display the number of genes within those boundaries. Since KnetMiner v3.0, entering search keywords is no longer mandatory. If you've a list of genes and no clue about what they do, just paste your gene ids/names into the Gene List box (without any keyword) and let KnetMiner provide a summary of all information it has for your genes, their location, and enriched linked terms. You can then view their individual knowledge networks. Searching by this method will not rank genes; only paths from gene to trait and phenotype nodes will initially be shown in this instance. However, if you combine your gene list with keywords, KnetMiner will be able to rank your gene list based on relevance and highlight the most interesting paths of the knowledge network.

KnetMiner results views
The result of a search is a list of candidate genes along with supporting evidence. KnetMiner provides different views that help to explore the search results and drill down into interesting candidate gene networks.

Gene View
The Gene View displays identified candidate genes sorted by the KnetScore in a table. The various node types (GO, TO, phenotype, pathway, gene, publication, etc.) matching the search terms are summarised in the legend (below max number of genes dropdown). The legend is interactive and can be used as a filter. Clicking on one or multiple concepts (shapes) in the legend filters the table to genes with matching concepts in the EVIDENCE column, e.g. genes with pathway AND phenotype information. The concepts in the EVIDENCE column are extendible and provide a short description of the evidence. If the evidence is a publication, then the PubMed ID is shown and linked to PubMed.
Genes supplied by a user that are associated with the search terms are referred to as known targets, whereas those user genes that are not associated with any search term (nil evidence) are referred to as novel targets. A checkbox at the top of the Gene View table allows a user to select all known targets or novel targets instantly. Clicking on a single gene, or on the 'View Network' button below, for a selection of genes opens the Network View.

Map View
The Map View is the chromosome-based display. To the right of the chromosome, it'll show all the genes which are related to the search term(s) given. Colour coding is used to distinguish genes, with green for high scores, orange for medium, and red for low. SNPs are shown to the left as the highlighted squares, colour coded according to the study shown in the SNP legend, relating to evidence found for the SNP. The user can also specify a specific region search and then only show genes and SNPs within this region. The view can be exported as a PNG, the user can zoom in and out and move across, and the p-value (how significant the association of the gene is with the search term) can be altered via the settings option (cog icon). Right clicking SNPs will provide further information in a pop-up box, where the selected SNP/QTL can be hidden or displayed, as shown below. Genes can be selected in the Map View and opened in Network View by clicking the network icon on the top (far left).

Evidence View
Evidence View provides a table-based view of the node types (concepts) linked to the search results. The results are sorted according to query-relevance score. The number of genes which are linked to each concept in the knowledge network is also displayed (within the 'GENEs' column). Clicking this value will bring you to the Network View, containing the selected concepts in the centre of the network with the shortest path which connects the evidence documents to the linked genes. When clicking on a node (concept) icon in the interactive legend, located above the table, the table results will be filtered by evidence type.

Network View
The Network View will display knowledge networks of one or multiple genes selected from the previous views. The entry gene will be displayed as a blue triangle with a double border. Each path starting from the entry gene travelling towards another node will provide a relationship, initially showing only the most relevant relationships to the search term. The maximize button (far left) in the top menu renders the network in a maximised viewport. Should the user click the binoculars, this will show the whole network, but this can cause the application to slow down when loading too many concepts. To view information regarding a concept (node), or its relationship (edge), hold the right click button on the node/edge and a wheel (context menu) will appear. Click 'show info' to see an information table on the right of the network viewport. You can close this by clicking the 'X' button in the info-box. To reset the orientation (zoom) of the graph, click the reset button. The info button next to the 'CoSE layout' drop down menu can also be used to show the info box (you must then click a concept to show information for it).
You can move concepts and edges to be more easily viewable, and you can also save the Knetwork to KnetSpace by pressing the save Knetwork button (it'll prompt you to log in if you aren't already), if you've signed up. KnetSpace enables you to store, edit, and collaborate on your Knetworks. If you wish to sign in and haven't already logged in via KnetSpace, simply click the top right hand 'Sign in' button on the header.
You can also hide concepts or show their labels or hide by their specific type by using the same wheel, or alternatively use the interactive legend where double clicking a concept will remove the concept, and single clicking adds it. On a touchscreen device, gently flick up on the legend concept to remove the concept and tap to add. The concept count will also update to show the current number of concepts present on the graph over the total number, updating as added or removed. The number of concepts and relationships visible are shown below the interactive legend, and the total number are shown adjacent to them in brackets.

KnetMiner Plant Use Case
This application case shows the utility of KnetMiner for the functional analysis of a transcriptomics (RNA-Seq) experiment in bread wheat (Triticum aestivum). Wheat is the third most-grown cereal crop in the world after maize and rice and has a hexaploid genome 5 times the size of the human genome.
The red colour of the grain is due to the presence of coloured compounds, called flavonoids, in the seed coat (bran). These flavonoids give wholemeal bread not only its colour, but also a slightly bitter taste which is disliked by many people. Whitegrained wheat varieties lack the red compounds of the seed coat and are milder in flavour. However, white grains are prone to pre-harvest sprouting (PHS) which causes the grain to germinate before harvest and results in a loss of grain quality. It has been known for some time that PHS is associated with grain colour and that the red pigmentation of wheat grain is controlled by R genes on the long arms of chromosomes 3A, 3B, and 3D. In the last decade, the genetic basis of the relationship between grain colour and PHS has been studied and molecular characterisation showed the R gene is a Myb-type transcription factor responsible for transcriptional activation of genes (CHS,CHI,F3H and DFR) in the flavonoid biosynthesis pathway. However, the link between the R (Myb) gene and PHS is still unclear.
Here we demonstrate the utility of KnetMiner for analysing candidate genes from reverse genetics or transcriptomics studies and answering questions such as: 1. Do any of these genes contribute to the expression of trait A (e.g. grain colour)?
2. Do any of these genes contribute to the expression of trait B (e.g. PHS trait)?
3. Which biological processes and pathways are underlying these traits? 4. Are there common genes or mechanisms that influence both traits? 5. Which other processes and traits will be affected by loss-of-function mutants?
Exercise 1 -Choosing the right search terms Seed dormancy and germination are the underlying developmental processes that activate or prevent pre-harvest sprouting in many grains and other seeds. The user can provide this knowledge as a list of keywords into the search box. The Query Suggester provides alternative synonyms or more specific keywords. It also highlights key concept types that match the keywords.
Task: Type the keyword dormancy into the search box. Try to replace it with a more specific keyword.
A: The Query Suggester shows that the keyword dormancy matches Gene Ontology (GO), Trait Ontology (TO), Plant Ontology (PO), gene, and protein concepts from the wheat knowledge network. The TO concepts include terms specific to seed or bud dormancy. It can be assumed that processes involved in grain dormancy are like the ones involved in seed dormancy but different to bud dormancy. Clicking replace changes the keyword to "seed dormancy", making it more precise.