Ppomicsdb: A Multi‐Omics Database for Genetic and Molecular Breeding Applications in Pigeonpea

Cajanus cajan, commonly known as Arhar or tur in India, is a highly treasured plant species belonging to the Fabaceae family. Pigeonpea is a drought‐tolerant legume crop produced in the world's tropics and subtropics areas, rich source of protein, carbohydrates, fiber, and minerals. It is considered as “meat for vegetarian people” and addresses malnutrition issues globally. Despite its nutritional and economic importance, the lack of comprehensive knowledge about its genomic resources prevents it from being used wisely through molecular breeding programs and biotechnological intervention. Several genomic repositories on pigeonpea are available; however, there is no cohesive integrated multi‐omics database available for C. cajan. Here, we present a first report on comprehensive pigeonpea omics database, named as Ppomics database (db) available at https://ppomics.multiwebx.com/, which provides up‐to‐date various aspects of multi‐omics information devoted to the catalogs phenomics (both qualitative and quantitative), genomics, transcriptomics, and proteomics data. Ppomics db is an integrated multi‐omics platform for discovering important regulators of several qualitative and quantitative traits in pigeonpea, which can be utilized for superior breed development. Ppomics db has been made available to researchers to acquire the related omics information and perform multi‐omics data analysis.


| Introduction
Global eradication of hunger and malnutrition is the second Sustainable Development Goal (SDG-2) of the United Nations.The Covid-19 pandemic, which has had a devastating impact on agriculture, has made the goal even more difficult to achieve.By 2030, there may be more than 840 million hungry people worldwide, with the majority (above 381 million) coming from the Asian region (https:// w w w. un.org/ susta inabl edeve lopme nt/ hunger/ ) (Bankar et al. 2015).However, in addition to focusing on food security, nutrition must also be prioritized in order to meet SDG-2 requirements and commitments (N.Singh, Rai, and Singh 2020;Aman and Masood 2020).In this context, legumes (pulses) are key players in the combat against malnutrition because they are an essential component of the human diet suggested by Zhang et al. (2017).Furthermore, legumes can improve and maintain soil health through symbiotic nitrogen fixation (http:// www.fao.org/ resou rces/ infog raphi cs/ ) (Barman et al. 2018;Kumar et al. 2022).In legumes, pigeonpea (Cajanus cajan (L.) Millsp) (Sanskrit: Adhaki, Hindi: Arhar, English: pigeonpea, Bengali: Tur) is an important source of protein (21%-28%), minerals, and vitamins and plays a vital role in the secure supply of food and nutrition in developing countries (Gomezulu and Mongi 2022;Varshney et al. 2012).Pigeonpea was domesticated from its wild parent Cajanus cajanifolius in central India over 3500 years ago (N.K. Singh et al. 2012).It is a diploid plant (2n = 2x = 22) with an estimated genome size ~852 Mb (Dutta, Kumawat, and Singh 2011;Srinivasarao et al. 2020).The pigeonpea genome has been available for a decade and provide a foundation to analyze assembled genome sequence of "Asha" (Dutta, Kumawat, and Singh 2011) and development of genic simple sequence repeat (SSR) markers by transcriptomic sequencing for Asha and UPAS 120 (Srinivasarao et al. 2020).Importantly, pigeonpea is a highly drought-tolerant crop and also capable of withstanding poor soil and abiotic stresses.Other than nutritional aspects, it has several applications including food, feed, fuel, soil enrichment or soil binder, fence, roofing, and basket making as well as pharmaceutical business suggested by Fatokimi and Tanimonure (2021) and A . Singh et al. (2017).To discover its potential and to understand its genetic architecture, an integrated multi-omics approach including phenomics (qualitative and quantitative), genomic, transcriptomic, and proteomic data is essential to explore and fully understand the biochemical, physiological, and molecular interactions of seed and nutritional composition in pigeonpea.Although there are few databases developed for pigeonpea including PpTFDB (http://14.139.229.199/PpTFDB/Home.aspx) is a simple web interface that provides public domain data about pigeonpea transcription factors (TFs) (Marla, Mishra, and Maurya 2020).Another one is the International Initiative for Pigeonpea Genomics (IIPG) that provides genomic information as well as SSRs, single nucleotide polymorphisms (SNPs) and quantitative trait loci (QTLs) data for pigeonpea (https:// w w w. pulse db.org/ analy sis/ 136).On the other hand, PulseDB dedicated to provides an extensive information on various aspects of legumes and serves as a central resource for Genome-assisted breeding (GAB), while PIPEMicroDB (Pigeonpea Microsatellite Database) searches for a specific number of markers at a specific position on a chromosome (http:// webapp.cabgr id.res.in/ pigeo npea/ ).It is crucial to comprehend the implications of the interrelationships of different traits using correlation and path coefficients for yieldcontributing qualities in a single genotype (Pandey, Kumar, and Pandey 2015).High-throughput omics technologies have been widely used to investigate plant responses and to develop better pigeonpea varieties that possess several agronomically important quality traits given by Rathinam, Mishra, and Vasudevan (2019) and Tai, Martin, and Heald (2014).As a result, in order to develop nutrient-dense varieties with added value, we must first understand the deep genetic architectures of qualitative and quantitative pigeonpea traits.Thus, it is important to collect, protect, and characterize the germplasm.To share and use this priceless genetic resource effectively, a database of all the knowledge on pigeonpea will be essential.
In this study, we developed the first comprehensive database named as Pigeonpea Omics Database (Ppomics database).This database offers information on a wide range of phenotypic characters that could be used to identify novel candidate genes leading to improving the pigeonpea crop.The mini-core (MC) collection and variety of C. cajan collected from various places would provide a rich library of phenotypic data.This database strives to use omics data for pigeonpeas' enhancement using high-throughput genotyping technologies such as Illumina BeadXpress, Genotyping by Sequencing (GBS), and 50K Axiom Cajanus SNP Array and 62K SNP chip array "CcSNPnks".The literature was searched for genome assembly, trait-associated genes, genotyping data, QTLs, and gene function.This database contains transcriptomics data that has been manually curated from the literature, as well as all peer-reviewed articles relevant to omics research.To the best of our knowledge, the Ppomics database can be used for the pigeonpea reference set, inferences drawn from their analysis, and agronomically relevant traits collectively provide valuable resources to accelerate genetic gains in pigeonpea crop improvement programs for the benefit of smallholder farmers in the developing world who grow this multipurpose food security crop in Figure 1.

| Phenomics and Genomics
For the phenotypic data retrieval, a set of 250 pigeonpea genotypes, 73 of which are MC and 177 are varieties, were used.These genotypes were collected from the National Institute of Plant Biotechnology (NIPB), Indian Agricultural Research Institute (ICAR), New Delhi.The data as collected over consecutive years (2019-2020 to 2020-2021) and last, we collect the phenotypic data in the year of 2022-2023.In total, 26 traits were recorded, with 12 qualitative and 14 quantitative.Data on each trait were collected from plants chosen at random from each plot.Table S1 has a description and attributes of qualitative and quantitative traits.However, we provide a brief overview of the genomic resources accessible for the use of pigeonpea genetic enhancement.To integrate pigeonpea genomic data, we created sub-tabs including trait associated genes, different QTLs with functional annotation, genotyping data retrieved after performing 62K genic SNP chip of pigeonpea, construction and validation of pigeonpea genotypes with their original source, DQC and call rates, and heterozygosity using 62K SNP chip of pigeonpea.Genotyping information of 30,426 SNPs with 100% call rates on 95 pigeonpea genotypes and 20 trait-associated genes was mined from existing research.To produce comprehensive information for the responsive candidate genes, functional annotation was studied at the gene and QTL levels.The pigeonpea genome has been made available in the genomic tab.All this genomic information was derived from the literature survey.

| Proteomics and Transcriptomics
The proteome tab is arranged into three sub-tabs including total protein content and abiotic and biotic stress-related genes.From the literature, we added different protein structures with their gene IDs that are responsible for biotic and abiotic stress.For transcriptomics data, transcript IDs and their functions were retrieved through literature survey.

| Publication, Gallery, and Video
We chose peer-reviewed publications that mainly focus on multi-omics technologies and advanced crop enhancement methods.We gathered a broad panel of plant structure as well as various patterns of flower color, seed color, and leaf form.This page displays all of the gorgeous modifications or structure of the pigeonpea plant along with video showing the transition of the pigeonpea plant from their germination phase to the tall and broad pattern of plant.

| Architecture and Database Design
The pigeonpea omics database (Ppomics database) was constructed using a three-tier schema architecture, client tier, a server tier, and a database layer.The database interface was developed using PHP Version 7+ (https:// w w w. php.net/ ), a server-side web programming language on the Windows 11 platform.A Linux Shared Hosting server serves as the database server.To build the database, we used a WordPress theme, a free and open-source content management system written in PHP that works with a MySQL or MariaDB database that supports HTTPS (Figure 2).This database is available online for chrome, Internet Explorer, and Windows Edge.All illustrations/figures were done using Microsoft Office tools or Adobe Photoshop 2022.
The Ppomics database comprises more than 250 C. cajan germplasm accessions in respective omics platform.This database provides information on the genes and QTLs responsible for numerous phenotypes across genotypes, important genes associated with abiotic and biotic stress.Moreover, this database will assist researchers, and geneticists in learning about morphological information and improving the production, quality, and resistance of pigeonpea from various type ofdiseases.

| User Interface
The architecture of this online database is three-tiered.The data was saved in the "Ppomics database" web-enabled database at Gujarat Biotechnology University, Gandhinagar, Gujarat, India.This database is divided into eight tabs that offer information about pigeonpea.The "Phenomics" tab offers qualitative and quantitative information on seeds and plants, such as geographical distribution and specific trait values.The "Genomic" page provides information on genomic data, which is divided into seven sub-tab.The "62K SNP chip (CcSNPnks)" sub-tab is further broken into five categories.The "Transcriptomics" tab contains data on transcript IDs in various C. cajan genotypes.The "Proteomics" tab provides details on structure and function of genes responsible for biotic and abiotic stresses.All relevant research articles/references about omics and high-throughput technologies can be found in the "Publication" tab.

| Results and Discussion
Ppomics database is a multi-omics database that comprises eight portals: Phenomic, Genomics, Transcriptomics, and Proteomic datasets in Figure 3.With the exception of phenomics, where we record each and every trait using quantifiable methods mentioned in descriptor of pigeonpea by Sameer et al. (2017), all the information was manually collected from the literature.These datasets will provide the abundant and convenient tools for users to browse phenotypic data, genome sequence, and protein structure.High-throughput experimental techniques and exponential growth of omics data are pertaining to studies of pigeonpea genomics (Hussain et al. 2016;Krishnan et al. 2017;K. B. Saxena and Sawargaonkar 2015).However, different databases have been constructed and use cutting-edge bioinformatics analysis tools for effective data management, storage, retrieval, integration, and analysis to identify the genetic basis of agronomically important traits and to overcome malnutrition and climatic challenges that globe is facing (Brozynska, Furtado, and Henry 2016;Varshney, Penmetsa, and Dutta 2010).
Moreover, it will give knowledge and insight toward developing better pigeonpea varieties that possess several agronomically important quality traits and exhibit high yield potential under adverse climatic conditions (Henry and Nevo 2014).Phenomics databases are more sparsely developed.Comparatively, the Pulse Crop Database (PCD), which offers pertinent genomics, genetics, and breeding information and analysis, is a potent tool for genes linked features of interest to maximize plant breeding efficiency and research (Tai, Martin, and Heald 2014).In contrast, Ppomics database is a manually collected phenome data and literature-curated online resource providing information on the seeds and plants of the treasured legume C. cajan.For mapping of the genes, QTLs, and/or alleles responsible for a trait of interest, precise phenotyping of the germplasm is essential.Additionally, it was made sure that standard terminology is used, evaluated, and processed, resulting in a tool that is simple to comprehend and can be used to evaluate information using this database.

| Multi-Omics Centric Portal and Data Access
Each omics portal offers data for high-throughput analysis.Integration of multi-omics information provides great opportunities of mapping candidate genes in loci associated with important traits and interpreting complex relationships across multiple genes and traits (Mishra et al. 2017;Varshney, Penmetsa, and Dutta 2010).In order to use Ppomics database, each portal was organized to the flow of genetic information within a biological system.Phenomics portal provides information of quantitative and qualitative variation type that can be used for the GWAS association and phenotypic statistical analysis among varieties.In the Genomics portal, sequence, trait associated genes, and QTLs were aligned and annotated.Users can browse genome alignment and SNP chip data can be used for the development of high-throughput genotyping assays for genetic studies and molecular breeding applications (Liu et al. 2022; N. K. Singh et al. 2012).Most importantly, this SNP data provides AGCP, CSCSP, MCP, SCP, and DRDRP genes responsible for particular variation which can be utilized for the association, gene expression, and analysis of structural variation in Figure 4.In addition, the transcriptomics portal added various transcripts belonging to signal and transporter protein that would depict the specific molecular function to resist different biotic and abiotic stresses.In-depth analysis of transcripts would be fascinating for better understanding of gene expression, stress response, signaling and function of secondary metabolites in pigeonpea (R. K. Saxena, Singh, and Kale 2017).Furthermore, proteomics dataset provides structure and function of proteins responsible for biotic and abiotic stress.
The information available in the web resource could be accessed by opting any omics portal (for instance, Qualitative trait --> Seed shape).After clicking, the window page will open presenting a table of phenotypic trait's details, containing their origin, genotypic name, and trait-related information.Similarly, the Genomics tab allows users to access SNP chip data, trait, and QTL embedded information, including genotype name, associated traits, and function with their references.In addition, the user can quickly search for genes related to agronomy, conserved single copy genes, and disease resistance in pigeonpea by just clicking.Each gene has a corresponding sequence, SNP ID, and Affymetrix probe ID.Abiotic/biotic stress-associated genes with their structure and function, as well as transcript IDs of C. cajan with functional annotation, are all available to users in the transcriptomic and proteomic portal.

| Statistics of Number of Entries
Ppomics database is an intuitive database that is freely available with the URL https:// ppomi cs.multi webx.com/ provides phenotypic data on a total of 240 genotypes for seed size, shape, and coat color, as well as 237 genotypes for phenolic content (%) and antioxidant content (%).There are 250 genotypes for flower color, flower color-streaked pattern, pod color, and susceptibility to sterility mosaic virus (SMV) disease.Likewise, there are 250 genotypes for the number of seeds per pod, branch angle, pod length and width, plant height, number of primary branches, leaf length and width, stem width, specific gravity, and milling quality.The genomics part includes 79 QTLs and 72 genes, each responsible for a number of phenotypes in 20 genotypes.We are providing 95 pigeonpea genotypes added for diversity analysis genotyping data on 30,426 SNPs with 100% call rates.A genotyping platform for pigeonpea includes the high-density 62K genic-SNP array "CcSNPnks," which includes the NIPB SNP Id, Affymetrix Probe ID, and SNP flanking DNA sequence.It has 65,534 SNPs that fall into five different categories, including 600 multi-copy pigeonpea genes (MCP) and 18,350 singlecopy genes (SCP) unique to pigeonpea.It also has 27,548 single-copy genes conserved between soybean and pigeonpea (CSCSP) and 30,633 homologs of agronomically important cloned genes (AGCP), 15,973 disease resistance and defense response genes (DRDRP), and 600 multi-copy genes of pigeonpea (N.Singh, Rai, and Singh 2020).List of the 95 diverse genotypes of pigeonpea that were utilized in the development and validation of the 62K SNP chip array.Significantly, these data could be used for population structure analysis, phylogenetic studies, QTL mapping, and gene haplotype analysis.From this website, users can find comprehensive information about the pigeonpea genome assembly.
Analysis of transcriptomic data unreavels processing like defense responses to pathogens, metabolism, and response to biotic and abiotic stresses (Brozynska, Furtado, and Henry 2016;Varshney, Penmetsa, and Dutta 2010).Therefore, we are providing 73 transcript IDs for various cellular, molecular, and metabolic pathways from recent studies.Given the significance of pigeonpea as a primary protein source for people in underdeveloped countries, it is essential to integrate genome information with proteomic data.Hence, we are providing different protein structures and functions of genes responsible for biotic and abiotic stress.Twenty-four accessions for genes associated with abiotic and seven with biotic stress.In addition to omics information, we are also providing a publications portal so that users can quickly use it for their own studies and application exploration.Ppomics database offers not only phenotypic data but also external links to articles on omics research and genomic and transcriptomic data on associated traits or QTLs and proteomic data on aspects of stress related genes.Ppomics database provides dataset on omics research but mainly focuses on phenomics data, as it is a comprehensive database and first of its kind to provide variation phenotypic and genotypic data, which is important to mine the candidate variants or genes in pigeonpea.Based on qualitative traits, seed color was the most diversified trait followed by flower color.A great variability found in quantitative traits such as total protein content, phenolic content, and antioxidant content.We also gathered biotic and abiotic functional and structural information.

| Conclusions
In this study, we mined and integrated the data of phenomics, genomics, transcriptomics, and proteomics in Ppomics database using the PHP scripting language on a Linux Shared Hosting server.This database provides the datasets that can help researchers quickly acquire the omics information of pigeonpea.Subsequently, these findings indicated that the collection had significant morphological heterogeneity.When creating mapping populations for QTL analysis, phenotypic variety found in this database can provide important information on the genomic architecture and genetic control of key traits.Ppomics db can provide valuable resources to help with functional validation and identification of candidate genes in the locus and selecting the best breeding strategy.The database will be regularly updated when newer versions of the data become available.Furthermore, the database will also incorporate more data on different traits of pigeonpea, such as data on gene expression, including patterns of expression in various cultivars, and genetic variants.

FIGURE 1 |
FIGURE 1 | Graphical summary depicting the content and attributes of the Ppomics database.

FIGURE 2 |
FIGURE 2 | Entity representation (ER) of the flow of data from the backend (MYSQL) database to the frontend (server page) using WordPress.All of the data tables were kept in a relational format to facilitate custom searches and data retrieval shown in Figure 2.

FIGURE 3 |
FIGURE 3 | Home page of Database display various tabs that contains a brief information of omics studies of pigeonpea.

FIGURE 4 |
FIGURE 4 | Example to access data from Ppomics database.(a) Different tabs and search modules for omics data information; search was performed by clicking on provided options.(b) Select any single trait to access the available data.(c) Sub-tab among quantitative and qualitative traits.(d) Format of provided data in search module.