Global gene expression analysis of pigeonpea with male sterility conditioned by A2 cytoplasm

Cytoplasmic male sterility(CMS), a maternally inherited trait, provides a promising means to harness yield gains associated with hybrid vigor. In pigeonpea [Cajanus cajan (L.) Huth], nine types of sterility‐inducing cytoplasm have been reported, of which A2 and A4 have been successfully deployed in hybrid breeding. Unfortunately, molecular mechanism of the CMS trait is poorly understood because of limited research invested. More recently, an association between a mitochondrial gene (nad7) and A4‐CMS has been demonstrated in pigeonpea; however, the mechanism underlying A2‐CMS still remains obscure. The current investigation aimed to analyze the differences in A2‐CMS line (ICPL 88039A) and its isogenic maintainer line (ICPL 88039B) at transcriptome level using next‐generation sequencing. Gene expression profiling uncovered a set of 505 genes that showed altered expression in response to CMS, of which, 412 genes were upregulated while 93 were downregulated in the fertile maintainer line vs. the CMS line. Further, gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and protein–protein interaction (PPI) network analyses revealed association of CMS in pigeonpea with four major pathways: glucose and lipid metabolism, ATP production, pollen development and pollen tube growth, and reactive oxygen species (ROS) scavenging. Patterns of digital gene expression were confirmed by quantitative real‐time polymerase chain reaction (qRT‐PCR) of six candidate genes. This study elucidates candidate genes and metabolic pathways having potential associations with pollen development and male sterility in pigeonpea A2‐CMS. New insights on molecular mechanism of CMS trait in pigeonpea will be helpful to accelerate heterosis utilization for enhancing productivity gains in pigeonpea.

in pigeonpea will be helpful to accelerate heterosis utilization for enhancing productivity gains in pigeonpea.

INTRODUCTION
Male sterility systems that circumvent the need of manual emasculation has given a great impetus to crop improvement in agricultural industry via exploitation of hybrid vigour (Dong et al., 2012). In pigeonpea [Cajanus cajan (L.) Huth], two types of male sterility systems were discovered: genic male sterility (GMS) and cytoplasmic male sterility (CMS). However, the practical applicability of the GMS system was greatly hampered by its inherent drawback of segregation of a large proportion of fertile plants (Bohra et al., 2020). On the other hand, CMS became popular with the increasing availability of stable male sterile lines, a robust fertility restoration system, and a notable (up to 47%) yield advantage over the check varieties (Saxena et al., 2013). Equally important, availability of male sterility in plants also contributes toward ensuring the seed purity that otherwise remains extremely difficult to achieve through manual emasculation (Liu et al., 2013). To date, nine sterility-inducing cytoplasms have been reported in pigeonpea that are designated as A 1 -A 9 , a classification relying upon the cytoplasmic donors (Bohra et al., 2016;Saxena et al., 2010). Of these, two cytoplasms, A 2 and A 4 derived from C. scarabaeoides (L.) Thouars and C. cajanifolius (Haines) Maesen, respectively, have been successfully deployed for development of high-yielding hybrids (Bohra et al., 2020).
Understanding the molecular mechanism of the CMS trait is central to accelerate the progress of hybrid breeding (Mishra & Bohra, 2018). To this end, comparative analysis of four mitochondrial genomes (ICPA 2039, ICPB 2039, ICPH 2433 revealed a set of 31 chimeric open reading frames, of which 13 could be associated with A 4 -CMS (Tuteja et al., 2013). Based on structural variations in mitochondrial genes of CMS and cognate maintainer lines, a 10-bp deletion in the nad7a region was used to develop a diagnostic marker for the detection of A 4 -CMS in pigeonpea (Sinha et al., 2015a). The study demonstrated that this 10-bp deletion is specific to A 4 -CMS and, hence, cannot explain the CMS induced by other cytoplasms. In other words, the molecular mechanism underlying A 2 -CMS remains elusive, and limited progress has been made, to date, with respect to elucidation of crucial bioprocesses that eventually lead to the generation of nonfunctional pollen grains.
With ever-increasing throughput and cost-efficiency, the whole-transcriptome sequencing, or RNA sequencing (RNAseq) has emerged as a promising tool to analyse the expression levels of the genes that participate in regulation of male steril-ity and fertility restoration in different crops . The digital gene expression (DGE) approach is increasingly replacing the conventional assays like micro array to study qualitative and quantitative gene expression (Yang et al., 2014). Researchers have conducted comparative transcriptome analysis of flower buds and anthers in CMS and fertile lines in different crops using next-generation sequencing platforms (Bohra et al., 2016). For instance, differential expression of 365 genes as elucidated by the analysis of the flower bud transcriptomes of the CMS vs. fertile lines with Illumina offered evidence in support of energy deficiency model for CMS induction in soybean [Glycine max (L.) Merr.] . Similarly, DGE profiling of male sterile and fertile lines of garlic (Allium sativum L.) revealed differential expression for >16,000 genes between the CMS line and fertile line. In this study, the pollen abortion could be credited to impaired respiration and programmed cell death (PCD) in tapetum. In rape (Brassica napus L.), the DGE analysis of the self-pollinating progenies of F 1 hybrid revealed altered expression of a number of genes, the majority of which were associated with pollen wall formation and carbon and energy metabolism (Yang et al., 2014).
The objective of this study was to examine differential gene expression associated with A 2 -derived CMS in pigeonpea. We used a pair of alloplasmic isogenic lines to elucidate the molecular mechanism of CMS in pigeonpea. The two lines differ only for the fertility trait, which renders them perfectly suitable for transcriptome analysis. Based on comparative transcriptome analysis of flower buds of ICPL 88039A and its cognate maintainer line, ICPL 88039B, we uncovered a set of genes showing altered expression (between the fertile and sterile lines) and the metabolic pathways that are potentially associated with anther and pollen development and CMS induction. Genome-wide expression patterns are the first to report between isogenic male sterile and fertile lines.

Sample preparation and sequencing
Total RNA was extracted from the flower bud tissue of ICPL 88039A and ICPL 88039B using the TRIzol kit (Invitrogen). The CMS line ICPL 88039A was developed by Bohra et al. (2017) through standard backcross procedure. A popular early pigeonpea variety, ICPL 88039, was crossed as pollen parent with CMS line GT 288A (seed parent). The male sterile F 1 was crossed with the recurrent parent ICPL 88039 to generate BC 1 F 1 plants. The male sterile BC 1 F 1 plant was again crossed with ICPL 88039. The scheme was followed till BC 6 F 1 to obtain a stable CMS line with 99% recovery of the nuclear genome of ICPL 88039. During the backcrosses, the self-pollinated plant of ICPL 88039 (the same that was used for crossing) served as maintainer. The resulting CMS and the corresponding maintainer line were designated as ICPL 88039A and ICPL 88039B, respectively. The CMS line ICPL 88039A and maintainer line ICPL 88039B were isogenic except for anther color, which reflected the formation of nonfunctional or functional pollen. Both lines are semispreading and indeterminate in growth habit with plants having narrowly oblong leaves, yellow flowers, immature pods with green color, and purple streaks and pods devoid of pubescence (Bohra et al., 2017). The pollen fertility assay of CMS and fertile lines are shown in Figure 1. The integrity of total RNA was checked by 1% agarose gel electrophoresis; the concentration and purity were determined using NanoDrop (Thermo Scientific) and Agilent 2100 Bioanalyzer. Three biological replicates per sample were used for sequencing. Pair-end libraries were prepared using Illumina TruSeq stranded mRNA library preparation kit as per its described protocol. Cluster generation and sequencing were performed using Illumina platform following Das and Majumder (2019). The read length of the sequencing was 2 by 150 bp.

Variant calling and identification of DEGs from the RNA-seq data
Sequencing reads in the fastq format were quality checked using FastQC v0.11.8 tool (Andrews, 2010). Then Illumina TruSeq adapters were trimmed and low-quality reads were discarded using Trimmomatic v0.36 by specifying parameters SLIDINGWINDOW:4:15 and minimum read length cutoff of 35 bp (Bolger et al., 2014). The clean reads in FASTQ format were aligned to the reference genome using STAR v2.7.1a aligner (Dobin et al., 2012). The SNPs were called using GATK pipeline and SNPs were subsequently filtered based on minimum base quality (minQ < 30) and minimum depth (minDP < 5) (McKenna et al., 2010). All the identified SNPs were annotated, and unique SNPs in each sample were listed. The SNP density ratio per 1000 bp per gene was calculated and plotted on 11 chromosomes using Circos tool (Krzywinski et al., 2009). The high-quality reads were de novo assembled using Trinity v2.8.4 (Grabherr et al., 2011). Each of the six high-quality fastq paired-read files (three biological replicates of each ICPL 88039A and ICPL 88039B) were aligned to the de novo transcriptome assembly using Bowtie v2.3.5.1 (Langmead & Salzberg, 2012) and RSEM was used to estimate expression values based on the align-

Core Ideas
• RNA-seq was performed on unopened floral buds of isogenic pigeonpea lines. • A total of 505 differentially expressed genes were identified. • GO and pathway-level analysis associated CMS occurrence with four major pathways. • PPI network analysis revealed a set of 15 hub genes. • The study provides a functional genomics resource to understand male sterility trait in plants.
ment (Li & Dewey, 2011). The differentially expressed genes (DEGs) between CMS line and maintainer line were identified using DESeq2 R package (Love et al., 2014). The significant DEGs for further analysis were filtered with adjusted p value < .1 (observed highest p value is .003) and log 2 fold change (LFC) ≥ 1. EnhancedVolcano R package was used to generate volcano plots (Blighe et al., 2019).

Annotation and functional characterization of DEGs
To find the gene annotations, DEGs were handled as queries and BLASTx search was performed against pigeonpea protein sequences with a significant threshold e-value of <0.00001 (Camacho et al., 2009). Gene ontology (GO) annotations for the identified DEGs were taken from the annotations given in the original pigeonpea genome assembly files and LegumeIPv2 database (Li & Dewey, 2011). Further, DEGs were subjected to GO enrichment analysis to find enriched molecular functions, biological process, and cellular component using topGo R package (Alexa & Rahnenfuhrer, 2019) and BiNGO v3.0.3 (Maere et al., 2005) plugin in Cytoscape v3.7.1 (Shannon et al., 2003). Two statistical tests, namely Kolmogorov-Smirnov test and Fisher's exact test, were used to select the top 15 enriched pathways on the basis of q value (the lesser the q value the more significant is the pathway) for both up-and downregulated DEGs. The Kyoto Encyclopedia of Genes and Genomes (KEGG) Orthology-Based Annotation System (KOBAS v3.0) was used to calculate the enriched KEGG pathways from the DEGs (Xie et al., 2011). Furthermore, high-throughput data visualization software MapMan v3.6.0RC1 was used for pathway visualization (Schwacke et al., 2019). To generate the mapping file, pigeonpea CDS sequences were assigned to the most recent MapMan ontology using Mercator4 v2.0.

Construction of protein-protein interaction networks
The protein-protein interaction (PPI) network was built using web-based tool STRING v11.1 based on the orthologs of identified DEGs (http://string-db.org; Szklarczyk et al., 2018). This database generates a PPI network based on coexpression, interaction sources, experiments, and literature mining of query proteins and genes. An interaction score of >0.2 was applied. The PPI network was visualized on Cytoscape v3.7.1.

Quantitative real-time PCR
The quantitative real-time polymerase chain reaction (qRT-PCR) analysis was employed to validate the gene expression patterns inferred from RNA-seq data. Total RNA from flower bud tissue of ICPL 88039A and ICPL 88039B was isolated using the method of Jorge and Omar (2011) followed by DNase I (Thermo Fisher) treatment. The integrity of isolated RNA was checked on 1% agarose formaldehyde (FA) gel electrophoresis. The concentration of each sample was checked on the BioPhotometer plus (Eppendorf) and three micrograms of RNA was used for first-strand cDNA synthesis using the high-capacity cDNA reverse transcription kit (Thermo Fisher,) following the manufacturer's protocol. The qRT-PCR was carried out using PowerUpTM SYBR Green Master Mix (Applied Biosystems) reaction on a QuantStudioTM 5 Real-Time PCR System (Applied Biosystems) according to the manufacturer's instructions. The PCR conditions for all the qRT-PCR reactions were used as follows: 2 min at 50˚C, 2 min at 95˚C, and 40 cycles of 15 s at 95˚C, 15 sec at 60˚C, and 1 min at 72˚C. Actin 1 (Sinha et al., 2015b) was used as internal control. Each reaction was performed in three biological and three technical replicates along with no template control. Relative quantification (RQ) was determined by the following formula: RQ = 2 −ΔΔCt (Livak & Schmittgen, 2001).

F I G U R E 2
Single nucleotide polymorphism (SNP) density ratios of genes from ICPL 88039A and ICPL 88039B. The SNP density ratios of genes from ICPL 88039A and ICPL 88039B are shown as red circles and blue triangles, respectively based on their SNP density ratio score (0 to ≥10). Genes with higher SNP density ratio from ICPL88039A and ICPL88039B were labeled as red and blue, respectively, whereas genes having higher SNP density in both lines were labeled as black in the center of the plot

Transcriptome sequencing and de novo assembly
To obtain genes responsible for CMS induction in ICPL 88039A, total RNA was isolated from CMS line and cognate fertile line (ICPL 88039B) for the generation of cDNA library. We obtained 50949127 and 69870800 clean pairedend reads in the case of ICPL 88039A and ICPL 88039B, respectively, following trimming of raw reads. Average length of clean reads was 130.48 bp. The clean reads from each sample were aligned to the de novo transcriptome assembly with an overall alignment rate ranging from 75.28 to 81.47%. In the de novo assembly, we found an average contig length of 566.57 bp and an N50 of 994 bp. The average percentage of GC content in the assembly was 40.56%. This transcriptome shotgun assembly project has been deposited at NCBI Gen-Bank under the accession GIME01000000.

Identification of SNPs, DEGs, and GO annotations
Variant calling using GATK identified a total of 5,331 SNPs, of which 1,112 and 2,007 were unique to ICPL 88039A (three replicates merged) and ICPL88039 B (three replicates merged) lines, respectively and their annotations were also obtained. Figure 2 illustrates the SNP density ratios plotted on 11 chromosomes. The SNPs were found in 1,528 and 1,860 genes in ICPL 88039A and ICPL 88039B, respectively (Supplemental Table S1). Transcript quantification from the RNA-seq data was performed with RSEM, and DESeq2 was used for differential expression analysis of the estimated counts (Figures 3 and 4). Using CMS line as a reference, a total of 505 DEGs were obtained using p value of <.003 and absolute LFC value of ≥1 as the threshold. Of these DEGs, 412 showed higher expression in male fertile line with LFC ranging from 1.20 to 12.8, whereas 93 had lower expression with LFC varying between −1.10 and −8.6 (Supplemental F I G U R E 3 Volcano plots showing expression abundance of transcripts between ICPL 88039A and ICPL 88039B. Transcripts with upregulated expression in ICPL 88039B are shown toward the right, while downregulated transcripts in ICPL 88039B are toward the left. The most statistically significant transcripts are toward the top represented by red dots and slightly transcripts genes are shown by green dots Table S2). To better analyze variability among replicates, all six samples were plotted in a two-dimensional space using PCA, and the resulting PC1 and PC2 explained 34 and 25% of the variation, respectively (Supplemental Figure S1). Similarly, a high degree of biological replicate reproducibility in the present study could be visualized from sample distance matrix (Supplemental Figure S2).
Gene ontology terms enriched with p value <.05 were taken into consideration and the DEGs were annotated to functional categories. In case of the 93 downregulated DEGs in ICPL 88039B, 127 GO terms were enriched. The 127 GO terms belonged to 71 biological process (BP), 24 molecular function (MF), and 32 cellular component (CC) categories. Similarly, 242 GO terms corresponding to 116 BP, 90 MF, and 36 CC were enriched in case of upregulated DEGs.

TopGO annotations and network representations of DEGs
TopGO analysis was used to analyze DEGs for their association with the dominant pathways (Supplemental Table S3). Figure 5a depicts topmost enriched terms regulation of molecular function (GO:0065009), molecular function regulator (GO:0098772), and cell projection (GO:0042995) in BP, MF, and CC, respectively, in the case of upregulated genes in fertile line. The other functional groups among BP category were regulation of catalytic activity, pollen tube growth, developmental cell growth, and pollen tube development. Similarly, enzyme regulator activity and enzyme inhibitor activity were significantly enriched among MF category, whereas pollen tube, apical plasma membrane, membrane region, and cytoskeleton were significantly enriched among CC category.

F I G U R E 4
Hierarchical cluster analysis of differentially expressed genes of the genotypes ICPL 88039A and ICPL 88039B Among the downregulated genes, photosynthesis (GO:0015979), ribulose-bisphosphate carboxylase activity (GO:0016984), and photosynthetic membrane (GO:0034357) were the most dominant terms in BP, MF and CC categories, respectively, followed by electron transport chain (GO:0022900), electron transporter (GO:0045156), and thylakoid (GO:0009579) (Figure 5b). The network between the associated pathways was constructed and the major top GO terms were identified (Supplemental Table S4). Representative images of the major GO terms of upregulated genes associated with CC category are presented in Supplemental  Figures 3a and 3b.

Pathway analysis and functional classification of DEGs
Analysis using KEGG pathway enrichment was conducted to elucidated the involvement of DEGs to various pathways (Supplemental Table S5). In total, 261 and 67 DEGs from upregulated and downregulated categories were assigned to 59 and 19 KEGG pathways, respectively. As shown in Figures 6a and 6b, the pathways with the highest representation from the upregulated genes were metabolic pathways (KEGG ID: ccaj01100, 67) followed by biosynthesis of secondary metabolites (KEGG ID: ccaj01110, 22), pentose and glucuronate interconversions (KEGG ID: ccaj00040, 18), and plant-pathogen interaction (KEGG ID: ccaj04626, 12). A representative image shows DEGs affecting the KEGG pathway pentose and glucuronate interconversions (Supplemental Figure S4). Other enriched pathways included glycolysis and gluconeogenesis (KEGG ID: ccaj00010, 8), carbon metabolism (KEGG ID: ccaj01200, 6), starch and sucrose metabolism (KEGG ID: ccaj00500, 5), glycerolipid metabolism (KEGG ID: ccaj00561, 5), carbon fixation in photosynthetic organisms (KEGG ID: ccaj00710, 5), amino sugar and nucleotide sugar metabolism (KEGG ID: ccaj00520, 5), protein processing in endoplasmic reticulum (KEGG ID: ccaj04141, 5), etc. Metabolic pathways was also the most representative pathway in the case of downregulated genes followed by photosynthesis (KEGG ID: ccaj00195, 14), Biosynthesis of secondary metabolites (KEGG ID: ccaj01110, 6), and carbon metabolism" (KEGG ID: ccaj01200, 4). Pathway analysis using MapMan showed that several DEGs were involved in cell wall organization, lipid, and metabolism (Supplemental Figure S5). Furthermore, the PPI network analysis was performed on DEGs in order to find potential functional interactions between them. The PPI network analysis revealed a set of 15 hub genes having high degree and number of directed edges (Figure 7). These hub genes are involved in various biological processes including ATP synthesis, pectin catabolic process, carbohydrate metabolic process, and photorespiration (Supplemental Table S6).
Three and five DEGs involved in ascorbate and aldarate metabolism and starch and sucrose biosynthesis, respectively, showed higher expression levels in male fertile line, and these genes are known to control the cell division and maintain the carbon flux in the sink tissues during pollen development. SUC2, a sucrose transporter protein (C.cajan_35396) responsible for the transport of sucrose into the cell and tissues, LEA D-7, a late embryogenesis protein (C.cajan_06049), LEA-14 (C.cajan_34939), and LEA-D 34 (C.cajan_06906) expressed low in CMS lines vs. its isogenic fertile line.
We obtained 33 DEGs that participate in pollen-related processes (Supplemental Table S2). Six DEGs underexpressed in the CMS line including receptor-like protein kinase (C.cajan_10753) that control pollen tube behavior through timely rupturing to release germ cells. Similarly, low expression levels were recorded in the CMS line for two genes responsible for pollen tip growth and three genes involved in pollen wall assembly. Lower expression was also recorded in the CMS line for Rho-guanine exchange factor (C.cajan_00354), which acts downstream to PRK2 in order to control the polarization of pollen tube growth and the gene C.cajan_04138 implicated in glucan synthase activity associated with normal development of male gametophyte. These genes showing downregulation in ICPL 88039A are known to control several functions including formation of pollen exine, callose wall, and pollen tube growth.
Differentially expressed genes also included C.cajan_ 44741, C.cajan_04312, C.cajan_27022, C.cajan_18474, and C.cajan_32517 coding for pectate lyase 3, polygalactouronase, glycan metabolism, pollen-specific SF3, and formin-like protein 5, respectively, that are known to play a significant role in pollen maturation and pollen-pistil interaction (Table 1). Similarly, C.cajan_04911 (a borate transporter) that maintains boron homeostasis for pollen viability had low expression in CMS line. One gene, C.cajan_29311, that was downregulated in CMS line encodes cytochrome b5, which is involved in the reduction of cytochrome P-450, a key player in flavonoid biosynthesis in the petals and seed development.
The gene C.cajan_19641 coding for cytochrome c oxidase assembly protein subunit 11 (COX 11) showed abundant expression in the CMS line. Cytochrome c oxidase is the marker enzyme attached to mitochondrial inner membrane (Liu et al., 2013). Early cytochrome c release, ROS burst, and abnormal tapetal PCD have been reported to cause pollen abortion in Welsh onion CMS lines (Liu et al., 2016). In this context, we also observed higher expression level of cysteine proteinase (C.cajan_06109) in the flower buds of fertile plants, leading to timely degeneration of tapetum and production of functional pollen grains. Phosphatidylinositol 3, 4, 5 trisphosphate 3-phosphatase (PTEN) protein and cAMP binding protein (also known as calmodulin-binding protein) directly involved in Ca +2 ions mediated signaling controlling The Plant Genome T A B L E 2 Details of primer pairs selected for quantitative polymerase chain reaction verification of the RNA-sequencing data pollen tube maturation were also found to be upregulated in fertile line.

Validation of the DEGs using qRT-PCR
Six DEGs were chosen for qRT-PCR in order to verify the expression patterns established from RNA-Seq data ( Table 2). The results of qRT-PCR of the selected DEGs were highly consistent with the RNA-seq results (Figure 8). Among these DEGs, C.cajan_26704, C.cajan_13635, C.cajan_19226, and C.cajan_28973 encode pollen-coat-like protein, arabinogalactin protein 16, L-ascorbate oxidase, photosystem I/P700 in chloroplast, and electron acceptor A 0 , A 1 , and F x , respectively. The qRT-PCR results confirmed the RNA-seq data. Selection of DEGs for quantitative PCR validation relied upon higher number of transcripts corresponding to a single DEG having higher fold change and possible association with the trait of interest based on literature survey.

DISCUSSION
Whole transcriptomic data were used to decipher molecular mechanism of CMS in the study. Anther development is a complex process involving communication between nuclear and mitochondrial genes. Four models that describe CMS occurrence in plants are energy deficiency, aberrant PCD, retrograde regulation, and cytotoxicity (Chen & Liu, 2014). The energy deficiency model associating impaired ATPase genes with the CMS trait in various crops registers concordance with our findings. ATPases, plasma membrane-type, exhibited reduced expression in the CMS line in the current study.
A similar downregulation was also observed in CMS line for the genes that participate in oxidative phosphorylation and electron transport chain like H+ transporting ATPases. Our observation finds close agreement with the transcriptome profiling of CMS and its maintainer line in Welsh onion (Allium fistulosum L. ) (Liu et al., 2016). According to the candidate genes reported in our study and in soybean , carbohydrates are not only responsible to provide energy but also control pollen development as a signaling molecule and cater the higher energy demand during reproductive phase of crop plants. Rhoads et al. (2006) have suggested an increase in ROS concentration under hypoxic condition or low expression of mitochondrial complexes, influencing the normal PCD.
In the present study, sucrose transporter genes SUC 8, SUC 10, and SUC 14 showed higher expression in fertile line. Similarly, a sucrose transporter gene (SUC 1) suppressed by the miRNA was found in Chinese cabbage [Brassica rapa L. subsp. pekinensis (Lour.) Hanelt] ogura-CMS sterile line (Tyms) (Wei et al., 2015). In the fertile line, we also noticed enhanced expression for the genes that are related to cell wall loosening, such as polygalactouronase, expansins, pectate lyase, beta 1-3 glucosidase, endoglucanase, which is in accordance with the findings reported earlier by Rhee et al. (2015) based on differential gene expression between the sterile line and its near isogenic fertile line. Likewise, significant differences in transcript abundance were monitored between CMS and its fertile line in soybean for genes participating in pollen wall development like plant invertase and pectin methylesterase inhibitor and pectate lyase family proteins . Our observations gather strong support from gene expression atlas (CcGEA) in pigeonpea (Pazhamala et al., 2017). Most importantly, the three highly connected hub genessucrose-proton symporter 2 (C.cajan_35396), a pollen specific SF3 protein (C.cajan_07765), and an uncharacterized protein (C.cajan_28171) of the expression network for flower-related genes-showed higher expres-sion in the fertile line. Interestingly, expression pattern of C.cajan_28171 was further validated by qRT-PCR analysis in the present study. Similarly, pentose and glucuronate interconversions was represented by 18 genes in our analysis, and the genes such as C.cajan_ 44741, C.cajan_04312, and C.cajan_27022 had lower expression in CMS lines. In CMS line, we also noted low expression of pollen-specific genes like L-ascorbate oxidase (C.cajan_19226), β-galactosidase (C.cajan_32927), polygalacturonase (C.cajan_04312), etc. Pazhamala et al. (2017) also associated pollen viability in pigeonpea with impairments in boron homeostasis, which concurs with our observation on differential expression of boron transporters (C.cajan_04911, C.cajan_39524) between the two lines. Consistent with earlier reports, we observed lower expression of arabinogalactan glycoproteins (AGPs), such as GPIanchored proteins and Fasciclin-like arabinogalactan protein 3, in the CMS line, suggesting their participation in impaired microspore development (Rhee et al., 2015;Saxena et al., 2020).
Specific expression of BoPMEI1, a pollen-specific pectin methylesterase inhibitor, in mature pollen grains and tubes confirms the participation of pectin methylesterase inhibitors as candidate genes in pollen tube growth (Zhang et al., 2010). Other genes, such as those coding for pollen Ole e 3 allergen and expansin family protein, which have been established as developmental regulators, also showed downregulation in CMS line vs. fertile line. This observation also remains in close agreement with the findings reported earlier in soybean based on DEG profiling of CMS line and its fertile counterpart .
Enhanced expression of genes encoding late embryogenesis abundant (LEA) proteins in fertile line is also noteworthy as LEA-like proteins, such as LP28, participate in pollen maturation in several crops [e.g., lily (Lilium longiflorum Thunb.)] and the presence of LP 28 in pollen tube wall implied toward its contribution in the growth of the pollen tube (Mogami et al., 2002). Importantly, the antioxidant activity of LEA protein in eliminating ROS and protecting cell wall was elucidated in soybean (Liu et al., 2011). For that reason, inhibition of ROS scavenging enzymes is considered as a potential reason for mitochondrial damage and subsequent sterility trait in soybean CMS line .
Greater abundance of calmodulins (CaM) and calmodulinlike (CML 11) proteins in fertile line also supported their function in pollen development. Evidence suggests a connection between CaM and signal transduction mechanisms that regulate growth and orientation of pollen tube (Rato et al., 2004). We also observed enhanced expression of Ca ++ -dependent protein kinase (CPK) in fertile line in this study. In a similar manner, prevalence of 12 of the 34 Arabidopsis thaliana (L.) Heynh. CPK genes was demonstrated in pollen, and importantly, cpk17 and cpk34 could be conclusively associated with perturbation in pollen tube growth and their Ca ++ -dependent activation was also illustrated (Myers et al., 2009). Golovkin and Reddy (2003) ruled out the role of calmodulin-binding protein (No Pollen Germination1: NPG1) in microsporogenesis; however, their implications to pollen germination were established. Two more DEGs corresponding to calcium-binding EF-hand family showed upregulation in fertile line. Another DEG, relevant to plant fertility, is glutamate decarboxylase 1 (GAD), which generates gamma-aminobutyric acid from L glutamic acid. Various research groups have generated evidences that confirm the role of gamma-aminobutyric acid in guidance of pollen tube growth (Biancucci et al., 2015).
We also observed differential expression of genes associated with various metabolic events, in particular, glucose and lipid metabolism. Genes related to glycolysis and gluconeogenesis-like phosphoglycerate kinase, phosphoenolpyruvate carboxykinase, and fructokinase showed downregulation in CMS line. Similarly, glycerol-3-phosphate dehydrogenase, connecting carbohydrate metabolism with lipid metabolism leading to the formation of glycerol and electrons in mitochondria, also showed downregulation in CMS line. Based on histological and RNA-seq data analysis, defects in pollen wall formation resulting from disrupted fatty acid synthesis were shown to be causative for male sterility in cotton (Gossypium spp.) GMS line (Wu et al., 2015).

CONCLUSION
We performed high-throughput transcriptome analysis using Illumina sequencing technology to study the differential gene expression patterns between the two isogenic alloplasmic lines of pigeonpea. Consequently, a set of 505 genes showed marked differences in their expression level between the sterile and fertile lines. Based on GO, BiNGO, and KEGG analysis we could establish a possible connection of CMS trait of pigeonpea with perturbations in glucose and lipid metabolism, energy deficiency, pollen development and pollen tube growth, and ROS generation and PCD. Our findings are congruent with a previous report in pigeonpea  and similar mechanisms reported in other crops. Our findings offer a comprehensive foundation to accelerate functional genomics of CMS trait of pigeonpea. However, future research will be required to validate the specific functions of genes associated with pigeonpea CMS.

C O N F L I C T O F I N T E R E S T
The authors declare that they have no conflict of interest.

AU T H O R C O N T R I B U T I O N S
AB conceptualized study and planned experiments; AB, SNSJ, SP, RJ, LP, AKM, KRS, MA, GP carried out the experiments; AB, PG, AR, performed analysis and interpretation; AB prepared the original draft with support from PG, AT and RKV; RKS, RKV, DD, NPS review and edited the manuscript. All authors have read and approved the final manuscript.

A C K N O W L E D G M E N T S
AB acknowledges financial support Indian Council of Agricultural Research (ICAR), New Delhi (Grant No. AGENIAS-RICOP201501000047). RKV thanks Bill and Melinda Gates Foundation for support.