Exploring gene regulatory interaction networks and predicting therapeutic molecules for hypopharyngeal cancer and EGFR‐mutated lung adenocarcinoma

Hypopharyngeal cancer is a disease that is associated with EGFR‐mutated lung adenocarcinoma. Here we utilized a bioinformatics approach to identify genetic commonalities between these two diseases. To this end, we examined microarray datasets from GEO (Gene Expression Omnibus) to identify differentially expressed genes, common genes, and hub genes between the selected two diseases. Our analyses identified potential therapeutic molecules for the selected diseases based on 10 hub genes with the highest interactions according to the degree topology method and the maximum clique centrality (MCC). These therapeutic molecules may have the potential for simultaneous treatment of these diseases.

With the advent of Information technology, the Bioinformatics research field is becoming increasingly attractive to researchers and academicians.The recent development of various Bioinformatics toolkits has facilitated the rapid processing and analysis of vast quantities of biological data for human perception.Most studies focus on locating two connected diseases and making some observations to construct diverse gene regulatory interaction networks, a forerunner to general drug design for curing illness.For instance, Hypopharyngeal cancer is a disease that is associated with EGFR-mutated lung adenocarcinoma.In this study, we select EGFR-mutated lung adenocarcinoma and Hypopharyngeal cancer by finding the Lung metastases in hypopharyngeal cancer.
To conduct this study, we collect Mircorarray datasets from GEO (Gene Expression Omnibus), an online database controlled by NCBI.Differentially expressed genes, com-1 arXiv:2402.17807v1[q-bio.GN] 27 Feb 2024 mon genes, and hub genes between the selected two diseases are detected for the succeeding move.Our research findings have suggested common therapeutic molecules for the selected diseases based on 10 hub genes with the highest interactions according to the degree topology method and the maximum clique centrality (MCC).Our suggested therapeutic molecules will be fruitful for patients with those two diseases simultaneously.

K E Y W O R D S
GEO(Gene Expression Omnibus), Hypopharyngeal cancer(HC), EGFR-mutated lung adenocarcinoma, Microarray Datasets, Hub Gene, Degree Topology, Therapeutic Molecule

| INTRODUCTION
Bioinformatics, which combines the capabilities of computer science with biology, has expanded significantly in recent years [1].Several Bioinformatics toolkits are leveraged to achieve the desired result for the experiment.Bioinformatics can research the molecular causes of sickness, describe the disease's situation from the gene's nook, and reduce the amount of time and money spent on the process by utilizing computer abilities to narrow the scope of the investigation and improve the quality of the results [1].200 different cell types and 100 different cancers have been found among the 100 trillion cells in the human body [2].Cancer is a group of diseases characterized by abnormal cell proliferation that attacks like a crab and utilizes its numerous claws to try to kill its target [2,3,4,5,6,7,8].
The prognosis of tumors that originate from other head & neck sites is better than that of Hypopharyngeal cancer, a less frequent type of head & neck cancer [9].With only 15-30% of patients living for more than five years, Hypopharyngeal carcinoma, which makes up around 5% of all head & neck malignancies, has a horrible prognosis [10] [11].Two common risk factors for Hypopharyngeal carcinoma, are alcohol consumption and smoking [12].According to the American Cancer Society, the Human Papillomavirus also causes Hypopharyngeal carcinoma.
The epidermal growth factor receptor gene is the most commonly mutated gene in lung cancer (EGFR) [13].
Lung squamous cell carcinoma (SCC), which has an estimated frequency of 3% to 18%, is comparatively uncommon compared to lung adenocarcinoma (1-10) [13].Lung cancer, which comprises both small and non-small cells, is the leading cause of cancer-related death globally [14,15].The world's highest incidence and fatality rates are associated with the most prevalent kinds of cancer [16].Risk factors for lung cancer include smoking, passive smoking, age, gender, family history, chronic lung disease, chest radiotherapy, diet, obesity, physical activity, alcohol consumption, employment, education, and income [17].The Human Papillomavirus might potentially increase the risk of developing Lung Cancer [18].
Head and Neck Cancers as well as Lung Cancers pose significant challenges to global health [19].Head and Neck Cancer is among the most frequent malignancies to migrate to the lungs [20] [21] [22].Following bone and soft tissue sarcomas, [23] recognized head-neck cancer as the third most frequent reason for pulmonary mastectomy.[24] reported in 2012 that there were 686,000 new instances of head and neck cancer, 1,825,000 new cases of lung cancer, and a combined mortality rate of 5% and 19%, respectively.As Hypopharyngeal cancer is a type of head and neck cancer and EGFR-mutated lung adenocarcinoma is also one form of Lung cancer.So, we can claim that patients with EGFR-mutated lung adenocarcinoma may have the potential to develop Hypopharyngeal cancer, according to the preceding statistic.Also, Hypopharyngeal cancer may potentially spread to lung adenocarcinoma with EGFR mutation, and lung adenocarcinoma with EGFR mutation may potentially metastasize to Hypopharyngeal cancer.[25] This paper's researchers did an analysis of the general population's lung metastases in newly diagnosed hypopharyngeal cancer.[26] According to the Canadian Cancer Society, lung cancer may develop if hypopharyngeal cancer progresses.Therefore, this concludes that they are related genetically because they share genes.This set of shared genes is restrained by regulatory interaction network pathways.
In this research, we aim to look into common DEGs(Differentially Expressed Genes), Hub genes, Various Gene Regulatory networks, and Therapeutic Molecule for Hypopharyngeal cancer and EGFR-mutated lung adenocarcinoma using Bioinformatics technology.
We used two datasets for EGFR-mutated Lung Adenocarcinoma and Hypopharyngeal Cancer.Each of these datasets has eight samples.The DEGs (Differentially Expressed Genes) shared by these two datasets are extracted using the R programming language.These widely distributed DEGs help to identify GO terms, pathways, PPI networks, and TF-miRNA.Based on the hub genes of patients with Hypopharyngeal Cancer and EGFR-mutated Lung Adenocarcinoma who have these 2 diseases concurrently, certain therapeutic compounds are envisaged.Hypopharyngeal cancer and EGFR-mutated lung adenocarcinoma can be associated with each other directly or indirectly.These diseases have some common interrelated genes.Gene regulatory interaction networks are developed by using different types of Bioinformatics tools.The PPI network is visualized, and common drugs are developed for the selected two associated diseases.A PPI network describes the connections between proteins in a biological system in the context of biological study.The process of visualizing this network usually entails producing a map or graphical depiction that shows the connections and interactions between various proteins.By illustrating these relationships graphically, scientists can better understand the intricate biological mechanisms involving proteins and possibly pinpoint important hubs or nodes in the network that are essential to cellular operations or disease processes.Visualizing the PPI network is all things considered, a step towards better understanding the complexities of molecular interactions through the analysis of biological data.Developing common drugs indicates the creation of common medications for the selected diseases.In addition, designing one common drug for two associated diseases decreases the amount of drug one should absorb for the diseases separately.
Microarray data exploration is among the most well-known techniques used for extensive investigations of gene expression, and high throughput technologies are becoming more and more important in the field of biomedical research [27].Researchers in genetics can analyze gene expression simultaneously with the help of microarray studies [28].This research attempts to discover the relationship between the selected 2 diseases.GSE212398 for Hypopharyngeal Cancer and GSE198672 for EGFR-mutated Lung Adenocarcinoma, datasets are used for the investigation.The NCBI's GEO database served as the source for both dataset selections.Shared DEGs are collected from those 2 datasets.Figure 1 presents the proposed methodology's flow diagram.This work aims to identify targeted therapeutic molecules for these 2 diseases.Targeted therapy has very often a remarkable effect against cancer.Drug compounds can serve multiple purposes in cancer treatment, including reducing the size of tumors before surgery, removing any remaining cancer cells after surgery, or as a last resort when other treatments are ineffective or cancer recurs.
Our contributions are summarized as follows.
• To propose a Bioinformatics framework for integratively analyzing expression profiles of Lung adenocarcinoma and Hypopharyngeal cancer samples to find commonly found biomarkers.
F I G U R E 1 Diagram representing the proposed methodology of the current research.For Hypopharyngeal Cancer and EGFR-mutated Lung Adenocarcinoma, two datasets are used.Each dataset has eight samples.Using the R programming language, the DEGs (Differentially Expressed Genes) from those two datasets are retrieved.The VENNY tool is used to find out the common genes between these two diseases.With the aid of these widespread DEGs, GO terms, pathways, PPI networks, TF-miRNA, and Hub genes are identified.Functional association, TF-gene, Gene-miRNA, Gene-disease, and Some therapeutic compounds are anticipated based on the hub genes of individuals with Hypopharyngeal Cancer and EGFR-mutated Lung Adenocarcinoma who have these 2 diseases concurrently.
• To conduct detailed downstream analyses based on commonly found biomarkers.
• Finally, propose some therapeutic agents for those biomarkers via drug-target analyses.
The rest of the paper is organized as follows.We review the related literature in Section 2. In section 3, the methodology is presented.The result analysis is conducted in the Section before concluding the paper in Section.

| RELATED LITERATURE
In this section, we discuss some earlier works based on some different diseases using the various bioinformatics approaches.
[29] Molecular biomarker identification to suggest therapeutic targets for the creation of medicines to treat esophageal cancer.The authors collected these GSE93756, GSE94012, GSE104958, and GSE143822 datasets from the National Center for Biotechnology Information's (NCBI) Gene Expression Omnibus.Using the R Language Limma Package, DEGs were collected by applying the adjusted value (< 0.05).After that, DEGs' GO and Pathway Enrichment Analysis was done.And PPI and Clustering Analysis were also done based on the DEGs.[30] discussed idiopathic pulmonary fibrosis (IPF) people who have SARS-CoV-2 infections are genetically more likely to develop IPF.The GEO NCBI database was used to collect the GSE147507 and GSE35145 datasets.DEGs for GSE35145 were retrieved using the GEO2R tool that is included by default with the dataset in GEO, while DEGs for GSE147507 were gathered using the R programming language.Adjusted P-value (< 0.05) and log2-fold change (absolute) (> 1.0) were used as the cut-off criteria.Common genes for these 2 diseases were used for gene set enrichment analysis, PPIs network construction, hub gene searching and module examination, TF-miRNA identification, and candidate drug suggestion.
[31] Identification of the SARS-CoV-2 biomarkers and pathways that complicate the condition in patients with pulmonary arterial hypertension.The R programming language's limma and DESeq2 packages were used to gather DEGs of GSE147507 for SARS-CoV-2 infection in human lung epithelial cells.DEGs of GSE117261 for PAH lung were found through the GEO2R tool of the GEO NCBI database.Adjusted P-value (< 0.05) and log2-fold change (absolute) (> 1.0) were used as the cut-off criteria.For the objective of controlling the false discovery rate, the Benjamini-Hochberg approach was used on both datasets.Common DEGs from these 2 datasets were used for further analysis.[32] Employing Bioinformatics analysis to screen for and identify potential target genes in head and neck cancer.
The author used the dataset GSE58911 from GEO.An interactive web tool called GEO2R, which is by default available on GEO, was used to identify the DEGs.The cutoff criteria were an adjusted P value < 0.05 and a log fold-change (FC) ≥ 1 or ≤ −1.KEGG and GO enrichment analysis was performed using the DEGs.Additionally, a PPI network was constructed, and hub genes were identified using the DEGs.[33] Bioinformatics analysis to identify relevant HNSCC (head and neck squamous cell carcinoma) genes from public databases.In this research work, DEGs were deemed significant if their logFC ≥ 1 or ≤ −1 and an adjusted P value < 0.05.After retrieving the DEGs, several analyses were conducted, including GO, KEGG, PPI, DEG survival analyses, verification of key genes via Oncomine, specimens, and real-time PCR.[34] Using integrated Bioinformatics techniques, the identification and analysis of genes linked to head and neck squamous cell carcinoma.GSE13601, GSE31056, and GSE30784 datasets from GEO were downloaded and analyzed using R language.To identify the DEGs from 3 different datasets, p (< .01)and |l og (F C ) | (> 1) were chosen as the cutoff.Common genes of the 3 different datasets were also identified by using the Venn Diagram package in R language.Further analysis (Analysis of KEGG pathways and gene ontologies, Top modules and hub genes in a PPI network identification, Validation of hub gene relative mRNA expression levels, Examining the hub genes' protein levels in the human protein atlas database, Hub gene survival analysis using the TCGA database, RNA extraction and real-time quantitative PCR, and analysis of statistics) was done based on the common DEGs.[35] By Using Integrated Bioinformatics Analysis, Hub Genes Associated With the Development of Head and Neck Squamous Cell Carcinoma Have Been Found.From TCGA and GEO, the gene expression profiles for HNSCC were retrieved.Using WGCNA, Key Co-expression Modules were identified, and DEGs were defined as genes with the cut-off criteria of |l og F C | ≥ 1.0 and adj.P < 0.05.Functional Analysis of Interest Genes, PPI construction and hub gene screening, Validation of Hub Gene Expression Patterns and Prognostic Values, and Validation of Survival-Related Hub Gene Protein Expressions by the HPA Database were also conducted in this study.[36] Using TCGA and GEO Datasets, a study was conducted to identify potential biomarkers and analyze survival data for head and neck squamous cell carcinoma.Using R, the GSE6631 dataset for head and neck squamous cell carcinoma was analyzed.Here, adj.p-val(< 0.05) was applied to differential gene screening in order to control the number of false positive results.The heat and volcano maps were also constructed for the corresponding DEGs.Enrichment analysis, PPI analysis, Hub genes survival analysis, Key genes verifications, analysis of Immunohistochemical, and Finding Potential Small Molecules were also identified in this paper.
[37] Using bioinformatics to investigate lung adenocarcinoma prognostic biomarkers.GSE31210, GSE32665, GSE32863, GSE43458, and GSE72094 datasets for lung adenocarcinoma were used in this paper.| log 2 F C | ≥ 1.5 and p < 0.05 were the cut-off criteria to retrieve the DEGs from these datasets.Enrichment analysis, Finding and Verifying the Prognostic Gene Signature, Interactive Analysis of Gene Expression, Analysis of the Prognostic Model's Independence, and Nomogram construction were all conducted in this paper.[38] The use of bioinformatics to identify important biomarkers in patients with lung adenocarcinoma.The GSE10072 dataset from the GEO database was used in this paper.The adjusted P value < 0.05 and | log 2 F C | ≥ 1 were the cut-off criterion to retrieve DEGs.
All of these steps such as the Analysis of KEGG pathways and gene ontologies, the Top 5 upregulated and top 5 downregulated comparison, The top 5 downregulated and top 5 upregulated stages of overall survival (OS), Analysis of the PPI network and modules were done here.[39] Microarray data analysis using bioinformatics to find potential lung adenocarcinoma biomarkers.The GEO database was used to download the datasets (GSE118370, GSE32863, GSE85841, and GSE43458) for lung adenocarcinoma.DEGs were defined with | log 2F ol d C hang e | ≥ 1 and FDR < 0.05 .Analysis of GO and KEGG enrichment, Analyzing modules and building a PPI network, and Analyses of hub genes were examined here.[40] Using bioinformatics analysis, elevated mRNA levels of the genes AURKA, CDC20, and TPX2 are linked to a poor prognosis for lung adenocarcinoma caused by smoking.GSE31210, GSE32863, GSE40791, GSE43458 and GSE75037 datasets from the GEO database were analyzed here.Using the cut-off criteria of P (< 0.05) and absolute fold change (> 1.5), the DEGs were retrieved.The functional enrichment analysis was done for 58 DEGs.After that, the Validation of data and statistical analysis steps were performed in this research work.

| METHODOLOGY
In this section, we present the methodology of our experiments.We have introduced a process of designing gene superintendent interaction networks, including PPI networks, Interaction between TFs and genes, Network regulating gene-miRNA Interactions, and Network of the Gene-Diseases for Hypopharyngeal Cancer and EGFR-mutated Lung Adenocarcinoma, and also suggested common drug compounds for these two associated diseases.
The steps in the proposed methodology are described below.

| Dataset Selection
NCBI (National Center for Biotechnology Information) is an online platform from which we can collect many forms of biological data in a variety of formats; these data are also accessible in a variety of computer-readable formats.
Datasets used in this research were gathered from the NCBI platform's GEO (Gene Expression Omnibus) database.
The GEO database for high throughput gene expression analysis can be accessed through the National Center for Biotechnology Information platform [41].RELA is dependent on CD271 expression and stem-like features in Hypopharyngeal cancer, according to the dataset (GSE212398).The dataset (GSE198672) contains EGFR-mutated lung adenocarcinomas that develop from pre-existing tumor cells and persist in a specialized stromal milieu as drug-tolerant persisters after erlotinib treatment.The RNA Sequence from GSE198672 and GSE212398 was extracted using the GPL10558 (Illumina HumanHT-12 V4.0 expression bead chip) and GPL20844 (Agilent-072363 SurePrint G3 Human GE v3 8x60K Microarray 039494 [Feature Number Version]) platforms, respectively.The GSE212398 dataset is a subseries of the GSE212399 dataset.For our investigation, we chose the GSE212398 dataset because this dataset contains 8 samples including 4 samples for Control and 4 samples for KO.The GSE198672 dataset also has 8 samples.

| Differential Expression analysis
Finding Differentially Expressed Genes(DEGs) from these microarray datasets is the initial step for this particular research.The GEOquery [42] R package was used for retrieving gene expression datasets for both diseases from the NCBI GEO [43] database.Next, the limma [44] R package with empirical Bayes statistics was used for differential expression (DE) analysis.The DE output was formatted as a comma-separated values (CSV) file containing information, including Gene Symbol, logFC, p.value, and adjusted p.Val for the corresponding disease dataset and collected.Both datasets' false discovery rates were controlled using the Benjamini-Hochberg [45] method.An adjusted P-value (< 0.05) and log2-fold change (absolute) > 1 are the cutoff criteria for obtaining DEGs for both datasets.Using the Venny tool [46], the shared DEGs between these two diseases were visualized.

| Enrichment analysis of shared DEGs
Enrichment of gene sets is the study of gene sets with connected chromosomal locations, molecular activities, and biological functions [47].Gene Ontology (GO), which is divided into the three categories of biological process, molecular function, and cellular component, is used for gene product annotation [48].Understanding the molecular activity, cellular function, and the location in a cell where the genes perform their functions serves as the main foundation for choosing GO keywords [30].The Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway, which has a significant advantage over gene annotation, is frequently used to study metabolic pathways [49].For extensive route analysis, databases from Reactome [50], BioCarta [51], and WikiPathways [52] were used in addition to the KEGG pathway.

| TF-miRNA coregulatory network
To determine which transcription factors (TFs) bind with shared DEGs in the regulatory regions, target gene relationships between TFs and TFs were examined [53].MiRNAs that attempt to bind on a gene transcript to negatively affect protein expression have been identified using miRNAs target gene interactions [54].The RegNetwork repository [55] provided interactions for TF-miRNA coregulatory interactions, which make it easier to identify regulatory TFs and miRNAs that regulate DEGs of interest during the transcriptional and post-transcriptional phases [30].Utilizing the NetworkAnalyst platform, we constructed the TF-miRNA coregulatory network [56].Researchers can browse complex datasets with the help of NetworkAnalyst to find biological traits and functionalities that can be used to generate useful biological hypotheses [57].The minimum Network option was selected among the different available formats to construct the TF-miRNA coregulatory network.

| PPI Network
PPI activity is thought to be the main area of interest in cellular biology research and is necessary for system biology [58].With the aid of cutting-edge research on PPI networks, the number of complex biological processes is identified [59] [60].Proteins operate inside of cells through interactions with other proteins, and information produced by a PPI network aids in our understanding of how proteins function [61].Given to the STRING [62] database (https://stringdb.org/),shared DEGs of Hypopharyngeal cancer and EGFR-mutated lung adenocarcinoma are used to create a PPI network and discover the genes that are directly associated among the common genes.Some basic settings are set on the STRING to get our desired result such as setting the network type as full STRING network, selecting the meaning of network edges by evidence, also selecting active interaction sources by Textmining, Experiments, Databases, Co-expression, Neighborhood, Gene Fusion, and Co-occurrence.Interactive svg (network is a scalable vector graphic [SVG]; interactive ) is selected for network display mode in Advanced Settings.The information provided by STRING is based on expected and experimental interactions, and the interactions generated by the web tool are characterized by 3D structures, supplementary data, and evidence scores [63].After constructing this PPI network from STRING, this STRING PPI network was further reconstructed in Cytoscape to identify only the interconnected Genes and remove the disconnected genes among those 32 shared genes.With the help of the web-based NetworkAnalyst [56] software (https://www.networkanalyst.ca/),identified directly interconnected genes (from Cytoscape PPIs network) were entered into InnateDB [64] to design additional PPIs for interconnected genes.Here Auto Layout was selected in the Layout option to build this network.

| Retrieving Hub Genes
Hub nodes are referred to be the highly connected nodes in a large-scale PPI network [65].The cytoHubba plugin for the Cytoscape program is used to locate hub nodes.The user-friendly cytoHubba interface makes it the most popular hub identification plugin for Cytoscape, and it comes with 11 topological analysis methods [66].Among the 11 topological methods on CytoHubba, The Degree method and The maximum clique centrality (MCC) were chosen to identify the hub genes.In the degree topology method, the degree is counted according to the number of interactions among the genes.Higher interacted genes from the given input genes are easily identified.The gene has the highest number of degree scores ranking as top among all genes.The most important candidate genes among the shared DEGs, which may be crucial in physiological regulatory functions, were found using the maximal clique centrality (MCC), which demonstrated better accuracy in predicting critical proteins in the PPI network.
The Maximal Clique Centrality (MCC) technique was found to be the most efficient way to locate hub nodes in a PPI network [67].Also the authors of article [68] mentioned that the most efficient technique for identifying hub nodes was thought to be the Maximal Clique Centrality (MCC) algorithm.So, these two methods (The Degree Topology Method, One of the most popular topological method [69] and The maximum clique centrality (MCC), the most efficient method among the available 11 methods [66]) that were chosen to identify hub genes out of the 11 available.

| Functional association Network
A Bioinformatics program called GeneMANIA displays functional association information, genetic relationships, pathways, and co-expression for a given set of input data [70].Gene sets' functions can be predicted with the aid of Gen-eMANia [71].The percentage of Co-expression, Physical interactions, Predicted, Pathway, Co-localization, Shared protein domains and Genetic interactions for the given input genes are easily identified through the Functional association Network.Physical interactions between two or more proteins can produce binary interactions and complex proteins [72].Genes are associated in gene co-expression networks, which are transcription factor-transcription factor association networks that are typically presented as undirected graphs [73].Unlike most co-expression networks, which are undirected graphs, this network showed a close relationship between two genes [74].In studies on proteinprotein interactions, two genes are connected if they are found to interact.These ligand-based protein networks, which foresee the ability of nearby proteins to bind connected substances indirectly, may be used to enhance genetically orientated gene networks, which foretell the significance of a procedure or a disease [75].Using a data analysis technique called gene co-expression analysis, it is possible to find groupings of genes that have comparable expression patterns under various conditions [76].The link between the genes' functions is referred to as genetic interaction [77].The top 10 hub genes were used to demonstrate a functional network from GeneMania [78].

| TF-gene interactions
By analyzing the TF-gene interaction using the discovered 10 hub genes, one may determine the impact of TF on functional pathways and gene expression levels [79].Users can do a meta-analysis and analyze gene expression for numerous species with the use of NetworkAnalyst [56].The control of gene transcription as well as the establishment of cellular identity and activity are assumed to depend on transcriptional factors (TFs), the TF (transferrin) gene-producing proteins [80].The TF-gene interaction investigates how TF affects functional pathways and levels of gene expression [81].Finding the important TF-gene interactions is crucial for comprehending the roles of pleiotropic global regulators [82].Through direct or indirect interactions with other TFs, specific TFs help regulate the expression of a variety of target genes [80].To control life activities, several transcription factors interact [83].The 10 hub genes are utilized to evaluate the impact of TF on the functional pathways and expression levels of the genes through TFgene interaction analysis.To find TF-gene interactions with well-known genes, researchers use the NetworkAnalyst platform [56].NetworkAnalyst includes activities that are typical of network topologies and can be used to analyze biological modules [84].The NetworkAnalyst platform's ChEA [85] database provided inspiration for the network built for the TF-gene interaction network [30].

| Gene-miRNA interactions
By analyzing the Gene-miRNA interaction using the discovered 10 hub genes, one may determine the impact of TF on functional pathways and gene expression levels By base-pairing with their target mRNAs, microRNAs, a class of brief, non-coding RNA molecules with a length of 21-25 nucleotides, regulate the expression of genes, primarily by silencing or down-regulating the target genes [86].Natural single-stranded tiny RNA molecules known as microRNAs control the expression of genes by attaching to certain mRNAs and either starting the translation of the target mRNA or starting the destruction of the target mRNA [87].Small non-coding RNAs known as microRNAs (miRNAs) were discovered to promote mRNA degradation or prevent post-transcriptional translation [87].Evidence is mounting that miRNAs have a role in carcinogenesis and cancer metastasis [88].More and more varieties and uses for small non-coding RNAs are being discovered.This implies that there may be regulatory mechanisms that are far more complex than those now employed in the analysis and creation of gene regulatory networks.Finding new therapeutic targets can benefit from the analysis of inter-pathway regulatory factors.Since non-coding miRNAs are important for activating pathways, their activity is crucial in this regulatory environment.In the control of transcriptome processes, microRNAs are crucial [89].For many biological processes in both plants and animals, posttranscriptional mediators of gene expression such as microRNAs are crucial [90].To fully comprehend a miRNA's biological function, it is crucial to pinpoint the genes that it regulates [77].MiRNAs can be retrieved by using the TarBase database.TarBase is a comprehensive repository of animal microRNA targets supported by experimental data.The database is also functionally connected to several other helpful resources, including Gene Ontology (GO) and the UCSC Genome Browser.
TarBase provides a rich data set from which to evaluate characteristics of miRNA targeting that will be helpful for the upcoming generation of target prediction tools.TarBase reveals substantially more empirically supported targets than even recent evaluations indicate [91].The network of gene-miRNA interactions is created using the web-based tool TarBase under NetworkAnalyst for those 10 hub genes (JUN, ERBB2, HLA-DMB, HBEGF, HLA-B, HLA-DRA, DUSP5, ARHGDIB, MUC4, CLEC2D).

| Gene-disease interactions
Gene-disease interactions network focuses mostly on the most recent understanding of human genetic illnesses, including complex, mendelian, and ecological diseases [92].Gene-disease interactions network helps to identify those diseases that can occur due to the input genes.This network helps us to identify the risk factors that should be cured by therapeutic molecules.Gene-disease interactions network focuses primarily on the most recent knowledge of complex and ecological diseases, as well as other human genetic ailments [93].DisGeNET is a sizable database of gene-disease interactions that combines information from several sources and covers a range of biological traits linked to diseases [92].The hub genes were linked to related diseases and their chronic states by the network analysis of gene-disease correlations.DisGeNET [92] is a large database of gene-disease interactions that incorporates links from several sources and covers a variety of biological features associated with illnesses [53].The investigation of gene-disease correlations using NetworkAnalyst identified associated diseases and their chronic conditions with the hub genes (JUN, ERBB2, HLA-DMB, HBEGF, HLA-B, HLA-DRA, DUSP5, ARHGDIB, MUC4, CLEC2D) [57].

| Therapeutic molecule suggestion for selected diseases
Therapeutic molecule suggestion is the pointer step of this research.DSigDB is used for drug suggestion.Users can access the DSigDB database via the Enrichr platform (https://amp.pharm.mssm.edu/Enrichr/)[30].Enrichr is mostly used as an enrichment analysis tool, which offers substantial graphical data on the combined functions of the input genes [94].There are 19,531 genes, 22,527 gene sets, and 17,389 unique chemicals in DSigDB [95].A new gene set resource called Drug Signatures Database (DSigDB) connects medicines and compounds with their target genes.To forecast drugs, DSigDB largely employs gene expression-based datasets, and each group of genes is seen as being targeted when taking a molecule into account [95].

| Differential Expression analysis identifies common DEGs between Hypopharyngeal cancer and EGFR-mutated lung adenocarcinoma
We found 605 identical DEGs for Hypopharyngeal cancer (GSE212398) and 1062 identical DEGs for EGFR-mutated lung adenocarcinoma (GSE198672) by using the R programming language.Among those identical DEGS, 32 common genes were identified between Hypopharyngeal cancer and EGFR-mutated lung adenocarcinoma through the Venny tool.The ven Diagram of Shared DEGs between the two diseases is shown in Figure2.
F I G U R E 2 Venn Diagram of shared DEGs.32 common genes were found between HC and EGFR-mutated LC.Common DEGs were 2% among 1667 DEGs.

| Enrichment of functional pathways and Gene ontology terms
The analysis of gene set enrichment was performed using the online tool Enrichr [30].Many databases, including The GO [96], Reactome [50], KEGG [97], WikiPathways [98], and BioCarta [51], were used [31] to find GO keywords and cell-informing pathways.The GO database was used to find the biological process, molecular function, and cellular components.Analysis of biological process, molecular function, and cellular component data revealed notable involvement in Peptide antigen assembly with MHC protein complex, ErbB-3 class receptor binding, and MHC protein complex in shared DEGs respectively.MAPK Family Signaling Cascades, Allograft rejection, Allograft Rejection, and D4-GDI Signaling Pathway were highly enriched among all identified when Reactome, KEGG, WikiPathways, and BioCarta databases were used, respectively.Table 1 shows the top 5 Biological terms, Cellular terms, and Molecular terms and table 2 shows the top 5 pathways from Reactome, KEGG, WikiPathways, and BioCarta with correspondent P-value and genes.Top 10 GO terms concomitant to biological process, molecular function, and cellular component pinpointing entrenched on combined score in this Figure 3 and also based on a combined score, the top 10 pathways from Reactome, KEGG, WikiPathways, and BioCarta are mentioned in this Figure 4. To get the combination score, multiply the z score, which represents the deviation from the predicted rank, by the log of the p-value from the Fisher exact test.The "Combined scores" for Figures 3 and 4 are automatically calculated in the Enrichr platform.
In the biological process, peptide antigen assembly with MHC protein complex (GO:0002501) indicates "Peptide attachment to an MHC protein complex's antigen-binding groove".The Interferon-gamma-mediated signaling pathway (GO:0060333) means the cascade of molecular signals that begins when interferon-gamma binds to its receptor on a target cell's surface and ends with the control of a cell's transcription, among other downstream cellular processes.The only type II interferon so far discovered is interferon-gamma.The antigen processing and presentation of exogenous peptide antigen via MHC class II (GO:0019886) is the process by which an MHC class II protein complex collaborates with an antigen-presenting cell to express a peptide antigen of external origin on the cell surface.Typically, but not always, a complete protein is used to digest the peptide antigen.The negative regulation of reproductive processes (GO:2000242) indicates any procedure that slows down, prevents, or lessens the number of times, how often, or how much the reproductive process occurs.The antigen processing and presentation of peptide antigen via MHC class II (GO:0002495) is the process by which an MHC class II protein complex collaborates with an antigen-presenting cell to express a peptide antigen on the cell surface.Usually, but not always, the protein in its whole serves as the source of the peptide antigen.In cellular components, the MHC protein complex (GO:0042611) is An MHC class II beta chain or an invariant beta2-microglobulin chain, along with or without a bound peptide, lipid, or polysaccharide antigen, makes up a transmembrane protein complex.The endosome membrane (GO:0010008) indicates a lipid bilayer that envelops an endosome.The lumenal side of the endoplasmic reticulum membrane (GO:0098553) indicates the leaflet-shaped side of the plasma membrane that is facing the lumen.The integral component of the lumenal side of the endoplasmic reticulum membrane (GO:0071556) indicates a portion of the endoplasmic reticulum membrane made up of gene products that can only pass through the membrane's lumenal side.The cytoplasmic vesicle membrane (GO:0030659) is a cytoplasmic vesicle's protective lipid bilayer.In molecular function, ErbB-3 class receptor binding (GO:0043125) indicates ErbB-3/HER3 protein-tyrosine kinase receptor binding.The MHC class II protein complex binding (GO:0023026) is the main histocompatibility complex of class II.The phosphatidic acid transfer activity (GO:1990050) means phosphatidic acid is taken out of a membrane or a monolayer lipid particle, transported through the aqueous phase while being sheltered in a hydrophobic pocket, and then brought to a membrane or lipid particle that will accept it.Phosphatidic acid is a type of glycophospholipid that typically has a phosphate group attached to carbon-3, an unsaturated fatty acid attached to carbon-2, and a saturated fatty acid attached to carbon-1.
The CCR6 chemokine receptor binding (GO:0031731) is chemokine CCR6 receptor binding.The oxidoreductase activity, acting on NAD(P)H, heme protein as acceptor (GO:0016653) indicates an oxidation-reduction (redox) reaction that uses NADH or NADPH as a hydrogen or electron donor to reduce a heme protein is catalyzed.

| TF-miRNA coregulatory network construction
To comprehend how TF and miRNA regulate with shared DEGs, a TF-miRNA coregulatory network was developed.
Background as White and layout as Circular Bi/Tripartite were chosen to better visualise the network.Also, Opacity, Thickness, Color, Label and size were also customized from Edge and Node options.Red color for TF, Green-Black highlighted for seeds and Blue for miRNA were chosen from The Global Node Styles.The network shown in Figure 5, comes with 93 nodes, 223 edges, and 28 seeds.

| PPI Network
The network of protein-protein interactions was built using the STRING 6. Interconnected genes and disconnected genes are easily differentiated from this Figure 6.This Network was further evoked to Cytoscape for better visualization.This PPI network shown in Figure 7, contains only 17 connected genes.Another PPI network was contrived by IMEx Interactome of NetworkAnalyst using the corresponding connected genes to understand the infection state by those corresponding genes.This network shown in Figure 8, contains 972 nodes, 1110 edges, and 16 seeds.These 16 seeds are those 17 interconnected genes except "CLEC2D".These 16 seeds have a higher degree of interaction with the Protein.As "CLEC2D" had no significant interactions in the network.So, the IMEx Interactome automatically removed "CLEC2D" as the network seed.

| Pinpointing Hub Genes
For this research, the top 10 hub genes were taken.Because these top 10 hub genes are considered to be the most responsible genes among all.If these 10 hub genes are cured by a therapeutic molecule, then all other affected genes may have the possibility to recover as well as these hub genes have interaction with other genes.10 hub genes had been pinpointed from the reconstructed PPI network as shown in Figure 7 of Cytoscape using the Degree topology and the maximal clique centrality (MCC) method.Table 3 and 4 show the top 10 hub genes according to the degree topology method and the maximal clique centrality (MCC) method respectively.The JUN has the highest interaction among the retrieved 10 hub genes.The same hub genes were retrieved using 2 different methods, the Degree topology method and The maximal clique centrality (MCC) method.JUN and CLEC2D have the highest and lowest scores, respectively, in both methods.Seeing that from the TF-gene interactions network and the Gene-miRNA network, the hub genes with score 1 have contributed to those networks.As microRNAs (miRNAs) are discovered to promote mRNA degradation or prevent post-transcriptional translation understanding the functions of pleiotropic global regulators requires identifying the significant TF-gene interactions.After that 2 networks were drawn through The Cytoscape.The Grid Layout was chosen to extract the Figure 9 and the Figure 10.

| Functional association Network
The 10 hub genes were used in the GeneMania Functional Network.This  new components of a pathway or complex, discover extra genes that may have escaped existing screens, or discover novel genes that have a particular function, such as protein kinases.If the input gene list has five or more genes, GeneMANIA uses the "assigned based on query gene" technique to assign weights to enhance connectivity between all of the given input genes.To maximize the interaction between genes on a given list and minimize the interaction with genes not on a given list, the weights are automatically selected using linear regression.As our input gene list had more than 5 genes (10 Hub genes), this default method was done here.This network has 30 nodes and displays functional keywords such as shared protein domains, co-expression, physical interactions, predicted, path- ways, and genetic interactions [99] and also shows the percentage of the functional keywords for our interested gene set.36.76%Co-expression, 31.14%Physical Interactions, 15.88% predicted, 6.48% Pathway, 4.72% Co-localization, 4.51% Shared protein domains and 0.50% Genetic Interactions are found in the Functional Association Network.A higher level of co-expression (36.76%) between the transcripts associated with the two selected diseases and a consistent 31.14%physical interaction.This implies a more pronounced connection at the genetic and molecular levels between the diseases.The acknowledgment of these strong associations is then used to support the idea that it is relatively straightforward to customize or modify a generic medication to effectively treat both distinct illnesses.The thesis here is that a single medicine can target a shared biological basis shared by the two diseases because of the significant co-expression and physical interaction between the genes and proteins linked to them.It might be easier to create a drug that treats both illnesses at once because of their similar foundation.affected and responsible genes among all.If we cure these genes with any therapeutic molecules, other genes that are connected to these 10 genes will be also cured by the same therapeutic molecules.If these 10 genes are not cured by molecules, other associated diseases can occur due to these genes.From our suggested 8 drugs, the Retinoic acid CTD 00006918 can affect 8 hub genes among the 10 hub genes.Patients having both diseases (Hypopharyngeal cancer and EGFR-mutated lung adenocarcinoma) concurrently may have a higher possibility of cure by using our suggested drug compounds.Our suggested drugs may be approved in the future after doing further chemical experiments, testing, and so on.If some related illnesses are found with Hypopharyngeal Cancer and EGFR-mutated lung adenocarcinoma then future research in this area aims to create a single generic drug to treat some related illnesses, offering a fresh perspective.

| CONCLUSION
This study mentioned that the selected two diseases may have the possibility to metastasize to one another.Analyzing any disease means analyzing the disease genes.The constructed PPI Networks displayed all the Directly associated genes, general genes, and a channel that ensures the route to a general remedy map.The Cytohubba module was used to identify 10 hub genes using the Degree Topology approach and the maximum clique centrality (MCC).Only those genes that are interconnected with each other and have higher interaction among all are taken for this research purpose.If we can recover the directly affected, higher interconnected genes of a disease, we can get rid of those selected diseases (Hypopharyngeal cancer and EGFR-mutated Lung Adenocarcinoma).The next step is to employ GeneMania to develop a new network for the 32 shared genes to learn more about their physical interactions, shared protein domains, shared pathways, and genetic interactions.TF-gene, Gene-miRNA, and Gene-Disease association networks were designed by using the same 10 hub genes.After analyzing those networks, some well-known therapeutic molecules were suggested for Hypopharyngeal cancer and EGFR-mutated lung adenocarcinoma by using the 10 hub genes as input.A common drug for selected two associated diseases aims to reduce the amount of drug one should take and also reduce cost.Future research in this area aims to create a single generic drug to treat several related illnesses, offering a fresh perspective.

Figure 11
helped us to predict how certain gene sets will behave.Utilizing a massive collection of functional association data, GeneMANIA discovers additional genes that are connected to a set of input genes.Protein and genetic relationships, pathways, co-expression, colocalization, and protein domain similarity are all examples of association data.GeneMANIA can be used to discover F I G U R E 5 Visualization of TF-miRNA coregulatory network through NetworkAnalyst.Green-black highlighted Nodes indicate seeds, Red Diamond-shaped nodes for TF, and Blue Box Shaped Nodes for miRNA.F I G U R E 6 Protein-Protein interaction Network through String.F I G U R E 7 PPIs Network through Cytoscape using the directly interconnected genes.

F I G U R E 8 4
PPIs Network obtained through IMeX intercome of InnateDB Database contains 972 nodes, 1110 edges, and 16 seeds.TA B L E 3 The 10 Hub Genes, ordered by degree of importance The 10 Hub Genes, ordered by MCC (The maximal clique centrality) Method.

F I G U R E 9
The Top 10 Hub genes Network according to the Degree Topology Method through Cytoscape.F I G U R E 1 0 The Top 10 Hub genes Network accordance with The maximal clique centrality (MCC) Method through Cytoscape.

F I G U R E 1 1
Functional association Network through GeneMania.36.76%Co-expression, 31.14%Physical Interactions, 15.88% Predicted, 6.48% Pathway, 4.72% Co-localization, 4.51% Shared protein domains and 0.50% Genetic Interactions are found here.F I G U R E 1 2 Visualization of TF-gene association network through NetworkAnalyst.Red Diamond Shaped Nodes indicate TF-gene and Green-Black highlighted circle-shaped nodes for seeds F I G U R E 1 3 Visualization of Gene-miRNA network through NetworkAnalyst.Green-black highlighted Nodes indicate 17 seeds and Blue box-shaped nodes for miRNA, Edges connect the Genes and miRNAsF I G U R E 1 4Gene-disease Network is divided into 3 subnetworks.Subnetwork1 represents genes (HLA-B, DUSP5, ARHGDIB, JUN, HBEGF) and their corresponding associated diseases, Subnetwork2 acts for HLA-DRA genes with its associated diseases and Subnetwork3 focuses on the MUC4 gene with its correspondent genes.Here, Green-Black highlighted Nodes for seeds and red box-shaped nodes for associated diseases.