Identification of Hub genes associated with infection of three lung cell lines by SARS‐CoV‐2 with integrated bioinformatics analysis

1Department of Clinical Medicine, The Third Clinical School of Guangzhou Medical University, Guangzhou, China 2Department of Clinical Laboratory Medicine, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China 3Key Laboratory for Major Obstetric Diseases of Guangdong Province, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China 4Key Laboratory of Reproduction and Genetics of Guangdong Higher Education Institutes, The Third Affiliated Hospital of Guangzhou Medical University, Guangzhou, China


| INTRODUC TI ON
were obtained from Gene Expression Omnibus (GEO). The study was designed to identify key biomarker candidates for SARS-CoV-2 and improve the diagnosis and prognosis based on functional and molecular analyses by evaluating DEGs in three groups.

| Data inclusion and DEG screening
The gene expression profile of GSE147507 (https://www.ncbi.nlm. nih.gov/geo/query/ acc.cgi?acc=GSE14 7507) in this study was ob- series of matrix file data sets were downloaded, the gene probes were converted into gene names on the GPL18573 platform, and the matrix of data counts was convert to tpm format. Then, the limma software package in R software was used to standardize and screen each set of data, with the screening criteria set as: |log2FC| > 1 and P < .01 for the purpose of identifying genes with significant changes.

| Protein-protein interaction (PPI) network construction, modular analysis and Hub genes identification
To analyse protein interactions, the PPI network was established with the help of the STRING online database (version 11.0; http:// string-db.org/). The minimum required interaction score was set as medium confidence >0.4. The initial PPI network created with the online tool was to some extent complicated, so the Cytoscape software (version 3.7.2) was utilized to visualize and draw the interactions between proteins. In addition, the MCODE plug-in in Cytoscape was adopted to explore the important modules in PPI network (the default parameters). The genes with top-ten node degrees are defined as hub genes.

| Verification of hub genes in intersection results
After identifying the intersection hub genes from three groups of data, the data of GSE150316 (https://www.ncbi.nlm.nih.gov/geo/ query/ acc.cgi?acc=GSE15 0316) obtained after bioinformatics analysis were used for verification. We found out the expression matrices of the hub genes corresponding to our research in this data set that contains enough samples of patients. And we imported the data of the infection group and the control group of them into GraphPad Prism (version 8.0.2) for t tests and non-parametric tests. Finally, we choose P < .05 as the standard to screen the hub genes.

| Identification of DEGs in infected SARS-CoV-2 cell lines
The original microarray data of GSE147507 related to SARS-CoV-2 were obtained at Gene Expression Omnibus (GEO). The data of GSE147507 were divided into three groups according to the differences of cell lines, namely Calu-3, A549 and NHBE. With P < .01 and |logFC| > 1 as the screening criteria, a total of 1286, 747 and 80 DEGs were extracted from Calu-3, A549 and NHBE groups, respectively (Table S1). In addition, the DEGs of the three groups were analysed through intersection, and finally 29 DEGs that were all continuously up-regulated in the three groups were obtained ( Figure 1A). Based on the data of GSE147507, three groups of volcano maps ( Figure 1B

| GO function enrichment analysis of the DEGs
Composed of the biological pathway (BP), the CC, and the MF, GO enrichment analysis for the DEGs in three groups of Calu-3, A549 and NHBE were shown in Table S2 and Figure 1H-J,L-N,P-R.

| KEGG pathway analysis
KEGG enrichment analysis was conducted on all DEGs and corresponding P-value and P-adjust values of each pathway were obtained. Subsequently, the top 10 channels of KEGG significance in each group were sorted out after the processing of a large amount of data. Taking more details into consideration, the gene names corresponding to each pathway were also marked in the Table S3. Dot plots were mapped for each set of data to give a more intuitive description of the results of KEGG analysis ( Figure 1K,O,S).
The analysis of the results displayed that although the data derive from different types of samples, some pathways, such as TNF signalling pathway, NF-kappa B signalling pathway, IL-17 signalling pathway, NOD-like receptor signalling pathway, and DEG, all showed upward trends.

| PPI network analysis
The STRING online tool was applied to construct PPI networks for the three groups independently, together with the intersected PPI network of DEGs data of the three groups for the sake of better understanding the interaction between proteins. It turned out that the PPI network of the Calu-3 group has 234 nodes and 265 edges (Figure 2A), that of A549 group 220 nodes and 224 edges ( Figure 2B), that of NHBE group 60 nodes and 364 edges ( Figure 2C), and the intersected PPI network 29 nodes and 129 edges ( Figure 2D). In addition, this study identified the hub genes with top-ten node degrees in the intersected PPI network: CXCL1, CXCL2, TNF, NFKBIA, CSF2, TNFAIP3, IL6, CXCL3, CCL20 and ICAM1 ( Figure 2E).

| Verification of hub genes
In consideration of the rigorousness of this study, data from the GSE150316 gene data set were used to verify the 10 hub genes obtained. With P < .05 as the standard, it was found that only the analytic results of CXCL2, IL6 and CCL20 genes were statistically significant ( Figure 2F).

| D ISCUSS I ON
This study obtained gene expression profiles of SARS-CoV-2 from GEO database and performed DEGs screening, GO and KEGG analysis, so as to understand the biological functions of these DEGs and report meaningful enrichment pathways. Subsequently, PPI analysis was conducted to identify the hub genes that play a key regulatory role in the pathologic process of infection.
Based on GO enrichment analyses of the DEGs among three groups, it was found that the response to virus, defence response to virus, and response to type I interferon all have high enrichment scores in the BP. These findings were with those from previously published studies which documented that the occurrence of coronavirus infection causes the body to initiate an innate immune response and trigger IFN gene up-regulation to achieve the antiviral status. 6 Besides, the CC category of Calu-3 and A549 in enrichment analyses was I-kappaB/NF-kappaB complex and transcription factor AP-1 complex. Transcription factors NF-kappaB and AP-1 make a big different in T cell activation processes. 7 Interestingly, the CC is associated with high-density lipoprotein particle in NHBE cell, implying that SARS-CoV-2 may regulate the lipid composition, lipid synthesis and signalling of host cell. 8 According to KEGG analysis, after SARS-CoV-2 infection, there were four signalling pathways in the three groups changing jointly, among which the TNF signalling pathway transform is the most significant. This study analysed the significantly altered signalling pathways after SARS-CoV-2 infection, found out the possible pathogenic mechanisms and organisms of antiviral mechanisms, to provide new ideas for its treatment. In the NF-kappa B signalling pathway, NF-kB as a key transcription factor is crucial for innate and adaptive immunity. Studies have shown that the M protein of SARS-CoV interacts with IKKb, inhibits the degradation of IkBa protein and the expression of NF-kB-dependent Cox-2, so it is reasonable to believe that SARS-CoV can evade immune responses by changing the gene expression of key inflammatory molecules. 9 In TNF signalling pathway, Penicillium marneffei is a human pathogen that exists in macrophages and threatens immunocompromised patients. After infection with Penicillium marneffei, the body produces an important defence mechanism that induces TNF-α production via extracellular signal-regulated kinase (ERK) 1/2 to resist Pseudomonas marneffei. 10 In addition, HCV-infected cells will affect IFN-α/β induction and response, which may inhibit IFN-α/β induction by viral protease-mediated cleavage of MAVS and TRIF, thereby inhibiting its antiviral effect against HCV. 11 These studies indicate that the up-regulation of TNF pathway may be beneficial for the inhibition of SARS-CoV-2.
After the above research and analysis, PPI networks were constructed by STRING, and from the intersection, 10 central genes were obtained, which were verified with data from GSE150316 database for the sake of rigorousness of scientific studies. The results showed that the analytic results of IL-6, CXCL2 and ICAM-1 were statistically significant, which suggested that these three genes possibly play a key regulatory role in the course of SARS-CoV-2 infection. And there are studies to back that up. For example, studies have discovered a significant increase in IL-6 expression in patients with COVID-19. In line with the principle that inhibited expression of IL-6 can produce an obvious anti-inflammatory effect, 12 it is expected by some researchers that IL-6 blockers be used to treat cytokine release syndrome caused by COVID-19, 13 thus saving patients' lives. On the other hand, CXCL2 is also a cytokine highly expressed in infections cause by various viruses, such as Zika Virus, 14 which will promote its expression and mediate an inflammatory response. ICAM-1 was encoded by Group 2 innate lymphoid cells to reduce lung inflammation by destroying the homoeostasis and function of ILC2s. 15 At the same time, the overexpression of ICAM-1 and knockdown can also promote and block the production of rhinovirus, indicating that they also have certain regulatory effects on virus transfection.
In spite that this study included data from multi-type samples, it had certain limitations. For one thing, the data studied is relatively small in size and may not be universal enough. For another, as the samples from which the data were extracted were mostly artificially cultured, this study lacked live samples, which would also compromise the reliability of this study.

ACK N OWLED G EM ENTS
We sincerely thank the third clinical college of guangzhou medical university for its technical support. Our heartfelt gratitude also goes to Miss Sun Jingjing from Southern Medical University for revising and polishing this article in language.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

S U PP O RTI N G I N FO R M ATI O N
Additional supporting information may be found online in the Supporting Information section.