Identification of immune‐related genes contributing to head and neck squamous cell carcinoma development using weighted gene co‐expression network analysis

Abstract Background This study aimed to identify genes related to the degree of immune cell infiltration in head and neck squamous cell carcinoma (HNSCC), explore their new biological functions, and evaluate their diagnostic and prognostic value in HNSCC. Methods Transcriptomic data from The Cancer Genome Atlas (TCGA) HNSCC dataset was used to screen differentially expressed genes between tumors and normal tissues, followed by weighted correlation network analysis (WGCNA) to identify immune‐related modules. Differential gene expression, immune cell infiltration, and survival analyses were performed to screen key genes. The expression of these key genes was validated in Oncomine and gene expression omnibus (GEO) datasets and by immunohistochemistry (IHC). Results 1869 and 1578 genes were significantly upregulated and downregulated in HNSCC. WGCNA showed that the brown module was associated with the most significant number of immune‐related genes. PPI network analysis demonstrated that PPL, SCEL, KRT4, KRT24, KRT78, KRT13, SPRR3, TGM3, CRCT1, and CRNN were key components in the brown module. Furthermore, the expression levels of KRT4, KRT78, KRT13, and SPRR3 in HNSCC correlated with infiltration levels of CD8+ T cells and macrophages. Survival analyses revealed that the expression of KRT78, KRT13, and SPRR3 in HNSCC correlated with overall survival (OS). The IHC assay indicated that KRT13 (p = .042), KRT78 (p < .001), and SPRR3 (p = .022) protein expression levels in HNSCC were significantly lower than in normal tissues. Analysis of GSE65858 and GSE41613 datasets showed that a worse OS was associated with low expression of KRT78 (p = .0086, and p = .005) and SPRR3 (p = .017, and p = .02). Conclusions Our findings suggest that KRT4, KRT78, KRT13, and SPRR3 are related to the occurrence and development of HNSCC. Importantly, KRT78 and SPRR3 might serve as diagnostic and prognostic biomarkers of HNSCC.


| INTRODUCTION
Head and neck squamous cell carcinoma (HNSCC) is a common malignancy, with an annual incidence of 500 000 cases worldwide. 1,2 It is well-established that most patients with HNSCC are diagnosed with advanced or metastatic disease at diagnosis, with a 5-year overall survival (OS) rate of only 50% and a recurrence or metastasis rate of 30%. 3,4 Indeed, once recurrence and metastasis occur, the prognosis is poor, and the median survival time is only 6 months. These findings emphasize the need to identify novel approaches to reduce recurrence and metastasis rates and improve the survival of patients with recurrent and metastatic HNSCC.
In recent years, tumor immunity research has gathered momentum given the efficacy of immune checkpoint inhibitor (ICI) treatment in various tumors. 5 An increasing body of evidence suggests that immune cell infiltration is essential for the efficacy of ICIs [6][7][8] and the cause and predictor of cancer recurrence and metastasis. 6,7 It is well-established that ICIs are efficient in HNSCC 8 ; however, predicting the efficacy of ICIs and the mechanism underlying ICI resistance remain unclear. Current evidence suggests that the tumor immunosuppressive microenvironment can inhibit immune cell function and infiltration levels. 9 Accordingly, it is necessary to explore the mechanisms underlying the inhibition of immune cell infiltration in HNSCC.
F I G U R E 1 Flow chart of overall analysis The rapid advent of bioinformatics has enhanced our understanding of the biological functions of tumors, providing the foothold for the discovery and verification of tumor biomarkers. There is ample evidence showing that immune-related genes in HNSCC can predict tumor prognosis and are potential biomarkers. 10,11 However, these genes are known to be inversely related to immunity. Importantly, bioinformatic analysis provides a novel approach to uncover genes related to immune cell infiltration and explore their biological functions and their potential clinical translational value as diagnostic and prognostic markers.
Herein, weighted correlation network analysis (WGCNA) was conducted to identify highly correlated gene clusters using module clustering. 12 We sought to identify specific immune-related genes associated with the development or prognosis of HNSCC from The Cancer Genome Atlas (TCGA). Furthermore, we explored key genes related to the occurrence of HNSCC, the biological process involved, and their prognostic value in this patient population.
We used DESeq2 13 to analyze the differential expression of genes in the transcriptomic data of TCGA HNSCC database. The screening criteria for differentially expressed genes (DEGs) included: jlog2 (fold change)j>2 and p < .05.
In addition, the expression levels of key genes of two HNSCC datasets from the Oncomine database (Ye Head-Neck, 14

and Peng
Head-Neck), 15 were included in our analyses.

| Weighted correlation network analysis
WGCNA 16 was used to identify co-expressed gene modules in DEGs and convert the adjacency matrix into a topological overlap matrix (TOM). According to the TOM-based dissimilarity, DEGs were finally divided into 13 different modules. Using the Inna-teDB 17 database, a database of genes involved in innate immunity, we identified 824 immune-related genes, of which 72 exhibited significant changes in HNSCC. Furthermore, genes from the brown module were selected as the gene set for subsequent analysis since it contained the most significant number of immune-related DEGs (n = 223).

| Gene ontology function enrichment analysis
Metascape 18 (https://metascape.org/), an online analysis tool, was used to perform Gene ontology (GO) function enrichment analysis (Biological Process, Molecular Function, Cellular Component) on DEGs, with the parameters set to min overlap = 3, p-value cutoff = .01, and min enrichment = 1.5. 19 Specifically, the immunological signature (Immunologic Signature) was selected for functional enrichment analysis, and the parameters remain unchanged.

| Gene set enrichment analysis
The Gene set enrichment analysis (GSEA) 20

| Protein-protein interaction network construction
We obtained the protein-protein interactions (PPI) (score >0.4) of 223 immune-related DEGs using STRING database 21 (https://string-db.org/), F I G U R E 3 Gene set enrichment analysis (GSEA) of differentially expressed genes (DEGs) related to head and neck squamous cell carcinoma and the main interaction network was visualized using Cytoscape. 22 The top 10 key genes were obtained using CytoHubba, and the interaction network was drawn.

| Tumor immune infiltration analysis
We analyzed the gene expression data of HNSCC samples from TCGA using TIMER2.0 23 and analyzed the correlation between KRT4, KRT78, KRT13, and SPRR3 expression and immune cell (CD8+ T cells and macrophages) infiltration levels.

| Statistical analysis
The Wilcoxon rank-sum test and Wilcoxon signed-rank test were used to analyze the expression of key genes in tumor and normal tissues.   Table 1). First, differential expression analysis showed that 1869 and 1578 genes were significantly upregulated and downregulated in HNSCC, respectively ( Figure 2A, Table S1).
GO annotation demonstrated that these DEGs were mainly enriched in biological processes, including muscle contraction, myofibril assembly, and muscle structural development ( Figure 2B). GO terms for molecular functions consisted of structural molecular activity, muscle structural components, and receptor ligand activity ( Figure 2C), and the related cellular components included contractile fibers, extracellular matrix, and A-bands ( Figure 2D). Meanwhile, enrichment analysis of the immune characteristics of these DEGs showed that they were involved in the regulation of CD4+ T cells, macrophages, and CD8+ T cells ( Figure 2E, Table S2). Furthermore, GSEA showed that biological processes that were significantly enriched in normal tissues included oxidative phosphorylation  Figure 3N), DNA repair ( Figure 3O) and mitotic spindle ( Figure 3P; Table S3).

| Co-expression analysis of DEGs and PPI network in HNSCC
To identify gene sets related to the occurrence of HNSCC, we performed WGCNA based on transcriptomic data. Using a soft-threshold of 4 ( Figure 4A,B), WGCNA identified 13 modules ( Figure 4C-E).
Compared with the other modules, the brown module contained the most significant number of immune-related genes (n = 7; Table S4). Two hundred and twenty-three genes were enriched in the brown module, and many interactions were identified between the proteins encoded by these genes (Figure 5A,B). Proteins encoded by PPL, SCEL, KRT4, KRT24, KRT78, KRT13, SPRR3, TGM3, CRCT1, and CRNN were core components in the PPI network ( Figure 5C).

| Differential expression, immune cell infiltration, and survival analysis of key genes
The expression of key DEGs screened from normal and HNSCC tissues was analyzed. We found that their expression levels in the HNSCC group were significantly lower than in the normal group and SPRR3 expression levels were significantly related to DFI F I G U R E 9 Correlation between the expression level of candidate genes and the clinical stage of head and neck squamous cell carcinoma ( Figure 8C) and OS ( Figure 8D), respectively. Finally, KRT4, KRT78, and SPRR3 were selected as our candidate genes.

| Correlation analysis of key gene expression and clinical characteristics
A low correlation was found between KRT78 and SPRR3 expression levels and clinical T stage. However, the expression levels of these two genes in T4 HNSCC patients were significantly lower than in T3 HNSCC patients ( Figure 9A,B), while no correlation was found between KRT13 expression and T stage ( Figure 9C). The expression levels of these three candidate genes showed a decreasing trend as the N stage increased ( Figure 9D-F). Similarly, KRT78, KRT13, and SPRR3 expression levels showed a downward trend with increasing HNSCC clinical stage ( Figure 9G-I). These results suggested that the three candidate genes were closely related to HNSCC development ( Table 2). To further analyze differences in the expression of KRT13, KRT78, and SPRR3 genes in HNSCC and normal tissues, we used a microarray of 10 normal tissues and 70 HNSCC tissues to verify the protein expression of the above genes. The results showed that the protein expression levels of KRT13 (p = .042, Figure 11A), KRT78
After verifying the expression differences of KRT13, KRT78, and SPRR3 in HNSCC and normal tissues, we used GEO datasets (GSE65858 and GSE41613) to analyze their prognostic value (

| DISCUSSION
The present study explored the relationship between HNSCC development and immune infiltration via a comprehensive integrated analysis of transcriptomic and clinical data of HNSCC patients in public databases. To conclude, we identified four hitherto unrecognized key genes, writingoriginal draft (lead); writingreview and editing (lead).