Identification of a glycolysis‐related gene signature associated with clinical outcome for patients with lung squamous cell carcinoma

Abstract Background Lung squamous cell carcinoma (LUSC), one of the main types of lung cancer, has caused a huge social burden. There has been no significant progress in its therapy in recent years, Resulting in a poor prognosis. This study aims to develop a glycolysis‐related gene signature to predict patients’ survival with LUSC and explore new therapeutic targets. Methods We obtained the mRNA expression and clinical information of 550 patients with LUSC from the Cancer Genome Atlas (TCGA) database. Glycolysis genes were identified by Gene Set Enrichment Analysis (GSEA). The glycolysis‐related gene signature was established using the Cox regression analysis. Results We developed five glycolysis‐related genes signature (HKDC1, AGL, ALDH7A1, SLC16A3, and MIOX) to calculate each patient's risk score. According to the risk score, patients were divided into high‐ and low‐risk groups and exhibited significant differences in overall survival (OS) between the two groups. The ROC curves showed that the AUC was 0.707 for the training cohort and 0.651 for the validation cohort. Additionally, the risk score was confirmed as an independent risk factor for LUSC patients by Cox regression analysis. Conclusion We built a gene signature to clarify the connection between glycolysis and LUSC. This model performs well in evaluating patients’ survival with LUSC and provides new biomarkers for targeted therapy.

molecular targeted therapy's progress. However, only a small proportion of LUSC molecules have been identified, leading to its current treatment plan limited to platinum-based chemotherapy. 2 The limited therapeutic effect causes a 5-year survival rate lower than 5% for patients with LUSC. Hence, it is required further to clarify the underlying mechanism of the occurrence of LUSC and design new treatment strategies.
With the development of sequencing technology, many studies have confirmed that some molecular markers play a role in LUSC. These biomarkers were not only involved in transcription level but also a posttranscription level. For instance, circTP63 was reported to be upregulated in LUSC and associated with patients' tumor stage and size. It attenuated the inhibition of FOXM1 by competitively binding to miR-873-3p, thus promoting tumor cell cycle progression. 3 Another study highlighted the critical role of miRNA in LUSC. They found six genes and 12 miRNAs closely associated with the OS of patients with LUSC, which may be considered new therapeutic targets. 4 Additionally, lncRNA HULC was confirmed to promote LUSC by regulating the PTPRO/NF-κB pathway. 5 These studies have deepened our understanding of the pathogenesis of LUSC and guided prognosis and targeted therapy. However, a single biomarker's efficacy in predicting the prognosis of patients with LUSC may be insufficient. Some researchers suggested that the establishment of multigene signatures may be a suitable option to evaluate cancer patients' prognosis. 6 One of the "hallmarks of cancer" is disrupted energy metabolism, characterized by mainly relying on glycolysis. 7 The "Warburg effect" revealed that the rapid growth of tumor cells depended on efficient glycolysis to produce energy. 8 A large number of studies have found the association between glycolysis and various tumors. Altenberg et al. 9 found that glycolysis genes were ubiquitously overexpressed in 24 types of tumors. Eight out of ten glycolytic enzymes were upregulated in lung cancer. Another study 10 proposed a nine glycolysis-related genes signature to evaluate the metastasis and prognosis for lung adenocarcinoma. It is convincing that glycolysis-related genes may be a potential mechanism in the occurrence of lung cancer. But there is currently a lack of study linking LUSC with glycolysis genes.
In this study, we used the expression profile of 550 patients with LUSC obtained from TCGA. GSEA was used to find glycolysis genes, and a 5-gene signature was constructed. We further tested the performance of the gene signature in the training and validation set, respectively. These findings reveal a close association between LUSC and glycolysis and demonstrate the possibility of using glycolysisrelated gene signatures to assess patients' prognosis with LUSC.

| Data sources
The overall design of this study was exhibited in the flowchart ( Figure 1). Data of 501 patients with LUSC and 49 normal samples, including RNA-sequencing data (FPKM value) and clinical features, were extracted from the Genomic Data Commons (https://portal.gdc.cancer.gov/), which linked to The Cancer Genome Atlas (TCGA) database. The clinical features recorded were mainly age, gender, tumor stage, and TNM classification. Also, each case is accompanied by a detailed follow-up time and overall survival. Table 1 describes the detailed clinical information of all patients.

| Gene set enrichment analysis (GSEA)
GSEA software 4.0.3 was used to define significantly different glycolysis-related gene sets between patients with LUSC and normal samples. We downloaded six gene sets related to glycolysis from the Molecular Signatures Database (https://www.gsea-msigdb.org/gsea/msigd b/index.jsp). Each gene set had 1000 permutations to get a normalized enrichment score (NES). Normalized p-value <0.05, FDR <0.1, and NES >1.6 were used to determine whether the gene set was selected for subsequent analysis. We integrated gene sets with significant differences and finally got 311 genes related to glycolysis.

| Development and validation of the glycolysis-related gene signature
We performed a differential analysis of these 311 genes in normal samples and patients with LUSC by the Mann-Whitney-Wilcoxon test (p < 0.05). Univariate Cox regression analysis was used to identify genes significantly related to OS, and genes with p < 0.05 were selected for subsequent multivariate Cox regression analysis. We randomly divided the tumor samples into 7:3 groups and used them as the training set and the validation set, respectively. We performed a multivariate Cox regression analysis in the training cohort and built a five glycolysis-related genes signature. Next, a linear combination of the expression of five genes weighted with the regression coefficient formed a risk scoring system. The formula was as follows: Risk score = expression of the gene 1 × 1 +expression of the gene 2 × 2 + … + expression of the gene n × n β2βn.
Patients with LUSC in the training cohort were divided into high-or low-risk groups based on the median risk score.
The 5-gene signature and the median risk score were also used in the validation cohort.

| GO and KEGG pathway enrichment analyses
To investigate the specific function of differentially expressed genes (DEGs), we conducted Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. We visualized the top 20 significant terms for GO analysis and top 10 significant terms for KEGG analysis, respectively, using the clusterProfiler R packages. We identified the terms based on the cutoff of p-value < 0.01 and Benjamin-Hochberg adjusted pvalue < 0.05 as significant terms.

| Immunohistochemistry and lung cancer cell lines
Immunohistochemistry staining results were extracted from the Human Protein Atlas (https://www.prote inatl as.org/). The expression of glycolysis-related genes were compared in lung squamous cell carcinoma samples and normal lung tissue. The expression data of glycolysisrelated genes in cell lines were obtained from the Cancer Cell Line Encyclopedia (CCLE) database (https://porta ls.broad insti tute.org/ccle). It mainly included 136 nonsmall-cell lung cancer cell lines and 54 small-cell lung cancer cell lines.

| Statistics
The chi-square test was used for categorical variables. Kaplan-Meier curves and log-rank tests were used to compare survival differences between the two groups. ROC curves were used to access the effectiveness of the gene signature in predicting OS. Independent prognostic factors were identified by univariate and multivariate Cox regression analyses. CBioPortal (http://www.cbiop ortal.org/) provided genetic variation of five glycolysis-related genes. All statistical analyses and figures were mainly done by SPSS 24.0 and R 4.0.2 software. p < 0.05 was considered to be significant.

| Initial screening of glycolysis-related genes
Six glycolysis-related gene sets were analyzed by the GSEA method to detect whether there were significant differences between samples with LUSC and normal samples. Four gene sets were enriched in samples with LUSC in our study. (

| Identification of glycolysis-related genes relevant to OS of LUSC patients
We first conducted a differential analysis of these 311 genes. There were 265 genes differentially expressed in samples with LUSC and normal samples (p < 0.05). Next, univariate Cox regression analysis was performed to acquire DEG, among which 13 genes were relevant to OS of patients with LUSC ( Figure 3A). The association between these 13 genes and patients' survival was further validated via multivariate Cox regression analysis in the training cohort. We found that HKDC1 (HR: 1.273, 95% CI: 1.069-1.516) and MIOX (HR: 0.619, 95% CI: 0.407-0.940) were confirmed as independent predictors for patients with LUSC ( Figure 3B). Additionally, three filtered mRNA (HKDC1, ALDH7A1, and SLC16A3) appeared as risk factors with HR >1, whereas the other two filtered mRNA (AGL and MIOX) emerged as protective factors with HR <1 (Table 2).

| GO and KEGG enrichment analyses of DEGs
To investigate the biological function of DEGs, we carried out GO and KEGG enrichment analyses exhibited the top 20 terms of GO and the top 10 terms of KEGG enrichment results. We found that the top GO-BP terms ( Figure 4A) were associated with energy metabolism, such as "ATP biosynthetic process" (gene count = 89, p = 1.69E−126), "glycolytic process" (gene count = 86, p = 1.19E−139), and "carbohydrate catabolic process" (gene count = 95, p = 8.77E−127). The results of KEGG enrichment analysis ( Figure 4B) were mainly about metabolic reprogramming of tumor which were similar to that of GO-BP, such as "citrate cycle (TCA cycle)" (gene count = 18, p = 2.37E−21), and some pathway which was reported to play important role in cancer progression, such as "carbon metabolism" (gene count = 51, p = 8.62E−50), "HIF-1 signaling pathway" (gene count = 25, p = 8.10E−17), and "biosynthesis of amino acids" (gene count = 29, p = 1.26E−26).

| Development of the 5-genes signature to evaluate OS of the patient with LUSC
A predictive risk scoring system was built to calculate each patient's risk score based on the previous steps' results. The formula was as follows: Risk score = (expression of HKDC1 × 0.241) + (expression of AGL × −0.272) + (expression of ALDH7A1 × 0.111) + (expression of SLC16A3 × 0.165) + (expression of MIOS × −0.480). There was only one risk score for each patient with LUSC. We divided patients into high-and low-risk groups based on the training Cohort's median risk score. Kaplan-Meier curves showed that high-risk patients had poor OS than lowrisk patients in the training cohort (p < 0.001) ( Figure 3C). Survival analysis of the validation cohort also showed similar results (p = 0.006) ( Figure 3D). Furthermore, the ROC curves exhibited that the AUC was 0.707 for the training set ( Figure 3E) and 0.651 for the validation set, respectively ( Figure 3F), which indicated good sensitivity and specificity of the 5-genes signature in predicting OS for patients with LUSC. The distribution of risk score and survival information for each patient in the training cohort was shown in Figure 5 ( Figure 5A for the training cohort and Figure 5B for the validation cohort), revealing that high-risk patients had higher mortality and shorter survival time. The heatmap, respectively, displayed the differences in the expression profiles of the five F I G U R E 2 Enrichment plots of four glycolysis-related gene set that were significantly differentiated between LUSC patients and normal samples. The GO_GLYCOLYTIC_PROCESS gene set had an NES of 1.741 and a p value = 0.010; the HALLMARK_GLYCOLYSIS gene set had an NES of 2.313 and a p value < 0.001; the MODULE_306 gene set had an NES of 1.888 and a p value = 0.021; the REACTOME_ GLYCOLYSIS gene set had an NES of 2.095 and a p value < 0.001. Gene set data were from the Molecular Signatures Database (https://www. gsea-msigdb.org/gsea/msigd b/index.jsp). NES: normalized enrichment score genes between high-risk and low-risk groups in the training cohort and the validation cohort ( Figure 5). As the risk score of patients with LUSC increased, the expression levels of risk genes (HKDC1, ALDH7A1, and SLC16A3) were obviously upregulated. In contrast, the expression levels of protective genes (AGL and MIOX) were downregulated.   Figure 6C,D). These results indicated that the five glycolysis-related gene signature was an independent prognostic factor for patients with LUSC.

| Genetic alterations and expression levels of the five glycolysis-related genes in LUSC
The cBioPortal database covers the genomic data of 487 patients with LUSC, providing genetic alterations in different genes. Among the 487 patients with LUSC, 7 had mutations and 2 had deep deletions in HKDC1; 14 had mutations, 1 had amplifications, and 2 had deep deletions in AGL; 2 had mutations and 3 had deep deletions in ALDH7A1; 9 had amplifications and 2 had deep deletions in SLC16A3; and 2 had mutations, 3 had amplifications, and 7 had deep deletions in MIOX ( Figure 7A). The expression levels of the five glycolysis-related genes in samples with LUSC and normal samples were also analyzed. The results showed that AGL, SLC16A3, and MIOX were significantly high expressed in tumor samples, whereas HKDC1 and ALDH7A1 were significantly low expressed in tumor samples ( Figure 7B). subgroups except for the female, stage III-IV, and M1 subgroups ( Figure 8). It revealed that the five glycolysis-related gene signature was suitable for multiple categories of patients with LUSC.

| Validation of the glycolysis-related genes in tissue samples and lung cancer cell lines
To confirm the gene signature's reliability, we used immunohistochemical data to detect protein levels in five gene in normal lung tissue samples and lung cancer cell lines. The results showed that AGL and SLC16A3 were significantly overexpressed in tumor samples compared to normal samples ( Figure 9). HKCD1 and SLC16A3 were significantly overexpressed in non-small-cell lung cancer cell lines than small-cell lung cancer cell lines ( Figure 10).

| DISCUSSION
Previous studies had confirmed that some clinical characteristics, such as age, histological type, tumor size, tumor stage, and treatment, played a pivotal role in the prognosis of patients with LUSC and constructed predictive models. 11,12 With the popularization of genomics, researchers paid more attention to the impact of differences at the molecular level on patients' prognosis with LUSC. They identified many molecular markers that were believed to be at the core of most cases of LUSC, involving protein, mRNA, lncRNA, circRNA, miRNA, and DNA methylation, etc. However, these studies often focused on the impact of a single molecule on LUSC, which were insufficient for monitoring the prognosis of patients with LUSC because the molecule was involved in multiple pathways and regulatory processes. A reasonable solution is to establish a gene signature, combining various genes' expression to construct a prediction model. Several gene signatures for LUSC have also been proposed. For example, Li et al. 6 developed four methylation-driven genes signature (GCSAM, GPR75, NHLRC1, and TRIM58) through multivariate Cox regression analysis and verified it in external data. Another study identified seven IncRNAs as potential prognostic factors for LUSC and constructed a prognostic signature using five of them (AC022148.1, HCG9, LINC00460, C5orf17, and LINC00261). 13 In this study, we focused on glycolysis-related genes, which had a crucial role in tumors. In our study, GSEA analysis was carried out based on the mRNA data of 550 LUSC patients. We showed four glycolysis gene sets with significant differences (p < 0.05) between tumor samples and normal samples. We extracted the genes in these four glycolysis gene sets for subsequent analysis. Then, a 5-gene signature was built to evaluate each patient's risk score. Subsequent verification confirmed that the risk score derived from the five glycolysis-related genes signature could be used to classify patients' risk with LUSC and predict patients' prognosis.
As for the five glycolysis-related genes we identified, HKDC1, a putative fifth hexokinase, was one of the rate-limiting enzymes regulating glucose metabolism in F I G U R E 5 Risk score distribution, survival time, and heatmap of 5 genes' expression profile for each LUSC patient. A, High-risk patients had higher mortality and shorter survival time in the training cohort. HKDC1, ALDH7A1, and SLC16A3 were high expressed in high-risk patients, whereas AGL and MIOX were low expressed in high-risk patients; B, Similar results were observed in the validation cohort several organisms. 14 There was growing evidence indicating the association between HKDC1 and cancer susceptibility. Furthermore, it was plausible that HKDC1 could be a promising potential therapeutic target for numerous kinds of carcinomas. Li et al. 15 reported that high levels of HKDC1 was a risk factor for patients with lung squamous cell carcinoma who tended to exhibit a worse prognosis. Chen et al. 16 found that HKDC1 could promote breast tumor growth and transfer through the PGC1β/SREBP1 pathway.
ALDH7A1, an aldehyde dehydrogenase, functioned in the detoxification of aldehydes via lipid peroxidation and alcohol metabolism. Previous studies have identified the relationship between its role and the occurrence of non-small-cell lung carcinoma (NSCLC). ALDH7A1 was abundant in cancer stem cells, and knockdown of ALDH7A1 enhanced NSCLC sensitivity to cisplatin. What is more, Giacalone et al. 17 found that high expression of ALDH7A1 led to a high recurrence rate in surgically treated patients.
SLC16A3, also known as MCT4, one member of solute carriers transporting monocarboxylate molecules, was reported to regulate tumor cell migration, invasion, and proliferation in numerous kinds of carcinomas. [18][19][20] Moreover, SLC16A3 was regarded as a critical regulator for lactate metabolism in NSCLC cells based on aerobic glycolysis. Thus, SLC16A3 was believed to be a treatment site for glycolysis-preference cancer cells. 21 AGL was primarily responsible for breaking down glycogen and was suggested to be closely connected with bladder F I G U R E 6 Univariable and multivariable Cox regression analyses for each clinical feature and risk score. A, Stage, T staging, M staging, and riskScore were significant prognostic factors for LUSC patients by univariable analysis in the training cohort; B, T staging and riskScore were significant prognostic factors for LUSC patients by multivariable analysis in the training cohort; C, Only riskScore was significant prognostic factor for LUSC patients by univariable analysis in the validation cohort; D, Stage and riskScore were significant prognostic factors for LUSC patients by multivariable analysis in the validation cohort cancer. AGL was regarded as a biomarker that suppressed tumor growth in bladder cancer. 22 AGL silencing promoted bladder tumor cell growth via different mechanisms such as promoting the synthesis of glycine 22 and HAS2-mediated hyaluronic acid (HA) synthesis. 23 Furthermore, a recent study reported the function of AGL in NSCLC and suggested that the silencing of AGL enhanced NSCLC cells' growth, which was mediated by HAS2. 24 Myo-inositol oxygenase (MIOX) was one member of the family consisting of different Aldo-Keto reductases and participated in starting the myo-inositol metabolism.
Its overexpression was reported to induce ROS production. 25,26 The imbalance between the production and clearance of ROS caused oxidative stress, which ultimately led to tumorigenesis. 27 Thus, although there was no study identifying the association between MIOX and cancer susceptibility, it was plausible that MIOX contributed to tumor development. The above studies revealed that these five glycolysis-related genes are closely related to tumors, even lung cancer, supporting our gene signature's reliability for providing clues in the prognosis of patients with LUSC. Although many biomarkers have been confirmed to be strongly associated with the occurrence and progression of LUSC, no signatures composed of the glycolysis-related gene have been built yet. This study reports a gene signature for the first time based on glycolysis genes and is used for predicting the prognosis of patients with LUSC. Its predictive performance applies to patients with LUSC of various classifications and has good stability.

F I G U R E 8
Kaplan-Meier curves for the predictive value of the risk score for LUSC patients divided by each clinical feature. High-risk patients had a significantly worse survival than low-risk patients in most subgroups except for the female (p = 0.075), stage III-IV (p = 0.070), and M1 subgroups (p = 1.000) F I G U R E 9 Validation of the gene signature by immunohistochemistry between LUSC and normal samples. AGL and SLC16A3 were significantly overexpressed in tumor samples compared to normal samples (data from the Human Protein Atlas: https://www.prote inatl as.org/)

| CONCLUSION
We developed a five glycolysis-related genes signature (HKDC1, AGL, ALDH7A1, SLC16A3, and MIOX) to predict patients' prognosis with LUSC. And we used internal data to verify its feasibility. The risk score derived from this gene signature was an independent predictor of patients with LUSC. We hope this signature can be used for clinical work and develop a new target treatment.