A risk signature with four autophagy‐related genes for predicting survival of glioblastoma multiforme

Abstract Glioblastoma multiforme (GBM) is a devastating brain tumour without effective treatment. Recent studies have shown that autophagy is a promising therapeutic strategy for GBM. Therefore, it is necessary to identify novel biomarkers associated with autophagy in GBM. In this study, we downloaded autophagy‐related genes from Human Autophagy Database (HADb) and Gene Set Enrichment Analysis (GSEA) website. Least absolute shrinkage and selection operator (LASSO) regression and multivariate Cox regression analysis were performed to identify genes for constructing a risk signature. A nomogram was developed by integrating the risk signature with clinicopathological factors. Time‐dependent receiver operating characteristic (ROC) curve and calibration plot were used to evaluate the efficiency of the prognostic model. Finally, four autophagy‐related genes (DIRAS3, LGALS8, MAPK8 and STAM) were identified and were used for constructing a risk signature, which proved to be an independent risk factor for GBM patients. Furthermore, a nomogram was developed based on the risk signature and clinicopathological factors (IDH1 status, age and history of radiotherapy or chemotherapy). ROC curve and calibration plot suggested the nomogram could accurately predict 1‐, 3‐ and 5‐year survival rate of GBM patients. For function analysis, the risk signature was associated with apoptosis, necrosis, immunity, inflammation response and MAPK signalling pathway. In conclusion, the risk signature with 4 autophagy‐related genes could serve as an independent prognostic factor for GBM patients. Moreover, we developed a nomogram based on the risk signature and clinical traits which was validated to perform better for predicting 1‐, 3‐ and 5‐year survival rate of GBM.


Abstract
Glioblastoma multiforme (GBM) is a devastating brain tumour without effective treatment. Recent studies have shown that autophagy is a promising therapeutic strategy for GBM. Therefore, it is necessary to identify novel biomarkers associated with autophagy in GBM. In this study, we downloaded autophagy-related genes from Human Autophagy Database (HADb) and Gene Set Enrichment Analysis (GSEA) website. Least absolute shrinkage and selection operator (LASSO) regression and multivariate Cox regression analysis were performed to identify genes for constructing a risk signature. A nomogram was developed by integrating the risk signature with clinicopathological factors. Time-dependent receiver operating characteristic (ROC) curve and calibration plot were used to evaluate the efficiency of the prognostic model. Finally, four autophagy-related genes (DIRAS3, LGALS8, MAPK8 and STAM) were identified and were used for constructing a risk signature, which proved to be an independent risk factor for GBM patients. Furthermore, a nomogram was developed based on the risk signature and clinicopathological factors (IDH1 status, age and history of radiotherapy or chemotherapy). ROC curve and calibration plot suggested the nomogram could accurately predict 1-, 3-and 5-year survival rate of GBM patients.
For function analysis, the risk signature was associated with apoptosis, necrosis, immunity, inflammation response and MAPK signalling pathway. In conclusion, the risk signature with 4 autophagy-related genes could serve as an independent prognostic factor for GBM patients. Moreover, we developed a nomogram based on the risk signature and clinical traits which was validated to perform better for predicting 1-, 3-and 5-year survival rate of GBM.

K E Y W O R D S
autophagy, glioblastoma multiform, nomogram, prognosis, risk signature

| INTRODUC TI ON
Glioblastoma multiforme (GBM) is one of the most aggressive types in glioma with 5-year survival rate of 5%. 1 Although current treatment approaches, including maximum safe resection and adjuvant chemoradiotherapy, have been adopted, the median overall survival is only about 15 months. 2 Despite the progress of experimental technologies and therapeutic regimens in this field, such as inhibition of oncogenic signal transduction, anti-angiogenesis and immunotherapy, GBM remains incurable. 3 It is thus necessary to explore novel biomarkers or targets for GBM treatment.
Recent study has showed that autophagy is involved in tumorigenesis and development of GBM. 4 Autophagy is a cellular self-digestive process that protects cells via eliminating damaged or abandoned intracellular components under the conditions of hypoxia, oxidative stress or nutrient starvation. 5 Autophagy can govern the fate of cancer cells via initiating pro-survival or prodeath mechanisms. 6 As a first-line chemotherapeutic agent, temozolomide (TMZ) has shown benefit for prolonging the survival of GBM patients. 7 TMZ preferentially induces autophagic death in GBM cells rather than apoptosis through PI3K/AKT/mTOR signalling pathway. 8 However, further study demonstrates that persistent inhibition of PI3K/AKT/mTOR by TMZ can only induce autophagy transiently and promote drug resistance of GBM. 9,10 In the meantime, combination with autophagy inhibitors or regulators can interfere with the therapeutic effects of TMZ for GBM. 11 These researches reveal that autophagy-targeted therapy is a promising approach to potentiate the efficacy of conventional therapies in GBM.
In this study, we identified four autophagy-related genes associated with the prognosis of GBM patients from the TCGA, REMBRANDT and Gravendeel data sets. A risk signature was established based on the four genes and proved to be an independent risk factor for GBM patients. Furthermore, we developed a nomogram that integrated the risk signature with clinicopathological factors (IDH1 status, age and experience of radiotherapy or chemotherapy) and validated its better performance for predicting 1-, 3-and 5-year survival rate of GBM patients. Gravendeel data sets (microarray) were downloaded from GlioVis (http://gliov is.bioin fo.cnio.es/). 12 In the GlioVis online data set, RNA-seq data processing is based on the normalized count reads from the pre-processed data (sequence alignment and transcript abundance estimation) with log2 transformation after adding a 0.5 pseudocount. For microarray data, the "affy" package was used for robust multi-array average normalization followed by quantile normalization.

| Construction of a risk signature associated with survival of GBM patients
To screen genes for constructing risk signature, univariate Cox regression models were performed to select genes that are associated with overall survival of GBM patients in the TCGA (HG-UG133A platform), REMBRANDT and Gravendeel data sets. P < .05 was considered statistical significance. Overlapping autophagy-related genes were extracted from the three data sets and visualized in a venn diagram. Least absolute shrinkage and selection operator (LASSO) regression was used to screen out the optimal gene combination for constructing the risk signature. Multivariate Cox regression model was carried out to further identify the selected genes using "step" function in R language. The data in the TCGA database (HG-UG133A) were used as the training cohort, and data in the Gravendeel and REMBRANDT data sets were used for the validation cohorts. Subsequently, a risk signature was established based on a linear combination of the regression coefficient derived from the multivariate Cox regression model coefficients and expression level of the genes. The risk score formula was calculated as follows: Risk score = (expr gene1 × Coef gene1 ) + (expr gene2 × Coef gene2 ) + … + (expr genen × Coef genen ). 13 The GBM patients were classified into low-risk group and high-risk group according to the median value of the risk scores. The Kaplan-Meier (K-M) method and time-dependent receiver operating characteristic (ROC) curve were used to assess the efficiency of the risk signature.

| Establishment and assessment of the nomogram
For better clinical application of the risk signature, patients in the TCGA data set (HG-UG133A) with detailed information about age, IDH status and experience of radiotherapy or chemotherapy were included. Univariate and multivariate Cox regression analyses were performed to determine the association between these factors (risk score, age, IDH status and history of radiotherapy or chemotherapy) and patients' overall survival. The included patients were divided into training cohort (60%) and validation cohort (40%) randomly using R package "caret." The training cohort was used to establish a nomogram for predicting 1-, 3-and 5-year survival rate of GBM patients via R programming language. The validation cohort was used for internal validation. ROC curve and calibration plot were carried out to evaluate the efficiency of the nomogram.

| Functional enrichment analysis
Gene set enrichment analysis (GSEA, https ://www.broad insti tute. org/gsea/index.jsp) was performed to identify the autophagyrelated gene sets between LGG and GBM. 14 Normalized enrichment score (NES) and false discovery rate (FDR) were applied to determine the statistical differences. The gene set variation analysis (GSVA) was used to explore biological processes and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways associated with the risk signature. 15 Gene sets with differences between high-risk group and low-risk group in TCGA data set (GBM HG-UG133A) were selected using the R package "limma," and adjust P value < .05 was considered statistically significant. Several representative gene sets were presented in heatmaps. To confirm the KEGG pathways associated with the signature, R package "clus-terProfiler" was performed on the differentially expressed genes (DEGs) between low-risk group and high-risk group which were selected via "limma" package in R with adjust P value < .05 and |log2(fold change)| > 0.5. 16 The KEGG pathway map was presented by "pathview" package.

| Statistical analyses
All the statistical analyses including principal component analysis (PCA), univariate and multivariate Cox regression models, LASSO regression, ROC curve analysis and K-M survival analyses were performed using Rstudio (version 3.5.2). Quantitative data were exhibited as the mean ± standard deviation (SD). Statistical differences were compared by Wilcoxon test between two groups and Kruskal-Wallis H for multigroup comparison. P < .05 was considered statistically significant. The venn, heatmaps, boxplots, pie charts, forest plots and calibration plots were drawn using R language.

| Four autophagy-related genes were screened out for constructing a risk signature
A total of 531 autophagy-related genes were integrated from HADb database and the GO_AUTOPHAGY gene set in GSEA website (Table S1). PCA based on these autophagy-related genes confirmed the distribution difference between low-grade glioma (LGG) and glioblastoma multiforme (GBM) in the TCGA (GBMLGG RNA-seq) data set. As shown in Figure 1A, GBM samples were located on the left side, and LGG samples were on the other side. To identify autophagy-related biological processes between LGG and GBM, GSEA was performed and the results showed that autophagy-related genes were highly enriched in GBM ( Figure 1B), suggesting that autophagy played an essential role in GBM. Our study just focused on GBM based on these results of PCA and GSEA. Ninetyone genes in the TCGA HG-UG133A, 73 genes in the REMBRANDT and 129 genes in the Gravendeel data set were found to be correlated with GBM survival using univariate Cox regression analysis (Table S2, P < .05). Sixteen overlapping genes in the three databases were screened out and visualized in a venn diagram ( Figure 1C).
LASSO regression analysis was performed on the overlapping genes so as to avoid overfitting problems in risk signature, and 7 genes (CTSB, DIRAS3, HK2, LGALS8, MAPK8, PPP1R15A and STAM) were retained according to the optimal lambda value ( Figure 1D LGALS8 had poor prognosis in GBM (Figure S1C,D, P < .05). The patients were divided into high-risk and low-risk groups according to the median cut-off value of the scores. To explore the difference between low-risk and high-risk groups, PCA was implemented based on genome expression data and the results demonstrated the distribution difference between the two groups ( Figure 2A). In addition, patients in the high-risk group had significantly worse overall survival than those in the low-risk group  Figure 2F). In the meantime, the number of alive patients reduced ( Figure 2F).

| Association between the risk signature and clinical characteristics
To explore the association between the risk signature and clinical characteristics, we firstly developed a heatmap to present the distribution trends of age, gender, molecular subtypes, MGMT promoter status and IDH1 status between low-risk and high-risk groups in the TCGA database. As shown in Figure 4A, high-risk group inclined to contain more elder patients, whereas samples with IDH1 mutant were all in low-risk group. To be more intuitive, pie charts were used to display the proportion of each factor (age, gender, molecular subtypes, MGMT promoter status and IDH1 status) in low-risk and high-risk groups. Similar results were shown in Figure S2A, elder patients and samples with mesenchymal subtype or classical subtype accounted for a large proportion in the highrisk group. And there were no significant differences between lowrisk and high-risk groups for gender and MGMT promoter status.
Subsequently, we compared the risk scores in different cohorts stratified by molecular subtypes, age, IDH status, MGMT promoter status and gender separately. The risk scores in the mesenchymal subtype were obviously higher than those in neural and proneural subtypes ( Figure 4B, P < .0001). For IDH status, the risk scores decreased in patients with IDH1 mutant type compared with IDH1 wild-type ( Figure 4C, P < .0001). The risk scores of patients with MGMT promoter methylation were lower than MGMT promoter unmethylation ( Figure 4D, P = .033). Patients above 60 years old inclined to have higher risk scores compared with those in the younger age group ( Figure 4E, P = .002). However, there was no difference in risk scores between male and female ( Figure 4F, P = .468).
Afterwards, we also explored the prognostic value of the risk signature in different cohorts stratified by molecular subtypes, MGMT promoter status, IDH1 status and history of radiotherapy or chemotherapy. In the four different molecular subtypes, higher risk scores indicated poor prognosis in the mesenchymal ( Figure 5A, P < .001) and proneural ( Figure 5B, P < .0001) subtypes, but there were no prognostic differences between low-risk group and high-risk group in the classical ( Figure S3A, P = .9447) and neural ( Figure S3B, P = .5887) subtypes. The patients in high-risk groups had adverse outcomes in the IDH1 wild-type group ( Figure 5C, P < .01), MGMT promoter methylation group ( Figure 5D, P < .0001) and MGMT promoter unmethylation group ( Figure 5E, P < .001).
Furthermore, higher risk scores were also found to be clinically associated with poor prognosis in patients receiving radiotherapy ( Figure 5F, P < .0001) or chemotherapy ( Figure 5G, P < .0001).

| Construction of a nomogram for predicting 1-, 3-and 5-year survival rate of GBM
In order to better apply the risk signature, we collected 401 GBM patients in the TCGA HG-UG133A platform with detailed clinical information including IDH1 status, age and history of radiotherapy or chemotherapy. The 401 samples were randomly divided into a training cohort (n = 241) and a validation cohort (n = 160) ( Table 1).
The training cohort was used for constructing a nomogram to predict survival of GBM, and the validation cohort was used for further assessing the efficiency of the nomogram. Firstly, we performed univariate and multivariate Cox regression analyses in the training cohort, indicating that the risk signature was an independent risk factor for GBM patients ( Figure 6A,B, P < .05). Subsequently, a nomogram integrating the five factors was constructed for predict-

| Functional analysis of the risk signature
GSVA was used to explore biological processes and KEGG pathways associated with the risk signature. As shown in Figure 7A, several biological processes relevant to apoptosis, necrosis, cell adhesion, immune and inflammatory response were enriched in the high-risk group. These biological processes such as apoptosis, necrosis, immune and inflammatory response were closely related to autophagy. 19,20 For KEGG pathway, high-risk group was positively correlated with focal adhesion, MAPK signalling pathway, Toll-like receptor signalling pathway, apoptosis, ECM receptor interaction and so on ( Figure 7B). To confirm the KEGG pathways associated with F I G U R E 3 Evaluating the efficiencies of the risk signature in the REMBRANDT and Gravendeel data sets. A, B, Kaplan-Meier survival curves showed the prognostic value of the risk signature in Gravendeel data set (A. low-risk group, n = 78; high-risk group, n = 77; P < .01) and REMBRANDT database (B. low-risk group, n = 91; high-risk group, n = 90; P < .01). C, D, ROC curves evaluated the efficiency of the risk signature for predicting 1-, 3-and 5-y survival in Gravendeel data set (C) and REMBRANDT database (D). E, F, The four genes expression profiles, the risk scores distribution and patients' survival status in the Gravendeel data set (E) and REMBRANDT database (F)

F I G U R E 4
Associations between the signature risk scores and clinical features. A, The heatmap showed the associations between the risk signature and the clinical characteristics (age, gender, molecular subtypes, MGMT promoter status, IDH1 status and survival status) in the TCGA database. B-F, Distribution of the risk scores in different cohorts stratified by the molecular subtype (B, classical, n = 144; mesenchymal, n = 156; neural, n = 88; proneural, n = 137; P < .0001), IDH1 status (C, mutant, n = 30; wild-type, n = 372; P < .0001), MGMT promoter status (D, methylated, n = 170; unmethylated, n = 176; P = .033), age (E, age > 60, n = 248; age ≤ 60, n = 271; P = .002) and gender (F, female, n = 203; male, n = 314; P = .469) the risk signature, we screened out DEGs between low-risk group and high-risk group, and 558 genes were obtained (Table S3). R package "clusterProfiler" was performed on the DEGs. Consistent with the result in GSVA, KEGG pathways including focal adhesion, MAPK signalling pathway and ECM receptor interaction were also obtained ( Figure 7C). Considering the close relationship between MAPK signalling pathway and autophagy, MAPK signalling pathway was downloaded from the KEGG database and marked according to DEGs such as HSP27, FAS and CD14 ( Figure 7D). In this pathway map, a majority of red-labelled genes were involved in triggering MAPK signalling pathway that can induce cell proliferation, differentiation, inflammation, cell cycle and apoptosis. In brief, these results revealed that the risk signature was correlated with apoptosis, necrosis, immunity and inflammation response and MAPK signalling pathway.

| D ISCUSS I ON
GBM is a refractory central nervous system tumour without effective treatment strategy. Emerging evidence has indicated that autophagy plays essential roles in GBM and serves as a therapeutic target. The level of autophagy in GBM is declined, 21 and TMZ can promote autophagy in malignant glioma cells rather than apoptosis. 22,23 Furthermore, combination of TMZ with bafilomycin A1, an autophagy inhibitor, enhances the chemotherapy effect of TMZ for malignant gliomas. 22 However, another study shows that TGF-β2 induces autophagy to promote invasion of glioma. 24 Despite the controversial roles of autophagy in GBM, these investigations suggest autophagy is a promising target for GBM treatment and needs further explorations.
In our study, we identified four autophagy-related genes (DIRAS3, LGALS8, MAPK8 and STAM) which were associated with GBM survival. DIRAS family GTPase 3 (DIRAS3), also known as ARHI, is a member of RAS superfamily and locates on human chromosome 1p31. Its encoding protein shares 60% homology with RAS or Rap, but unlike other members, it has a long N-terminal extension, low GTPase activity and constitutive GTP binding state in resting cells. 25 DIRAS3 is involved in the occurrence and progression of a variety of cancers and considered as an anti-oncogene. For example, DIRAS3 inhibits gastric cancer and epithelial ovarian cancer cells proliferation and promotes apoptosis by inducing autophagy. 26,27 Up-regulating the expression of DIRAS3 can enhance the chemosensitivity in both breast cancer and ovarian cancer. 28,29 However, the functional roles of DIRAS3 in glioma were controversial. It has been reported that DIRAS3 is down-regulated in glioma samples, and DIRAS3 overexpression inhibits the proliferation, migration and invasion of glioma cells. 30 Instead, another study has shown that DIRAS3 is overexpressed in glioma and is positively associated with adverse outcome of glioma patients. In the meantime, overexpression of DIRAS3 promotes glioma cell proliferation and invasion via EGFR-AKT signalling pathway. 31 Our study was identical to the opinion that DIRAS3 was correlated with poor prognosis of GBM and was a risk factor for GBM survival. As a member of lectin family, LGALS8 (galectin-8) plays key roles in various cellular processes, such as autophagy, cytoskeletal rearrangement, immunity and inflammation, [32][33][34][35] as well as tumour progression. 36 LGALS8 can recognize lysosome damage and promote autophagy through inhibiting mTOR activity. 37 Remarkably, LGALS8 may serve as prognostic biomarkers in cancers. The expression level of LGALS8 is associated with recurrence of gastric cancer, urothelial carcinoma of the bladder and clear cell renal cell carcinoma. [38][39][40] In glioma, LGALS8 enhances the capacities of proliferation and migration and inhibits apoptosis of GBM cells. 41    with the risk signature. C, The risk signature-related KEGG signalling pathway based on DEGs between low-risk and high-risk groups using R package "clusterProfiler". D, MAPK signalling pathway map downloaded from the KEGG database was marked according to DEGs can trigger apoptosis, promote necroptosis, involve in non-canonical pyroptotic signalling and induce autophagy by different molecular mechanisms. 46 It has been reported that JNK1 affects chemosensitivity of colon cancer cells by regulating autophagy. 47 In glioma cells, JNK1/c-JUN signalling pathway enhances the transcription of CHAC1 which induces cell death and serves as a target of TMZ. 48 Conformed to a DNA microarray data, 49 our study also suggested that MAPK8 was a protective factor for GBM patients' survival. The fourth gene in our study is signal transducing adap-  [50][51][52] STAM is reported to be overexpressed in locally advanced cervical cancer. 53 And another report shows that STAM may be responsible for regulating proliferation of gastric cancer. 54 The research on STAM in GBM has not been reported yet, and it needs to be further explored.
Previously, there was a report about an autophagy-related gene signature in glioma. 55 According to the PCA and GSEA results that autophagy-related genes were differentially distributed between GBM and LGG and preferred to enrich in GBM, we purposefully developed a prognostic signature with the four autophagy-related genes in GBM. The results showing in ROC curves that the AUCs of the risk signature for 1-, 3-and 5-year survival prediction were larger than those of IDH1, MGMT promoter and CIMP status indicated the risk signature was more accurate for predicting the survival of GBM patients. In molecular function, we found the risk signature was correlated with apoptosis, necrosis, immunity and inflammatory response that were closely related to autophagy. Correspondingly, KEGG pathway analysis suggested that the risk signature was involved in MAPK signalling pathway. MAPK signalling pathway is essential for cell proliferation, differentiation and programmed cell death and regulates the balance of apoptosis and autophagy. 56,57 These results indicated the risk signature might serve as a therapeutic target for GBM. Another innovation in our study was that a nomogram that combined the risk signature and clinicopathological factors (IDH1 status, age and experience of radiotherapy or chemotherapy) was established for predicting 1-, 3-and 5-year survival rate of GBM patients. ROC curves and calibration plots validated an efficient performance of the nomogram. However, there were some deficiencies in our study. Firstly, the data came from several databases with limited sample size. Secondly, the prognostic factors in the nomogram did not contain the extent of GBM surgical resection which is a cardinal factor associated with GBM prognosis. 58 Therefore, more samples with detailed clinical information will be collected to make up for these shortcomings and assess the efficiency of the nomogram in future study.
In summary, our study identified four autophagy-related genes that were associated with GBM survival. Based on the four genes, a risk signature was established and proved to be an independent prognostic factor for GBM patients. On molecular function, the risk signature was found to be correlated with apoptosis, necrosis, immunity and inflammation response and MAPK signalling pathway.
Furthermore, we developed a nomogram integrating the risk signature with several clinicopathological factors (IDH1 status, age and experience of radiotherapy or chemotherapy), which was validated to perform better for predicting 1-, 3-and 5-year survival.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

AUTH O R CO NTR I B UTI O N S
WY, LX and GG conceived the concept and implemented the scheme. WY, ZM and ZW wrote the original draft. ZM and XZ reviewed and edited the manuscript. All authors read and approved the final manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available from the additional supporting information.