A novel DNA damage and repair‐related gene signature to improve predictive capacity of overall survival for patients with gliomas

Abstract Gliomas, as the most lethal and malignant brain tumours in adults, remain a major challenge worldwide. DNA damage and repair‐related genes (DDRRGs) appear to play a significant role in gliomas, but the studies of DDRRGs are still insufficient. Herein, we systematically explored and analysed 1547 DDRRGs in 938 glioma samples from TCGA and CGGA datasets. Using least absolute shrinkage and selection operator (LASSO) Cox regression analysis, we identified a 16‐DDRRG signature, characterized by high‐risk and low‐risk patterns. This risk model harbours robust predictive capability for overall survival of glioma patients. We found the high‐risk score is strongly associated with well‐known malignant features of gliomas, such as the mesenchymal subtype, IDH‐wildtype, 1p/19q non‐codeletion and MGMT promoter unmethylated status. In addition, we found that the high‐risk score is also linked with multiple oncogenic pathways and therapeutic resistance. Significantly, we found the high‐risk group has higher enrichment of immunosuppressive cells (M2‐type macrophages, Tregs and MDSCs) and immune inhibition biomarkers (PD‐1, PD‐L1 and CTLA‐4). Lastly, we proved that SMC4, which has the highest positive regression coefficient in our risk model, is strongly linked with malignant progression and TMZ resistance of gliomas in a E2F1‐dependent manner.

overall survival (OS), from 1 year to 15 years, 6 while the average survival time of GBM is less than 20 months after diagnosis. 5 In 2016, the World Health Organization (WHO) classification of gliomas used molecular parameters including isocitrate dehydrogenase 1 and 2 (IDH1 and IDH2) mutations and 1p/19q codeletion status in addition to histopathological criteria. 7 In addition, a study indicated that methylation status of the O 6 -methylguanine-DNA methyltransferase (MGMT) promoter is a molecular biomarker for chemotherapy. 8 In 2010, GBM was divided into proneural, neural, classical and mesenchymal subtypes, 9 but recent study has not classified neural as major subtype due to lack of tumour-intrinsic patterns. 10 However, existing molecular classification has not led to improvement of outcomes for glioma patients. 11 Therefore, more comprehensive studies are urgently needed to provide predictive models for gliomas.
DNA damage response (DDR) is the pathway that cells recognize and repair DNA damage, which is required to maintain the genomic integrity and stability. 12 Increasing studies found that DNA damage induced by chemicals and physical agents can promote carcinogenic mutations that led to human cancers. 13 Meanwhile, accumulating evidence showed deficiency in DDR facilitates progression of multiple cancers, such as brain metastases of colorectal cancer 14 and pancreatic cancer. 15 Interestingly, studies also showed that the overactivated DDR can induce the therapy resistance of glioma stem cells, and targeting DDR pathway can overcome this resistance. 16 Consistently, a report suggested that Rad51, a DNA double-strand break repair gene, accounts for resistance to DNAdamaging reagents such as chemotherapy. 17 In addition, a study showed that purine metabolites could induce radioresistance by enhancing the repair of DNA double-stranded breaks. 18 Another study also reported that p53, the key molecule of DDR, can regulate M2 polarization of microglia to remodel immunosuppressive microenvironment of gliomas. 19 Thus, the DDR is a potent candidate for prognosis prediction in patients with gliomas. Previous studies have identified a variety of DNA damage and repair-related genes that are involved in the DDR process, providing additional potential choices for therapeutic strategies. 20 Thus, our studies focused on the role of DDRRGs in gliomas.
In this study, we explored DDRRGs features from the Cancer Genome Atlas (TCGA) and the Chinese Glioma Genome Atlas (CGGA) datasets. The results indicated that patients with gliomas can be divided into 2 clusters with distinct gene features and clinical outcome based on consensus clustering. Next, the DDRRG signature was established using least absolute shrinkage and selection operator (LASSO) Cox regression analysis. We found that the DDRRG signature, composed of FBXO18, MMS19, SMC4, HEXB, UBQLN4, VAV3, E2F7, EFNB1, WEE1, SAA1, SHISA5, WAC, PSMC2, PTGFRN, EIF3L and HMGA2, can independently predict the outcome of glioma patients. Moreover, we found that this risk signature is strongly linked with multiple oncogenic pathways, immunosuppressive tumour microenvironment and therapeutic response. In addition to bioinformatics analyses, we functionally confirmed the oncogenic role of SMC4 in gliomas in vivo and in vitro.

| Included patients and datasets
In total, 938 glioma samples from two cohorts have been involved in this study. The data of mRNA expression was downloaded from TCGA RNA sequencing (RNA-seq) dataset 21 and CGGA RNA-seq dataset. 22,23 Detailed information was provided in the Data S1. The corresponding clinical information was obtained from TCGA dataset (https://portal. gdc.cancer.gov/) and CGGA dataset (http://www.cgga.org.cn), respectively. The clinicopathological features for 938 patients were shown in Tables 1 and S1. For TCGA dataset, appropriate consents were obtained from relevant institutional review boards, which coordinated the consent process at each tissue-source site. For the CGGA dataset, written informed consents were obtained from all patients. 22 ). (Exp represents the expression level of each selected gene). (Table S2). Based on this formula, the risk score (RS) for each sample was calculated in TCGA and CGGA datasets, and the median value was manually defined as the threshold for high-risk

| Bioinformatics analyses and experiments in vitro and in vivo
The details for bioinformatics analyses and experiments in vitro and in vivo in this study were described in the Data S1.

| Statistics
The statistical analyses were performed by using the R software (ver- analyses). The role of each analysis in this study was provided in the Data S1. p < 0.05 was considered as statistical significance.

| Using the DDRRGs for the consensus clustering of gliomas
To explore the potential oncogenic role of DDRRGs in gliomas, consensus clustering method was applied to analyse in TCGA and CGGA datasets. Cumulative distribution function (CDF) was carried out to determine the optimum cluster number. The results showed that clustering outcoming is stable when k = 2 (Figures S1A and S2A-C).
Heatmaps showed distinct DDRRGs distribution between cluster 1 and cluster 2 (Figures S1B and S2D,  Figures 2B and S3B). Further analysis in TCGA dataset suggested that the risk score is positively correlated with WHO grade of gliomas ( Figure 2C). In addition, patients with IDH-wildtype status, 1p/19q non-codeletion, non-methylation of MGMT promoter or higher age had elevated risk score (Figures 2D-G). However, we observed no statistical difference between male and female groups in TCGA dataset ( Figure 2H). Additionally, the results suggested that the classical and mesenchymal subtypes have highest risk score ( Figure 2I). ROC analysis was performed to evaluate the predictive capability of risk model for the mesenchymal subtype.
The results revealed that the area under curve (AUC) is 87.9% in predicting the mesenchymal subtype in TCGA dataset ( Figure 2J).
Next, we assessed whether the risk signature matches the identified cluster. The results indicated that glioma patients in cluster1 have higher risk score ( Figure 2K). Consistently, the AUC in predicting the cluster was 99.6% in TCGA dataset ( Figure 2L). Same analyses were performed in the CGGA dataset, and we obtained similar results ( Figure S3C-J). These results imply that the high-risk score may be linked with aggressive progression in gliomas.

| The prognostic value of the DDRRG signature
To comprehensively understand the prognosis prediction ability of the DDRRG signature, we evaluated the association between expression level of DDRRGs, risk score and patients' prognosis. The multivariate Cox regression analyses confirmed that the risk score is an independent prognosis factor for gliomas (Tables 2 and S4).   (Figures S9A,B).
Given the above findings that glycolysis and cholesterol homeostasis are enriched in high-risk group, we further explored the association of our signature DDRRGs in regulation of metabolic alternations based on 114 metabolic pathways obtained from the previous study. 30 GSVA was performed to compute the meta-score of each patient for metabolic pathways. The heatmaps showed 10 glioma-related metabolic pathways that are enriched in the high-risk group ( Figure S9C,E). In addition, we observed that several pathways that were reported to be tightly linked with malignant behaviour for gliomas have elevated enrichment in the high-risk group, such as pyrimidine metabolism 31 and purine metabolism 18 ( Figure S9D,F).
These above results suggest that our risk signature is tightly associated with malignant progression of gliomas.

| High-risk score is closely linked with immunosuppressive microenvironment of gliomas
Given the promising role of immune therapy in glioma treatment 32 and the significant immune alternation difference between the risk groups, we carried out further investigation on this aspect. We firstly Interestingly, previous evidence has proved that M2-type macrophages can promote immunosuppressive microenvironment and progression of gliomas. 33 In addition, we found that T cell regulatory (Tregs) have higher expression in the high-risk group ( Figure 6A,B).    S10F). These analyses indicate that high-risk score is strongly linked with immunosuppressive role for gliomas.

| High-risk score is potentially associated with therapy resistance in gliomas
Based on aforementioned results, we next explored the association between the risk model and therapeutic response in gliomas using clinical information from the CGGA dataset. K-M survival analysis was performed to compare prognosis of glioma patients treated with or without irradiation/TMZ. The results showed no statistical difference between untreated and treated groups for irradiation therapy in the high-risk group ( Figure 7A), while we observed that untreated group suffered worse outcome for irradiation therapy in the lowrisk group ( Figure 7B), suggesting that high-risk score was positively linked with radioresistance. However, there was no statistical difference between untreated and treated group for TMZ therapy in the high-risk or low-risk groups ( Figure 7C,D).
The above results showed that high-risk score is likely linked with immunosuppressive tumour microenvironment. Thus, in addition to conventionally therapy, we next decided to explore the relationship between risk score and immunotherapy responsiveness using online TIDE database. The results revealed that the high-risk score was indicative for a resistant phenotype to ICB therapy in gliomas ( Figure 7E,F). In addition, we used subclass mapping algorithm to further validate the findings in TIDE. The expression data of 28 patients that underwent anti-PD-1 therapy was obtained from previous study. 35 The results showed that glioma samples with high-risk score are more potentially resistant to ICB therapy ( Figure 7G,H).
Accumulating evidence indicated that mutational load is closely linked with immunotherapy. 35,36 Interestingly, previous study showed that DNA damage was strongly linked with carcinogenic mutations. 37 Therefore, we evaluated the variation of somatic mutation between high-and low-risk groups in our risk model. The results revealed that high-risk group is marked by malignant biomarkers, such as TP53 mutation (38%), EGFR mutation (20%) and PTEN mutation (20%) 38 ( Figure S11A). In addition, we observed that IDH1 mutation (92%) indicating better OS of patients 38 was the feature of low-risk group ( Figure S11B). Subsequently, TMB of each patient was calculated and the results indicated that high-risk sore has significantly higher TMB ( Figure S11C). Above results indicated that high-risk score is a phenotype of therapy resistance.

| Functional verification of carcinogenic effect of SMC4 in gliomas
In the DDRRG model, we found that SMC4 had the highest coefficient among the identified biomarkers. To further explore its oncogenic role in gliomas, shRNAs targeting SMC4 (shSMC4 #1 and shSMC4 #2) were was used as control. The qRT-PCR analysis was carried out to assess the silencing efficacy of the shSMC4 lentivirus transfection. The results indicated that SMC4 mRNA expression was significantly reduced after transfection and shSMC4 #2 had relatively higher silencing efficacy compared with shSMC4 #1 (Figure 8A,B). Next, cell viability assays were performed to explore the role of SMC4 knockdown on tumour proliferation of U87 and U373 cells. The results demonstrated that the proliferative capacity of tumour cells was significantly attenuated after SMC4 knockdown ( Figure 8C,D). To comprehensively understand the aggressive roles of SMC4 on gliomas, shSMC4-and shNT-bearing U87 glioma cells were intracranially injected into the F I G U R E 7 High-risk score is potentially associated with therapy resistance in gliomas. (A, B) K-M curves showing OS for glioma patients in the high-risk group or low-risk group that treated with or without irradiation. (C, D) K-M curves showing OS for glioma patients in the high-risk group or low-risk group that treated with or without TMZ. (E, F) Stacked column charts comparing the response of gliomas to ICB therapy between high-and low-risk groups in TCGA and CGGA datasets using TIDE dataset. (G, H) The response of gliomas to ICB therapy between high-and low-risk groups in TCGA and CGGA datasets using expression data of GSE78220 brains of SCID mice. The results revealed that the SCID mice intracranially injected with shSMC4-bearing U87 glioma cells had much better OS in comparison with those injected with shNT-transfected cells ( Figure 8E). Bioluminescent imaging further showed that SMC4 knockdown attenuated the tumorigenicity and progression of glioma cells ( Figure 8F). Next, shNT-and shSMC4-transfected U87 or U373 glioma cells were treated with TMZ. Cell viability assays showed that SMC4 silencing apparently improve TMZ's capability to kill glioma cells in both U87 and U373 cell lines ( Figure 8G,H). In addition, to validate these results, we performed in vitro cell viability assay to detect proliferative ability of U87 tumour cells at multiple TMZ concentrations. The results suggested that silencing of SMC4 can attenuate TMZ resistance ( Figure S12A). Currently, the regulation mechanism of SMC4 in gliomas is still unclear. Thus, we sought to further investigate this point. Glioma

| DISCUSS ION
Multiple studies have already demonstrated the critical links between molecular subtypes and clinical prognosis for patients with gliomas. 7-10 Nevertheless, molecular subtypes of gliomas have not significantly improved patients' OS. 11 Accumulating evidence showed that disorder of DDR plays significant role in glioma progression. 39 However, analyses focusing on DDRRGs in gliomas are still insufficient. Thus, in this study, we aimed to construct a DDRRG signature and explore the possibility of clinical application for our signature.  40 Another study showed that SNHG12 can induce increased expression of E2F7 to promote TMZ resistance by G1/S cell cycle transition. 41 In addition, study found that targeting G2 checkpoint kinase WEE1 can attenuate TMZ resistance in gliomas. 42 Elevated expression of SAA1 was found to be strongly linked with TMZ resistance in a AKT dependent manner. 43 However, more in-depth investigations are warranted to explore the associations between DDRRGs and TMZ-mediated therapeutic efficacy, as well as the detailed underlying mechanisms.
SMC4, which is the member of SMC gene family, plays the vital role in chromosome assembly and segregation. 44 In addition, the complex containing SMC4, named condensin, is essential for this role. 45 A study showed that condensin I, containing SMC4 and SMC2, is recruited to interact with base excision repair (BER) factors (PARP-1-XRCC1 complex), at damage sites to play in role in DNA singlestrand break (SSB) repair. 46 Indeed, we performed GSEA analyses and found that BER activity is enriched in high-SMC4 expression group.
In addition, a study showed that TMZ can generate a series of DNA lesions, including O6-methylguanine (Ome6G), N3-methyladenine and N7-methylguanine. 47 However, N3-methyladenine and N7methylguanine lesions are rapidly repair by BER pathway. 48  itself could play an inhibitory role by triggering inhibitory signals. 55 In accordance with previous findings, we found that high-risk score is negatively associated with activated CD8 + T cells. These results show that high-risk group is strongly linked with immunosuppressive microenvironment caused by interaction of multiple immunosuppressive factors. Based on this point, we analysed the differences of therapeutic response between high-risk and low-risk groups. The results showed that high-risk score is resistant to ICB therapy. Accumulating evidences have illustrated that higher mutational load is linked with satisfactory objective response to immunotherapy in non-small cell lung cancer (NSCLS) 36 and metastatic melanoma. 35 However, recent study showed that clinical response to anti-PD-1 immunotherapy in GBM is linked with lower TMB. 32 In addition, recent evidence found that low mutation burden is linked with response to immunotherapy in recurrent GBM. 56 Moreover, a study reported that PTEN mutation is associated with immunotherapy resistance in gliomas. 32,57 In our study, we found that the high-risk group had higher TMB and more frequent PTEN mutation. Meanwhile, we observed that high-risk score is a resistant phenotype for immunotherapy in gliomas via TIDE database and GenePattern database. This consistency between our results and previous findings further confirms the value of applying our model in the clinical.
We analysed the DDRRG signature from multiple aspects and observed more satisfactory value in comparison with previous glioma signature. [58][59][60] Nevertheless, we must point out certain shortcomings and limitations in our study. Firstly, the main source of this study was downloaded from available public databases. Although software (equal); supervision (lead); validation (lead).

CO N FLI C T O F I NTE R E S T
The authors confirm that there are no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data used in this study can be downloaded from TCGA dataset (https://xenab rowser.net/datap ages/) and the CGGA dataset (http://www.cgga.org.cn).