Construction of a SUMOylation regulator‐based prognostic model in low‐grade glioma

Abstract Low‐grade glioma (LGG) is an intracranial malignant tumour that mainly originates from astrocytes and oligodendrocytes. SUMOylation is one of the post‐translational modifications but studies of SUMOylation in LGG is quite limited. Transcriptome data, single nucleotide variant (SNV) data and clinical data of LGG were derived from public databases. The differences between the expression of SUMOylation regulators in LGG and normal brain tissue were analysed. Cox regression was used to construct a prognostic model in the training cohort. Kaplan‐Meier survival curves and ROC curves were plotted in the training and the validation cohort to evaluate the effectiveness of the prognostic model. GO and KEGG analyses were applied to preliminarily analyse the biological functions. Compared with normal brain tissue, SENP1 and SENP7 were up‐regulated and SENP5 was down‐regulated in LGG. SUMOylation regulators may be involved in functions such as mRNA splicing, DNA replication, ATPase activity and spliceosome. One prognostic model was established based on the 4 SUMOylation regulator‐related signatures (RFWD3, MPHOSPH9, WRN and NUP155), which had a good predictive ability for overall survival. This study is expected to provide targets for the diagnosis and treatment of low‐grade glioma.

accompanied by activation of SUMOylation. 9 However, studies of SUMOylation in LGG were quite limited.
In this study, the expression of SUMOylation regulators in LGG was analysed by high-throughput sequencing database, and the related molecules of SUMOylation regulators were screened to construct a prognostic model for LGG. This work aimed to provide targets for the study of the pathogenesis of low-grade glioma.

| Acquisition of data
The single nucleotide variant (SNV) data of LGG was derived from the TCGA database (https://cance rgeno me.nih.gov/). One of transcriptome data of LGG was derived from the TCGA database and the corresponding clinical information of TCGA-LGG was acquired from cBioportal database (https://www.cbiop ortal.org/). The other independent transcriptome data and clinical data of LGG were acquired from CGGA database (http://www.cgga.org.cn/, including mRNAseq_325 and mR-NAseq_693). 'sva' package of R language was applied to integrate two CGGA data sets to eliminate batch effect. The intersection genes of TCGA and CGGA data sets were used for subsequent analysis. Samples with incomplete survival data were excluded from survival analysis.

| The expression of SUMOylation regulators in LGG
The R language 'maftools' package was conducted to detect SNV information. The expression of SUMOylation regulators in LGG was extracted from the TCGA data set. The gene expression differences between LGG and normal brain tissue were analysed by Wilcoxon rank-sum test using | log 2 FC | > 0.5 and P <.05 as threshold, and a heatmap of SUMOylation regulators was drawn accordingly. The expressions of SUMOylation regulators in IDH wild-type and IDH mutant-type subgroups were further analysed.

| Construction and verification of a prognostic model based on SUMOylation regulatorrelated signatures
In this study, the TCGA-LGG was used as the training group and the CGGA-LGG was used as the validation group for the construction and verification of the risk model. For the CGGA data sets, the same formula was inherited from the TCGA dataset during the validation of the risk model. First, the correlations between differentially expressed genes and differentially expressed SUMOylation regulators were calculated by Pearson's method in the training cohort. SUMOylation regulatorrelated signatures were determined by Pearson's correlation when coefficient was greater than 0.8 and P <.05. Next, univariate Cox regression was applied to select prognosis-related genes. The prognostic model was built by stepwise multivariate Cox regression.
Signatures with lowest Akaike Information Criterion (AIC) value were included to avoid over-fitting. Finally, Kaplan-Meier survival curves and ROC curves were plotted in the two independent data sets (TCGA-LGG and CGGA-LGG) to evaluate the effectiveness of the prognostic model. For short, the high-risk and low-risk groups were separated by median value of risk scores. 'survival' package of R language was used to compare the overall survival difference. Moreover, ROC curves were plotted by 'timeROC' package of R language.

| Functional enrichment analysis
To preliminarily analyse the biological functions of the molecular involvement of SUMOylation regulators, GO and KEGG enrichment analyses were performed using the R language 'clusterProfiler' package.

| Statistical analysis
All statistics and graphics were implemented based on R language (4.0.2). When P <.05, it is considered statistically different.

| SNV of the SUMOylation regulators
In this study, mutations of 15 SUMOylation regulators in LGG were detected in the TCGA database. A total of 12 (2.372%) of 506 LGG samples showed mutations in SUMOylation regulators ( Figure 1). Missense mutation, SNP and C > T were the most common forms of mutation ( Figure 1A-C). Figure 1D showed the overall distribution of mutations in each sample. Missense mutation existed in all 12 mutation samples ( Figure 1E). Among the 15 SUMOylation regulators, RANBP2, SENP6, SENP2, SENP7, SENP3, UBA2, SENP5 and PIAS4 showed mutations in SNV. Besides, mutation of RANBP2 was the most frequent ( Figure 1F). Figure 1G showed the distribution relationship between mutant genes and LGG samples in the form of a waterfall plot.

| Differentially expressed SUMOylation regulators
The clinical characteristics of LGG samples with complete survival data were shown in the Table S1. Compared with normal brain tissue, SENP1 and SENP7 were up-regulated and SENP5 was down-regulated in LGG (| log 2 FC | > 0.5 and P <.05), as shown in Figure S1. The heatmap of SUMOylation regulator expressions in LGG was shown in Figure 2. In addition, SUMOylation regulator expression distributions in IDH wildtype and IDH mutant-type subgroups were statistically analysed in this    Figure 5C). The ROC curve of the training group showed that the 1-year AUC was 0.796, the 3-year AUC was 0.710, and the 5-year AUC was 0.601. In the validation group, patients with high-risk scores also had worse survival status than patients with low-risk scores (P <.001, Figure 6A). The ROC curve of the training group indicated that the 1-year AUC was 0.688, the 3-year AUC was 0.655, and the 5-year AUC was 0.655. Moreover, a nomogram was drawn based on the prognostic model, as shown in Figure 6C.

| Independent prognostic value of the risk score
Next, we conducted univariate and multivariate Cox analysis to evaluate the independent prognostic value of risk score based on the prognostic model. The results showed that the risk score was an independent prognostic factor apart from age and radiotherapy of LGG, as shown in Figure S2.

| D ISCUSS I ON
The treatment of LGG is a comprehensive treatment based on maximum surgical resection, but progression of deterioration is still inevitable, and patients often have adverse outcomes such as multicentre recurrence and meningeal dissemination. [10][11][12] SUMOylation, as a covalent modification, occurs mainly in lysine residues 13 and plays an important role in maintaining protein stability and stress response. 14 The dynamic regulation of SUMOylation and deSUMOylation is also involved in tumour molecular regulation. [15][16][17] For example, the

F I G U R E 2 Heatmap of SUMOylation regulators between LGG and normal brain tissue
SUMOylation of P53 at K386 can prevent it from being acetylated by p300, but P53 acetylated by p300 can still be SUMOylated and reduce the DNA binding inhibited by SUMOylation. 18 In esophageal squamous cell carcinoma, HSP27 can directly interact with SUMO2/3, and SUMOylation of HSP27 enhances tumour cell proliferation, migration and invasion. 19 However, the research on the regulatory mechanism of SUMOylation in glioma, especially LGG is quite limited.
This study first evaluated the distribution of SNV of the SUMOylation regulators in LGG. The percentage of all samples with SUMOylation regulator mutation was 2.372%. Furthermore, we analysed the differential expression of SUMOylation regulators between LGG and normal brain tissue. It was found that SENP1 and SENP7 were up-regulated, and SENP5 was down-regulated. with UBE2I to bind to SUMO1. 22 However, the molecular mechanism of these molecules in LGG is rarely studied and remains to be explored.
This study still has certain limitations. For example, more basic experiments can better prove the conclusion and explore the specific molecular mechanisms of these molecules involved in SUMOylation.

This study demonstrated the expression of SUMOylation regulators in
LGG. SUMOylation regulators might be involved in biological functions such as mRNA splicing, DNA replication, ATPase activity, spliceosome. One prognostic model was established based on the 4 SUMOylation regulatorrelated signatures (RFWD3, MPHOSPH9, WRN and NUP155), which had a good predictive ability for overall survival. This study is expected to provide targets for the diagnosis and treatment of low-grade glioma.

ACK N OWLED G EM ENTS
Not applicable.

CO N FLI C T O F I NTE R E S T
Not applicable.

E TH I C A L A PPROVA L
The data came from public databases and no more ethical approval was needed.