Protein disulphide isomerase can predict the clinical prognostic value and contribute to malignant progression in gliomas

Abstract Increasing evidence from structural and functional studies has indicated that protein disulphide isomerase (PDI) has a critical role in the proliferation, survival and metastasis of several types of cancer. However, the molecular mechanisms through which PDI contributes to glioma remain unclear. Here, we aimed to investigate whether the differential expression of 17 PDI family members was closely related to the different clinicopathological features in gliomas from The Cancer Genome Atlas (TCGA) and Chinese Glioma Genome Atlas data sets. Additionally, four subgroups of gliomas (cluster 1/2/3/4) were identified based on consensus clustering of the PDI gene family. These findings not only demonstrated that a poorer prognosis, higher WHO grade, lower frequency of isocitrate dehydrogenase mutation and higher 1p/19q non‐codeletion status were significantly correlated with cluster 4 compared with the other clusters, but also indicated that the malignant progression of glioma was closely correlated with the expression of PDI family members. Moreover, we also constructed an independent prognostic marker that can predict the clinicopathological features of gliomas. Overall, the results indicated that PDI family members may serve as possible diagnostic markers in gliomas.

shown that PDI is strongly expressed in invasive glioma cells, in both human xenografts and invasive glioma fronts. 12 Inhibition of PDIA1 activity was shown to increase the apoptosis of melanoma cells and suppress tumour growth in human ovarian cancer mouse xenografts. 13,14 Moreover, down-regulation of PDIA6 was reported to inhibit the proliferation and invasion of bladder cancer cells. 15 Additionally, knockdown of PDIA4 was indicated to contribute to a reduction in hepatic tumorigenesis and increase the survival of mice with spontaneous hepatoma, 16 while PDI was also reported to be closely correlated with the resistance of HeLa cells to the chemotherapeutic agent plitidepsin. 17 This suggests that the expression levels of PDI family members in cancer can potentially be used as prognostic markers in glioma. However, no comprehensive investigation has been undertaken on the function and prognostic value of PDIs in the malignant glioma progression.

| Data sets for glioma samples
The RNA-seq data and corresponding clinicopathological information of 689 and 508 glioma samples were downloaded from TCGA (http:// cance rgeno me.nih.gov/) and CGGA (http://www.cgga.org.cn/), respectively. The somatic mutation profiles were downloaded from the Genomic Data Commons data portal (https://portal.gdc.cancer.gov/) and analysed by the R package 'maftools'. All of the RNA-seq data were normalized by log 2(n + 1) transformation. The clinicopathological information for the CGGA and TCGA data sets is summarized in Table S1.

| Selection of PDI family members
A total of seventeen PDI members were obtained by combining the two selection methods, namely the selection of 21 PDI family members based on the published literature 1,18 and the RNA expression data available in TCGA and CGGA data sets.

| Oncomine and human protein atlas databases
The ONCOMINE database (www.oncom ine.org) is an integrated online cancer microarray database based on DNA or RNA sequence analysis, and a convenient tool for the analysis of genome-wide gene expression. According to the ONCOMINE database, the mRNA expression levels of 17 PDI family members in different cancer samples were compared to their corresponding normal control samples.
Differences in expression levels were analysed by Student's t test.

| Bioinformatic analysis
To investigate the function of PDI family members in glioma, the R package 'limma' was used to correlate the expression of the 17 PDIs with the different clinicopathological features of glioma patients.
The correlation between PDI family members was analysed using Spearman's correlation test. Next, 689 glioma samples in TCGA data were grouped using the R package 'ConsensusClusterPlus', and principal component analysis (PCA) was used to verify the grouping results.
The Kaplan-Meier method was used for survival analysis. In addition, GO and KEGG analyses were conducted for genes with differential expression between the C1 (cluster 1/2/3) and C2 (cluster 4) clusters using the Database for Annotation, Visualization, and Integrated Discovery (DAVID, http://david.abcc.ncifc rf.gov/home.jsp). Gene set enrichment analysis (GSEA) was also performed to validate the functions of genes differentially expressed between C1 and C2.
To determine the prognostic value of PDI family members, univariate Cox regression analysis of their expression in TCGA data set was performed. From this, we identified nine genes significantly associated with survival (P < .05), and these genes were selected for further functional analysis and development of a potential risk signature using the LASSO Cox regression algorithm. Subsequently, the nine genes and their coefficients were selected by the optimal value for penalization coefficient lambda, which was analysed by the smallest 10-fold cross-validation within the training (TCGA) data set. The measure of survival risk was calculated by the risk score for each patient in both the training (TCGA) and validation (CGGA) data sets using the following formula: Risk score = expression of gene 1 × β1 + expression of gene 2 × β2 + … expression of gene n × βn.
The median cut-off value for the prognostic risk score was utilized to categorize patients into low-and high-risk subgroups. A time-dependent receiver operating characteristic (ROC) curve analysis within 3 and 5 years was used to evaluate the predictive accuracy and sensitivity of our prognostic model. Tumour mutation burden (TMB) was calculated using Perl scripts (https://www.perl.org/), and the algorithm to calculate the TMB was the total number of somatic mutations (including non-synonymous point mutations, insertions and deletions in the coding region)/size of the target region, in units of mutations/Mb.

| Statistical analysis
One-way ANOVA was used to compare the expression levels of PDI family members and TMBs in gliomas of different WHO grades. The expression levels of PDI family members according to isocitrate dehydrogenase (IDH) status and 1p/19q codeletion status in gliomas were analysed by t tests, which were also used to compare the association between TMB and risk score in somatic mutation profiles of gliomas. Chi-squared tests were used to compare the distribution of gender, WHO grade, TCGA subtype, IDH status and 1p/19q codeletion status between the low-and high-risk groups using the median risk score (derived from the risk signature) as the cut-off value. The prognostic value of the risk score and various clinical and molecular-pathological characteristics were compared by univariate and multivariate Cox regression analyses. Receiver operating characteristic (ROC) curves were generated to test the prediction were used to conduct the statistical analyses. P < .05 was considered significant.

| Expression of PDI family members was significantly associated with clinicopathological features in glioma
Accumulating evidence has indicated that the PDI enzyme family is actively involved in the occurrence and development of various cancers. To clarify the biological function of PDI members in glioma occurrence and development, we comprehensively analysed the relationship between 17 PDI family members and glioma based on different clinicopathological features, such as WHO grade, IDH status and 1p/19q codeletion status. The 689 glioma samples obtained from TCGA were used as the training set, and the 508 glioma samples obtained from the CGGA were used for verification (Table S1).
A heat map was generated to visualize the correlation between the mRNA levels of the 17 PDI members and glioma WHO grades based on TCGA data set ( Figure 1A,B), with the results indicating that most of the 17 PDIs were significantly correlated with glioma WHO grade.
Furthermore, quantitative analysis of the mRNA levels from both data sets showed that, compared with those in low-grade glioma (LGG) samples (WHO grade II and III), the expression levels of P4HB, TXNDC12 and PDIA5 were significantly up-regulated in glioblastoma multiforme (GBM) samples (WHO grade IV), whereas that of CASQ1 were down-regulated ( Figure 1C,D). Additionally, the 17 PDIs were closely related to the IDH and 1p/19q status. Here, we only considered the IDH status of LGG samples as only 11 IDH-mutant highgrade glioma (HGG) samples were identified. As shown in Figure 1E, most of the 17 PDIs were correlated with IDH status. We also investigated the relationship between the 17 members and 1p/19q status in IDH-mutant LGG samples ( Figure 1G). The results showed that 13 of the PDIs were significantly correlated with 1p/19q status. The genes validated in the CGGA data set presented an expression pattern consistent with that of TCGA data set, that is the expression levels of PDIA5, ERP27, P4HB, PDIA4, TXNDC5 and TXNDC12 were higher in the LGG samples with wild-type IDH than in those with mutated IDH. In contrast, the levels of TMX2 and CASQ1 were significantly up-regulated in the mutant IDH state compared with that in the wild-type state ( Figure 1F). In the IDH-mutant LGG samples of the CGGA data set, the mRNA levels of TXNDC12, TMX1, PDIA5, CASQ2 and DNACJ10 were significantly higher in the LGG samples without 1p/19q codeletion, while those of TMX2 and CASQ1 were increased in LGG samples presenting with 1p/19q codeletion ( Figure 1H).

| Identification of PDI family members as potential biomarkers based on oncomine and human protein atlas database analyses
Changes in the transcript levels of the 17 PDI family members in different types of brain cancer and normal brain tissue were analysed by ONCOMINE data mining, while the relationship between PDI expression and different glioma pathological grades was analysed via the Human Protein Atlas database. ONCOMINE data mining indicated that the expression of DNAJC10, TMX1, PDIA4, PDIA5, PDIA6, P4HB and TXNDC5 was significantly up-regulated in brain and CNS cancers when compared with normal tissue; however, the expression levels of TMX4 and PDIA2 were higher in normal brain samples than in brain cancer tissues ( Figure 2A).
The Human Protein Atlas database was used to investigate the protein expression patterns of the 17 PDI family members in glioma.
Immunohistochemistry staining data suggested that the expression of P4HB, PDIA5, TMX1, PDIA4, PDIA6, DNAJC10, TMX3, ERP44, ERP29, ERP27 and TXNDC5 was positively correlated with glioma grade ( Figure 2B and Figure S1), whereas the expression levels of TMX4, PDIA2, TMX2 and CASQ1 were negatively correlated with glioma grade. No immunohistochemistry data were available for TXNDC12 and CASQ2 in the Human Protein Atlas database. Taken together, these results indicated that the data for the relationships between the expression of the 17 PDIs and the pathological grade of glioma extrapolated from the TCGA and CGGA data sets were reliable.

| Stratification of the clinical outcomes and clinicopathological characteristics of gliomas based on consensus clustering analysis of the 17 PDI family members
To investigate the relationship between the 17 PDIs, their RNAseq profiles were extracted from TCGA for correlation analysis.
The results showed that a significant positive correlation ex-  Figure S2). The result showed that the four clusters correlated with different clinicopathological features of gliomas ( Figure 3D).
We found that IDH-mutant status, 1p/19q codeletion status and lower WHO grade of gliomas presented higher level in the cluster 2 compared with the other clusters (Table S2). In contrast, the cluster 4 was significantly correlated with wild-type IDH status (P < .0001), 1p/19q non-codeletion status (P < .0001) and higher WHO grade of gliomas (P < .0001) compared with the other clusters. PCA was used to compare transcripts between the four subgroups in TCGA ( Figure 3E). The results indicated that significant differences existed between these subgroups. Furthermore,

| Consensus clustering analysis revealed the potential mechanisms underlying the malignant progression of gliomas
Because cluster 4 presented the lowest OS, the malignancy-related mechanisms were further investigated in this cluster. All the samples were classified into two groups, C1 and C2, representing clusters 1/2/3 and cluster 4, respectively. We performed a differential gene expression analysis for all the transcriptional data, setting a P < .05 and log 2|fold change| >1. Compared with C1, gene functional en-  (n = 508) data sets were categorized into low-and high-risk groups according to the median risk score. As shown in Figure 5B,C, Kaplan-Meier survival analysis indicated that significant differences in OS existed between the two groups.

| The prognostic scores of the risk model showed a strong correlation with the clinicopathological features of gliomas
The chi-squared test was used to investigate the relationship between the prognostic risk scores and pathological characteristics of gliomas from the TCGA ( Figure 6A). The result suggested that significant differences existed between clusters (P < .001), 1p/19q status (P < .001), IDH status (P < .001), age (P < .001) and WHO grade (P < .001; Table S3). Subsequently, the predictive value of the prognostic risk scores was evaluated by ROC curve analysis based on 3-and 5-year survival rates. The area under the ROC curve (AUC) for OS was 0.889 at 3 years and 0.821 at 5 years, showing a reliable predictive ability in the CGGA validation data set ( Figure 6B-E).
Overall, our study indicated that patient outcomes and clinicopathological features of glioma could be accurately predicted through the prognostic scores of the risk model.
To determine whether the risk scores could represent an independent prognostic index, we performed univariate and multivariate Cox regression analyses using TCGA data set. Univariate analysis showed that WHO grade, age, IDH status, 1p/19q status and risk score were closely correlated with OS ( Figure 6F,G), while WHO grade (P < .001), age (P < .001), IDH (P = .001) and risk score (P = .012) remained a significant relationship with OS in the multivariate Cox regression analysis. The same analyses were also performed for the CCGA validation data set to analyse whether the risk score model constructed by the TCGA data set still has independent indicators. Multivariate regression analysis showed that WHO grade (P < .001), IDH status (P = .004), 1p/19q status (P = .004) and risk score (P = .016) were significantly associated with OS. Consequently, our results suggested that the prognostic value of PDIs in gliomas can be independently predicted by the risk score based on the nine selected PDI family members.
To investigate whether the prognostic value of the different WHO grades was significantly related to the risk signature, the Kaplan-Meier method was used to compare the OS associated with different WHO grades between the low-and high-risk group in the TCGA data set. The OS in the high-risk group was shorter compared with that in the low-risk group for WHO grade II and III gliomas (Figure S4A,B) and GBM ( Figure S4C). These results were consistent with those for the CGGA validation data set, which showed a poor prognosis for the high-risk group when compared with that in the low-risk group for gliomas of differing WHO grades ( Figure S4D-F).

| Analysis of the association between single nucleotide polymorphisms and tumour mutational burden with the different risk scores
The single nucleotide polymorphism (SNP) frequency in the TCGA data set was evaluated to determine the SNP status with different risk scores in GBM. The result suggested that SNPs in IDH1 and ATRX exerted a higher somatic mutation burden in the low-risk group than in the high-risk group ( Figure 7A). In contrast, the somatic mutation burden associated with PTEN and EGFR was higher in the high-risk group than in the low-risk group. Notably, the TMB was significantly different between the two groups ( Figure 7B). The relationship between glioma WHO grade and TMB was also analysed ( Figure 7C), and a positive correlation was found between the two.
Subsequently, Kaplan-Meier survival analysis suggested that OS with a high TMB was significantly shorter than that with a low TMB ( Figure 7D).

| D ISCUSS I ON
Malignant glioma is the most common malignant primary central nervous system tumour in adults and is characterized by poor prognosis. 19 The last few decades have witnessed the development of different treatments for glioma patients, such as precise surgical resection with pre-operative imaging, 20,21 as well as radiation therapy 22 and chemotherapy based on glioma histological identification. 23 However, therapeutic targeting has always been necessary because of the short-term survival of glioma patients after routine treatment. 23,24 Accumulating evidence has indicated that PDI has potential as a therapeutic target in cancer treatment. 13 In addition, significantly elevated expression of PDI has been detected in various cancer types, including gliomas. 25,26 Therefore, in this study, we systematically analysed the potential function and prognostic value of PDI family members based on the different pathological features of gliomas. and chemoresistance. 28,29 In this study, we have shown that the expression of PDIA5 and TXNDC12 was positively correlated with high-grade gliomas, as well as with wild-type IDH status and 1p/19q non-codeletion. Interestingly, CASQ1, a gene related to myopathy, 30 vacuolar myopathy 31 and tubular aggregate myopathy, 32 showed increased expression in samples of LGGs, as well as in those with mutated IDH and 1p/19q codeletion. However, no study has reported on the correlation between CASQ1 and cancer, a topic that requires further study. Taken together, the results of our study have highlighted the close correlation existing between the differential expression of genes of the PDI family and the malignancy of gliomas.
Analyses based on the ONCOMINE and Human Protein Atlas databases suggested that the 17 PDI members identified in the TCGA data set have potential as diagnostic biomarkers in glioma.
Consensus clustering analysis of the 17 PDI family members was utilized to classify the clinical outcomes and clinicopathological characteristics of glioma in the TCGA data set, resulting in the identification of four glioma subgroups. We found that cluster 4 was significantly correlated with poor prognosis as well as with wild-type IDH status, 1p/19q non-codeletion and higher WHO grade of gliomas. Our results also suggested that the PDI family was correlated with biological processes such as neutrophil chemotaxis, cellular protein metabolic process, positive regulation of cell proliferation, immune response, cellular response tumour necrosis factor and angiogenesis. Moreover, cancer-related signalling pathways, such as transcriptional misregulation in cancer, the Jak-STAT signalling pathway and viral carcinogenesis, were significantly correlated with PDI-related gene expression. GSEA also identified hallmarks of malignant tumours, including apoptosis, E2F targets, epithelial-mesenchymal transition, TNF signalling via NF-κB and angiogenesis.
Taking the effect of the 17 PDI family members in the prognosis and malignant biological process of gliomas into account, we constructed a prognostic model with nine PDI members selected by LASSO Cox regression analysis to stratify glioma patients into highand low-risk categories based on the median risk score. Our result indicated that lower OS was significantly correlated with the highrisk category in TCGA and CGGA data sets. Chi-squared tests also highlighted that a strong correlation existed between the prognostic scores of the risk model and clinicopathological features of gliomas.
In addition, our results suggested that the predictive value of the prognostic risk scores was robust based on ROC curve analysis, while univariate and multivariate Cox regression analysis revealed that the risk scores could be regarded as an independent prognostic index in TCGA and CGGA data sets. Subsequently, we used gliomas of differing WHO grades to evaluate the prognostic value of the risk signature in TCGA and CGGA data sets, and found that the prognostic value of the risk signature based on the nine selected PDI gene members was robust in WHO grade II, III or IV gliomas in both TCGA and CGGA data set.
Owing to the close relationship between SNPs and risk of tumour progression, [33][34][35] we investigated the relationship between SNP and TMB status and the different risk scores. Mutational analysis of TCGA data set revealed that SNPs in IDH1, ATRX, PTEN and EGFR were significantly associated with risk status in glioma. Consistent with the fact that the mutation of IDH1 and ATRX is associated with prolonged survival in glioma, 36,37 our result showed that the SNPs in IDH1 and ATRX exerted a higher somatic mutation burden in the low-risk group than in the high-risk group. The loss of PTEN function was reported to promote tumour insensitivity to multiple therapeutic approaches, and the polymorphisms of EGFR were reported to increase the risk of gliomas [38][39][40] ; similar results were found in our result that the SNPs with PTEN and EGFR were higher in the high-risk group than in the low-risk group. The TMB is an emerging biomarker for immunotherapy responses and the latest marker for the evaluation of PD-1 antibody treatment efficacy in colorectal cancer. We found that TMB was higher in the high-risk group than in the lowrisk group, and there was a positive correlation between TMB and glioma WHO grade. Kaplan-Meier curve analysis further revealed that a high TMB was associated with a lower OS compared with that with a low TMB.
In summary, our study revealed the differential expression of the PDI gene family members in gliomas, as well as their potential function and prognostic values. These results may be useful for the development of clinically relevant treatments for gliomas.

ACK N OWLED G EM ENTS
The authors gratefully acknowledge contributions from the CGGA network and the TCGA Network.

CO N FLI C T O F I NTE R E S T
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

AUTH O R CO NTR I B UTI O N S
Xingen Zhu designed the study; Qing hu and Chuming Tao performed the data analysis; Kai huang was responsible for the writing and critical reading of the manuscript; All authors read and approved the final manuscript.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets analysed for this study can be found in the TCGA (http:// cance rgeno me.nih.gov/) and CGGA (http://www.cgga.org.cn/).