Collagen Type III Alpha 1 chain regulated by GATA‐Binding Protein 6 affects Type II IFN response and propanoate metabolism in the recurrence of lower grade glioma

Abstract Some studies suggested the prognosis value of immune gene in lower grade glioma (LGG). Recurrence in LGG is a tough clinical problem for many LGG patients. Therefore, prognosis biomarker is required. Multivariate prognosis Cox model was constructed and then calculated the risk score. And differential expressed transcription factors (TFs) and differential expressed immune genes (DEIGs) were co‐analysed. Besides, significant immune cells/pathways were identified by single sample gene set enrichment analysis (ssGSEA). Moreover, gene set variation analysis (GSVA) and univariate Cox regression were applied to filter prognostic signalling pathways. Additionally, significant DEIG and immune cells/pathways, and significant DEIG and pathways were co‐analysed. Further, differential enriched pathways were identified by GSEA. In sum, a scientific hypothesis for recurrence LGG including TF, immune gene and immune cell/pathway was established. In our study, a total of 536 primary LGG samples, 2,498 immune genes and 318 TFs were acquired. Based on edgeR method, 2,164 DEGs, 2,498 DEIGs and 31 differentials expressed TFs were identified. A total of 106 DEIGs were integrated into multivariate prognostic model. Additionally, the AUC of the ROC curve was 0.860, and P value of Kaplan‐Meier curve < 0.001. GATA6 (TF) and COL3A1 (DEIG) were selected (R = 0.900, P < 0.001, positive) as significant TF‐immune gene links. Type II IFN response (P < 0.001) was the significant immune pathway. Propanoate metabolism (P < 0.001) was the significant KEGG pathway. We proposed that COL3A1 was positively regulated by GATA6, and by effecting type II IFN response and propanoate metabolism, COL3A1 involved in LGG recurrence.


| INTRODUC TI ON
Immune genes involve in tumour genesis, development and prognosis, 1,2 and these genes can be applied in mechanism exploring, diagnosis, therapy target identification and prognosis. 2,3 Some studies indicated that immune-related lncRNA showed its prognosis value in lower grade glioma (LGG). 2 LGG is a serial common centre nervous system tumour. According to classification of WHO, glioma can be divided into Ⅰ, Ⅱ, Ⅲ and Ⅳ grade, and grade Ⅰ and Ⅱ are included in LGG. And estimated by the American Cancer Society, it will be 23,890 new cases and 18,020 death in America. 4 Generally, the prognosis of primary LGG is better than higher grade glioma, and the median overall survival is 5-10 years. 5 However, the recurrence, progression and deterioration make the therapy tough. And the recurrence rates of grade Ⅰ and Ⅱ are 84.5% 6 and 57.6%, 6 respectively. Besides, 50%-75% LGG patients died of progression and deterioration. 7 Therefore, for optimizing treatment, the biomarker for recurrence is required in the early stage. And we proposed immune gene might participate in LGG recurrence.
In our study, differential expressed immune genes (DEIGs) were identified, and univariate Cox regression was utilized to access the prognostic value. Additionally, significant prognosis DEIGs were integrated into multivariate prognosis Cox model and then calculated the risk score. Univariate and multivariate Cox regression were applied to validate risk score as the independent prognosis factor. Similarly, differential expressed transcription factors (TFs) were identified. Then, co-analysis for differential expressed TFs and DEIGs to identify the significant TF-DEIG relationship was performed. Based on CIBERSORT, immune cells/pathways were identified, and significant immune cells/ pathways were identified by ssGSEA. And significant DEIG and immune cells/pathways to identify the significant immune cell/pathway were co-analysed. Moreover, GSVA was utilized to identify the expression level in clinical LGG sample, and univariate Cox regression evaluated the prognosis value of Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. Further, differential expressed KEGG pathways were identified by GSEA. Additionally, significant DEIG and KEGG pathways to identify the significant KEGG pathway were co-analysed. In sum, a scientific hypothesis for LGG recurrence based on TF, immune gene, immune cell/pathway and KEGG pathway was established.

| Differential expressed genes identification
Gene expression level of clinical samples in TCGA database was performed by edgeR method, differential expressed genes (DEGs) based on whether recurrence or not were identified by log2 FC (Fold charge) and false discovery rate (FDR) P value. When log2 FC >1.0 or <−1.0, and FDR P value < 0.05, DEGs were defined. Further, Gene Oncology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis were utilized to explore clinical significance of DEGs. Similarly, immune genes were performed by edgeR method, when log2 FC >1.0 or <−1.0, and FDR P value < 0.05, DEIGs were defined. In addition, to access the prognostic value, DEIGs were screened by univariate Cox regression. Also, TFs were screened by edgeR method, differential expressed TFs were identified as log2 >1.0 or <−1.0, and FDR P value < 0.05.

| Multivariate model construction and independent prognosis analysis
To predict the prognosis of LGG, differential expressed prognostic significance immune genes were integrated into multivariate model.
Additionally, the accuracy of the model was tested by receiver operator characteristic (ROC) curve. And for each LGG sample, risk score was calculated by this following formula: For each single sample, 'n' represented the number of DEIG in the multivariate model, 'β' represented regression coefficient of each DEIG, and DEIGn represented the expression level of the Nth DEIG in the corresponding sample. Then, median of risk score was selected to distinguish the high and low-risk score, and samples were divided into the high-and low-risk group, respectively. Besides, Kaplan-Meier curve was utilized to predict the prognosis value of the multivariate model based on risk score.
Further, the value of independently prognostic for risk score was evaluated by univariate and multivariate Cox regression analysis.
And other demographic information like age, gender and race was applied for correction.
proposed that COL3A1 was positively regulated by GATA6, and by effecting type II IFN response and propanoate metabolism, COL3A1 involved in LGG recurrence.

K E Y W O R D S
low-grade glioma, prognosis, propanoate metabolism, recurrence, type II IFN response

| Regulatory network for immune genes and TFs
To validate the clinical relationship with recurrence/ new tumour event, non-parametric test for DEIGs was utilized. The Pearson correlation analysis was utilized to co-analyse differential TFs and DEIGs. When correlation coefficient more than 0.300 and P value < 0.001, the corresponding regulatory relationships based on TFs and DEIGs were considered to be statistically significant.
Besides, the most significant regulatory relationship was selected, and the most significant TF and DEIG were selected, respectively.

| Co-analysis the relationship of immune gene with immune cells/pathways and KEGG pathway
To identify the characteristic of immune cells and pathways in the downstream of DEIGs in clinical samples, single sample gene set enrichment analysis (ssGSEA) was utilized to filter significant immune cells and pathways. Besides, the Pearson analysis was used to identify the correlation between the most significant DEIG and immune cells/pathways. When P value < 0.05, the interplay was regarded as significant. And the most significant immune cell or pathway was decided by the largest correlation coefficient.
Aiming to filter prognostic KEGG pathway, gene set variation analysis (GSVA) was applied to access the expression level of KEGG pathways, and all KEGG pathways were in the downstream of the most significant DEIG. In addition, univariate Cox regression was utilized to calculate the prognostic value of KEGG pathways. And survival-related KEGG pathways were identified. What's more, to co-analyse the most significant DEIG and survival-related KEGG pathways, the Pearson analysis was utilized, and P value < 0.05 was defined as significant.

| Regulatory network for TFs, immune genes, immune cell and KEGG pathways
Gene set enrichment analysis (GSEA) was utilized to identify significant KEGG pathway between primary and recurrence LGG samples in clinical data. According to the result of GSVA and GSEA, the most significant survival-related and recurrent-related KEGG pathway was selected.
The regulatory network was constructed based on the interaction of the most significant TF, DEIG, top 9 immune cells/pathways and top three KEGG pathways.

| Statistics analysis
For all analysis process, tow-sided P value < 0.05 was defined as statistically significant. And R version 3.6.1 was utilized for analysis in silico

| Differential immune gene identification
All analysis process was illustrated in Figure S1. And all baseline information of LGG samples were summarized in Table 1 In addition, to evaluate the prognosis value of DEIGs, univariate Cox regression was utilized and 99 DEIGs were significant as risk factor and 7 DEIGs were significant as protective factor ( Figure 2C).  Figure 3E), green and red represented the high-and low-risk group samples, and 21 DEIGs lower than low-risk group, 32 DEIGs higher than the high-risk group and 53 DEGs were not significant.
Moreover, plots of propanoate metabolism ( Figure 6F), systemic lupus erythematosus ( Figure 6G) and proteasome ( Figure 6H) based on GSEA were also demonstrated. Additionally, according to co-analysis with COL3A1 in heatmap, propanoate metabolism was  Therefore, the regulatory network ( Figure 7A Ultimately, the scientific hypothesis ( Figure 7C): COL3A1 was positively regulated by GATA6 and affect type II IFN response and propanoate metabolism in recurrent LGG.  Table S1, all expression levels were summarized in Table S2, and all prognosis information were summarized in Table S3. All these figures and tables were mentioned in supplement materials.

Primary
LGG develops slowly and has a well prognosis. However, the prognosis becomes poor when LGG recurs, progresses and deteriorates. 7 And the progression-free survival of 5-, 10-and 15-years were 38%, 18% and 1%, respectively. 6 Besides, the rate of malignant conversion at the first recurrence was 36.4%. 6 Thus, a biomarker for GATA6 is a TF named GATA-binding protein 6, which belongs to the GATA family, and it expressed in neuron, astrocyte, choroid plexus epithelial cells and endothelial cells. 23 GATA6 plays an F I G U R E 4 Heatmap for TFs (A), blue and red represent primary and recurrent samples. Volcano plot for TFs (B), green and red dots represent significant differential expression. Venn plot for co-expressed, recurrent and new tumour event immune genes (C). Result for co-analysis immune genes and TFs (D). Non-parametric test beeswarm plot for new tumour event (E) and recurrent (F) and COL3A1. TF, transcription factor important role in differentiation during the process of embryogenesis, in proliferation during the process of development, and in inhibition apoptosis during the process tumour genesis. 24 In addition, the frequency of methylation in GATA6 was 68.4%, and hypermethylation in GATA6 was positively related to the prognosis of glioblastoma multiforme (GBM). 25 Besides, based on expression, loss of heterozygosity and loss function analysis, GATA6 low-expressed in tumour, and researcher indicated that GATA6 was a new tumour suppressor gene in astroglioma. 26 However, GATA6 also plays a different role in promoting recurrence in cholangiocarcinoma and related to poor prognosis. 27 COL3A1 is an immune gene named collagen type III alpha 1 chain, and it codes the pro-alpha 1 chains of type III collagen in the vascular system and other tissue. 28 COL3A1 highly expressed in glioma. 1 Some study indicated that it promoted tumour genesis by regulating the angiogenesis process, and it was defined as an oncogene. 28,29 By validation based on qPCR, COL3A1 correlated with prognosis of GBM, and the expression level was related to the grade of glioma (normal vs LGG, P < 0.001; normal vs higher grade glioma (HGG), P = 0.002; LGG vs HGG, P = 0.005). 1 And some studies showed it might relate to recurrence in LGG by genome-scale integrated analysis. 29 However, there was no study reported the direct interaction between GATA6 and COL3A1. Based on bioinformation analysis and other studies, 27,29 we speculated GATA6 played a role in eliciting the transcription of COL3A1, and high-expressed COL3A1 coding more pro-alpha 1 chains of type III collagen in vessel, then activated angiogenesis process which participate in recurrence of LGG.
Type II IFN response is a kind of defence mechanism, and the main molecule was IFN-γ. However, IFN-γ can positively and negatively regulate immune system; thus, it is an antitumour cytokine and also a promoter for tumour genesis. 30 And in centre nervous system, IFN-γ mediated neurogenesis and differentiation through ERK1/ 2, and it also plays a role in apoptosis in brain cells and blastoma. 31 In tumour genesis, propanoate metabolism plays a role in lipogenesis, and by down-regulating propanoate metabolism and lipogenesis, leading to inhibition of angiogenesis in vitro/in vivo. 35 However, COL3A1 elicits the process of angiogenesis. 28 Besides, ICAM1 is induced by IFN-γ, and IFN-γ inhibits angiogenesis. 33  and biomarker of type II IFN response and propanoate metabolism will be detected by immunohistochemistry (IHC) in normal brain tissue, adjacent tissue and LGG tissue. Secondly, gain/loss of function experiment was utilized to explore the regulatory relationship between these genes and phenotype of recurrence.
Thirdly, to explore direct interplay of COL3A1 and GATA6, COL3A1 and type II IFN response, COL3A1 and propanoate metabolism, chromatin immunoprecipitation, luciferase reporter assay, RNA immunoprecipitation and co-immunoprecipitation will be applied.
Besides, to explore the regulatory relationship, gain/loss of function experiment was utilized. Ultimately, the indirect regulatory relationship including GATA6-COL3A1-type II IFN response and GATA6-COL3A1-propanoate metabolism was validated by rescue assays. All our effort might offer new biomarkers to prognosis the recurrence in LGG.
Based on analysis in silico and multidimensional validation, we proposed that COL3A1 was positively regulated by GATA6, and by affecting type II IFN response and propanoate metabolism, COL3A1 involved in recurrence LGG. And the prognosis value of GATA6, COL3A1, type II IFN response and propanoate metabolism might play a role in recurrence mechanism.

ACK N OWLED G EM ENTS
We thank the The Cancer Genome Atlas (TCGA) team for using their data.

CO N FLI C T O F I NTE R E S T
The authors declare that there is no potential conflict of interest and no commercial or financial relationships in this study.

E TH I C A L A PPROVA L
The study was approved by the Ethics Committee of First Affiliated Hospital of Zhengzhou University. The authors declare that there is no conflict of interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets generated and/or analysed during the current study are available in the in the Supplementary Material and TCGA-LGG program (https://portal.gdc.cancer.gov).