Prognostic value and immune cell infiltration of hypoxic phenotype‐related gene signatures in glioblastoma microenvironment

Abstract Glioblastoma (GBM) is a malignant intracranial tumour with the highest proportion and lethality. It is characterized by invasiveness and heterogeneity. However, the currently available therapies are not curative. As an essential environmental cue that maintains glioma stem cells, hypoxia is considered the cause of tumour resistance to chemotherapy and radiation. Growing evidence shows that immunotherapy focusing on the tumour microenvironment is an effective treatment for GBM; however, the current clinicopathological features cannot predict the response to immunotherapy and provide accurate guidance for immunotherapy. Based on the ESTIMATE algorithm, GBM cases of The Cancer Genome Atlas (TCGA) data set were classified into high‐ and low‐immune/stromal score groups, and a four‐gene tumour environment‐related model was constructed. This model exhibited good efficiency at forecasting short‐ and long‐term prognosis and could also act as an independent prognostic biomarker. Additionally, this model and four of its genes (CLECL5A, SERPING1, CHI3L1 and C1R) were found to be associated with immune cell infiltration, and further study demonstrated that these four genes might drive the hypoxic phenotype of perinecrotic GBM, which affects hypoxia‐induced glioma stemness. Therefore, these might be important candidates for immunotherapy of GBM and deserve further exploration.

26.5% and 9.7%, respectively. 2 Immunotherapeutic strategies investigated by different approaches have shown promising results. 3 For these therapies, it is necessary to explore the molecular signatures and candidate targets in GBM patients that affect the prognosis and immune response. The tumour microenvironment (TME), where tumour cells grow and develop, includes a series of non-neoplastic cells and molecules, containing infiltrating and resident immune cells, vascular cells and other glial cells, with numerous cytokines and chemokines. 1,4 The dynamic changes in the TME reflect the evolutionary nature of the tumour, which involves the tumour immune escape, tumour growth and metastasis. For example, additional gene mutations induced by the low expression of DNA mismatch repair genes in a low oxygen environment could promote a more aggressive tumour phenotype. 5 Tumour progression and therapeutic responses are significantly correlated with TME heterogeneity, thereby dictating the success of immunotherapeutic programmes. 6 Thus, a deeper understanding of the TME and the mechanisms involved is crucial.
Innovative computational methods and genomics have given us a preliminary understanding of TME. For example, CIBERSORTx,

Tumor IMmune Estimation Resource (TIMER) and Estimation of STromal and Immune cells in MAlignant Tumor tissues using
Expression data (ESTIMATE) are innovative algorithms used to calculate the infiltration of tumour-related normal cells in tumour tissues according to genomics. [7][8][9] Depending on specific molecular biomarkers expressed in immune and stromal cells, the ESTIMATE algorithm computes the immune/stromal/ESTIMATE scores to reflect the TME. Researchers have successfully applied this algorithm to explore the TME of various cancers, including clear cell renal cell carcinoma, non-small cell lung cancer, acute myeloid leukaemia and colon cancer. [10][11][12][13] However, this algorithm is still not extensively applied to explore the GBM microenvironment in the literature.
In this study, we analysed the potential TME-related prognostic signature for GBM patients by integrating of TCGA data and ESTIMATE algorithm, identifying differentially expressed genes based on their immune and stromal scores. The common genes were screened out and followed with a robust likelihood-based survival model to identify prognostic genes associated with the GBM TME.
Furthermore, using the LASSO method, we constructed an independent prognostic model containing four genes aberrantly expressed in GBM; these genes and the model were significantly associated with the immune infiltration levels and the hypoxic phenotype in GBM patients. The findings of this study will help elucidate the TME effect on GBM and provide alternative targets for immunotherapy.

| Database
The genomic expression and clinical data of GBM patients in TCGA and Gravendeel databases were retrieved from GlioVis (http://gliov is.bioin fo.cnio.es/), which is a user-friendly website for data visualization and analysis of glioma data sets. The CGGA RNA sequencing data were downloaded from CGGA data portal (http://www.cgga. org.cn/). The stromal and immune scores of TCGA GBM data set originated from the ESTIMATE algorithm, which was downloaded from the ESTIMATE website (https://bioin forma tics.mdand erson.  Table 1. A single-cell RNA-seq data set comprising 3,589 cells collected from both the tumour core and the peritumoral brain was obtained from a user-friendly website (http://www.gbmseq.org/). 14 The single-cell RNA-seq data set was used to explore the expression of identified genes within the model in different cell clusters. Our workflow for this study is shown in Figure S1.

| Estimating infiltration of 22 types of immune cells
CIBERSORTx is an online analytical tool based on gene expression profiles and signature matrix files that allow us to determine cell type abundance and expression from bulk tissue. 15 The TCGA expression profile was uploaded to the online tool, and a reference LM22 signature matrix with 100 permutations was used for the tool.

| Distinguishing and elucidating differentially expressed TME-related genes
The median values of immune and stromal scores were selected as the cut-off to divide the high-and low-score groups. Differentially expressed genes were screened out between the high-and lowscore groups using the limma package. 16 The absolute value of logFC > 1 and adjusted P-value < 0.05 were fixed as the thresholds to identify differentially expressed TME-related genes.
Functional enrichment analysis, including biological processes (BP), molecular functions (MF) and cellular components (CC), and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis were conducted via the clusterProfiler package in R language for differentially expressed TME-related genes. 17 Benjamini-Hochberg (BH)-adjusted P-values < 0.05 were considered statistically significant.

| Establishing and confirming the prognostic model and building the co-expression network of the model
Using the rbsurv, glmnet and survival packages, the robust likelihoodbased survival model and the least absolute shrinkage and selection operator regression (LASSO) analyses were conducted in the TCGA training set to construct the TME-related model. Consequently, four prognostic TME-related genes and their regression coefficients were obtained. Univariate and multivariate Cox hazard regression analysis showed that this gene signature model was an independent prognostic biomarker, verified using the Gravendeel validation set.
Weighted gene correlation network analysis (WGCNA), an innovative method used to describe the correlation patterns among genes across microarray samples, was conducted for the training set to build the co-expression network of the gene signature model. 18 The variance of each gene was calculated and sorted by number. The top 50% genes ranked based on variance were chosen for WGCNA. It means that 416 samples and 6,350 genes were included in the WGCNA. First, outliers were screened out, and a co-expression similarity matrix was established according to the absolute value of the correlation between the expression levels of transcriptome data. Second, an adjacency matrix converted from the abovementioned matrix by choosing 5 as a soft threshold, applying the topological overlap measure, allowed co-expression gene modules to be classified. Gene significance (GS) used to distinguish the importance of each module was computed to assess the relationship between genes and immune/stromal scores.
Defined as the mean of GS within modules, module significance (MS) was estimated to measure the relationship between modules and the immune/stromal scores. Lastly, the immune/stromal score-related gene modules containing the four genes were screened out. After filtering the genes that interplayed with the four genes, the co-expression network of this signature was built

| Immunohistochemistry
These experiments were approved by the Human Ethics Committee of Xiangya Hospital, and informed consent was obtained from all patients. Based on polyformalin-fixed and paraffin-embedded tissues obtained from GBM patients, immunohistochemistry analysis was conducted as previously described. 20 Tissue sections were incubated with indicated primary antibodies against C1R(17346-1-AP) and CHI3L1(12036-1-AP).

| Statistical analysis
Statistical analyses were performed with R software. Using the logrank test, the relationship between the critical factors (gene expression level and immune/stromal scores) and patients' overall survival was analysed based on the survival and survminer R package, with P < .05 regarded as statistically significant. To assess the predictive efficiency of the signature, the time-dependent receiver operating characteristic curves were depicted.

| Clustering single cells and identifying the role of the signature in cell clusters
Dimensionality reduction and cell clustering for the single-cell RNAseq data were performed based on 'tsne' package in R. As described by a previous study, 14 the cell-to-cell distance matrix of all cells was established based on the top 500 over-dispersed genes. The matrix was reduced to two dimensions with tSNE. Singe cells were coloured on a dimensional reduction plot according to the genes within the model.

F I G U R E 1
Immune score and stromal score are associated with GBM subtypes and the overall survival. (A) Distribution of immune/ stromal score of TCGA GBM transcriptome subtypes. The violin plot shows that both immune score and stromal score are significantly correlated with transcriptome subtypes of GBM (n = 416, *: P < .05, **: P < .01, ***: P < .001). (B) Distribution of immune/stromal score of TCGA GBM non-G-CIMP subtype and G-CIMP subtype. The violin plot shows that both immune score and stromal score are significantly correlated with CIMP subtypes of GBM. (C) TCGA GBM cases were divided into two groups based on their immune score; median overall survival of cases with the elevated immune score is shorter than that of cases with the lower immune scores, although it was not statistically significant (P = .082). (D) Similarly, TCGA GBM cases were divided into two groups based on their stromal score; the median overall survival of cases with the elevated stromal score is shorter than that of the cases with the lower stromal scores, although it was not statistically significant (P = .062) The immune/stromal scores of the mesenchymal subtype were the highest, whereas the scores of pro-neural subtype were the lowest ( Figure 1A). Immune/stromal scores in the non-G-CIMP cluster were remarkably higher than those in the G-CIMP cluster ( Figure 1B). To study the potential relationship between immune/ stromal scores and survival time of GBM patients, the median of immune and stromal scores was set as the cut-off, and we dichotomized 416 GBM patients into low-and high-score groups.

| Immune/stromal scores remarkably correlated with molecular subtypes and prognosis of GBM
The survival rate between the two clusters was compared by were significantly associated with the GBM subtypes and higher scores were related to shorter survival times in GBM patients.
In accordance with the results from a previous study, 21 we also found that transcriptional subtypes of IDH wild-type tumours, except the neural subtype which was identified as a sampling artefact, differentially activated the immune microenvironment.
Mesenchymal GBM exhibits a higher fraction of M2 macrophages and neutrophils and lower proportion of activated nature killer cell compared with classical and pro-neural GBM. Meanwhile, the fraction of the resting memory CD4 + T cell was reduced in the pro-neural GBM ( Figure S2A).

| Differentially expressed genes related to the GBM TME and their functional annotation
Comparing the transcriptome data from 416 GBM patients after their separation into high-and low-score groups, we attempted to uncover the correlation of gene expression profiles with immune and stromal scores. The differential gene expression profiles of patients with highimmune/stromal scores and those with low-immune/stromal scores can be observed in the generated heat maps (Figure 2A Figure 2C).
Thus, we focused on these 152 genes, named differentially expressed genes related to tumour environment (DEGRTME), for the subsequent analysis. Functional enrichment analysis was conducted to clarify the mechanism underlying the functions of these 152 genes. The top 10 functional annotations of Gene Ontology (GO) terms and the KEGG terms are listed in Figure 2D-G. The results revealed that DEGRTME were strongly associated with immune-related terms/pathways, such as extracellular matrix, neutrophil-mediated immunity, humoural immune response, regulation of inflammatory response, acute inflammatory response, IgG binding, signalling pattern recognition receptor activity, and complement and coagulation cascades.

| Establishing and confirming the TMErelated model for GBMs and building the coexpression network
Through the robust likelihood-based survival model and the LASSO method, four genes (CLEC5A, SERPING1, CHI3L1 and C1R) were screened out to construct the TME-related model. These four genes were also validated to be differentially expressed genes in the Gravendeel data set and another independent data set in the Chinese Glioma Genome Atlas (Table S1 and Figure S2B,C). A prognostic risk score for each patient was obtained by the regression coefficients and the transcriptome data of the genes. After calculating the risk scores, the TCGA GBM patients were separated into high-and low-risk clusters according to the optimal cut-off point of the risk scores. Figure 3A shows the distribution of the four-gene signature risk scores. The survival status and survival time of the GBMs between the two risk groups are listed in Figure 3B. Notably, the expression of the four genes was remarkably higher in the high-risk group than that in the low-risk group ( Figure 3C).
Moreover, compared with the low-risk group, the overall survival time of the high-risk group was worse (P < 0.0001) ( Figure 3D). The area  Figure 3J). The co-expression network was constructed after performing WGCNA of the training set using the R package of WGCNA, which elucidates the mechanism underlying these four genes. 18 The correlations between gene modules and immune/stromal scores were measured by module eigengene (ME), which represents the expression level of the whole genes in corresponding modules. Twenty-one gene modules were generated containing genes varying from 50 to 1085.  Figure S3A-D). The results indicated that the yellow and brown modules contain the four genes, and genes belonging to these two modules were screened out. Based on genes contained in these two modules, the co-expression network of the TME-related model was built ( Figure 4A). The number of genes co-expressed with C1R, SERPING1, CLEC5A and CHI3L1 was 124, 48, 199 and 2, respectively.
Containing 209 nodes, the co-expression network of this model was highly connected by 373 edges ( Figure 4A). GEPIA (http://gepia.cance r-pku.cn/), a web server for analysing the RNA sequencing expression data of tumour and normal samples, revealed that the expression of these filtered four genes was expressed at a higher level in GBM than in non-GBM tissues ( Figure 4B-E).

| Independent prognostic factor and hierarchic marker of TME-related signatures
To identify whether the TME-related signature is an independent predictive factor for GBM patients, the probable predictors, age group and risk level (HR = 0.64, P = .05) were independent predictors in the Gravendeel data set ( Table 2). Our signature remained a remarkable prognostic biomarker after being adjusted by other predictive factors.
When considering the combined effects of risk level and other clinicopathological factors (age, CIMP status, MGMT status), the younger patients in the low-risk group exhibited the longest median survival, whereas the older patients in the high-risk group showed the shortest median survival time ( Figure 4F). Likewise, the high-risk group patients harbouring non-G-CIMP had the shortest median survival. In contrast, the G-CIMP patients belonging to the low-risk group showed the longest median survival time ( Figure 4G). This outcome was verified in the validation data set ( Figure S3E,F). Our model also stratified both MGMT-methylated and MGMT-unmethylated patients' median survival remarkably well ( Figure 4H). Therefore, these results suggest that the TME-related prognostic model constructed is an independent prognostic biomarker and valuable marker for stratification in GBM.

| The TME-related prognostic model correlates with immune cell infiltration in GBM
After constructing the TME-related prognostic model using the training set and verifying its efficiency using the validation set, the correlation between this model and the infiltration of immune cells for GBM was calculated ( Figure 5 and Figure S4). Scatter plots were generated using Pearson's correlation analysis, and statistical significance was determined. The risk score and the four identified genes in the model exhibited moderate correlation with the infiltra-  Figure S4E). This outcome reveals that our model system accounts for the immune cell infiltration in the GBM TME, especially the DCs.

| The TME-related genes might contribute to the hypoxic phenotype of perinecrotic GBM
To study the expression of the TME-related genes in the GBM tissue, we explored the RNA-seq data set of the Ivy Glioblastoma Atlas Project (http://gliob lasto ma.allen insti tute.org/stati c/home). This web server collected GBM samples laser-dissected from various sites including cellular tumour, perinecrotic zone, pseudopalisading cells around necrosis, hyperplastic blood vessels in cellular tumours and proliferating microvasculature samples. 22 Results showed that the GBM perinecrotic zone highly expressed TME-related genes ( Figure 5F). To further confirm this, we investigated the correlation between the model and perinecrosis-related genes, such as CD44, which are activated under hypoxia and interact with HIF-2α to modulate the hypoxic phenotype of perinecrotic and perivascular glioma cells. 23 Scatter plots show that the expression of the four genes belonging to the TME-related model exhibited a strong correlation with CD44 and HIF-2α (known as EPAS1) ( Figure 5G-H). Additionally, the outcome of single-cell clustering showed that these four genes were highly expressed either in three of the largest immune cell clusters defined by the pan-immune marker PTPCR (CD45) or neoplastic cells that were collected from the tumour core, which is the location at which necrosis occurred ( Figure S5A-C). Immunohistochemistry results showed that C1R and CHI3L1 were highly expressed in the GBM perinecrotic zone compared with the non-necrotic zone and peritumour tissue ( Figure S6A,B). These indicate that the TME-related genes might contribute to the hypoxic phenotype of perinecrotic GBM.

| CON CLUS ION
In conclusion, we constructed a four-gene TME-related predictive signature as an independent prognostic biomarker and compared it with other probable clinical features.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets of this study were generated from the GlioVis

R E FE R E N C E S
F I G U R E 5 The correlation between model genes and risk score and dendritic cells and the TME-related model might play an important role in the hypoxic phenotype of the perinecrotic GBM. The levels of risk score(A), C1R(B), CLEC5A(C), SERPING1(D) and CHI3L1(E) were significantly correlated with the infiltration levels of dendritic cells. (F) IVY GAP (http://gliob lasto ma.allen insti tute.org/stati c/home) analysis indicated that the expression of the TME-related model was highly related to the perinecrotic zone in glioblastoma anatomic structures. (G) The expression level of this model was tightly correlated with the level of CD44. (H) The expression level of this model showed moderate correlation with the level of HIF-2α