Classification of diffuse lower‐grade glioma based on immunological profiling

Transcriptomic data derived from bulk sequencing have been applied to delineate the tumor microenvironment (TME) and define immune subtypes in various cancers, which may facilitate the design of immunotherapy treatment strategies. We herein gathered published gene expression data from diffuse lower‐grade glioma (LGG) patients to identify immune subtypes. Based on the immune gene profiles of 402 LGG patients from The Cancer Genome Atlas, we performed consensus clustering to determine robust clusters of patients, and evaluated their reproducibility in three Chinese Glioma Genome Atlas cohorts. We further integrated immunogenomics methods to characterize the immune environment of each subtype. Our analysis identified and validated three immune subtypes—Im1, Im2, and Im3—characterized by differences in lymphocyte signatures, somatic DNA alterations, and clinical outcomes. Im1 had a higher infiltration of CD8+ T cells, Th17, and mast cells. Im2 was defined by elevated cytolytic activity, exhausted CD8+ T cells, macrophages, higher levels of aneuploidy, and tumor mutation burden, and these patients had worst outcome. Im3 displayed more prominent T helper cell and APC coinhibition signatures, with elevated pDCs and macrophages. Each subtype was associated with distinct somatic alterations. Moreover, we applied graph structure learning‐based dimensionality reduction to the immune landscape and revealed significant intracluster heterogeneity with Im2 subtype. Finally, we developed and validated an immune signature with better performance of prognosis prediction. Our results demonstrated the immunological heterogeneity within diffuse LGG and provided valuable stratification for the design of future immunotherapy.

Transcriptomic data derived from bulk sequencing have been applied to delineate the tumor microenvironment (TME) and define immune subtypes in various cancers, which may facilitate the design of immunotherapy treatment strategies. We herein gathered published gene expression data from diffuse lower-grade glioma (LGG) patients to identify immune subtypes. Based on the immune gene profiles of 402 LGG patients from The Cancer Genome Atlas, we performed consensus clustering to determine robust clusters of patients, and evaluated their reproducibility in three Chinese Glioma Genome Atlas cohorts. We further integrated immunogenomics methods to characterize the immune environment of each subtype. Our analysis identified and validated three immune subtypes-Im1, Im2, and Im3-characterized by differences in lymphocyte signatures, somatic DNA alterations, and clinical outcomes. Im1 had a higher infiltration of CD8+ T cells, Th17, and mast cells. Im2 was defined by elevated cytolytic activity, exhausted CD8+ T cells, macrophages, higher levels of aneuploidy, and tumor mutation burden, and these patients had worst outcome. Im3 displayed more prominent T helper cell and APC coinhibition signatures, with elevated pDCs and macrophages. Each subtype was associated with distinct somatic alterations. Moreover, we applied graph structure learning-based dimensionality reduction to the immune landscape and revealed significant intracluster heterogeneity with Im2 subtype. Finally, we developed and validated an immune signature with better performance of prognosis prediction. Our results demonstrated the immunological heterogeneity within diffuse LGG and provided valuable stratification for the design of future immunotherapy.

Introduction
Diffuse lower-grade gliomas (LGGs) consisting of World Health Organization (WHO) grade II and III gliomas are infiltrative brain tumors that arise from glial or precursor cells, showing a more indolent course compared with glioblastoma (GBM, grade IV) (Jiang et al., 2016;Lapointe et al., 2018). The survival of this tumor ranges widely (from 1 to 15 years) and varies considerably when stratified by tumor type. Based on the IDH mutation and 1p/19q codeletion status, LGGs are classified into three diagnostic and prognostic subtypes in the 2016 WHO classification: IDH wild-type (holding the worst outcome), IDH mutant and 1p/19q intact, and IDH mutant and 1p/19q codeleted tumors (oligodendrogliomas) (Louis et al., 2016). Despite multimodal treatment, including neurosurgical resection, radiotherapy, and chemotherapy, tumor recurrence and malignant progression are inevitable because of their highly invasive nature and treatment resistance (Cancer Genome Atlas Research et al., 2015).
More than four decades of efforts with variety of immunotherapeutic modalities yield limited successes in glioma. Glioma imbues numerous obstacles to successful immunotherapy, including tumor heterogeneity, low mutational burden, T-cell dysfunction, and poor immune access (Fecci and Sampson, 2019). Moreover, new finding reveals that tumor-directed sequestration of T cells in bone marrow severely restricts the access of T cells to tumors in the CNS (Chongsathidkiet et al., 2018). Despite successes with checkpoint blockade in many cancers, CTLA-4 and PD-1 inhibitors have failed in clinical trials of glioma (Schalper et al., 2019). CAR T cells targeting EGFRvIII, IL12Rａ and HER2 are proved to be safe in preclinical studies of GBM, and the clinical antitumor capabilities remain to be seen (Ahmed et al., 2017;Brown et al., 2015;Johnson et al., 2015). Despite excellent preclinical efficacy, EGFRvIII-targeted peptide vaccine (Rindopepimut) exhibits no significant improvement in median overall survival (OS) of GBM patients in phase III clinical trial (Weller et al., 2017).
For successful immunotherapeutic approaches, it will require better understanding of glioma-specific immune microenvironment. A growing number of studies have demonstrated that the TME remains highly divergent in different subtypes of glioma. The TME of GBM has been delineated by the proportion and gene expression signatures of immune cells, but the clinical implications of these immune cells are still disputed (Berghoff et al., 2015;Lohr et al., 2011). Michael and Luoto found that transcriptional and mutational subtypes were characterized by distinct TME in high-grade gliomas and GBM, respectively (Bockmayr et al., 2019;Luoto et al., 2018). Reduced immune infiltrates were observed in IDH mutant glioma (Amankulor et al., 2017), while Wang found increased macrophage infiltration in NF1 mutant tumors . However, the immune microenvironment of diffuse LGG has not been fully characterized.
In this study, we classified diffuse LGG into three distinct subtypes based on consensus clustering of immune-related gene expression profiles. We demonstrated the stability and reproducibility of this classification in an independent cohort. Each of the three immune subtypes was associated with distinct molecular and cellular features, and clinical outcomes. The identification of immune-related subtypes may facilitate the optimal selection of LGG patients responsive to immunotherapy.

Patients and datasets
This study collected 1018 diffuse LGGs from two databases: The Cancer Genome Atlas (TCGA) and Chinese Glioma Genome Atlas (CGGA). For the TCGA discovery cohort (402 LGG patients; Table 1), the RNA-seq data, somatic mutation, copy-number alterations (CNAs), and corresponding clinical information were obtained from TCGA database (http://ca ncergenome.nih.gov/) (Ceccarelli et al., 2016). The molecular and immune-related features, including neoantigens, immune cell composition, TCR/BCR status, and other signatures, were also retrieved ( Thorsson et al., 2018). Three CGGA validation cohorts were involved in this study: two RNA-seq cohorts (cohort1 and cohort2) and a microarray cohort (cohort3) ( Table 1). The RNA-seq and microarray data, and clinical and survival information were downloaded from CGGA database (http://www.cgga.org.cn) (Hu et al., 2018;Zhao et al., 2017). Patient informed consent was existing in these two public datasets, and this study was carried out in accordance with the Helsinki Declaration and approved by the ethics committee of Tiantan Hospital.

Identification and validation of immune subtypes
Based on the expression of 782 immune-related genes listed in the literature (Charoentong et al., 2017;Gentles et al., 2015), we applied consensus clustering to identify robust clusters of TCGA patients (Wilkerson and Hayes, 2010;Wu et al., 2019a). 80% item resampling, 50 resamplings and a maximum evaluated K of 10 were selected for clustering. The cumulative distribution function and consensus heatmap were used to assess the optimal K.
To validate the immune subtypes in CGGA cohort, we first identified the immune-related genes shared by the training and validation cohorts (751 genes). Then, we trained a partition around medoids (PAM) classifier in the discovery cohort to predict the immune subtype for patients in the validation cohort (package pamr), and each sample in the validation cohort was assigned to an immune subtype whose centroid had the highest Pearson correlation with the sample (Tibshirani et al., 2002). Finally, the in-group proportion (IGP) statistic (package clusterRepro) was performed to evaluate the similarity and reproducibility of the acquired immune subtypes between discovery and validation cohorts (Kapp and Tibshirani, 2007).

Computation of molecular signatures and immune cellular fraction
ESTIMATE was performed to evaluate the immune cell infiltration level (immune score) and stromal content (stromal score) for each sample (Yoshihara et al., 2013). The enrichment levels of proliferation and 23 immune signatures were quantified by the single-sample gene set enrichment analysis (ssGSEA), as implemented in the GSVA R package (Hanzelmann et al., 2013;He et al., 2018). The relative fraction of 22 immune cell types in tumor tissue was estimated using CIBERSORT algorithm .

Immune landscape analysis
To further uncover the intrinsic structure and distribution of individual patients, we extended a graph learning-based dimensionality reduction analysis to the immune gene expression profiles. The discriminative dimensionality reduction with trees (DDRTree) was used, and the immune landscape was visualized with the plot cell trajectory function (package monocle) (Li et al., 2019;Trapnell et al., 2014).

Identification of an immune-related signature
First, we applied significance analysis of microarray (SAM) to identify differentially expressed immune genes between subtypes. Univariate Cox regression analysis was performed to determine the immune genes with prognostic significance. Then, the Cox proportional hazards model, which was suitable for high-dimensional regression analysis, was used to construct an optimal and prognostic gene set (package glmnet) (Hughey and Butte, 2015;Wu et al., 2019b). The linear combination of gene expression weighted by regression coefficients (Coeffs) was used to calculate the risk scores of patients.

Statistical analysis
The Kaplan-Meier with log-rank test was used to assess survival difference between groups. The univariate and multivariate Cox regression analyses were performed to detect the prognostic factors. All statistical analyses were conducted using R software, GRAPHPAD PRISM 6.0 (GraphPad Inc, San Diego, CA, USA) and SPSS 16.0 (IBM, CHICAGO, IL, USA). P < 0.05 was considered statistically significant.
To validate our findings in TCGA cohort, we evaluated reproducibility of the immune subtypes in CGGA cohorts. Each sample was assigned to an immune subtype according to the Pearson correlation of centroid ( Fig. 1B-D). IGP statistic showed high consistency between TCGA and CGGA cohorts (Table S1). Moreover, the obtained immune subtypes displayed similar pattern of prognostic and pathological features with TCGA cohort ( Fig. 2A-C).

Cellular and molecular characteristics of immune subtypes
To uncover the immune heterogeneity among these three subtypes, we resorted to several immune-related tools. We computed stromal and immune scores based on the ESTIMATE method (Yoshihara et al., 2013). Compared with Im1 and Im3, tumors in Im2 had higher immune and stromal scores but lower purity (Fig. 3A). CIBERSORT  analysis also revealed Im1 had an increased percentage of lymphocytes, whereas Im2 and Im3 displayed higher level of M2 macrophages (Fig. 3B). We also used the ssGSEA (Hanzelmann et al., 2013) score to quantify the enrichment levels of immune cells and functions. Im2 subtype was associated with higher levels of CD8+ T cell, cytolytic activity, exhausted CD8+ T cells, macrophages, and elevated T-cell costimulation. Most immune cells tended to be increased in Im1 and Im3 subtypes, such as B cells, neutrophils, Treg, Th1, and Th2. Notably, Im1 tumors showed higher enrichment of CD8+ T cell, Th17, and mast cells, whereas Im3 had relatively higher levels of T helper cells,  pDCs, and macrophages, along with increased functions of APC and T-cell coinhibition (Fig. 3C). In addition, Im2 had higher TCR diversity and more neoantigen load (Fig. S4). Most HLA and checkpoint genes showed significantly higher expression levels in Im2 subtype (Fig. S5). We next sought to dissect the immune infiltrations of each subtype in three CGGA cohorts and obtained consistent results (Figs 3A-C and S5). Collectively, these results suggested that the TME of Im2 was immune-hot but highly immune-suppressive, whereas Im1 and Im3 tumors showed relatively moderate immune microenvironment.

Prognostic associations of tumor immune response measures
Since infiltrating immune cells have been shown to be prognostic in many human cancers , we determined the effects of immune cell signatures and functions on the outcome of diffuse LGG. Out of these immune signatures, Th1, mast cells, neutrophils, and Treg were associated with better OS. When stratified by immune subtypes, improved survival was observed in association with cytolytic activity in Im2, and Im1 patients with high scores of neutrophils or Treg tended to have better outcome (Fig. 4A,B). Likewise, we assessed the prognostic value of major immune checkpoints, and found that high expression of PDCD1, HAVCR2, and IDO1 implied worse outcome (Fig. S6A,B). When stratified by IDH and 1p/19q status, these immune signatures and checkpoints showed no significant prognostic correlation (Fig. S7). Additionally, strong correlations were observed between cytolytic activity and T-cell costimulation/inflammation-promoting, neutrophils and Treg, T-cell coinhibition and T helper cells/pDCs/APC coinhibition, PDCD1 and IDO1/LAG3 (Fig. S8A,B). We next interrogated the validation cohort and obtained similar results (Figs 4, S6 and S8). These results suggested that infiltrating immune cells might impact patients' outcome, and provided valuable targets of immune treatment for diffuse LGG.

Somatic variation of immune subtypes
Recent studies have linked the tumoral genomic alterations with immune infiltrate (Charoentong et al., 2017;Fig. 4. Immune cell signatures were associated with clinical outcome of diffuse LGG. (A) Heatmaps show hazard ratios for immune expression signature scores in relation to v within immune subtypes. Correlation of OS was assessed by Cox regression analysis. *P < 0.05; **P < 0.01; and ***P < 0.001. (B) The Kaplan-Meier analyses of tumors stratified by Th1, mast cell, neutrophil, and Treg scores in TCGA and CGGA cohorts. P value was calculated by the log-rank test.  Thorsson et al., 2018). We further explored the difference in genomic alterations among these three immune subtypes. Im2 tumors showed high aneuploidy, homologous recombination deficiency, and copy-number burden scores, as well as increased tumor mutation burden (Fig. 5A). We correlated gene mutations with immune subtypes and found significant associations. Im1 was enriched in IDH1, 1p/19q codeletion, CIC, FUBP1, and NOTCH1 mutations. Im2 was enriched in mutations in driver genes, such as PTEN, EGFR, and NF1, a finding of note since tumors with NF1 mutation had increased macrophage infiltration . Im3 was enriched in IDH1, ATRX, and TP53 mutations (Fig. 5B). GISTIC2.0 analysis revealed distinct CNAs among immune subtypes. Im2 showed more frequently deleted or amplified regions, such as CDKN2A/ CDKN2B, EGFR1, CDK4, KIT, and PDGFRA (Fig. 5B). These findings indicated that tumors with high immune infiltration might have higher levels of genomic alterations.

Dimension reduction analysis identifies two distinct subgroups in Im2
For defining the immune landscape of diffuse LGG, we further applied the graph learning-based dimensionality reduction analysis (Li et al., 2019;Trapnell et al., 2014) to the immune gene expression profiles. We observed that Im2 could be further divided into two subgroups (Im2A and Im2B) based on their location in the immune landscape (Fig. 6A,B). Interestingly, the two subgroups were associated with distinct prognosis. Compared with Im2B, Im2A had better outcome (Fig. 6C). Of note, ssGSVA found Im2A was associated with higher levels of immune cells and functions, such as T helper cells, CD8+ T cells, cytolytic activity, exhausted CD8+ T cells, neutrophils, macrophages, T-cell coinhibitions, APC costimulation, and pDCs, suggesting a hot and suppressive immune microenvironment in Im2A. In contrast, Im2B showed a relatively cold immune state and enhanced  proliferation rate (Fig. 6D). In addition, CIBERSORT  analysis also revealed Im2A had an increased percentage of M2 macrophages (Fig. 6E). Similar results were obtained in Im2 subtype of CGGA cohort2 (Fig. 6). There data further indicated a significant intracluster heterogeneity within immune subtypes.
3.6. Development and validation of an immunerelated signature using Cox proportional hazards model We obtained and validated a survival model using elastic-net Cox proportional hazards modeling with cross-validation. SAM analysis identified 421 differentially expressed immune genes between Im2 and Im1/ m3 subtypes, wherein 359 genes were significantly correlated with patients' OS in univariate Cox regression analysis (Fig. 7A). Then, Cox proportional hazards modeling was performed for identifying a model with best prognostic value (Fig. 7B). An eight-gene immune signature was obtained, and the scores were computed with the regression coefficients (Fig. 7C). High scores were enriched in tumors of Im2, classical and mesenchymal, grade III, or IDH wild-type (Fig. 7D). The Kaplan-Meier analysis revealed that high scores implied significantly poorer outcome in patients of diffuse LGG or each immune subtype (Fig. 7E,F). In addition, the acquired signature had higher area under the curve (AUC) compared with other factors (age and grade) (Fig. 7G). Multivariate Cox regression analysis also confirmed the independent prognostic value of this immune signature (Tables S2 and S3). We further applied this signature into validation cohorts and found consistent results (Fig. S9, Tables S2 and S3). These data demonstrated the superior performance of immune signature for prognosis prediction, highlighting the importance of the immune TME in determining survival.

Discussion
Recently, early studies from clinical practice or ongoing trials with immunotherapies show limited effectiveness in treatment of glioma (Ahmed et al., 2017;Schalper et al., 2019;Weller et al., 2017). A better understanding of the tumor immune microenvironment is critical for improving the efficacy of current immunotherapies. Here, we presented a comprehensive characterization of immunological profile of lowergrade diffuse glioma. Our results showed that diffuse LGG could be classified into three stable subtypes, and the reproducibility of this classification was demonstrated in validation cohort. Each of the immune subtypes was associated with distinct immune cell fractions, functions (from deconvolution of gene expression), somatic alterations, and clinical outcomes. Our works deepened the understanding of immune microenvironment of diffuse LGG, and provided valuable information for personalized immunotherapy.
Im2 subtype conferred the worst outcome on their constituent tumors and showed composite signatures reflecting a high lymphocytic infiltrate, with high M2 macrophage content and checkpoint gene expression, indicating an immune-hot but immune-suppressive TME. Im1 had favorable prognosis and demonstrated the most pronounced Th17 signature, consistent with a recent study suggesting that Th17 expression is associated with improved survival (Punt et al., 2015). Im3 was T helper cell-dominant and showed a better survival despite having high M2 macrophage content. Given that the discrete subtype information did not provide the intrinsic structure and distribution of individual patients, we further applied the graph learning-based dimensionality reduction analysis to the immune gene expression profiles (Trapnell et al., 2014). We revealed that Im2 subtype could be further divided into two subgroups, which had significant difference in immune infiltration and prognosis. Compared with Im2A, Im2B displayed lower lymphocytic infiltrate and worse outcome, in agreement with a cold TME for which a poor outcome would be expected. These implied the complexity of immune landscape of diffuse LGG.
Recently, Amankulor and colleagues reported that the IDH1 mutation is associated with a decreased number of immune cells in the glioma tumor microenvironment (Amankulor et al., 2017). Consistently, we showed that Im1 and Im2, which were enriched in IDH1 mutation, had lower immune infiltrate. We also observed that Im2, which was enriched in NF1 mutation, was characterized by high macrophages, in agreement with the finding that tumors with NF1 mutation had increased macrophage infiltration in glioma . These indicated that the somatic alterations might shape the immune subset composition, and further work is needed to determine the functional aspects of these correlations.
Tumor neoantigens are associated with improved survival and thought to be key targets of antitumor immunity in many tumors (Brown et al., 2014). As expected, Im2 subtype, which had more mutation burden, was associated with increased neoantigen load. However, we observed conflicting result that Im2 subtype showed worse prognosis, suggesting that the role of neoantigens might vary in LGG, or inaccurate method for neoantigen identification. When we associated immune subtype with IDH or 1p/19q status in TCGA cohort, 95% (130/137) of IDH mutant and 1p/19q codeleted LGG were enriched in Im1. 85% (62/72) IDH wild-type LGG were enriched in Im2. 23% (45/192), 23% (44/192), and 54% (103/ 192) of IDH mutant and 1p/19q non-codeleted LGG were enriched in Im1, Im2, and Im3, respectively. These indicated that immune gene expression profiles could partially reproduced the WHO LGG classification based on IDH mutation and 1p/19q codeletion status. This consistency was meaningful for advancing our view of genomic and immune landscapes in LGG.
Considering the insufficiency of univariate Cox model for variable selection, we adopted an elastic-net regression Cox modeling (Thorsson et al., 2018) to develop an immune signature that had better performance for prognosis prediction in diffuse LGG. Three scores in cases stratified by immune subtype, grade, TCGA, and WHO subtype. CL, classical; ME, mesenchymal; NE, neural; PN, proneural. Sub1: IDH mutant and 1p/ 19q codeleted, Sub2: IDH mutant and 1p/19q non-codeleted, Sub3: IDH wild-type. *P < 0.05; **P < 0.01; ***P < 0.001; ****P < 0.0001. (E, F) Survival analysis of the immune signature in diffuse LGG or immune subtypes. P value was calculated by the log-rank test. (G) ROC curve analysis of age, grade, and immune score. hundred and fifty-nine differentially expressed immune genes between Im2 and Im1/m3 subtypes were employed to develop a prognostic indictor. Low-and high-score tumors displayed significant survival differences in both cohorts, with good prediction accuracy. However, more prospective studies were needed for further validation of this immune signature.

.Conclusions
In summary, three stable and reproducible immune subtypes were identified in lower-grade diffuse glioma. These subtypes were associated with prognosis, genetic, molecular, and cellular characteristics that may shape the specific immune environment we have observed. The definition of the immune subtype of diffuse LGG may play a critical role in predicting clinical outcome, as well as rational design of immunotherapy.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Consensus clustering based on immune gene expression of 402 diffuse LGGs in TCGA cohort. Fig. S2. GO analysis of GM3 and GM5. Fig. S3. Distribution of DNA methylation clusters within immune subtypes in TCGA cohort. Fig. S4. Comparison of neoantigen and TCR diversity between immune subtypes in TCGA cohort. Fig. S5. Heatmaps show the expression levels of HLA and checkpoint genes between immune subtypes in TCGA and CGGA cohorts.     Table S1. IGP was estimated for each immune subtype in the validation cohorts.