Characterization and prognostic significance of alternative splicing events in lower‐grade diffuse gliomas

Abstract Alternative splicing (AS) is assumed to play important roles in the progression and prognosis of cancer. Currently, the comprehensive analysis and clinical relevance of AS in lower‐grade diffuse gliomas have not been systematically addressed. Here, we gathered alternative splicing data of lower‐grade diffuse gliomas from SpliceSeq. Based on the Percent Spliced In (PSI) values of 515 lower‐grade diffuse glioma patients from the Cancer Genome Atlas (TCGA), we performed subtype‐differential AS analysis and consensus clustering to determine robust clusters of patients. A total of 48 050 AS events in 10 787 genes in lower‐grade diffuse gliomas were profiled. Subtype‐differential splicing analysis and functional annotation revealed that spliced genes were significantly enriched in numerous cancer‐related biological phenotypes and signalling pathways. Consensus clustering using AS events identified three robust clusters of patients with distinguished pathological and prognostic features. Moreover, each cluster was also associated with distinct genomic alterations. Finally, we developed and validated an AS‐related signature with Cox proportional hazards model. The signature, significantly associated with clinical and molecular features, could serve as an independent prognostic factor for lower‐grade diffuse gliomas. Thus, our results indicated that AS events could discriminate molecular subtypes and have prognostic impact in lower‐grade diffuse gliomas.

IDH-wt; GBM, IDH-mut; and GBM, IDH-wt. 7 Although increasing studies have found several genetic mutations and deregulated signalling pathways affecting the malignant phenotype of gliomas, the current findings are still insufficient to reveal the molecular basis of gliomagenesis. 8 To further uncover the transcriptional dissimilarity among subgroups, we undertook an analysis of alternative splicing (AS) in lower-grade diffuse gliomas.
Alternative splicing (AS) is a post-transcriptional modification generating multiple mRNA and protein isoforms from a single gene.
The vast majority of human genes (approximately 95%) are alternatively spliced. 9,10 Growing evidence has shown that AS is closely associated with tumorigenesis and progression, especially growth, invasion and metastasis of tumour cells. [11][12][13] Moreover, AS can serve as valuable biomarkers for cancer classification, diagnosis and prognosis evaluation. 14-16 TCGA SpliceSeq is a web-based resource that provides an overview of alternative splicing form 33 different tumour types, including available adjacent normal samples. This data set contains seven AS patterns, namely, alternate acceptor site (AA), alternate donor site (AD), alternate promoter (AP), alternate terminator (AT), exon skip (ES), mutually exclusive exons (ME) and retained intron (RI). 17 Percent Spliced In (PSI) value, a common and intuitive ratio, was introduced to quantify splicing events. 10 In gliomas, the role of AS remains largely unexplored. Previous studies have mainly focused on alternative spliced isoforms of individual genes. For instance, β splicing of human telomerase reverse transcriptase (hTERT) was correlated with clinical parameters and could serve as a prognostic maker in gliomas. 18 BAF45d splicing mediated by polyprimidine tract-binding protein 1 (PTBP1) maintained the undifferentiated cellular state in glioblastoma. 19 Lineage-specific ANXA7 splicing in a constituent of an oncogenic pathway eliminated tumour suppressor functions and promoted glioblastoma progression. 20 However, comprehensive alterations of AS and their clinical implications in lower-grade diffuse gliomas have yet to be elucidated.
In this study, we undertook a comprehensive investigation of alternative splicing across lower-grade diffuse glioma subgroups in the TCGA cohort (n = 515). Using data from SpliceSeq, we identified subtype-differential AS events and genes involved in different biological functions. Furthermore, we identified three clusters of cases with significant difference in prognosis based on the consensus clustering of AS events. We also developed a 17-AS event signature for prognostic evaluation using differential AS events associated with overall survival (OS). Our results further highlight the transcriptional difference among subtypes and provide additional biomarker for subtype assignment in lower-grade diffuse gliomas.

| Patients and tissues
515 lower-grade diffuse glioma samples from the TCGA database were included in this study. All these tissue samples and clinicopathologic information were collected with informed consent. The study was conducted in accordance with the Declaration of Helsinki. Study protocols were approved by the ethics committees of the participating institutions.

| Data obtaining and processing
The RNA sequencing data, somatic mutation and copy number alterations (CNAs) data, and corresponding clinical information of all collected patients, such as age, gender, histology, isocitrate dehydrogenase (IDH) mutation status, 1p/19q status, methylguanine methyltransferase (MGMT) promoter status, TERT promoter status and survival information, were downloaded from TCGA database (http:// cance rgemo me.nih.gov/). Patient characteristics were summarized in Table S1. The corresponding alternative splicing data were obtained from the SpliceSeq database (http://bioin forma tics.mdand erson.org/ TCGA SpliceSeq). 10 The primary PSI data with no more than 25% missing values were analysed, and the missing values were filled up with average PSI value of each event. The samples were then randomly divided into two parts (Table S1), namely training and validation groups, to performing further analyses. All data supporting this study were openly available from TCGA database and SpliceSeq database.

| Consensus clustering
Using R package 'ConsensusClusterPlus', consensus clustering was performed for class discovery based on the comparison of alternative splicing profile. [21][22][23] Measured by median absolute deviation (MAD > 0.16), the most variable AS events were retained for subsequent clustering. PSI of each AS event was median-centred and the similarity between AS profile was quantified using Pearson correlation. The cumulative distribution function (CDF) was constructed for a range from 2 to 10 consensus clusters. The optimal number of clusters was evaluated using CDF and consensus matrices.

| AS-related signature identification
Univariate Cox regression analysis was performed to identify prognosis-associated AS events (P < .05). Then, the Cox proportional hazards model was applied for selecting optimal prognostic AS event set using R package 'glmnet', which was suitable for regression analysis of high-dimensional data. [24][25][26] After that, risk score of each patient was calculated with the linear combination of the PSI value weighted by their regression coefficients (Coeffs). Risk score = (PSI

| Bioinformatic analysis
Gene ontology (GO) analysis was applied for the functional annotation of differential spliced genes among subgroups. Kyoto encyclopedia of genes and genomes (KEGG) analysis was performed to analyse the pathway enrichment (http://david.ncifc rf.gov/). 27 A t test with P-values adjusted by Benjamini-Hochberg method was employed to identify differential AS events based on the threshold of foldchange more than 2 and adjusted P-value <.05. ROC curve analysis was used to predict OS with R package 'pROC'. Gene set enrichment analysis (GSEA) was performed to identify gene sets of statistical difference between two groups by using GSEA v3 software (http://www.broad insti tute.org/gsea/index.jsp). 28 GISTIC2.0 analysis was adopted to assess CNAs associated with clusters. Locus with GISTIC value more than 1 or less than −1 was defined as an amplification or deletion, respectively. 29

| Statistical analysis
Univariate and multivariate Cox regression analyses were conducted to identify independent prognostic factors. Chi-square test was performed to detect the difference of pathological features between groups. Kaplan-Meier with 2-sided log-rank test was used to assess the OS difference between groups. Cases with missing information were excluded from the corresponding analysis. All statistical analyses were conducted using SPSS, GraphPad Prism 6.0, and R software. P < .05 was considered statistically significant.

| Overview of AS events in lower-grade diffuse gliomas of TCGA cohort
To uncover the alternative splicing features of lower-grade diffuse gliomas, we extensively analysed the AS events data obtained from Interestingly, ES was the most frequently observed AS events, followed by AP and AT events. While ME events were fewest, with only 273 events. In addition, we analysed the detection frequency of AS events with varying PSI levels in all samples. The results showed that events with PSI levels (0-0.2 and 0.8-1) constituted the majority of all types of AS events ( Figure 1D). Although the number of detected AS events varied in all samples, the pattern of AS types with distinct PSI levels was similar.

| The landscape of differentially expressed AS events between subtypes in lower-grade diffuse gliomas
To portray the full landscape of aberrant AS in lower-grade diffuse gliomas, we next identified differential AS events across molecular subtypes. We used t test (P-values adjusted by Benjamini-Hochberg method) to compare the PSI value distributions within the 3 molecular groups (foldchange > 2, P-value < .05). As shown in Figure 2A, volcano plots represented the differentially spliced AS events of each subtype. The differential AS events identified by comparative analysis were comprehensively displayed in heatmap using hierarchical clustering ( Figure 2B and Table S2). Specifically, 134 differential AS events involved in 107 genes were detected in cases of subtype 1 (IDH-mut and 1p/19q codeleted), 54 differential AS events were observed in subtype 2 (IDH-mut and 1p/19q non-codeleted), and 478 differential AS events occurred in subtype 3 (IDH-wt). The proportional composition of differential AS types is similar, and AP was the most frequently observed differential AS type ( Figure 2C). In addition, some differential AS events within subtypes could be identified in the same genes (for example, IL11RA and KCNIP4). The Venn diagram and heatmap showed distribution of genes of differential AS events ( Figure S1A,B). Single type of differential AS events was observed in most of genes, whereas some genes harboured distinct types of differential AS events (EEF1D, MRPL55, SLC14A2, ZSCAN18, AKAP7, ERBB2, Figure S1C). Some genes were also reported in previous studies in gliomas, such as SGK1, TCF4 and FGFR1. [30][31][32] These results indicated that differential AS events varied in genes and patient groups in lower-grade diffuse gliomas.

| Functional enrichment of differential AS events
To further define the biological functions potentially affected by AS in lower-grade diffuse gliomas, we performed GO analysis based on genes with differential AS events (foldchange > 1.5, Pvalue < .05). In patients of subtype 1, we identified a high incidence of GTPase-mediated signal transduction process affected by AS.
Genes with differential AS events in subtype 2 were mainly enriched in mRNA splicing processing. Instead, genes in subtype 3 patients were annotated to translation and signal transduction ( Figure 2D). In addition, nervous system development and regulation of GTPase activity were observed in all groups of patients.
We further performed KEGG analysis to identify pathways in which genes of differential AS events might be involved. Pathways such as Rap1 signalling pathway, 33,34 pathway in cancer and insulin signalling pathway 35,36 which were previously implicated in gliomas were significantly enriched in differentially spliced genes in subtype 1 ( Figure S2A). Similarly, AS events in subtype 3 comprised crucial pathways reported in gliomas, including Ras signalling pathway 37,38 and cAMP signalling pathway 39,40 ( Figure S2B). Our analyses revealed that differential AS events participated in many important biological process and pathways involved in glioma pathogenesis.

| Consensus clustering of AS events identifies three distinct tumour subgroups
Considering the significant variation of AS events in lower-grade diffuse gliomas, we further performed tumour classification based on PSI values through consensus clustering. Tumour samples were randomly distributed into two parts, namely training and validation set. 4000 AS events with highly variable PSI values across samples F I G U R E 2 The landscape of differentially expressed AS events in lower-grade diffuse gliomas. A, Volcano plots represent the differentially spliced AS events of each subtype. Sub1 stands for cases with IDH mutation and 1p/19q codeletion, Sub2 for cases with IDH mutation and 1p/19q non-codeletion, Sub3 for IDH wild type. B, Heatmap shows the differential AS events identified by comparative analysis (t test, Benjamini-Hochberg method). C, Differentially spliced events in each subtype, bars indicate the proportion of each AS event type. D, GO analyses of genes with differential AS events for each subtype (MAD > 0.16) were used for subsequent clustering. Among them, AP and ES were the most frequently observed AS events (27.5% and 45.4%, respectively) ( Figure S3). Assessing by CDF and consensus matrices ( Figure S4), we identified three robust clusters (C1, C2 and C3) produced by AS events data in training set ( Figure 3A), because the shape of the CDF curves did not change much beyond this number. The cluster membership of these three groups was associated with distinct prognostic and pathological features ( Figure 3B,D). C1, with significantly poor outcome, contained the majority of tumours with classical or mesenchymal subtype. 77% (44/57) IDH wild-type tumours (subtype 3) were enriched in this cluster. In contrast, C2 and C3 showed better outcome and contained more tumours with IDH mutation and 1p/19q codeletion (subtype 1). Within them, C2 was mainly comprised of proneural tumours, whereas neural tumours were enriched in C3 cluster. When it came to the classification reported by Ceccarelli et al, 8 tumours of LGm5 and LGm4 were enriched in C1 cluster, whereas LGm3 was prevalent in C2 and C3 clusters.
To validate our findings in training set, we evaluated the reproducibility of AS-related clusters in validation set. Applying the same AS event ordering from the training set in the validation set clearly recapitulated the clusters identified in the training set ( Figure 3A).
Moreover, the obtained clusters displayed similar pattern of pathological and prognostic features with training set ( Figure 3C,E). These results suggested that AS events could serve as valuable biomarkers for classification of lower-grade diffuse gliomas.

| Genomic alterations of AS-related clusters
To explore the association between AS-related cluster and genomic alterations, we analysed the somatic mutations and copy number variations (CNVs) data form TCGA database. First, we compared the frequency of mutations across three AS clusters of training set.
Mutations in NF1, EGFR, TP53 and PTEN occurred more frequently in C1. In contrast, IDH, CIC, FUBP1 mutation and 1p/19q codeletion were significantly enriched in C2 and C3 clusters ( Figure 4A). Additionally, we observed higher frequency of TP53 mutation in C2 compared with C3 cluster. Moreover, CNV analysis revealed distinct chromosomal alteration patterns among three clusters. As shown in Figure 4A, C1 showed more frequently deleted or amplified genes, such as PTEN, EGFR, MET, CDKN2A, CDKNA2B, RB1, PIK3CA and PIK3R1. We next sought to dissect the genomic alterations of each cluster in validation set and obtained consistent results ( Figure 4B).
These findings confirmed the association between AS-related cluster and genomic alterations.

| Identification of an AS-related prognostic signature in lower-grade diffuse gliomas
Considering the strong link between patients' prognosis and AS events, we proposed to construct an AS-related signature for out- were favourable (hazard ratio < 1) and 5275 AS events were unfavourable (Table S3). t Test (Benjamini-Hochberg method) identified 15 546 differential AS events between C1 and C2/C3, wherein 6401 AS events were associated with patients' outcome ( Figure 5A).
Additionally, ES events were major components of these AS events correlated with OS, and most of these events showed a range of PSI level (0-0.2 and 0.8-1) ( Figure S5). We next applied a Cox proportional hazards model for selecting AS events with best prognostic value ( Figure 5B). Consequently, a 17-AS event signature was identified and the risk score was calculated with their PSI values and regression coefficients ( Figure 5C and Table 1). The relative influence of each AS event was showed in terms of the absolute value of coefficients ( Figure 5D). Multivariate Cox regression analyses found that most of these 17 AS events were independent factors of prognosis (Table S4). High scores were enriched in tumours of grade III, C1, classical and mesenchymal or IDH wild type ( Figure 5E). Kaplan-Meier analysis showed that high scores implied significantly poorer outcome in patients of lower-grade diffuse gliomas or stratified tumours ( Figure 5F,G and Figure S6). In addition, the acquired signature had higher AUC compared with other factors (age) ( Figure 5H).
Multivariate Cox regression analysis also confirmed the independent prognostic value of this AS signature (Table 2). We further applied this signature into validation set and found consistent results ( Figure S7 and Table 2). These data demonstrated the superior performance of AS signature for prognosis prediction, highlighting the importance of AS event in determining survival.

F I G U R E 5
Identification of an AS-related signature by Cox proportional hazards model in training set. A, Venn diagraph shows 6401 differential AS events significantly correlated with patients' overall survival in univariate Cox regression analysis. B, Cross-validation for tuning parameter selection in the proportional hazards model. C, Heatmap exhibits the 17 AS events of the identified signature. D, Relative influence of each AS events of the identified signature. E, Distribution of risk scores in cases stratified by AS cluster, grade, TCGA and WHO subtype. CL, classical; ME, mesenchymal; NE, neural; PN, proneural; Sub1, IDH mutant and 1p/19q codeleted; Sub2, IDH mutant and 1p/19q non codeleted; Sub3, IDH wild type. *P < .05; **P < .01; ***P < .001. F and G, Survival analysis of the signature in diffuse LGG and tumours stratified by grade. HR, hazard ratio. P-value was calculated by the log-rank test. H, ROC curve analysis of age and risk score. AUC, area under the curve In this study, we presented the integrated portrait of alternative splicing in lower-grade diffuse gliomas using SpliceSeq data.

| D ISCUSS I ON
Seven types of AS events, namely alternate acceptor site, alternate donor site, alternate promoter, alternate terminator, exon skip, mutually exclusive exons and retained intron, were analysed, and events with PSI levels (0-0.2, 0.8-1) formed the majority of all AS types. We further identified differential AS events across each lower-grade glioma subgroup based on the distribution of PSI values. Functional enrichment analysis of genes with differential AS events identified numerous cancer-related biological phenotypes and signalling pathways. In future work, we propose to select some differential splicing events for further expression and functional validation (including knockdown and over-expression assays), for uncovering the possible roles of these isoforms on glioma progression.
Growing evidence suggests that alternative splicing events could discriminate molecular subtypes in various cancers.

Leivonen et al reported that AS can differentiate subtypes in dif-
fuse large B-cell lymphoma. 14  We further compared gene expression between high-and lowrisk cases. Based on the top genes of differential expression (fold change > 1.5, P < .05) identified by SAM, GO analysis revealed that cell division, DNA replication and immune response were significantly enriched in high-risk gliomas, whereas the low-risk cases showed enrichment of oxidation-reduction process and chemical synaptic transmission ( Figure S8A,B). Meanwhile, GSEA confirmed these findings ( Figure S8C,D).
In summary, we systematically characterized the dysregulation of AS events and its biological and clinical significance in lower-grade diffuse gliomas. Our data demonstrated that differentially expressed AS events could act as risk indicator of patients' survival and further highlighted the transcriptional diversity within subgroups of gliomas.

CO N FLI C T O F I NTE R E S T
The authors declare that they have on conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data supporting this study were openly available from TCGA database (http://cance rgemo me.nih.gov/) and SpliceSeq database (http://bioin forma tics.mdand erson.org/TCGA SpliceSeq).