Identification and validation of critical alternative splicing events and splicing factors in gastric cancer progression

Abstract Gene expression and alternative splicing (AS) interact in complex ways to regulate biological process which is associated with cancer development. Here, by integrated analysis of gene expression and AS events, we aimed to identify the hub AS events and splicing factors relevant in gastric cancer development (GC). RNA‐seq data, clinical data and AS events of 348 GC samples were obtained from the TCGA and TCGASpliceSeq databases. Cox univariable and multivariable analyses, KEGG and GO pathway analyses were performed to identify hub AS events and splicing factor/spliceosome genes, which were further validated in 53 GCs. By bioinformatics methods, we found that gene AS event‐ and gene expression‐mediated GC progression shared the same mechanisms, such as PI3K/AKT pathway, but the involved genes were different. Though expression of 17 hub AS events were confirmed in 53 GC tissues, only 10 AS events in seven genes were identified as critical candidates related to GC progression, notably the AS events (Exon Skip) in CLSTN1 and SEC16A. Expression of these AS events in GC correlated with activation of the PI3K/AKT pathway. Genes with AS events associated with clinical parameters and prognosis were different from the genes whose mRNA levels were related to clinical parameters and prognosis. Besides, we further revealed that QKI and NOVA1 were the crucial splicing factors regulating expression of AS events in GC, but not spliceosome genes. Our integrated analysis revealed hub AS events in GC development, which might be the potential therapeutic targets for GC.


| INTRODUC TI ON
Gastric cancer is one of the most common cancers worldwide, especially in developing countries, and causes 783 000 cancer-related deaths globally. 1 Cancer growth, invasion and metastasis are critical steps in GC progression [2][3][4] and involve complex regulatory mechanisms among protein-coding genes, lncRNAs and microR-NAs. Alternative splicing (AS) is an important post-transcriptional regulatory mechanism involved in protein diversity and is involved in cancer progression. 5 Distinct pre-mRNA AS events can produce protein isoforms with diverse structures and functions, 6 which can affect mechanisms regulating cancer progression.
Investigating the role of AS in GC progression may clarify mechanisms of tumour progression and may uncover novel therapeutic opportunities.
There are seven basic splicing patterns, including exon skipping (ES), alternate acceptor sites (AA), mutually exclusive exons (ME), alternate donor sites (AD), alternate terminator (AT), alternate promoter (AP) and retained intron (RI). Approximately 95% of genes in the human genome undergo AS. 7 Expression of aberrant splicing can promote cancer progression by activating cancer-related pathways. 8 Li et al revealed an aberrantly spliced transcript of FGFR3, termed FGFR3Δ7-9, which lacks exons encoding the immunoglobulin-like III domain; this isoform exerted potent oncogenic functions, enhancing migration and lung metastatic capacity in hepatocellular carcinoma via activation of the MAPK and PI3K/AKT pathways. 9 The high incidence of AS events in cancer increases the complexity of mechanisms of cancer initiation and progression, and focused evaluation of AS events may not comprehensively describe how AS can regulate cancer biology. Fortunately, with the development of next-generation sequencing technology, we can perform in-depth and comprehensive analysis of AS events in cancer. 10 Comprehensive evaluation of AS events has been performed in several cancer types, including lung, 11 ovarian 12 and bladder carcinomas, 13 as well as in gastrointestinal adenocarcinomas. 14 Integrated analysis of gene expression and AS may facilitate better understanding of mechanisms of cancer progression, [15][16][17] because gene expression and AS events could interact in complex mutual regulation. [18][19][20] In this study, we performed integrated analysis of gene expression and AS in GC. We identified genes with prognostic mRNA expression or/and AS events, and we performed GO and KEGG pathway analysis to identify mechanisms of gene expression-and AS-mediated GC progression. We further established a Cox regression model to compare the prognostic ability between AS events and gene expression. By constructing co-expression networks between gene expression and AS events, we identified the hub AS events associated with cancer progression and hub splicing factors or/and spliceosome genes that regulated expression of AS events.
By analysing the relationships between AS events and TNM staging, Lauren classification and cancer stromal score, we revealed associations between hub AS events and GC invasion. Finally, further in vivo validation was performed on a cohort of 53 GC tissue samples. In summary, our study identified and validated hub AS events involved in GC progression, which may be potential therapeutic targets for GC.

| Data extraction
We downloaded RNA-seq row count data for 374 GC

| Cox univariate and multivariate analysis
Cox univariate regression analysis was performed on genes and all seven types of AS events using the "survival" package in R.
Genes whose AS events or mRNA levels associated with prognosis (P < 0.01) were further used to conduct Cox multivariate analysis and construct Cox regression model. Finally, the risk score was calculated for every case.

| GO and KEGG pathway analysis
Genes with prognostic AS events or prognosis-related mRNA expression (P < 0.05) were utilized to performed GO and KEGG pathway analysis in the Metascape database (http://metas cape.org/gp/ index.html#/main/step1). The enriched GO terms or KEGG pathways with P < 0.01 were displayed.

| Co-expression network analysis
In order to explore the relationships between gene expression and AS events, Spearman correlation coefficients were calculated.
Genes with paired gene expression and AS events with correlation P < 0.01 and R > 0.3 were imported into Cytoscape software (version 3.6.0, Palo Alto, CA, USA). Hub genes or AS events were identified according to the degree of interaction. Splicing factor genes were downloaded from SpliceAid 2 database (http://193.206.120.249/ splic ing_tissue.html), and spliceosome genes were downloaded from KEGG database (https://www.kegg.jp/kegg/).

| Patient characteristics and GC tissue specimens
We acquired a total of 53 fresh GC specimens for this study, including T1 GCs (n = 12), T2 GCs (n = 16), T3 GCs (n = 12) and T4 GCs (n = 13). The GC specimens were obtained from GC patients who underwent gastrectomy from 2013 to 2014 in Ruijin Hospital of Shanghai Jiaotong University. Patients were diagnosed with GC by histopathology and did not receive any adjuvant treatment prior to surgery. These GC tissues were stored at −80°C. All patients provided signed informed consent. The stage of GC was determined in accordance with the 8th edition of the UICC/AJCC cancer staging manual. 21   WT/(WT + AS). 22 To evaluate the AT type of AS, we designed primers located at the terminated exon and the near exon preceding the terminated exon; the relative PSI value of AS events was calculated as the ratio of grey value of the terminated exon and the near exon ahead of the terminated exon. In order to quantify the total mRNA level of genes amplified by the primers, we also calculated the grey value of all brands (including WT and AS) by normalizing to GAPDH.

| RT-PCR and quantitative analysis for AS
The primer sequences for AS are illustrated in Table S1.

| Quantitative RT-PCR analysis for genes encoding splicing factors
Total RNA extraction and cDNA synthesis were performed as described above. The mRNA expression of genes encoding splicing factors were measured using AceQ ® Universal SYBR qPCR Master Mix (Vazyme) on an Applied Biosystems 7900HT sequence detection system (Applied Biosystems, Waltham, MA, USA). The procedures for quantitative RT-PCR were conducted as described above.
GAPDH was used as an internal control, and relative expression levels of mRNA were evaluated using the 2 −ΔΔCt method.

| Western blot analysis
Total protein was extracted from 13 selected GC tissues, and Western blot performed as previously described. 23 The

| Statistical analysis
Heatmaps were plotted using the "pheatmap" package in R. The stromal score of every GC sample was calculated by using the "estimate" F I G U R E 1 Prognosis-related alternative splicing (AS) events and gene mRNA expression levels in gastric cancer (GC). A, Percentage of AS events of specific types significantly related to poor or good prognosis. AA, alternate acceptor sites; AD, alternate donor sites; AP, alternate promoter; AT, alternate terminator; ES, exon skipping; ME, mutually exclusive exons; RI, retained intron. B, Percentage of genes whose mRNA levels significantly related to poor (Gene-P) or good prognosis (Gene-G). C, Venn diagram showing the overlapping genes with both prognosis-related mRNA levels and AS events. AS_poor, genes with AS events involved in poor prognosis; AS_good, genes with AS events involved in good prognosis; Gene_poor, genes whose mRNA levels associated with poor prognosis; Gene_good, genes whose mRNA levels associated with good prognosis. D, By constructing Cox regression model based on 60 AS events, all GC cases were divided into high-risk (n = 174) and low-risk (n = 174) groups, according to the median value of the risk scores ( splicing factors and stromal scores, an optimal cut-off value was calculated to divide all GC cases into high and low groups. In order to assess the prognostic significance of the Cox regression model, a median value was set to determine GC cases as high and low groups, according to the risk scores. P values of P < 0.05 were considered statistically significant.

| Prognostic AS and gene expression events in GC
We obtained a total of 31 911 AS events from the TCGASpliceSeq  Figure 1A; Table S2), but most gene expression-related prognostic events were significantly associated with poor prognosis (P < 0.05, Figure 1B; Table S3). Venn diagrams were plotted to visualize the interactions between genes with prognostic AS events and genes with prognosis-related mRNA expression ( Figure 1C). There were only a few genes with both prognostic AS events and prognosis-related mRNA levels in GC.

| Prognostic prediction of Cox regression model based on AS events is superior to the model using gene expression
In order to compare the prognostic predictive ability of AS events and gene expression, prognostic AS events or prognosis-related gene expression events were utilized to develop Cox regression model. Using prognostic AS events with P value <0.01, a Cox regression model based on 60 AS events were established (Table S4).
Then, we used top 100 genes whose mRNA levels associated with prognosis to construct another Cox regression model. In order to make the two Cox model comparable, the Cox regression model was also developed based on 60-gene expression data (Table S5).
Risk scores based on these 60 AS events were significantly re-  Figure 1J) was more powerful than the model based on gene expression (green line: Figure 1J).

| GO and KEGG pathways analysis of prognostic AS events and genes
In order to study the different functions of AS events and gene expression in GC progression, genes with prognostic AS events and mRNA expression were used to perform GO and KEGG pathways analysis (Figures S1 and S2; Figure 2A,B). Genes with poor prognostic AS events were most significantly enriched in the T-cell receptor signalling pathway, and genes with poor prognostic mRNA expression levels were most significantly enriched in the PI3K/AKT signalling pathway (P < 0.001). Interestingly, we noticed that genes whose AS events (Figure 2A) or mRNA levels ( Figure 2B) were associated with poor prognosis were simultaneously enriched in the PI3K/AKT signalling pathway, the phospholipase D signalling pathway and in pathways in cancer. However, AKT3 was the only overlapping gene among the three pathways ( Figure 2C). These results suggested that AS events and gene expression promoted GC progression via the same mechanisms, but that the genes involved are distinct.

| Screening and validation of hub genes with AS events significantly associated with tumour progression, stromal proportion and Lauren classification in GC
To screen for critical AS events involved in GC progression, we established co-expression networks between genes with prognostic mRNA levels and genes with prognostic AS events. We identified 5 hub genes with 8 favourable prognostic AS events ( Figure 3A) and 5 hub genes with 9 unfavourable prognostic AS events ( Figure 3B; Table S6). We found that the hub prognostic AS events mainly correlated with expression of genes whose RNA levels were positively related to poor prognosis in GC ( Figure 3B; Table S6). Between the two networks, there were 350 overlapping genes whose RNA levels were significantly related to prognosis ( Figure 3C). RNA levels of 297 of 350 overlapping genes were positively related to poor prognosis.
Then, GO and KEGG pathway analysis were performed with the 297 overlapping genes whose RNA levels were positively related to poor prognosis. The results showed that these genes were significantly enriched in the PI3K/AKT signalling pathway, phospholipase D signalling pathway, pathways in cancer, and in the GO term extracellular structure organization ( Figure 3D-F).
In order to explore whether these hub AS events are involved in GC progression, we further analysed the relationship between  Figure S4C,D). Favourable prognostic AS events were negatively correlated with stromal score ( Figure   S4C), whereas unfavourable prognostic AS events were positively correlated with stromal score ( Figure S4D).
In order to explore whether AS events were significantly associated with GC invasion, we initially compared expression of the AS events in T1 cancers (n = 6) and T4 cancers (n = 6). RT-PCR was performed to verify AS expression in GC tissue ( Figure 3G). Among the 17 hub AS events, we confirmed that 10 were differentially expressed in T4 GC tissue ( Figure 3H). AS events in CD47, SORBS1 and MAP3K7 were decreased in T4 cancers, and AS events in CAST, EVI5L, CLSTN1 and SEC16A were increased in T4 cancers. We further examined expression of these AS events in 41 GC tissues ( Figure   S5). AS events in CD47, SORBS2 and MAP3K7 were down-regulated in invasive, diffuse and lymph node metastatic GC tissue, and AS events in CAST, EVI5L, CLSTN1 and SEC16A were up-regulated in invasive, diffuse and lymph node metastatic GC tissue ( Figure 4A).
F I G U R E 2 KEGG pathway analysis of genes whose expression and alternative splicing (AS) events associated with poor prognosis in gastric cancer. A, KEGG pathways analysis of genes with AS events associated with poor prognosis. B, KEGG pathways analysis of genes whose mRNA expression levels associated with poor prognosis. C, Genes with both poor prognostic AS events and poor prognosis-related mRNA levels was significantly enriched in the Phospholipase D signalling pathway, the PI3K/AKT signalling pathway and pathways in cancer, but venn diagrams showed only one overlapping gene (AKT3)

F I G U R E 3
Identification and validation of hub survival-associated alternative splicing (AS) events in gastric cancer (GC). A, Co-expression network analysis between 363 genes with prognosis-related mRNA levels and 5 hub genes with favourable prognostic AS events (R > 0.3, P < 0.001). B, Co-expression network analysis between 375 genes with prognosis-related mRNA levels and 5 hub genes with unfavourable prognostic AS events (R > 0.3, P < 0.001). The diamonds represent AS events. Purple nodes indicate genes with poor prognosis, and green nodes indicate genes with favourable prognosis. Red lines represent positive correlations, and green lines represent negative correlations. C, Between the above two networks (A and B), there were 350 overlapping genes whose RNA levels were significantly related to prognosis. AS_good: the genes whose mRNA levels were negatively related to the 8 favourable prognostic AS events of 5 hub genes (A); AS_poor: the genes whose RNA level were positively related to the 9 unfavourable prognostic AS events of 5 hub genes (B). D, KEGG pathway analysis of 297 of 350 overlapping genes whose mRNA levels were associated with poor prognosis. GO analysis of 297 poor prognostic genes of the overlapping 350 genes (E, biological process; F, cell component). G, Expression of 17 hub AS events of 10 genes in T1 (n = 6) and T4 (n = 6) GC tissues. H, Comparison of the 17 hub AS events expression between T1 and T4 GC tissues. *P < 0.05, **P < 0.01, ***P < 0.001 High expression of AS events in CD47 and SORBS2 was significantly associated with longer DFS, and high expression of AS events in CLSTN1 and SEC16A was significantly associated with shorter DFS ( Figure 4B). High expression of AS events in CAST, EVI5L, CLSTN1 and SEC16A was positively associated with shorter overall survival time ( Figure 4B).
Simultaneously, we also calculated the total mRNA levels of genes by normalizing to GAPDH, including brands of WT and AS.
Then, we also analysed the relationships between their RNA levels and invasion, lymph node metastasis and Lauren classification.
Unlike AS event, we only observed decreased SEC16A in T2 GC tissues and decreased CLSTN1 in lymph node metastatic GC tissues ( Figure S6A). Other genes did not show positive or negative correlations with invasion, lymph node metastasis and Lauren classification ( Figure S6A). Besides, as for the above 7 genes with prognostic AS events, we only found CAST mRNA level was significantly related poor prognosis, and SEC16A mRNA level was significantly associated with good prognosis ( Figure S6B) in TCGA database. In our validation cohort (n = 41), MAP3K7 mRNA level was significantly shorter DFS, and high expression of CLSTN1 and CAST was positively related to longer DFS ( Figure S6C). SORBS2 mRNA level was significantly poor OS, and high expression of CD47 was positively related to better OS ( Figure S6D). Our results indicated that the genes whose RNA levels were associated with clinical parameters and prognosis were different from the genes whose AS events were involved in clini- . C, According to the expression of AS events in GC tissues, 5 GC tissues with high expression of AS events associated with good prognosis were named "Group A", and 8 GC tissues with high expression of AS events associated poor prognosis were named "Group B". Western blot was used to examine expression of p-AKT, p-ERK and p-STAT3 in fresh GC tissues. The grey values of p-AKT, p-ERK and p-STAT3 were calculated with respect to the internal control GAPDH and were compared between Group A and Group B. *P < 0.05, **P < 0.01, ***P < 0.001 Western blot to examine expression of p-AKT, p-ERK and p-STAT3 in these tissues. We found that phosphorylated AKT was up-regulated in Group B, but p-ERK and p-STAT3 were not ( Figure 4C). Our results suggested that some hub AS events played critical roles in GC progression, especially AS events (ES) in CLSTN1 and SEC16A, and that these AS events might promote GC progression via activation of the PI3K/AKT pathway.

| Identification of prognostic splicing factors and spliceosome genes correlated with AS events in GC
Splicing factors and spliceosome genes participate in regulation of AS events. 24 were significantly associated with favourable prognosis, and NOVA1 and QKI were significantly associated with poor prognosis (Table S7). The two poor prognostic splicing factor genes, NOVA1 and QKI, showed the strongest correlations with AS events ( Figure 5C). In addition, other splicing factor or/and spliceosome genes showed fewer and weaker correlations with AS events, including the splicing factor genes DAZAP1, HNRNPL, the splicing factor or/and spliceosome genes HNRNPM and HNRNPK, and the spliceosome gene CHERP.
We observed increased mRNA levels of NOVA1 and QKI in invasive, lymph node metastatic, and diffuse GC tissues ( Figure 5F).
Kaplan-Meier survival analysis also indicated that high mRNA levels of NOVA1 and QKI were positively related to poor prognosis ( Figure 5G). We performed correlation analysis among the 10 AS events and splicing factors, and selected paired gene AS events and gene expressions with Spearman coefficient >0.3 and P value <0.05 to construct a co-expression network. We found that NOVA1 and QKI showed positive correlations with AS events involved in poor prognosis and negative correlations with AS events associated with good prognosis ( Figure 5H). Our results indicated that splicing factor genes, but not spliceosome genes, widely participated in regulating AS events in GC.

| D ISCUSS I ON
Alternative mRNA splicing is a major source of protein diversity, and abnormal AS events are closely associated with tumour initiation and tumour progression. 26 We therefore undertook simultaneous evaluation of AS events and mRNA expression to further uncover hub AS events involved in GC progression.
In this study, we evaluated 348 GC cases with gene expression, AS events and clinical data. By integrating gene expression and AS events, we found that most of the prognostic AS events were associated with good outcomes, and most of the prognostic mRNA expression was associated with poor outcome. There were few genes whose RNA levels and AS events were both significantly Gene expression can regulate AS events, and AS events can also regulate gene expression. [18][19][20] In order to identify critical AS events involved in GC progression, we performed co-expression network analysis among the prognostic AS events and prognosis-related genes with correlation coefficient >0.3 and P value <0.001. Ultimately, 17 hub AS events were identified, and we confirmed the expression of these events in GC tissue. Consistent with the above KEGG pathway analysis results, the 17 hub AS events were involved in activation of PI3K/AKT signalling pathway and pathways in cancer. Furthermore, we found that GC with high expression of AS events associated with poor prognosis and low expression of AS events associated with good prognosis exhibited activation of the PI3K/AKT pathway. In addition, these 17 hub genes were also significantly related to extracellular structure organization and ECM-receptor interaction, which are also significantly related to activation of the PI3K/AKT pathway, the integrin-mediated signalling pathway and the MAPK pathway. [36][37][38] In GC, extracellular structure organization and ECM-receptor interactions promote cancer invasion. 39,40 In this study, we found that the 17 hub AS events were significantly related to tumour invasion in GC in the TCGA database. Of import, our data indicated a tumour suppressive role for AS events in CD47 (ES) and KIF1B (AT). KIF1B functions as a tumour suppressor in neuroblastoma, but promotes GC invasion. 41 were significantly related to extracellular matrix and tumour stromal score. Intratumour stromal proportion is closely associated with aggressive phenotypes in gastric signet ring cell carcinomas. 51 We found that the stromal score of GC tumours was also significantly associated with poor prognosis. In GC, gene variants in genes such as CD44, E-cadherin and RHOA are significantly related to invasive and diffuse cancer invasion. [52][53][54] Finally, we confirmed that 10 of the 17 hub AS events of within 10 genes were significantly related to invasion, lymph node metastasis and Lauren classification in GC, especially AS events in CLSTN1 and SEC16A. Therefore, in this study, we not only presented novel potential markers to distinguish and diagnose early GCs, but also revealed some possible therapeutic targets for GC. Though these two genes have not been thoroughly studied, some previous reports indicate that they serve crucial roles in cancer progression. 50 , an essential RNA-binding protein with roles in RNA splicing, F I G U R E 5 Co-expression network analysis of survival-associated splicing factors and spliceosome genes and alternative splicing (AS) events in gastric cancer (GC). A, Venn diagram showed the overlapping 21 gene playing dual roles of splicing factor and spliceosome. B and C, Co-expression network analysis of prognostic splicing factor/spliceosome genes and all genes with prognostic AS events (B, R > 0.3, P < 0.001) Co-expression network analysis of prognostic splicing factor/spliceosome genes and 10 hub genes with 17 prognostic AS events (C, R > 0.3, P < 0.001). Red circles: genes with poor prognostic AS events; Green circles: genes with favourable prognostic AS events; Red diamonds: splicing factor genes associated with poor prognosis; Green diamonds: splicing factor genes associated with good prognosis; Green triangles: spliceosome genes associated with good prognosis; Green rectangles: splicing factor/spliceosome genes associated with good prognosis; Red lines: positive correlations; green lines: negative correlations. The thicker the line, the stronger the correlations. D, NOVA1 and QKI, were significantly associated with poor prognosis, and DAZAP1 and HNRNPL were significantly related to good prognosis by K-M analysis. E, QKI was negatively related to CD47 AS event (ES, R = −0.75, P < 0.001); NOVA1 was positively correlated to CAST AS event (ES, R = 0.66, P < 0.001); DAZAP1 was negatively associated with CAST AS event (ES, R = −0.36, P < 0.001); HNRNPL was negatively correlated to SORBS1 AS event (ES, R = −0.40, P < 0.001). F, Expression of DAZAP1, HNRNPL, NOVA1 and QKI mRNA levels in 53 GC tissues with different T stages, N stages and Lauren classification. G, Forest plots showing the relationships between DAZAP1, HNRNPL, NOVA1 and QKI mRNA levels and patient prognosis (DFS and OS). H, Co-expression network of prognostic genes and AS events (R > 0.3, P < 0.05) were constructed in 53 GC tissues. The diamonds represent splicing factors. Red circles indicate genes associated with poor prognosis; green circles indicate genes associated with favourable prognosis. Red lines represent positive correlations; green lines represent negative correlations. The thicker the line, the stronger the correlation. *P < 0.05, **P < 0.01, ***P < 0.001 promotes GC metastasis by mediating AS of CAV1. 60  factor genes showed stronger correlations with AS events than spliceosome genes. Our results indicated that splicing factor genes likely played critical roles in gene AS events in GC, especially QKI and NOVA1, but spliceosome genes do not. However, the functions of these splicing factors in GC need to be further elucidated.
In conclusion, by integrated analysis of AS events and gene expression, we identified hub AS events in CLSTN1 and SEC16A, and splicing factors QKI and NOVA1, which were significantly associated with GC progression. We validated expression of these AS events in GC tissues. AS events may accelerate GC progression via the PI3K/ AKT pathway, and these AS events may be potential therapeutic targets in GC. Studies focused on gene AS event and gene expression help to revealed more key genes in GC progression.

CO N FLI C T O F I NTE R E S T
The authors declare that there is no conflict of interests. Ren Zhao: Funding acquisition (equal); project administration (equal); supervision (equal); writing -review and editing (equal).

DATA AVA I L A B I L I T Y S TAT E M E N T
The data used to support the findings of this study are included within the article.