Prediction of a competing endogenous RNA co‐expression network as a prognostic marker in glioblastoma

Abstract Due to its high proliferation capacity and rapid intracranial spread, glioblastoma (GBM) has become one of the least curable malignant cancers. Recently, the competing endogenous RNAs (ceRNAs) hypothesis has become a focus in the researches of molecular biological mechanisms of cancer occurrence and progression. However, there is a lack of correlation studies on GBM, as well as a lack of comprehensive analyses of GBM molecular mechanisms based on high‐throughput sequencing and large‐scale sample sizes. We obtained RNA‐seq data from The Cancer Genome Atlas (TCGA) and Genotype‐Tissue Expression (GTEx) databases. Further, differentially expressed mRNAs were identified from normal brain tissue and GBM tissue. The similarities between the mRNA modules with clinical traits were subjected to weighted correlation network analysis (WGCNA). With the mRNAs from clinical‐related modules, a survival model was constructed by univariate and multivariate Cox proportional hazard regression analyses. Thereafter, we carried out Gene Ontology (GO) and the Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses. Finally, we predicted interactions between lncRNAs, miRNAs and mRNAs by TargetScan, miRDB, miRTarBase and starBase. We identified 2 lncRNAs (NORAD, XIST), 5 miRNAs (hsa‐miR‐3613, hsa‐miR‐371, hsa‐miR‐373, hsa‐miR‐32, hsa‐miR‐92) and 2 mRNAs (LYZ, PIK3AP1) for the construction of a ceRNA network, which might act as a prognostic biomarker of GBM. Combined with previous studies and our enrichment analysis results, we hypothesized that this ceRNA network affects immune activities and tumour microenvironment variations. Our research provides novel aspects to study GBM development and treatment.


| INTRODUC TI ON
Glioblastoma (GBM) ranks the most prevalent malignant cancer of the central nervous system and has a mortality rate of approximately 3.19 per 100 000. 1 Intensive therapy is considered to be the standard treatment of GBM. 2 However, to date, although the standard treatment has been applied, its high proliferative capacity and fast intracranial dissemination make it the least curable cancer, with a median overall survival of approximately 15 months. 3 In recent years, vast quantities of valuable information have accumulated by expanding big data analyses in molecular biology and developing targeted therapy techniques, which has laid a solid foundation for cancer research. 4 However, the detailed molecular mechanisms of GBM still remain unclear, which brings difficulties to its diagnosis and treatment. Thus, finding novel molecular mechanisms and biomarkers of GBM to enable the early diagnosis and treatment precision has become a hotspot in GBM research.
Currently, studies on competing endogenous RNA (ceRNA) co-expression networks are providing new perspectives on cancer pathogenesis at the molecular level. Acting as the key link in ceRNA co-expression networks, lncRNAs regulate gene expression through competitive binding with specific miRNAs, sequestering RNAbinding proteins and influencing nuclear transcription. 5,6 Through their roles in ceRNA co-expression networks, lncRNAs significantly affect the biological processes of brain cancer. For example, Zhang found GBM cells remodel the tumour microenvironment to promote tumour chemotherapy-resistance by secreting oncogenic lncSBF2-AS1-enriched exosomes. 7 LncRNA AC016405.3 was found to suppress proliferation and metastasis through modulating TET2 by sponging of miR-19a-5p in GBM cells. 8 The sponging of miR-885-3p by lncRNA HOXB-AS1 could further affect the expression of HOXB2, and this process regulates the proliferation and migration of GBM. 9 Other evidence showed that LINC01579 could competitively bind with miR-139-5p to regulate EIF4G2 and thus lead to cell proliferation and apoptosis in GBM. 10 Therefore, these studies have shown that the lncRNA-miRNA-mRNA ceRNA co-expression network is implicated in the development of GBM. However, publications on GBM are limited, and a comprehensive analysis of GBM molecular mechanisms based on high-throughput sequencing and on a large-scale sample size is lack.
In the present study, we extracted RNA-seq data from Genotype-Tissue Expression (GTEx) and The Cancer Genome Atlas (TCGA).
Furthermore, differentially expressed mRNAs were identified and applied to weighted correlation network analysis (WGCNA) for screening modules related to clinical traits. Among these modules, we set up a survival model by Cox proportional hazard regression analysis to predict GBM outcomes. Finally, combining information from multiple databases, we predicted key lncRNAs and miRNAs to weave a ceRNA network for explaining the molecular mechanism of GBM.

| Data processing and differential expression analysis
The RNA sequence data of 5 normal brain tissues and 168 GBM tissues were obtained from TCGA database (https://portal.gdc.cancer.gov/). Clinical data such as patient age, gender, overall survival time and overall survival state were also downloaded from TCGA.
Expression data of 105 normal brain tissues were obtained from the GTEx (https://gtexp ortal.org/home/datasets) database. Complete descriptions of the donor's age, gender, biospecimen procurement methods and sample fixation were presented in the GTEx official annotation. With the assistance of the R package (limma), differently expressed mRNAs were retrieved according to P < .05 and absolute log2-fold change (FC) > 2. 11 For excluding non-coding mRNA interference, Ensembl ID was used to identify and obtain protein-coding mRNA information for further study. To visualize the differentially expressed mRNAs, volcano plots were generated using the ggplot2 package for the R platform.

| GO and KEGG pathway enrichment analyses of differentially expressed mRNAs
We performed the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses and Gene Ontology (GO) with the R package (clusterProfiler). [12][13][14] The biological processes,

| Weighted correlation network analysis
We integrated the data from the differentially expressed mRNAs to perform weighted correlation network analysis (WGCNA). The R package 'WGCNA' was adopted to detect traits-related modules. 15 Herein, we set the soft-thresholding power as 6 and scale-free R2 as >0.85 to figure out key modules. The modules were then applied to analyse their relationship with GBM clinical traits using Pearson's correlation test, and adjusted P < .05 was considered significant. GO enrichment analysis was applied in order to explain the biological role of the modules.
TargetScan operates by searching for conserved sites and ranks the results by predicted targeting efficacy. 16,17 By analysing thousands of miRNA-target interactions (MTIs) from high-throughput sequencing experiments, miRDB can predict biologically relevant interactions between miRNAs and genes. 18 The miRTarBase contains more than three hundred and sixty thousand MTIs that have been validated experimentally by reporter assays, Western blots, microarrays and next-generation sequencing experiments. 19 We merged these predicted miRNAs to improve the reliability of the results. Millions of F I G U R E 3 KEGG pathway enrichment analyses of differentially expressed mRNAs RNA-binding sites from 108 CLIP-seq data sets have been analysed and deposited in starBase (https://starb ase.sysu.edu.cn/), which provides tools to predict relevant lncRNA-miRNA interactions. 20 Ultimately, a ceRNA co-expression network containing lncRNAs, miRNAs and mRNAs was constructed and visualized by Cytoscape software.

| Identification of differentially expressed mRNAs in GBM
The general condition and clinical characteristics of the GBM patients are presented in Table 1. The expression of mRNAs level in 110 normal brain tissue samples and 168 GBM cases was explored for further study. When P < .05 and |log2 FC|>2 were used as cutoff criteria, 1347 differentially expressed mRNAs were standardized and identified via the limma R package, which included 516 up-regulated and 831 down-regulated mRNAs. A volcano plot was applied to illustrate the down-regulated and up-regulated mRNAs ( Figure 1).

| GO, KEGG pathway enrichment analyses
To explore the biological characteristics of the differentially expressed mRNAs, GO enrichment analyses were performed with a cut-off criterion of P < .05. As shown in Figure

| WGCNA analysis applied to differentially expressed mRNAs
In the present work, mRNA modules among the 1347 differentially expressed mRNAs were analysed using the WGCNA R package. As shown in Figure 4A, softpower 6 and scale-free R2 as > 0.85 were chosen as the thresholds to identify co-expressed mRNA modules. Five mRNA colour modules were identified and the heat maps of topological overlap matrix (TOM) are presented in Figure 4B and Figure 4C. Then, mRNAs in the 5 different coloured modules were continuously used to analyse their correlation with GBM clinical traits using Pearson's correlation test and P < .05 was considered significant. The green module and turquoise module, which included 83 mRNAs, displayed strong relationships with the overall survival state of the GBM cases ( Figure 4D). These 83 mRNAs were further subjected to GO enrichment analyses for explaining their biological roles. As shown in Figure 4E, the enrichment analysis revealed that the mRNAs from the modules were most related to 'phagocytosis', 'neutrophil mediated immunity' and 'immune response-regulating cell surface receptor signalling pathway'.

| Identification of prognosis-related genes
Next, we randomly selected 80% of all of the GBM samples  Figure 5B. Moreover, the expression heat map of the 13 mRNAs in the high-risk or low-risk groups is shown in Figure 5C. We further estimated the accuracy of the 13-mRNA signature for predicting survival.
Kaplan-Meier survival curves depicted that, compared with lowrisk patients, those predicted high risk had significantly shorter OS (P = 1.94e − 07, Figure 5D). Receiver operating characteristic (ROC) analysis to compare the sensitivity and specificity of the survival prediction of our models was subsequently carried out.
TCGA data set indicated that AUC of the 13-mRNAs signature was 0.751, showing high sensitivity and specificity for prognostication ( Figure 5E).

| Construction of the ceRNA coexpression network
Combining the above research results, we predicted targeting miR- intersections in the three databases were discarded ( Figure 6A).
Targeted lncRNAs of these predicted miRNAs were screened by starBase and overlapped for seeking co-regulatory pathways ( Figure 6B). We merged these predicted results, found LYZ was related to hsa-miR-3613, hsa-miR-371, hsa-miR-373 and hsa-miR-32 and found that hsa-miR-92 regulated PIK3AP1. Moreover, lncRNAs XIST and NORAD were targeted to all of the 5 predicted miRNAs.
Ultimately, a ceRNA co-expression network containing lncRNAs, miRNAs and mRNAs was constructed and visualized by Cytoscape software ( Figure 6C). promoting invasion. 29 The crosstalk between synaptic vesicles and F I G U R E 6 Construction of the ceRNA co-expression network. A, The relationship between the mRNAs and their corresponding miRNAs is shown. B, Overlapping lncRNAs were analysed by the predicted lncRNAs of 5 miRNAs. C, The lncRNA-miRNA-mRNA ceRNA co-expression network was constructed via 2 lncRNAs, 5 miRNAs and 2 mRNAs for GBM prognosis the tumour microenvironment has a significant effect on the invasion and proliferation of high-grade gliomas. For example, studies have reported that synaptic molecule neuroligin-3 (NLGN3) could promote glioma proliferation by the PI3K-mTOR pathway. 30 Under the bridge of the ion channel, the interaction of GBM with reactive astrocytes was found to be closely related to the decrease in sensitivity to TMZ and the enhancement of cancer progression and aggression. 31,32 The phenomenon of neutrophil degranulation has been reported in GBM and several other human cancers. [33][34][35] Furthermore, neutrophil degranulation mediated T cell functional inhibition was found to promote the growth of GBM and that immunosuppression could be blocked through arginine supplementation. 33 In the ceRNA co-expression network, LYZ was found to be differently expressed in several cancers. 36

| CON CLUS IONS
In conclusion, in our ceRNA co-expression network, the interaction of lncRNAs and miRNAs leads to the differential expression of LYZ and PIK3AP1, which then leads to a worse prognosis of GBM. We suspect the poor prognosis is mainly related to immune activities and tumour microenvironment variations. Our findings will shed light to understanding the underlying molecular mechanism of GBM and will provide new biomarkers for clinical diagnosis and treatment, and our results can be used to guide future in-depth studies of GBM. However, its practical application value, such as sensitivity, specificity and price, has yet to be verified by laboratory studies and large-scale clinical studies.

ACK N OWLED G EM ENTS
This study was supported by grants from the Hunan Education Department general project (18C1024).

CO N FLI C T O F I NTE R E S T
The authors declare no competing financial interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
All the data were available upon request.