To construct a ceRNA regulatory network as prognostic biomarkers for bladder cancer

Abstract Emerging evidence demonstrates that competing endogenous RNA (ceRNA) hypothesis has played a role in molecular biological mechanisms of cancer occurrence and development. But the effect of ceRNA network in bladder cancer (BC), especially lncRNA‐miRNA‐mRNA regulatory network of BC, was not completely expounded. By means of The Cancer Genome Atlas (TCGA) database, we compared the expression of RNA sequencing (RNA‐Seq) data between 19 normal bladder tissue and 414 primary bladder tumours. Then, weighted gene co‐expression network analysis (WGCNA) was conducted to analyse the correlation between two sets of genes with traits. Interactions between miRNAs, lncRNAs and target mRNAs were predicted by MiRcode, miRDB, starBase, miRTarBase and TargetScan. Next, by univariate Cox regression and LASSO regression analysis, the 86 mRNAs obtained by prediction were used to construct a prognostic model which contained 4 mRNAs (ACTC1 + FAM129A + OSBPL10 + EPHA2). Then, by the 4 mRNAs in the prognostic model, a ceRNA regulatory network with 48 lncRNAs, 14 miRNAs and 4 mRNAs was constructed. To sum up, the ceRNA network can further explore gene regulation and predict the prognosis of BC patients.

increasing evidence that lncRNA, miRNA and mRNA are involved in the biological processes of cancer. Among the ceRNA, lncRNAs and mRNAs might harbour the same MREs. Therefore, when miRNA binds to MRE on lncRNA, mRNA expression is not inhibited which may contribute to disease pathogenesis. 8 Currently, the ceRNA network has been shown to play an important role in the occurrence and progression of different types of cancers.
But the roles of ceRNA network in BC, especially the analysis of ceRNA regulatory network based on large sample size and next-generation sequencing, were not completely expounded.
So, this research aims to construct a ceRNA regulatory network in BC.

| Identification of differentially expressed mRNAs and lncRNAs
We transformed the ensemble IDs of BC samples into gene symbols based on GENCODE (https://www.genco degen es.org/human /). 9 We identified differentially expressed mRNAs (DEmRNAs) and differentially expressed lncRNAs (DElncRNAs) between bladder urothelial carcinoma and normal bladder tissue by means of EdgeR R package. 10 And when they met these criteria (Adjust P < .05, and |log2 FC| ≥ 1), we considered mRNAs and lncRNAs as DEmRNAs and DElncRNAs.

| Analysis of gene function and pathway enrichment
We performed Gene Ontology (GO) 11 and Kyoto Encyclopedia of genes and Genomes (KEGG) 12 and Gene Set Enrichment Analysis (GSEA) 13 for DEmRNAs by means of clusterProfiler R package. 14 We take advantage of GO to describe gene functions, including three aspects: biological process (BP), molecular function (MF) and cellular component (CC). KEGG-GSEA was used to get possible pathways, and the cut-off criteria for gene sets were set at P < .05.
F I G U R E 2 Identification of DEmRNAs from TCGA data. A, Volcano map of DEmRNAs displaying 841 up-regulated and 1408 downregulated mRNAs. B, GO analysis of DEmRNAs showed they may act in muscle system process, muscle contraction and muscle organ development. C, The linkages of DEmRNAs and biological processes. D, KEGG-GSEA pathway enrichment of DEmRNAs

| Construction of a co-expression network
WGCNA R package 15 was conducted to construct a co-expression network. First, unqualified genes were removed by goodSamplesgenes. Then, construct a sample network based on squared Euclidean distance, and samples with Z.k < −2.5 which were considered as outlying samples were excluded from subsequent studies.
Then, the variances of mRNAs and lncRNAs across the left samples were calculated, and we chose the top 60% mRNAs and lncRNAs with highest variances for WGCNA. Next based on the scale-free topology criterion, we chose an appropriate soft-thresholding power (β) by pickSoftThreshold to set up a weighted adjacency matrix (WAM). Given the function of the topological overlap matrix (TOM), 16 we transformed adjacency into TOM. After calculating module eigengenes, densely correlated and highly co-expressed genes with BC patients are grouped into the same module.
F I G U R E 3 Analysis of key modules highly related to traits. A, Cluster dendrogram of genes with top 60% variances in the co-expression network. B, The correlation between modules and traits was displayed, and the ME of the yellow and lightcyan modules are most positively related to traits. C-D, The correlation between GS and MM in the lightcyan and yellow modules. E, TOM plot of selected genes. Modules are sorted by cluster dendrogram. Red colour represents high topological overlap and light colour represents lower overlap. This TOM plot showed that there is no high topological overlap among 25 modules, indicating that the modules have a high degree of scale independence. F, GO-GSEA showed the linkages of genes and BP. G, KEGG showed the pathway enrichment of genes in yellow and lightcyan modules

| Construction of BC prognostic signatures
We collected the clinical data of 186 BC patients with both OS and DFS. At the same time, for verification, we randomly divided patients into two groups based on the ratio of 5:4 (the training set = 103 and internal test set = 83). We conducted the chi-square test to analyse the differences in clinical features including age, gender, tumour subtype, differentiation grade, T stage, TNM stage and survival status between training set and internal testing set(the N stage and M stage are discarded because of patients with DFS data only being N0 or M0 F I G U R E 4 Identification of lncRNAs modules highly related to traits. A, Cluster dendrogram of lncRNAs with top 60% variances in the co-expression network. B, The correlation between modules and traits were displayed. C, The correlation between GS and MM in the pink module. D, TOM plot of selected lncRNAs status). By means of the 'survival' package in R, univariate Cox regression method was conducted to explore the correlation between the mRNAs obtained by prediction and overall survival (OS) of BC patients.
By means of LASSO regression analysis, we set up a survival model.
And we used the following formula to calculate the prognostic risk score of patients: Risk score = β mRNA1 * exp mRNA1 + β mRNA2 * exp mRNA2 + … + β mRNAn * exp mRNAn (the lower the risk score, the lower the risk of recurrence, where 'β' is the regression coefficient of mRNAs obtained by LASSO regression analysis, and 'exp' is the expression of corresponding mRNAs). Based on the survivalROC R package, a receiver operating characteristic (ROC) curve was drawn with risk score against survival status. At the same time, the cut-off point of risk score was picked by survminer R package which divided patients into high-and low-risk groups. Then, chi-square test was employed to investigate the correlations between the risk score of the 4-gene-based classifier with OS and clinicopathological characteristics.

| Analysis of DEmRNAs of TCGA data
We identified the DEmRNAs in the TCGA database of BC which contained 841 up-regulated and 1408 down-regulated mRNAs.
The distribution of the DEmRNAs is shown in Figure 2A. To explore the function of DEmRNAs and their molecular mechanisms in cell cycle progression, we performed GO enrichment analysis on DEmRNAs. GO analysis results showed that DEmRNAs were enriched in muscle system process, muscle contraction and muscle organ development in BP ( Figure 2B). Figure 2C depicts the linkages of DEmRNAs and biological terms as a network. In addition, KEGG-GSEA was used to visualize the distribution of the gene set and the enrichment score. Figure 2D shows DEmRNAs might act through adrenergic signalling in cardiomyocytes, calcium signalling pathway and hypertrophic cardiomyopathy in the development of BC.  Figure 3A). Among the 25 modules, the module eigengenes (ME) of the yellow and lightcyan modules are most positively related to traits (BC and normal) ( Figure 3B). So, the yellow and lightcyan modules were considered as the key module because of high correlation with trait. Figure 3C-D shows the correlation between gene significance (GS) and module membership (MM) in the lightcyan and yellow modules. Besides, the heatmap plot of selected genes is shown in Figure 3E.

| Identification of key modules by WGCNA
As shown in Figure 3F, we performed GO-GSEA on the genes in yellow and lightcyan modules to explore their functions in BP. The genes in yellow and lightcyan modules were most related to extracellular matrix and structure organization, and muscle contraction.
In addition, genes were highly enriched in focal adhesion, PI3K-Akt and MAPK signalling pathway, and proteoglycans in cancer by KEGG analysis ( Figure 3G), indicating that they might be involved in the process of BC.

| Construction of a lncRNA coexpression network
Then, the construction of a lncRNA co-expression network was aimed to identify the key modules. The top 60% variance lncR-NAs (8604 lncRNAs) were selected for WGCNA. Soft power 4 was selected as the soft thresholding to set up a weighted adjacency matrix (scale-free R 2 = .89). And a total of 33 modules were generated ( Figure 4A). Furthermore, Figure 4B-C shows that pink module which contained 357 lncRNAs was regarded as key module based on correlation analysis. The network heatmap is shown in Figure 4D.

| Overlapped mRNAs of predicted mRNAs, WGCNA mRNAs and DEmRNAs
By construction of lncRNA co-expression network, we obtained 357 lncRNAs in the pink module. Then, by miRcode online tool, the 357 lncRNAs were used to predict miRNAs which lncRNAs can bind to them through MREs. At the same time, we obtained the top 400 miRNAs with the highest expression through TCGA miRNA-Seq. After that, we selected the overlapped miRNAs (55) among the predicted miRNAs (208) and the highest expressed miRNAs (400). And by the means of miRDB, TargetScan, miRTarBase and the starBase dataset, we took advantage of these 55 overlapped miRNAs for the second prediction and obtained 2266 mRNAs.  Figure 5C shows the expression heatmap of the 86 genes in 433 samples. The GO analysis was further used to display biological function of 86 mRNAs. Figure 5D shows that the mRNAs were mostly related to ameboidal-type cell migration, muscle organ  Figure 5E).

| Construction of the 4-gene-based prognostic model
A total of 186 patients from TCGA were randomly divided into training set (n = 103) and internal testing set (n = 83) separately.
Demographic and clinical data for the training and internal testing set are summarized in Table 1. As shown in Table 1, there is no significant difference in the clinicopathological characteristics. Then, we conducted univariate regression analysis for the 86 genes to explore the genes relevant to the OS of BC patients from the training set; then, a 7-gene-based model was constructed with the cut-off for relevance set at P < .05. All the seven genes were then performed with the LASSO regression, and a prognostic model for 1,3,5 years of OS with 4 genes was constructed: ACTC1 + FAM129A + OSBPL1 0 + EPHA2 (Table 2). Figure 6A shows LASSO coefficient of 4-gene.
After the risk score was calculated for each BC patient, according to the risk scores, we classified the patients into the high-risk group (n = 23) and low-risk group (n = 80) with the cut-off for risk score set at '0.3073'. ( Figure 6B). Besides, Figure 6C shows the heatmap of the 4 genes expression between the high-risk and low-risk group.
Kaplan-Meier survival curves of OS and DFS in the whole sets (n = 186) showed that compared to the low-risk group, the high-risk group was tended to lower OS survival rate (P < .001), and DFS survival rate (P = .04) ( Figure 6D,E).

| Validation of survival prediction in the integrated test sets
To validate the ability of this prognostic model to predict survival of BC patients, we draw Kaplan-Meier survival curves of OS and DFS in the training and integrated test sets. In the training set, compared to the low-risk group, the high-risk group was tended to lower OS survival rate (P < .001) and DFS survival rate (P = .045) ( Figure 7A,B).  Figure 7C). In addition, we did the same analyses in the internal testing set ( Figure 7D,E,F). By the validation, we showed the high accuracy prediction ability of this prognostic model.

| Correlation between 4 gene classifier and clinicopathologic characteristics in the training set
Based on TCGA data, we obtained the clinical information of 186 BC patients. Chi-square test was conducted on 6 clinical factors between the high-risk and low-risk group. As shown in Table 3, the 2 clinical characteristics (T stage and TNM stage) showed significant differences between the two groups. The patients in the high-risk group tended to have higher staging.  Figure 8D). Next, 48 overlapped lncRNAs were obtained between the 788 differentially expressed lncRNAs and the 649 lncRNAs predicted from 14 miRNAs. Finally, Figure 8E shows that we constructed a lncRNA-miRNA-mRNA ceRNA network including 4 mRNAs, 14 miRNAs and 48 lncRNAs. combing plasma BCYRN1 with alpha-fetoprotein, the diagnosis of HCC was remarkably improved. 21 Therefore, the study of RNA interactions at the molecular level will lead to a better understanding of the gene regulatory network in cancer.

| D ISCUSS I ON
In the present study, EdgeR was conducted to identify the Molecularly, mRNA expression is regulated by miRNA and lncRNA.
Based on this, miRNAs and lncRNAs might regulate mRNAs and thus affect the biological process and staging of BC.
EphA2, a member of the family of Ephrin receptor tyrosine kinases, is a functional signalling receptor for progranulin. Interaction of progranulin with EphA2 may act in cancer progression and tumour angiogenesis. 23 Angiogenesis plays an important role in BC. A study found the immunosuppressive drug leflunomide inhibited angiogenesis in bladder carcinogenesis model via significant inhibition of the sEphrin-A1/EphA2 system and might be a potential biomarker for treating BC beyond immunosuppressive therapy. 24 In addition, EphA2 had been demonstrated to be essential in ceRNA network. In gastric cancer, miR-302b exhibited anti-tumour activity by reversing EphA2 regulation. And this modulation of EphA2 also had distinct effects on cell proliferation and migration in GC. 25 Besides, ephA2/ ephirnA1 signalling regulates miR-302b expression and attenuates malignant pleural mesothelioma cell growth. 26 Further study had shown that MIAT/miR-520d-3p/ EphA2 might be a new target for HCC therapy. Other studies showed the interaction between miR-520d and EphA2 could enhance tumour suppression in ovarian cancer and gastric cancer. 27,28 MiR-26a and miR-26b play important roles in BC as tumour suppressors. Overexpression of miR-26a/miR-26b inhibited the proliferation and migration of BC cells. 29,30 Importantly, a study found that blood miR-26b detected the presence of invasive bladder tumours with 94% specificity and 65% sensitivity. 31 And with increasing tumour-nodes-metastasis staging in bladder cancer, miR-26b showed a moderate decreasing trend. 32 As for interaction between EphA2 and miR-26b in cancers, EphA2 was verified as the target of miR-26b. MiR-26b could enhance radiosensitivity of hepatocellular carcinoma cells by targeting EphA2. 33 Inevitably, our research had some innate limitations which need to be addressed. Firstly, although this bioinformatics study was designed well, and a strict threshold for WGCNA analysis was set up, the major drawback of our study was the lack of in vivo and vitro validation. Secondly, the current study was of a retrospective nature, as it was based on data from TCGA dataset without validating it in a prospective clinical trial. Despite these drawbacks, the results of our study demonstrate that our survival model with 4 genes could be used as reliable prognostic predictors of BC, as well as the ceRNA network plays a role in the molecular regulatory network of BC.

CO N FLI C T S O F I NTE R E S T
The authors declare that there are no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data generated and analysed during the current study are available from the corresponding author on reasonable request. Public data and data repositories are referenced within the manuscript.