Genome‐wide screening of abberant methylated drivers combined with relative risk loci in bladder cancer

Abstract Background To explore important methylation‐driven genes (MDGs) and risk loci to construct risk model for prognosis of bladder cancer (BCa). Methods We utilized TCGA‐Assembler package to download 450K methylation data and corresponding transcriptome profiles. MethylMix package was used for identifying methylation‐driven genes and functional analysis was mainly performed based on ConsensusPathDB database. Then, Cox regression method was utilized to find prognostic MDGs, and we selected 17 hub genes via stepwise regression and multivariate Cox models. Kruskal‐Wallis test was implemented for comparisons between risk with other clinical variables. Moreover, we constructed the risk model and validated it in http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13507. Gene set enrichment analysis was performed using the levels of risk score as the phenotype. Additionally, we further screened out the relative methylation sites associated with the 17 hub genes. Cox regression and Survival analysis were conducted to find the specifically prognostic sites. Results Two hundred and twenty‐eight MDGs were chosen by ConsensusPathDB database. Results revealed that most conspicuous pathways were transcriptional mis‐regulation pathways in cancer and EMT. After Cox regression analysis, 17 hub epigenetic MDGs were identified. We calculated the risk score and found satisfactory predictive efficiency by ROC curve (AUC = 0.762). In the validation group from http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE13507, 17 hub genes remained higher predictive value with AUC = 0.723 and patients in high‐risk group. Meanwhile, Kruskal‐Wallis test revealed that higher risk score correlated with a higher level of TNM stage, tumor grade, and advanced pathological stages. Then, identified 38 risk methylated loci that highly associated with prognosis. Last, gene set enrichment analysis revealed that high‐risk level of MDGs may correlate with several important pathways, including MAPK signaling pathway and so on. Conclusion Our study indicated several hub‐MDGs, calculated novel risk score and explored the prognostic value in BCa, which provided a promising approach to BCA prognosis assessment.


| INTRODUCTION
Bladder cancer (BCa) is the 2nd most common form of malignancy in urological system worldwide. 1 In 2019, the estimated respective new cases and new death are 80 470 and 17 670 in the United States reported by the American Cancer Society. 2 With incidence and mortality rates of 9.6 and 3.2 per 100 000, respectively, BCa is approximately four times common in males than in females. 1 According to the extent of tumor invasion, about three-quarters of BCa patients classified as nonmuscle-invasive BCa (NMIBC) and the remainder had muscle-invasive BCa (MIBC) when initial diagnosed. 3 Despite the comprehensive treatments of surgical treatment, radiotherapy, chemotherapy, and immunotherapy, BCa is still a serious challenge due to high recurrence and mortality. 4,5 Apart from certain occupational chemical explosion, water pollution, and tobacco smoking, genetic material mutation is one of the main risk factors for BCa. [6][7][8] However, the molecular regulation mechanism of BCa remains confused. Both DNA and histone methylation are an essential epigenetic regulation mechanism that has been studied intensively in recent decades. [9][10][11][12][13] Tumor suppressor genes might be hypermethylated and ultimately silenced by abnormal methylation genes in CpG islands located in or near the genomic promoter region. It is noteworthy that DNA methylation shows the enormous difference between NMIBC and MIBC. For instance, in NMIBC hypomethylation in non CpG islands is more common, while widespread promoter hypermethylation is identified more in MIBC. 14,15 This proves in some way that DNA methylation changes might lead to different clinical and pathological characteristics in BCa. However, the DNA methylation regulation effect and mechanism in BCa have not been fully elucidated.
Recently, the accelerated development of bioinformatics has been applied in malignant tumor research benefit from the improvement of high-throughput sequencing technology, emergence of new statistical algorithms and the consummation function of public databases. The Cancer Genome Atlas (TCGA) database and Gene Expression Omnibus (GEO) database provide great convenience for systematic collection of clinical, pathological, and biological data from patients with malignancies. 16,17 In addition, Olivier Gevaert et al invented Methylmix, which is an original computational algorithm applied in correlation analysis between abnormality of DNA methylation and predict transcription. 18 This provides a new technical means for researching DNA methylation in malignant tumors.
In this study, we filtrated methylation-driven genes (MDGs) extracted from gene methylation profiling datasets and gene expression profiling datasets. Then the analyzed data of BCa patients from TCGA and GEO database were integrated in order to identify hub MDGs using biological algorithm. Finally, we combined selected MDGs with relative risk loci for predicting prognosis in BCa.

| Data acquisitions and integrative analysis
We obtained RNA-seq data of transcriptome profiles of BCA from TCGA database (https ://portal.gdc.cancer.gov/), including 414 tumor tissues and 10 normal tissues. We used edgeR package 19 to normalize the raw data. Moreover, we utilized the tool of TCGA-Assembler to download the DNA methylation data (419 BCA and corresponding 21 normal samples) based on the Illumina Infinium Human Methylation 450 platform, which detecting over 480 K human genome methylated sites. Limma package was applied to do the normalization. 20 Level three methylation data were in form of β value, representing the ratio of the methylation probe data vs total probe intensities. Then, the average DNA methylation value for all CpG sites correlated with a gene was calculated and merged into a matrix with the function of TCGA-Assembler. What is more, we subsequently extracted corresponding clinical features from 412 BCA patients via TCGA portal including age, gender, pathological stage, tumor grade, TNM stage, and survival data. Specific baseline characteristics of BCA patients are shown in Table 1 and Table S1.
MethylMix package was designed to deal with analysis of methylation data with RNA-seq profiles, especially detecting methylation alterations that were associated with gene expression. 18 We experienced three steps during the algorithm as following: firstly, methylation events that lead to alterations of gene expression were selected that only the genes passed the correlation cutoff value of correlation coefficients = −0.3 and adjust P value = .05. Second, the methylated state of one gene was defined across multiple samples using a β mixture model. Last, the comparison of methylation levels in BCA and corresponding normal samples was conducted using the Wilcoxon rank sum test ( Figure 1).

| Cluster analyses and enriched pathways of methylated drivers
Differentially expressed divers with methylated alterations were defined by MethylMix package and performed the cluster analysis between tumor vs normal tissues using pheatmap package. In addition, we applied the ConsensusPathDB database to further assess the potential enriched pathways using the imputed driven-genes list. ConsensusPathDB was publicly comprehensive biological resources mainly integrating interaction networks and we selected Humancyc, Reactome, Kegg, Smpdb, Wikipathways, Signalink, and Biocarta for subsequent analysis with P value cutoff of .05. 21

| Construction and evaluation of epigenetic risk score based on MDGs
We conducted the univariate Cox regression analysis to select prognostic MDGs with P < .05 as the threshold. Stepwise regression analysis was performed to select 17 hub MDGs with the minimum Akaike information criterion (AIC) value. Meanwhile, we utilized the multivariate Cox method to obtain the β value of hub genes, which was the coefficient of each variable and represented the respective weight. Hazard ratio and 95% confidence interval (95% CI) were calculated and shown in forest plot by survminer package. Then, risk score based on the epigenetic MDGs could be calculated as the follows: Risk score = Ʃ (β i × Exp i ) (i = 17), where Exp represented the expression data of 17 hub genes. Based on this, 405 BCA patients from TCGA cohort can be divided into high-and low-risk groups using the median risk score as the cutoff. Importantly, we further used the receiver operating characteristic curve (ROC) to evaluate the predictive efficiency of MDGs with survivalROC package. 22 Significant difference between high-and low-risk groups was assessed by Kaplan-Merier analysis using survival package.

MDGs and clinical characteristics
Since BCA patients in high-risk group revealed worse survival prognosis than that in low-risk group, we invented to further investigated the potential associations between MDGs signature with traditional clinical characteristics. We downloaded and collated clinical variables from 405 patients in TCGA cohort with complete information consisted of AJCC-TNM stage, tumor grade, as well as pathological stage, which was all independent clinical risk factors. We merged the risk score with other clinical variables with the same id numbers for subsequent analysis (Table S4). Kruskal-Wallis (K-W) test was utilized to assess the significant difference between level of risk score across multiple groups in respective clinical features with statistical cutoff of P < .05.

| Validation of MDGs signature in an independent population
To evaluate the predictive power of 17 MDGs signatures, we obtained 165 BCA patients with corresponding expression data and clinical information in GSE13507 from GEO database (https ://www.ncbi.nlm.nih.gov/geo/). We selected the same 17 hub genes and constructed the risk formula based on multivariate Cox model using survival package. Importantly, the cutoff was set as the same as the TCGA-BLCA cohort and we classified the patients into low and high subgroups. The AUC of the ROC curve was

| Analysis of genes and methylated loci associated with OS
To further detect specific methylated sites in 17 hub MGDs, we extracted the all methylation data correlated with 17 MDGs in 440 files from TCGA using the Perl scripts. We merged the value of 207 methylated sites into one matrix and conducted univariate Cox analysis to select potential risk methylated loci (Table S3). Survival analysis was then performed to assess the difference between hypo-and hypermethylation state of specific sites ( Figure S3). Additionally, a joint survival analysis was conducted to further evaluate key MDGs signature associated with prognosis in BCA patients, where we combined the methylation levels and expression data of one gene. The conjoint analysis was also performed by the survival package.

| Statistical analyses
The

| Identification of methylated driven genes
Raw transcriptome data of 414 BCA with 10 normal samples were normalized via edgeR package, and we identified 8898 differentially expressed genes with cutoff of FDR <0.05. Meanwhile, we prepared and normalized the methylation profiles of 419 BCA with corresponding 21 normal samples through limma package. Then, the results from the MethylMix package revealed the correlations between methylation state with genes expression, mixture models, as well as the methylation difference between BCA and normal tissues across multiple samples. A total of 228 genes were defined as the epigenetic drivers with |logFC| > 0, |Cor| > 0.3, and P < .05 (Table S2). We showed the top hypo-and hypermethylation of drivers in Figure 1. We selected top 100 genes to show the difference of methylated level between tumor and normal tissues, which were illustrated in heatmap with pheatmap package in Figure 2. 20 F I G U R E 3 Functional pathway analysis for 228 MDGs based on ConsensusPathDB database. Transcriptional misregulation in cancer, the MAPK signaling pathway, the Wnt signaling pathway, cell cycle, as well as other cancer-related pathways enriched significantly in our analysis

| Functional enriched pathways
Since we obtained 228 MDGs found to be statistically significant via MethylMix package, we intended to further explore the potential pathways that the divers might be involved in. We conducted the functional enriched analysis across multiple database in ConsensusPathDB and selected several significant biological pathways. The most conspicuous pathways were transcriptional misregulation pathways in cancer and epithelial to mesenchymal transition (EMT). In addition, there were other notable pathways including Wnt signaling crosstalk, glutathione metabolism, transcriptional cascade regulating pathway, as well as Bcell receptor signaling pathways. We selected several vital pathways to exhibit in Figure 3, and the other top 40 pathways were shown in Table 2.

| Construction and validation of risk scores based on MDGs signatures
We utilized the univariate Cox regression analysis to identify 64 genes associated with survival outcomes with P < .05 (Table S5). Then, the stepwise regression model was performed to select a robust signature that was the most frequent combinations, where 17 epigenetic divers were screened out. We showed the hazard ratio with 95% CI for each hub MDGs based on the multivariate Cox regression results in a forest plot ( Figure 4). Therefore, a methylationbased signature was established with the 17 hub MDGs. Meanwhile, the association analysis revealed that the methylation state all correlated negatively with the expression level of genes from the Pearson correlation coefficients in  Figure 5A,B. Moreover, the expression level of 17 genes in two groups was exhibited in heatmap ( Figure 5C). The AUC of the ROC curve was 0.762 which indicating the satisfactory predictive efficiency of the risk signature ( Figure  5D). The patients in high-risk group revealed the wore prognosis in OS compared with that in low-risk group from the Kaplan-Merier plot with P < .0001 in Figure 5E. Besides, we obtained the expression profiles with corresponding clinical information of 165 BCA patients from GSE13507 as an independent external validation. Using the same genes identified from the TCGA cohort,

F I G U R E 4 Forest plot of 17 hub
MDGs in TCGA cohort. The Concordance Index and the Minimum of AIC were shown at the left bottom of the picture the 17-MDGs based signature successfully stratified the 165 BCA patients into low-(n = 82) and high-risk group (n = 83). Furthermore, the ROC curve was conducted and the AUC was 0.723. The log-rank test showed the significant difference of OS in high-and low-risk groups with P = 8e-05, in accordance with the results from TCGA cohort ( Figure 6A,B). We calculated and merged the risk score with other clinical variables in Table S4.

| Correlation of methylation-based signature with clinical characteristics
With the result of risk score found to be statistically significant with the OS for BCA patients, we further invented to explore the potential relationships of risk score and other independent clinical risk features. The Kruskal-Wallis test showed that higher risk score correlated with a higher level of TNM stage, high tumor grade, as well as advanced pathological stage (Stage III and IV). The specific statistical difference in distribution of risk score across multiple groups of clinical variables was shown in Figure S2A-E.

| Screening of risk methylation loci with survival analysis, joint survival analysis
Since we identified 17 hub MDGs associated highly with survival prognosis of BCA, we wanted to detect the significant methylation loci harboring these risk methylation genes. We extracted the β value of methylation data according to the 17 gene names from the 440 files in Table S3. A list of 207 methylation sites were screened out and the univariate Cox analysis revealed that 38 key methylation loci were associated with survival outcomes in Table 3 (P < .05). Meanwhile, we conducted the Kaplan-Merier analysis and chose the plots of top 12 methylated sites in Figure S3. Additionally, a joint survival analysis found that combination of the methylation state and expression level of the 17 hub MDGs signature also correlated tightly with survival outcomes in 405 BCA patients (Figure 7).

| Enrichment of cancer-related pathways associated with MDGs signature
The GSEA was exploited to investigate potential biological pathways that the 17-methyalted drivers might be involved in, and we found a total of 79 items were enriched significantly with FDR < 0.25. The level of MDGs signature-based risk score was defined as the phenotypes, and the result suggested that high-risk level of MGDs may correlated highly with several important crosstalk, consisting of MAPK signaling pathway, Wnt signaling pathway, cell cycle, as well as other cancer-related pathways (Figure 8).

| DISCUSSION
Epigenetics, which focus on the features and modification of the genome, contains cytosine modifications of DNA, post-translational modifications of histones, nucleosome positioning and interactions of spatial conformation between genomic regions and accessible genomic loci. 24,25 As a crucial part of epigenetic regulation, DNA methylation was reported to participate in malignant progression. 26,27 In BCa, Ahlén Bergman et al reported that high methylation level at the IFNG −4229 bp locus was associated with more advanced post-cystectomy tumor stages, which might lead to a worse prognosis. 28 Methylation also participated in the regulation of tumor suppressor mechanisms. For instance, methylation status of CpG islands was observed in the regulation of miR-1-3p expression, causing inhibition of the BCa cell proliferation, migration, and invasion. 29 Although there are increasing studies concerning the methylation in BCa, limited research has explored the prognosis value on the genome-wide screening of MDGs.
In this study, we analyzed the raw transcriptome data and screened out 228 MDGs. Then the ConsensusPathDB database was used for the analysis of biological functions of MDGs. The results revealed that most conspicuous pathways were transcriptional mis-regulation pathways in cancer and EMT. After univariate Cox regression analysis and the establishment of the stepwise regression model, 17 epigenetic hub-MDGs were identified. Furthermore, we calculated the risk score and assessed the predictive value of methylation signature in OS prediction, and found the satisfactory predictive efficiency of the risk signature by ROC. To further verificate the results above, we extracted high-throughput sequencing data from the GEO database. The Kruskal-Wallis test showed that higher risk score correlated with a higher level of TNM stage, high tumor grade, as well as advanced pathological stage. After screening of risk methylation loci with survival analysis and joint survival analysis, we examined enrichment of cancer-related pathways associated with MDGs signature. The results above suggested that high-risk level of MGDs may be correlated highly with several important crosstalk, consisting of the MAPK signaling pathway, the Wnt signaling pathway, cell cycle, as well as other cancer-related pathways.
Keratins 8 (KRT8) is a key participation factor of keratinization, which exists abnormal performance in basal cell tumors. As representative of simple columnar epithelia, KRT8 is integral to epithelial differentiation. 30 It is worth noting that abnormal epithelial-mesenchymal transition is an important part of the malignant progression. 31 Study has revealed that the high expression of KRT8 in gastric cancer is related to the upregulation of EMT pathway and might promote the tumor progression and metastasis. 32 However, limited evidence has presented regulation mechanism of KRT8 and EMT signaling pathway in BCa. In our study, we identified KRT8 as a hub-MDG, which might influence the BCa progression through EMT pathway analyzed by ConsensusPathDB database. S100A16 belongs to S100 family, which is a superfamily of calcium-binding proteins. S100A16, combining with S100A1-S100A14 and S100A7, encodes in two tandem clusters on chromosome 1q21, which plays a powerful role in epidermal differentiation complex (EDC). 33 As a novel detected S100 family member, S100A16 was observed over-expressed in many malignancies. 34 Nariaki et al demonstrated that S100A16 might increase the expression level of Oct4 and Nanog in cancer stem-like cells in Yumoto human cervical carcinoma cells. 35 According to our findings, S100A16 is highly expressed in bladder cancer, consistent with previous studies above. TPM1 and TPM2 are both members of tropomyosin (TPM) family, which discovered as a category of actin binding proteins acting as inhibitors of cellular transformation. 36 Previous research reported that overexpression of TPM1 was related to a larger tumor size and a higher tumor grade, and TPM1 was proved as a potential biomarker of renal cell carcinoma prognosis. 37 Unlike TPM1, poorly TPM2 were identified associated with malignant progression prostate tumors. 38 As filtrated MDGs, whether TPM1 and TPM2 interact in the malignant progression of bladder cancer remains further study.
To further explore the regulation of MDGs in bladder cancer, functional pathway analysis was performed and indicated that the 17 MDGs signatures were mainly associated with MAPK, Wnt and cell cycle signaling pathway, which is unique to our study. Studies indicated that methylation in malignant tumors might exert further regulatory functions through the above signaling pathways. In gastric cancer, SPG20 might be suppressed by methylation and result in activation of cell proliferation by upregulated the EGFR/MAPK pathway. 39 Xiang et al demonstrated that zinc-finger protein 545 is silenced F I G U R E 8 Gene set enrichment analysis for identification of the underlying pathways using risk score as the phenotype. A-I, GSEA results revealed the top 9 MDGs-related pathways, including MAPK signaling pathway, the Wnt signaling pathway, cell cycle, as well as other cancerrelated pathways by promoter methylation and acted as an inhibiting factor in colorectal cancer through the Wnt/β-catenin, PI3K/AKT and MAPK/ERK signaling pathways. 40 In colorectal cancer, it was also found that aberrant DNA methylation of WNT pathway genes might regulate tumor progression. 41 In BCa, this study initially screened the signaling pathways associated with MDGs. Nevertheless, the specific mechanisms of action and regulation of these signaling pathways and methylation need to be further studied.
Finally, the risk model calculated based on 17 MDGs signatures has certain accuracy in assessing the prognosis of BCa patients, which is unique in our study. The AUC of ROC curve revealed the satisfactory predictive efficiency of the risk signature. What's more, Kruskal-Wallis test showed that risk score had positive correlation with TNM stage, tumor grade and pathological stage (Stage III and IV), which functioned reasonably from clinical perspective and indicated its potential practical significance for BCa diagnosis and prognosis prediction.
Risk scores were calculated based on hub genes for evaluating the prognostic value, which was an advantage of our research. The AUC of the ROC based on risk score was 0.762 in our study. In previous studies, Zeng et al 42 developed a novel nomogram with readily available clinicopathological information including grade, stage, age, lymph node, location, and histology for predicting cancer-specific survival of upper tract urothelial carcinomas. However, the AUC of ROC was 0.74. In addition, Yin et al 43 developed a nomogram based on 21-miRNA signature. The AUC of ROC was only 0.663. The above results indicated that the risk score calculated based on the hub genes identified in our study might have a well-prognostic value.
Despite the remarkable sense, it is inevitable that limitations also existed in our research. First, no clinical samples but only screening and verification were performed to extract target data through biological algorithm approaches. Secondly, data sources of BCa patients and validation are based on TCGA database and GEO database, which might result in selection bias due to ignorance of clinical and pathological data from BCa patients in other databases or clinical centers. Finally, these 17 hub-MDGs should be further studied and validated to expound its specific regulatory function and mechanisms in BCa.

| CONCLUSION
Our study indicated 17 hub-MDGs by genome-wide screening from public databases, calculated novel risk score and explored the prognostic value in BCa, which provided a promising approach to BCA prognosis assessment.