Identification of methylation‐driven genes related to prognosis in clear‐cell renal cell carcinoma

Abstract With the participation of the existing treatment methods, the prognosis of advanced clear‐cell renal cell carcinoma (ccRCC) is poor. More evidence indicates the presence of methylation in ccRCC cancer cells, but there is a lack of studies on methylation‐driven genes in ccRCC. We analyzed the open data of ccRCC in The Cancer Genome Atlas database to obtain ccRCC‐related methylation‐driven genes, and then carried out pathway enrichment, survival, and joint survival analyses. More important, we deeply explored the correlation between differential methylation sites and the expression of these driving genes. Finally, we screened 29 methylation‐driven genes via MethylMix, of which six were significantly associated with the survival of ccRCC patients. This study demonstrated that the effect of hypermethylation or hypomethylation on prognosis is different, and the level of methylation of key methylation sites is associated with gene expression. We identified methylation‐driven genes independently predicting prognosis in ccRCC, which offers theoretical support in bioinformatics for the study of methylation in ccRCC and a new perspective for the epigenetic study of ccRCC.

2016). In spite of immunotherapy improving the overall response rate, the incidence of serious adverse events also increases (Rini et al., 2019).
With the clinical efficacy of immunotherapy, the exploration of the mode of combined immunotherapy was followed, but at present, there were no reliable biomarkers for early screening and prognosis judgment of ccRCC, which limits the progress of treatment of renal clear-cell carcinoma.
Although early studies have discovered that the prognostic markers of ccRCC may be mutated genes such as VHL (Gulati et al., 2014). The heterogeneity of ccRCC itself requires more effective biomarkers to evaluate the prognosis. For this reason, there is an urgent need to ascertain valuable molecular targets in the in-depth study of ccRCC.
Advances in research techniques have led us to a deeper understanding of the adverse diseases in our own professional field. For example, we recognize that the high expression of CD36 (cluster of differentiation 36) transcriptional group (Xu, Qu, Wang, Zhang, & Ye, 2019) and P21 (RAC1) activated kinase 1 (PAK1) protein (Qu, Liu, Bai, Xu, & Guo, 2019) is a sign of poor prognosis in ccRCC, However, BCL2 associated athanogene 1 and NOP56 ribonucleoprotein are different from CD36 transcriptional group and PAK1 protein, their high expression indicates a better prognosis (Giridhar et al., 2017). The role of gene methylation in the development of cancer has been gradually perceived and recognized. Researchers have successfully identified methylated genes or methylation-driven genes that forecast the prognosis of esophageal squamous cell carcinoma (ESCC; Roy et al., 2019), lung squamous cell carcinoma (Gao et al., 2019), lung adenocarcinoma (C. Su et al., 2019), and hepatocellular carcinoma (G.-X. Li et al., 2019).
Moreover, the function of epigenetic abnormalities in ccRCC has been confirmed. Bioinformatics analysis of microarray data has broad prospects and clinical significance. Therefore, the era of clinical application of methylated and methylation-driven genes is bound to come. However, there is short of previous evidence for the exploration of methylation-driven genes in ccRCC.
The opening of a large database such as The Cancer Genome Atlas (TCGA; Tomczak, Czerwińska, & Wiznerowicz, 2015) makes it possible to meet the deeper and more accurate needs of scientists for disease exploration. It contains genetic data such as human-methylated genes needed by the researchers, as well as a variety of clinical prognosis information, supplemented by bioinformatics technology so that we continue to have a new understanding of the occurrence and development of cancer and promote the rapid development of the discipline field. Our team collected the opened big data of TCGA and analyzed the data using the MethylMix software package (Gevaert, 2015) developed by Gevaert et al. to filter out the differentially methylated genes connected with the prognosis of ccRCC. Then, the gene enrichment pathway was visualized, and the survival and joint survival analysis were carried out to ascertain the relationship between the methylation-driven genes and the survival of the patient.
The purpose of this study was to evaluate the relationship between methylation-driven genes, gene loci, and messenger RNA (mRNA) data for the sake of understanding the cancer mechanism involved in methylation ulteriorly and providing valuable medical evidence for the treatment and prognosis of ccRCC. DNA methylation data in patients with ccRCC were assayed using the Illumina Methylation 450k Bead chip (Walker et al., 2015), a technical platform for a large-scale study of DNA methylation. And then, we exploit R-based LIMMA (https://bioconductor.org/packages/ release/bioc/html/limma.html, RRID:SCR_010943) software package (Ritchie et al., 2015) and EdgeR (https://bioconductor.org/packages/ release/bioc/html/edgeR.html, RRID:SCR_012802) package (Robinson, McCarthy, & Smyth, 2010) to standardize the downloaded data to acquire differentially methylated genes (p = .05, logFold Change (log FC) = 0.5) and expressed genes (FC = 5, adjusted p = .01). LIMMA and EdgeR are R software packages used to analyze differential genes and differentially expressed data, respectively. The operation process is as follows: the data of ccRCC were introduced into R platform, and the differentially methylated genes and differentially expressed genes were obtained and mapped after LIMMA and EdgeR packages treatment, filtration, and standardization. It is worth noting that differentially methylated genes and differentially expressed genes are genes with different degrees of methylation and different degrees of expression in normal and tumor tissues, respectively. However, the methylation-driven genes are genes with different degree of methylation and expression between normal and tumor tissues. Therefore, the MethylMix algorithm was utilized for calculating the relationship between gene methylation level and gene expression advanced through R language with |logFC| ≥ 0, adjusted p < .05, and correlation coefficient (Cor) < −0.3 as screening conditions. MethylMix (https:// bioconductor.riken.jp/packages/3.1/bioc/html/MethylMix.html) is an algorithm based on the β-mixed model to identify methylation states and compares them with the normal DNA methylation state. It implemented the differential methylation value, that is, the difference between the tumor methylation state and the normal methylation state, to identify disease-related methylation-driven genes. And the operation of MethylMix package requires the input of three specific data sets (the methylation data of the tumor samples, the methylation data of the normal tissue samples, and the corresponding gene expression data of the tumor samples). As a result, we submit the data to MethylMix according to the requirements of the algorithm, and finally screened out differentially methylation-driven genes for survival analysis. All the data we download is open to the TCGA platform, so we do not need the approval of the local ethics committee. WANG ET AL.

| Path enrichment analysis of methylationdriven genes
To further comprehend the biological functions of these methylationdriven genes, we applied ConensusPathDB (http://cpdb.molgen.mpg.de/; Herwig, Hardt, Lienhard, & Kamburov, 2016) to visualize the analysis of pathway of methylation-driven genes, and took the cut-off value of p value = .05 as the criterion. ConensusPathDB integrates the interaction networks of Homo sapiens, containing signal transduction, gene regulation, and drug-target interaction, and biochemical metabolism either, which currently integrates 32 public databases. It can interact with the genetic information we collected to avoid redundancy perfectly, so it has become one of the favorite tools for visual methylation-driven gene pathway enrichment analysis in the research process.

| Survival analysis and joint survival analysis
To explore the effect and significance of methylation-driven genes on the prognosis of patients with ccRCC, we take the overall survival (OS) of T A B L E 1 Twenty-nine methylation-driven genes in ccRCC  (Schultz, Peterson, & Breslau, 2002) to analyze and evaluate the relationship between methylation-driven genes and survival rate, and verified this correlation by Log-rank test (Koletsi & Pandis, 2017) on the survival R software package platform (Singh & Mukhopadhyay, 2011), so as to screen the possibility of independent prognosis of methylation-driven genes. And the setting of p < .05 was viewed as statistically significant. Then, the survival R package was further used to obtain the joint survival curve in the joint survival analysis to analyze the relationship between gene methylation level and expression and prognosis of ccRCC. It is worth noting that, based on the downloaded clinical prognostic information and the information of the related sites of the methylation-driven genes, we combined the prognostic key genes obtained from the survival analysis and the combined analysis to make sure the correlation between the expression of methylation-driven genes and the methylated sites of key genes through the Kaplan-Meier curve drawn by the survival R package (set p < .05, |Cor| > 0.5 as the screening condition).
F I G U R E 1 Thermal map of methylation-driven genes associated with ccRCC. (The color change from green to red in the heat map illustrates the trend from low to high methylation. |log FC| ≥ 0, adjusted p < .05 and Cor < −0.3). ccRCC, clear-cell renal cell carcinoma; FC, fold change 3 | RESULTS

| Acquisition of methylation-driven genes
To study the methylation-driven genes associated in patients with ccRCC, we downloaded methylated data from 160 normal specimens and 325 methylated cancer samples, as well as mRNA expression data from 72 normal specimens and 539 ccRCC samples from the TCGA database.
First of all, we used LIMMA software package and EdgeR package to extract abnormal methylation data and gene expression data, and obtained 105 differentially methylated genes and 257 differentially expressed genes, respectively. Second, the data were integrated and grouped, and 29 methylation-driven genes were screened via using MethylMix R package (Table 1). Afterward the connection between gene methylation and gene expression was visualized on the R platform ( Figure   1). Six of the most representative of these genes (

| Path enrichment analysis of methylation-driven genes
We utilized ConensusPathDB online software for path enrichment analysis to further study the mechanism of methylation-driven genes in the occurrence and development of ccRCC. The results of pathway enrichment analysis showed that the methylation-driven genes were mainly enriched in the universal transcriptional pathway, RNA polymerase II transcription and gene expression, and were significantly related to the enriched events (Figure 4).

| Prognostic evaluation and survival analysis
The prognostic value of 29 methylation-driven genes was construed by survival R software package, and nine genes were detected to be independent prognostic indicators of ccRCC ( Figure 5). Among them, the On the basis of the above, by taking the intersection of survival analysis and joint survival analysis, we locked six prognostic genes C11orf21, EVI2A, PRR15L, RIPK4, SSX1, and ZNF418 as independent prognostic factors or biomarkers, and explored the correlation between the expression of each gene and the corresponding methylated sites, our team found that not all methylation sites were associated with the expression of driving genes. We gained sites significantly related to C11orf21 and EVI2A, PRR15L, RIPK4, SSX1, and ZNF418, respectively.
The details are as follows: zero site in C11orf21, one site in EVI2A  However, there is no report on the study of abnormal methylation driving genes in ccRCC. And so far, only one methylation-driven genes, Cytohesin 1 Interacting protein, has been reported to have an effect on ccRCC in previous studies, and the paper concludes that hypermethylation of this driver gene may be a feature of good F I G U R E 4 Pathway enrichment analysis of ccRCC-related aberrant methylation-driven genes by using ConsensuspathDB (only the pathways in which p < .01 are shown here). Node size: the number of genes; node color: p value; edge width: percentage of shared genes; edge color: genes from input; ccRCC, clear-cell renal cell carcinoma prognosis in KIRC (Gevaert, Tibshirani, & Plevritis, 2015). Thus, it is pressed for searching new prognosis-related driving genes in ccRCC for more accurate individualized treatment and good prognosis evaluation.
In this study, we detected TCGA data to explore the relationship ZNF418 gene encodes a zinc finger protein, which plays a negative regulatory role in mitogen-activated protein kinase signaling pathway (Y. Li et al., 2008). Its low expression is concerned with the occurrence and development and poor prognosis of gastric cancer (Hui et al., 2018), and can be used as a biomarker for the diagnosis of ESCC (Pu et al., 2017). The overexpression of RIPK4 in receptor interaction is involved in the growth of some tumor cells (X. Huang et al., 2013). And there are some studies that showed that RIPK4 can facilitate the differentiation of epidermis through phosphorylation and participate in the carcinogenesis of skin (Lee et al., 2017) and encouraged the occurrence of nasopharyngeal carcinoma (Gong, Luo, Yang, Jiang, & Liu, 2018). Furthermore, protein kinase RIPK4 promotes the invasiveness of bladder urothelial carcinoma  and pancreatic cancer (Qi et al., 2018) through nuclear factor-κB pathway and RAF1/MEK/ERK pathway, respectively.
Equally important, it is closely correlated with bone metastasis of breast cancer (Zhang, He, & Zhang, 2019) and lymph node metastasis of cervical cancer (Azizmohammadi et al., 2017). We made the conclusion that patients with high methylation level of PRR15L, ZNF418, and RIPK4 had a poor prognosis compared with patients with low methylation level of methylation cleared. Moreover, combined survival analysis further displayed that their high methylated/low expression was a feature of low survival rate. The lower the methylation level of the other three genes EVI2A, C11orf21, and SSX1, the worse the survival, and the combined survival analysis uncovered that the patients with low methylated/ high expression had worse prognosis than those with high methylated/low expression. EVI2A is a protein-coding gene that can be employed to diagnose diseases (Lo, Shen, Baumgarner, Cramer, & Lossie, 2012). In addition, some studies have revealed that it is also a specific tumor suppressor factor of lymphocytes (X.-W. Li et al., 2014), and the upregulation of EVI2A gene expression may increase the malignant risk of malignant peripheral schwannoma (MPNST; Pasmant et al., 2011). In our study, patients with hypomethylation/ high expression of EVI2A gene had a low survival rate, which was consistent with the study of MPNST. C11orf21 located on chromosome 11p15.5, in the meta-analysis of chronic lymphocytic leukemia (CLL), S. I. Berndt et al found that C11orf21 is related to the arises of CLL (Berndt et al., 2013), but it needs to be further verified. The expression of SSX1 gene is related to the poor prognosis of patients with colon cancer (Hilal, Novikov, Novikov, & Karaulov, 2017 In a word, the development of bioinformatics provides more and more evidence for the important role of DNA methylation in tumor.
The six effective genes obtained in this study have been studied more or less except PRR15L, whereas, their research in ccRCC is still in the blank period. Our study discussed the role of PRR15L, C11orf21, ZNF418, EVI2A, RIPK4, and SSX1 in ccRCC for the first time. We used R software to analyze the methylation-driven genes of ccRCC in TCGA data. After that, for the sake of verifying the relationship between the degree of gene methylation and the prognosis of patients, we utilized survival analysis. In addition, to further ensure the effect of its practical application, we united the level of gene methylation, gene expression and patient prognosis survival information for joint survival analysis, and finally carried out a more accurate site analysis. This study has found potential methylation-driven genes that can be used as biomarkers of ccRCC, and the exploration of related sites has further refined our research, giving us a new understanding of ccRCC from the perspective of bioinformatics.
However, it is undeniable that these ccRCC-related differential driving genes need further experimental verification on the basis of a rigorous attitude.

| CONCLUSION
To sum up, we conducted R soft packages and MethylMix technology to analyze the data we got from TCGA, and the outcomes was 29 methylation-driven genes which have a close relevance with ccRCC.
Afterwards, we explored the connection of driving gene methylation, gene expression and the prognosis of patients via survival analysis and joint survival analysis on the basis of available data, and identified the methylation-driven genes, EVI2A, C11orf21, SSX1, PRR15L, ZNF418, and RIPK4, which can be used as prognostic markers of ccRCC. It is interesting to note that the expression levels of the five methylation-driven genes except C11orf21 are negatively correlated with the degree of methylation of the sites. Therefore, the development and evolution of ccRCC may be due in part to the methylation of these driving genes, and the genes we obtained in this study can be used as prognostic markers for ccRCC. In conclusion, the results of our study provide new ideas for the diagnosis, therapy, and prognosis of ccRCC, so that the methylation-driven genes contribute to ccRCC discovered, and open the way for the future phase of methylation-driven genes into clinical application.

ACKNOWLEDGMENTS
We sincerely thank the Cancer Genome Atlas database for its data sharing.

CONFLICT OF INTERESTS
The authors declare that there are no conflict of interests.

AUTHOR CONTRIBUTIONS
J.W. and Q.J.Z. contributed to the design of study, J.W., Q.Q.Z. and C.X.L. performed data analysis, X.L.N. and F.X.W. collected research data, L.H.F. and J.L. provided analytical tools; Q.J.Z. and B.S. involved in drafting the manuscript, C.X. and S.F. revising the manuscript critically. All the authors were fully involved in the writing of the paper and gave final approval to the version we submitted.