A prognostic nomogram based on competing endogenous RNA network for clear‐cell renal cell carcinoma

Abstract Background Clear‐cell renal cell carcinoma (ccRCC) is stubborn to traditional chemotherapy and radiation treatment, which makes its clinical management a major challenge. Recently, we have made efforts in understanding the etiology of ccRCC. Increasing evidence revealed that the competing endogenous RNA (ceRNA) was involved in the development of varied tumors. However, a comprehensive analysis of the prognostic model based on lncRNA‐miRNA‐mRNA ceRNA regulatory network of ccRCC with large‐scale sample size and RNA‐sequencing expression data is still limited. Methods RNA‐sequencing expression data were taken out from GTEx database and TCGA database, a total of 354 samples with ccRCC and 157 normal controlled samples were included in our study. The ccRCC‐specific genes were obtained by WGCNA and differential expression analysis. Following, the communication of mRNAs and lncRNAs with targeted miRNAs were predicted by MiRcode, starBase, miRTarBase, and TargetScan. A gene signature of eight genes was further constructed by univariate Cox regression, Lasso methods, and multivariate Cox regression analysis. Results A total of 2191 mRNAs and 1377 lncRNAs was identified, and a dysregulated ceRNA network for ccRCC was established using 7 mRNAs, 363 lncRNAs, and 3 miRNAs. Further, a gene signature including eight genes based on this ceRNA was determined followed by the development of a nomogram predicting 1‐, 3‐, and 5‐year survival probability for ccRCC. Conclusion It could contribute to a better understanding of ccRCC tumorigenesis mechanism and guide clinicians to make a more accurate treatment decision.


| Gene expression and clinical data
The high-throughput RNA sequencing data of 539 kidney samples with ccRCC and 72 normal controlled samples were obtained from TCGA data repository using TCGAbiolinks 9 R package. miRNA sequencing data were also retrieved from TCGA. 8 Samples with tumor purity below 0.6, 10 Formalinfixed paraffin-embedded (FFPE) tissue, and duplicate samples were excluded in downstream analysis. Meanwhile, a total of 85 normal kidney cortex samples were downloaded from GTEx 7 (version V8). No further approval was required from the Ethics Committee as the data comes from the TCGA and GTEx database. lncRNAs and mRNAs were recognized by the Ensembl 11 database (version GRCh38.98). lncRNAs and mRNAs that were not included in the database were excluded in this study. We mainly used the R program (version 3.6.1) 12 for the analysis in our study.

| ccRCC-specific mRNAs and lncRNAs
WGCNA, which can determine genes most related to a sample trait by clustering highly correlated genes to several modules and combining modules with external traits, was used to identify co-expression network in lncRNAs or mRNAs expression profiles. 6 The biweight midcorrelation analysis 13 was efficiently used to assess weighted coexpression relationship. Gene significance (GS), module significance (MS), and module membership (MM) were explained by biweight midcorrelation coefficients. Genes in modules with max MS were defined with ccRCC importance, moreover, we only incorporated modules with a significant biweight midcorrelation coefficient between GS and MM. In this study, we obtained mRNAs or lncRNAs most related to ccRCC patients using WGCNA. Differentially expressed genes between ccRCC and normal samples were identified by DESeq2 with a threshold of |log2 fold change| >1 and adjusted p-value < 0.01. Further, to obtain ccRCC-specific mRNAs and lncRNAs, we intersected genes most positively or negatively correlated with ccRCC in WGCNA with upregulated or downregulated genes in differential expression analysis, respectively.

| Construction of a dysregulated ceRNA network
Interactions between lncRNAs and miRNAs were identified by MiRcode. 14,15 The interactions between mRNAs and miRNAs were explored by StarBase (version 3.9), 16,17  TargetScan (version 7.2), 18 and miRTarBase (version 8.0) 19 databases, meanwhile, miRNA sequencing data from TCGA was employed to review the top 10% expressed miRNA, since the implementation of ceRNA function depends on abundant of miRNAs, we only included triple lncRNAs-miRNAs-mRNAs with miRNA in above miRNAs. We used Cytoscape (version 3.7.1) to depict the ceRNA network.

| Establishment of prognostic gene signature
To construct a risk assessment gene signature, a whole of 344 TCGA cases whose follow-up >30 days with all clinical characteristics were randomly divided into a discovery group and a validation group, which was used to identify the gene signature and validate the efficacy of the gene signature, respectively. Both mRNA and lncRNA in the ceRNA network were employed to univariable Cox proportional-hazards model. Then, we selected genes meeting the statistical significance (p-value <0.01) to conduct Lasso penalized Cox regression analysis, which allows for the variable selection by constraining the variable regression coefficients, even to zero, and for declining the risk of overfitting, 20 thus finding a prognostic gene signature for patients with ccRCC based on the Lasso-penalized Cox regression model coefficients (β) and gene expression levels (risk scores = β1*gene 1 + β 2*gene 2 +…..+ βn*gene n ). To assess the prognostic gene signature, we conducted a time-dependent receiver operating characteristic (ROC) curve analysis and calculated Harrell's concordance index (C-index) among discovery group, validation group, and the entire group separately.

| Construction of prognostic nomogram
A prognostic nomogram predicting 1-, 3-, and 5-year survival probability for ccRCC patients in the entire group was constructed by applying prognostic gene signature and relevant clinical characteristics to a stepwise Cox proportionalhazards model. Further, we tested the discrimination and the calibration of the nomogram by Harrell's concordance index (C-index) analysis and the calibration curves analysis.

| Validation of gene signature and prognostic nomogram
The Human Protein Atlas and GEO database were firstly searched to validate the differential expression of genes in the gene signature. Bootstrap, which could adjust overfitting and provide nearly unbiased estimates for model performance, was employed to validate the gene signature and prognostic nomogram. 21,22 Let C-index(orig) indicated C-index based on the original data and gene signature or nomogram. First, we generated a bootstrap sample from our original data with replacement, and we developed a model based on the same process constructing the gene signature or prognostic nomogram using the upper bootstrap sample, we calculated C-index based on this model using the bootstrap sample (C-index(training)) and original data (C-index(test)), optimism denoted to C-index(training)-C-index(test) which was regarded as the evaluation of overfitting. The corrected C-index was C-index(orig) subtracted optimism.

| Functional enrichment analysis
We only applied functional enrichment analysis to mRNA as mRNA is the main functional molecule in ceRNA network. The clusterProfiler 23 package was used to investigate both biological process (BP) in Gene Ontology 24,25 (GO) and functional pathways in Kyoto Encyclopedia of Genes and Genomes 26 (KEGG).

| mRNA modules correlated with ccRCC
WGCNA was employed to analyze gene modules among the top 10000 mRNAs with maximum median absolute deviation (MAD) using softpower 14, minModuleSize 25, and mergeCutHeight 0.20 as the threshold. Consequently, we identified 17 gene color modules ( Figures S1A-B). The biological process for each module was determined by GSEA ( Table 1). As shown in Figure 2A, the association between gene co-expression modules and ccRCC was explored, a total of 4005 mRNAs, which showed the highest relationship with ccRCC, was found in brown module, black module, and turquoise module. The relationships between MM and GS in three modules were then analyzed ( Figure 2B). We then conducted GO analysis to reveal these mRNAs functions in BP, which found that these genes were most related to angiogenesis, extracellular matrix organization, and response to hypoxia ( Figure S1C). Besides, genes were highly enriched in HIF-1 signaling pathway, MAPK signaling pathway, and PI3K-Akt signaling pathway by KEGG analysis ( Figure S1D).

| Determination of differential expression mRNAs (DEmRNAs)
A total of 3679 significantly upregulated and 1944 significantly downregulated mRNAs, which were depicted in volcano map ( Figure 2C), were identified by DESeq2. We used Gene Set Enrichment Analysis (GSEA) to demonstrate the biological function behind these identified differential expression genes. DEmRNAs were enriched in neutrophilmediated immunity, immune response-activating signal transduction, and neutrophil activation in biological process (BP) ( Figure S2A). Meanwhile, cytokine-cytokine receptor interaction, human T-cell leukemia virus 1 infection, and viral carcinogenesis-related genes were found upregulated in DEmRNAs, while collecting duct acid secretion, proximal tubule bicarbonate reclamation, and glyoxylate and dicarboxylate metabolism pathways were downregulated in DEmRNAs ( Figure S2B). Finally, we obtained 2191 ccRCCspecific mRNAs by intersecting mRNAs from WGNCA modules and differential expression analysis.

WGCNA and DESeq2
We investigated the co-expression network of 8332 lncR-NAs by WGCNA with softpower 7, minModuleSize 15, and mergeCutHeight 0.20 as the threshold after filtering lncRNAs with median absolute deviation (MAD) zero. Finally, we identified a total of seven coexpression modules ( Figures S3A-B) and found that brown module with 744 lncRNAs was most positively related to ccRCC and blue module including 1424 lncR-NAs presented the highest negative relationship with ccRCC ( Figure 2D and Table 2), Further, both modules possessed a significant relationship between MM and GS ( Figure 2E). At the same time, we identified 3654 upregulated lncRNAs and   Figure 2F). By intersecting them with lncRNAs in brown and blue modules, respectively, we obtained 610 lncRNAs positively correlated with ccRCC, and 767 lncRNAs with a negative relation to ccRCC.

| Development of dysregulated ceRNA network
A total of 2191 ccRCC-specific mRNAs and 1377 ccRCCspecific lncRNAs was included in the construction of dysregulated ceRNA network. Then the interaction between lncRNAs and miRNAs was identified using miRcode, following StarBase, miRTarBase, and TargetScan databases were applied to demonstrate the targeted miRNA of cancer-specific mRNAs. At the same time, miRNA sequencing data from TCGA was employed to review the top 10% expressed miRNA, we only included triple lncRNAs-miRNAs-mRNAs with miRNA in the above miRNAs. Consequently, a dysregulated ceRNA network for ccRCC was established using 7 mRNAs, 363 lncRNAs, and 3 miRNAs (Figure 3). 3.6 | Construction of gene signature using genes in the ceRNA network A total of 344 ccRCC patients with the expression levels of genes in the dysregulated ceRNA network were included in the construction of the gene signature. We randomly classified patients into two groups: a discovery group (n = 210) and a validation group (n = 134). Meanwhile, the univariate Cox proportional hazards model was applied to both lncRNAs and mRNAs in the dysregulated ceRNA network to screen for genes as biomarkers which significantly influence overall survival and prognosis in discovery group. Consequently, we obtained 21 genes including 1 mRNA and 20 lncRNAs with the threshold of p-value <0.01, subsequently these genes were applied into Lasso with a lambda based on Cross-Validation ( Figure 4A) using glmnet. 27 Finally, we obtained eight genes, namely MPP5, WT1-AS, AC114316.1, AC103719.1, AL162377.1, HS1BP3-IT1, LINC02657, and AC015909.1. The expression level of these genes associated with clinical characteristics was also depicted in a heatmap ( Figure 4B). Furthermore, we also compared the expression levels between ccRCC and normal control samples ( Figure 4C). A risk score was constructed based on Lasso Cox model coefficients and the gene expression levels (risk  Figure 5A-C, and the clinical characteristics between groups determined by median risk scores were shown in Table 3.

| Building and validation of a prognostic nomogram
Univariate Cox regression analysis showed that gene signature, ages, T stage, M stage, N stage, and AJCC stage were significantly related to the overall survival (p < 0.05), and multivariate Cox analysis further revealed that ages, N stage, and gene signature were independent risk factors. Meanwhile, a stepwise Cox regression model was employed to develop a nomogram predicting the 1-, 3-, and 5-year OS for ccRCC patients ( Figure 8A) with a C-index 0.809 (95% CI, 0.696-0.887), we also depicted the calibration curve for 1-, 3-, and 5-year survival probability ( Figure 8B) which collectively indicated a good accuracy of the prognostic nomogram. Bootstrap validation further authenticated the performance of this nomogram (corrected C-index: 0.803) ( Table 4).

| DISCUSSION
Kidney cancer accounted for 338,000 new cases and 144,000 deaths each year. 1 The most common subtype of kidney cancer ccRCC, is a complex tumor with different clinical and pathological features, genetic variation, DNA methylation profiles, and RNA and proteomic signatures. 28 Nevertheless, the TNM staging system, the most used risk assessment system for ccRCC patients, failed to consider these genomic variation of ccRCC which makes it not perfect for accurately predicting the prognosis of ccRCC patients. 29 The novel hypothesis of gene expression regulation has been confirmed to be related to the mechanism of varied diseases, especially cancer. The disturbance of the equipoise of ceRNA network was of vital importance for tumorigenesis. In gallbladder cancer, the lncRNA PVT1 which was positively related to malignancies and worse overall survival time was upregulated in gallbladder cells. PVT1 and HK2 act as a ceRNA of miR-143, which could regulate aerobic glucose metabolism in gallbladder cancer cells, and promote cell proliferation and metastasis. 30 PTAR acts as a ceRNA of miR-101 which promotes tumorigenicity and metastasis of ovarian cancer in vivo. 31 LncRNA DANCR functions as a ceRNA in osteosarcoma which could promote cell proliferation and metastasis. 32 MT1JP, which severs as a ceRNA regulating FBXW7 expression, could influence the progression of gastric cancer. 33 Thus, ceRNA network containing crucial biomarkers was of vital importance in tumorigenesis. Importantly, lncRNA-miRNA-mRNA dysregulated ceRNA network played a vital role in predicting disease prognosis. For example, in pancreatic cancer, 11 lncRNAs have been found and validated to function well in predicting prognosis. 34 T A B L E 3 Relationship between clinicopathological characteristics and risk score calculated by using the eight-gene signature The group of low risk and high risk was based on a cut point of median risk scores. Categorical variables were tested by Chi-squared and continuous variables were tested by analysis of variance.
Seven genes (LPP-AS2, MUC1, GAB2, hsa-let-7i-5p, hsa-let-7f-5p, hsa-miR-101-3p, and hsa-miR-1226-3p) in a recurrent soft tissue sarcoma-specific ceRNA network associated with recurrence and survival were identified based on the TCGA database. 35 Although there were many studies on ceRNA networks conducted in numerous cancers, nevertheless, few of them were related to ccRCC. In this study, ccRCC-specific mRNAs and lncRNAs, including 2191 mRNAs and 1377 lncRNAs, were identified by WGCNA and DESeq2. Functional enrichment analysis revealed these mRNAs involved in MAPK signaling pathway and PI3K-Akt signaling pathway both of which have been reported to tightly implicated in the progression and metastasis of malignancy. [36][37][38][39] Sunitinib and axitinib, the most common targeted drugs related to MAPK signaling pathway, were used extensively in renal cancer. 40 GSEA suggested that these mRNAs were highly involved in multiply immune-related function and pathways, like neutrophil-mediated immunity, immune response-activating signal transduction, and neutrophil activation in biological process and Cytokine-cytokine receptor interaction and Human T-cell leukemia virus 1 infection in KEGG, this may imply the importance to ccRCC of immune infiltration, which has been validated to play vital role in tumor growth and progression. 41,42 Importantly, axitinib plus immune checkpoint inhibitor has made great success in treating renal carcinoma. 43,44 Our findings may provide evidence for the combination of tyrosine kinase inhibitors and immunotherapy.
Subsequently, the dysregulated ceRNA network consisting of 363 lncRNAs, 3 miRNAs, and 7 mRNAs was determined by MiRcode, StarBase, miRTarBase, and TargetScan databases. To further explore the relationships with prognosis of these 370 genes (mRNA and lncRNA in the ceRNA network), a gene signature with eight genes, namely MPP5, WT1-AS, AC114316.1, AC103719.1, AL162377.1, HS1BP3-IT1, LINC02657, and AC015909.1, was determined by univariate Cox proportional hazard regression, Lasso and multivariate Cox proportional hazard regression analysis. The discriminations and accuracy of the gene signature were validated with C-index and time-dependent ROC curve, which all suggested that the eight genes in the model could act as biomarkers based on the patients' prognosis. Moreover, bootstrap validations also authenticated the good performance of the gene signature.
Among the eight genes in the gene signature, the exclusive mRNA MPP5, which is associated with the membraneassociated guanylate kinase family helping the construction of cell polarity, had been validated to be associated with the maintenance of cell polarity, invasion, and cell division in prostate cancer, 45 meanwhile, disruption of apical protein MPP5, which could negatively regulate YAP/TAZ abundance and activity, might promote the enrichment of oncogenic YAP and TAZ in hepatocellular carcinoma. 46 The loss of MPP5 is a hallmark of cancer is crucial for tissue organization, corresponded to the downregulated expression in ccRCC. Long noncoding RNA WT1-AS which functioned as a potential tumor suppressor was related to poor survival in cervical squamous cell carcinoma 474849 and triple-negative breast cancer (TNBC). 50 For lncRNA HS1BP3-IT1, it may be a prognosis biomarker for cholangiocarcinoma, 51 laryngeal cancer, 52 respectively. LINC02657 has been named as LASTR (lncRNA associated with SART3 regulation of splicing), which was reported to decline the fitness of cancer cells by inducing intron retention. 53 Therefore, our prediction of the ceRNA network had great confirmation of previous studies.
Nomograms were widely used as prognostic tools in oncology and medicine. By including various prognosisassociated variables and generating survival probability, nomograms can help clinicians make better treatment decisions. 54 In the present study, by including gene signature based on the dysregulated ceRNA network and other related clinical characteristics into a stepwise Cox model, a concise nomogram for the prognostic prediction of ccRCC was  (Table 4) for the nomogram further strengthened the effect in predicting the prognosis of ccRCC comparing with Jiang's nomogram (C-index 0.79; 95% CI 0.75-0.82). 55 However, there are still some limitations in our study. Although the prognostic dysregulated-ceRNA-related gene signature and nomogram were validated by internal dataset and bootstrap, the exact mechanism of the eight genes in the gene signature have not been explored in the current study and an externally validated dataset is necessary for both gene signature and nomogram. Therefore, further efforts to investigate the exact function of eight genes in ccRCC in vitro and in vivo and external validation based on a larger sample size are still required to make our findings more convincible, which is also the direction of our future work.

| Conclusions
In conclusion, a ccRCC-specific dysregulated ceRNA network was developed, followed by the determination of an eight-gene signature, which will help us better understand dysregulated ceRNA network-mediated pathway in ccRCC. Moreover, the development of nomogram, including both clinical characteristics and ccRCC-specific gene signatures, could accurately predict 1-, 3-, and 5-year OS of ccRCC, it could contribute to a better understanding of ccRCC tumorigenesis mechanism and guide clinicians to make a better treatment decision. F I G U R E 8 Building and validation of a prognostic nomogram. (A) Nomogram based on gene signatures, ages, N stage, and AJCC stage for 1-, 3-, and 5-year OS prediction. The number from 1 to 4 in AJCC stage represents AJCC stage I, II, III, and IV, respectively (B) Calibration plot for agreement test between 1-, 3-, and 5-year OS prediction and actual observation