Development and verification of a nomogram for prediction of recurrence‐free survival in clear cell renal cell carcinoma

Abstract Nowadays, gene expression profiling has been widely used in screening out prognostic biomarkers in numerous kinds of carcinoma. Our studies attempt to construct a clinical nomogram which combines risk gene signature and clinical features for individual recurrent risk assessment and offer personalized managements for clear cell renal cell carcinoma. A total of 580 differentially expressed genes (DEGs) were identified via microarray. Functional analysis revealed that DEGs are of fundamental importance in ccRCC progression and metastasis. In our study, 338 ccRCC patients were retrospectively analysed and a risk gene signature which composed of 5 genes was obtained from a LASSO Cox regression model. Further analysis revealed that identified risk gene signature could usefully distinguish the patients with poor prognosis in training cohort (hazard ratio [HR] = 3.554, 95% confidence interval [CI] 2.261‐7.472, P < .0001, n = 107). Moreover, the prognostic value of this gene‐signature was independent of clinical features (P = .002). The efficacy of risk gene signature was verified in both internal and external cohorts. The area under receiver operating characteristic curve of this signature was 0.770, 0.765 and 0.774 in the training, testing and external validation cohorts, respectively. Finally, a nomogram was developed for clinicians and did well in the calibration plots. This nomogram based on risk gene signature and clinical features might provide a practical way for recurrence prediction and facilitating personalized managements of ccRCC patients after surgery.


K E Y W O R D S
clear cell renal cell carcinoma, nomogram, recurrence-free survival, risk gene signature Currently, the American Joint Committee (AJCC) staging system and the Fuhrman grading system have been universally acknowledged for cancer management clinically. 4 However, clinicians cannot acquire accurate information to estimate recurrence-free survival (RFS) or overall survival so that providing personalized treatment for ccRCC patients from the TNM and grade classification. This could be ascribed to the biological heterogeneity of cancer, and therefore, molecular exploration may help clinicians precisely make treatment decisions for ccRCC patients according to risk classification via acquiring biomarkers for prediction of recurrence. 5 Thus, it is urgent to explore new biomarkers for discriminating high-risk patients who may be inclined to have a higher probability of recurrence, thus offering personalized cancer treatment after surgery.
Clear cell RCC is a highly heterogeneous disease, resulting from complicated interaction between genetic and environmental factors. 6 Analysing gene expression profiles of different cancer tissues or cells, with different tumour stages, may be helpful for identification of characteristic risk gene signature in cancer. Nowadays, many researchers have focused on the gene expression profiles of ccRCC and tried to illuminate the underlying mechanism of progression. 7 However, few of them are used clinically. Therefore, identifying a more precise and practical risk gene model for predicting prognosis is urgently needed.
In the present study, we identified the differentially expressed genes between the normal kidney samples and ccRCC tumour tissues by gene expression microarray. A risk gene signature that can reflect the biological heterogeneity of different ccRCC patients and effectively predict clinical RFS was established via integrating gene expression profiles with matched clinical patient information.
Moreover, we combined both genomic and clinical features of patients to construct a nomogram model for more accurate recurrence evaluation and facilitating personalized management of ccRCC patients after surgery. TA B L E 1 Patient characteristics of three cohorts were shown in Figure 1. These studies were conducted with approval from the Ethics Committee.

| Microarray data and differentially expressed gene analysis
ccRCC gene expression data (GSE68417) used in this study are available on GEO (https:// www.ncbi.nlm.nih.gov/geo/) 8 . All raw data CEL files (Affymetrix Human Gene 1.0 ST Array) were processed under the same chip platform. These raw data files were downloaded and normalized by using a robust multi-array averaging method (expresso(data,bgcorrect.method ="rma", normalize. method="quantiles", pmcorrect.method="pmonly", summary. method= "medianpolish")). 9 A classical criterion of t test was adopted to identify DEGs with a change ≥twofold, and P-value cut-off <.01 was considered to be statistically significant.

| Gene ontology analysis and Kyoto Encyclopedia of Genes and Genomes analysis
The Database for Annotation Visualization and Integrated Discovery (DAVID) was used to conduct the Gene ontology analysis (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG). 10, 11 We used the human genome as the analysis background and defined P < .01 to be statistically significant.

| Identification and validation of the prognostic gene signature
In order to screen out the risk gene signature, R software (version 3.2.1) and the 'glmnet' package were applied to perform the LASSO Cox regression analysis in the training data set. The LASSO penalty was used to achieve shrinkage and variable selection simultaneously, and the optimal values of the penalty parameter lambda were determined through 10 times cross-validations.
Genes which were significantly correlated with RFS in ccRCC were screen out based on the optimal lambda value. The risk score of each patient was calculated based on the expression level of each prognostic mRNA expression and its associated coefficient.
Then, the patients in each data set were divided into low-risk and high-risk groups according to their mean risk score. Finally, we performed the Kaplan-Meier estimator and the log-rank test to assess RFS differences between above the low-risk and high-risk groups.

| Validation of hub gene expression via quantitative real-time PCR
The expression of identified hub genes was determined by qRT-

| Statistical analysis
We did a multivariate Cox regression analysis using backward selection to testify the independent of different indicators; variables (P < .05) were remained in the final model for nomogram construction.
Our nomogram was generated via rms package in R platform

| Identification of DEGs between normal kidney samples and ccRCC tumour samples
In the microarray analysis, with the criteria P < .01 and fold control (FC) ≥1.5, 580 genes were identified to be differentially expressed between 14 normal kidney samples and 29 ccRCC tumour samples. The volcano plot and heatmap were presented in Figure 2A and 2.

| Functional enrichment analysis of DEGs and selection of risk gene signature
Then, 580 DEGs were put into DAVID for functional analysis. The GO analysis, including molecular function, cellular component (CC) and biological process (BP), showed that these genes were primarily involved in cell adhesion, positive regulation of cell motility and WNT signalling pathway ( Figure 2C). To further elucidate the potential functional pathways of DEGs, we conducted KEGG pathway enrichment analysis. B cell receptor signalling pathway, NF-kappa B signalling pathway and WNT signalling pathway were considered to be the most significantly enriched pathways ( Figure 2D). LASSO Cox regression was used for further analysis, and 5 of these differentially expressed genes were identified to be significantly related to RFS of ccRCC ( Table 2). The risk scores of each patient were calculated by a formula which was derived from the expression level of five risk genes weighted by regression coefficient.

| To further demonstrate the expression of risk genes via RT-QPCR
To further confirm the expression of identified risk genes from network-based analysis, RT-QPCR assay of five hub genes (CD19, FGF2,

| Construction and validation of risk gene signature score model for predicting RFS of ccRCC patients
The 215 patients were randomly divided into training data set

| Independence of 5-gene signature risk score model for RFS prediction from clinical features
To determine whether the prognostic value of five gene-signature was independent of patient clinical features, we performed the univariable and multivariate Cox regression analyses using RFS as the dependent variable and five gene-signature score, age, gender, tumour stage, grade, lymph node invasion and necrosis as covariates in TA B L E 2 mRNA significantly associated with the recurrence-free survival in Training dataset. T/N: expression in ccRCC samples/ expression in normal kidney samples  Tables 3-5).
Besides, we introduced the stratification based on tumour stage.
We further stratified ccRCC patients into two subgroups where the AJCC stages I and II were fictitiously described as an early-stage stratum and the AJCC stages III and IV as a late-stage stratum. Result from Figure 5A-F indicated that the risk gene signature still had the ability to distinguish that the outcome of patients with high-risk score was dramatically worse than that with low-risk score both in the early-stage and late-stage stratums.
Receiver operating characteristic analysis was also performed to testify the specificity and sensitivity of RFS prediction in each data set ( Figure 6A

| Construction of nomogram combined 5gene signature with the other clinical features for personalized prediction
To come up with a useful approach to predict the risk of recurrence so as to facilitate personalized management of ccRCC patients, we

| D ISCUSS I ON
In this retrospective study, results indicated that our risk gene signature model developed in this research could categorize patients who had significantly different RFS into the low-and high-risk groups. Cancers are recognized as heterogeneous disease. 13 Thus, identifying the dysregulated genes in tumour carcinogenesis and progression could be helpful for improving prognostic and therapeutic strategies. 14 Nowadays, development in microarray has contribute to the acquisition of large amounts of data which is useful for exploring molecular mechanisms, risk stratification and guiding strategies for clinical therapy in different cancers. [15][16][17] In our study, microarray analysis was performed to acquire different expressed genes between normal kidney and ccRCC. Risk gene signature classifier was generated to predict recurrence risk, and its prognostic value was verified in both internal and external validation cohorts.
Moreover, this indicator is independent of clinical features and possessed a similar predictive power compared with those widely used indicators for ccRCC such as AJCC stage and tumour grade.
Among these genes, MAP4K1 was previously known to positively regulate cell motility and thereby to influence tumour cell invasion in the medulloblastoma and colon carcinoma. [18][19][20] Beside, Lourdes and Wang, Y indicated that the MAP4K1 was related to the progression of bladder cancer 21,22 ; STAT6 was found to promote intestinal tumorigenesis in the mouse model via inhibition of cytotoxic CD8 response 23 and was involved in lymphoma. 24,25 It is reported that activation of FGFR1 by its ligand fibroblast growth factor 2 (FGF2) could promote cell proliferation, epithelial-mesenchymal transition and invasion in lung cancer. 26 Overexpression of FGF2 could also induce EMT in malignant pleural mesothelioma cells via MAPK/MMP1 signal and confer the poor prognosis. 27 DCN is found to be a novel biomarker for the diagnosis of colon cancer by using iTRAQ-tagging and 2D-LC-MS/MS. 28 Researchers found that T cells with chimeric antigen receptors (CAR T cells) which targets human CD19 (hCD19) have shown great efficacy against B cell malignancies. 29 Therefore, our risk gene signature could potentially serve as a predictive appliance for personalized treatment and might also be potential target for clinical therapeutic targets of ccRCC. . Nomogram-predicted probability of recurrence is plotted on the x-axis; actual recurrence is plotted on the y-axis. The red line represents the predictive efficacy of our nomogram ratios in our study), these factors are unable to guide personalized treatment. 30,31 Importantly, when patients presented with the same stage or grade, these traditional factors are unable to predict an individual's risk. Therefore, our nomogram combined individual gene signature reflecting the biological heterogeneity of different ccRCC patients with traditional prognostic factors providing insights into a patient's clinicopathologic features so as to elevated the accuracy of individual RFS prediction. 32,33 We also demonstrated the performance of our nomogram in validation cohorts. However, this research is retrospective and our sample size is still limited. Thus, our risk gene model and nomogram still require further validation in multicenter clinical trials. Besides, we will validate the efficiency of our nomogram in other ccRCC patient cohorts in the following studies.

| CON CLUS IONS
This is the first research to combine gene expression profiles with clinical information for predicting clinical prognosis of ccRCC patients. Our results show that the risk gene signature can effectively classify ccRCC patients into high and low-risk groups. Moreover, this nomogram might help clinicians accurately and personally predict the prognosis of patients with ccRCC after nephrectomy.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

AUTH O R CO NTR I B UTI O N S
YLC, SJJ, LQX, JYL and GHL conceptualized the data; WH, YLC and SJJ contributed to methodology; LQX, LYL and DWX provided software; YLC, SJJ and ZYL investigated the study; LQX and LYL provided resources; YLC and SJJ curated the data; YLC wrote-original draft preparation; YLC wrote-review and editing; ZYL and LWX visualized the data; GHL supervised the study; GHL administrated the project; GHL acquired funding.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in GEO (https:// www.ncbi.nlm.nih.gov/geo/) and TCGA (https :// www.cancer.gov/about-nci/organ izati on/ccg/resea rch/struc turalgenom ics/tcga).