Identification of small molecule drugs and development of a novel autophagy‐related prognostic signature for kidney renal clear cell carcinoma

Abstract Abnormal autophagic levels have been implicated in the pathogenesis of multiple cancers, however, its role in tumors is complex and has not yet been explored clearly. Hence, we aimed to explore the prognostic values of autophagy‐related genes (ARGs) for kidney renal clear cell carcinoma (KIRC). Differentially expressed ARGs and transcription factors (TFs) were identified in KIRC patients obtaining from the The Cancer Genome Atlas (TCGA) database. Then, networks between TFs and ARGs, gene ontology functional annotations and Kyoto Encyclopedia of Genes and Genomes pathway enrichment analysis were conducted. Next, we performed consensus clustering, COX regression analysis and Lasso regression analysis to identify the prognostic ARGs. Finally, an individual prognostic index (PI, riskScore) was established. Based on TCGA cohort and ArrayExpress cohort, Survival analysis, ROC curve, independent prognostic analysis, and clinical correlation analysis were also performed to evaluate this PI. Based on differentially expressed ARGs, KIRC patients were successfully divided into two clusters (P = 5.916e‐04). AS for PI, it was constructed based on 11 ARGs and significantly classified KIRC patients into high‐risk group and low‐risk group in terms of OS (P = 4.885e‐15 for TCGA cohort, P = 6.366e‐03 for ArrayExpress cohort). AUC of its ROC curve reached 0.747 for TCGA cohort and 0.779 for ArrayExpress cohort. What's more, this PI was proven to be a valuable independent prognostic factor in both univariate and multivariate COX regression analysis (P < .001). Prognostic nomograms were also performed to visualize the relationship between individual predictors and survival rates in patients with KIRC. By means of connectivity map database, emetine, cephaeline and co‐dergocrine mesilate related to ARGs were found to be negatively correlated with KIRC. This study provided an effective PI for KIRC and also displayed networks between TFs and ARGs. KIRC patients were successfully divided into two clusters based on differentially expressed ARGs. Besides, small molecule drugs related to ARGs were also identified for KIRC.


| INTRODUCTION
Autophagy as a conservative metabolic way to maintain the homeostasis of the cell environment in the body, it can be expressed in all cells and selectively recover necrotic, injured and genetically deficient cells or tissues, providing energy for the body or regulating the stability of organ function. 1 However, the definite role of autophagy in tumors is complex and has not yet been explored clearly. Autophagy has different mechanisms in different types of cancer, tumor microenvironments, tumor stages and so on. 2,3 Jan Karlseder of the Salk Institute in the United States in a recent study further explained the mechanism by which autophagy inhibits cancer in the early stages of cancer that it was closely related to the 'replication crisis' caused by telomere fusion during cell division. 4 With the development of cancer, the role of autophagy in tumor would change from inhibiting cancer to promote cancer survival. 5,6 Even in established tumors, autophagy could also help tumor cells resist a variety of drug treatments. 7,8 Globally, there were 404,300 new cases of renal cell cancer and 175,100 new deaths of this tumor in 2018, ranking 16th in morbidity and mortality among all cancers, making it the third most common tumor in the urinary system. 9 Although most renal cell carcinoma (RCC) developed slowly, many patients would already have metastasis at the time of detection due to the lack of typical symptoms. 10 Moreover, renal cancer cells lacked sensitivity to radiotherapy and chemotherapy. Surgery remained the mainstay of therapy, however there were still 1/4 of these patients having a recurrence or metastasis after surgical treatment. 11,12 Due to the lack of accurate predictive markers for the prognosis of patients with RCC, the establishment of an effective prognosis prediction model is of great significance for the management of patients in the whole course of the disease.
Thanks to the availability of high-throughput expression data nowadays, it has become feasible for us to use public database data for analyzing the associations between autophagy-related genes (ARGs) and the clinical outcomes of kidney renal clear cell carcinoma (KIRC) patients. Here, we explored the associations between ARGs and transcription factors (TFs) and constructed prognosis prediction indexes for KIRC, based on related transcriptome profiling data and clinical information downloaded from the The Cancer Genome Atlas (TCGA) database. Our study was anticipated to provide new insights of autophagy for future work.

| Acquisition and preparation of data
Transcriptome profiling data and related clinical information of KIRC were downloaded from TCGA Data Portal (https://tcga-data.nci.nih.gov/tcga/; accessed August 2019) and ArrayExpress (https://www.ebi.ac.uk/array expre ss/; accessed March 2020). The Human Autophagy Database (HADb, http://autop hagy.lu/clust ering/ index.html) is a dedicated database reserving human ARGs. We did an overlap by comparing the obtained RNA-seq data with the HADb database. Then, the RNA-seq data were background corrected and standardized by the R programming language.

| Identification and enrichment analysis of differently expressed ARGs
Differently expressed ARGs were carried out by using "Lima" package in R statistical software between KIRC and solid tissue normal samples. The threshold for identification of ARGs was set as adjusted P-value (FDR) < .05 and |log-2fold changes (FC)| > 1. Gene functional enrichment analyses, including gene ontology (GO) functional annotations and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis, were conducted to analyze the biological functions, cellular localization, and signaling pathways of targeted genes. In this study, we performed GO and KEGG enrichment analysis on differentially expressed ARGs by using the "clusterProfiler" R package.

TFs and construction of a network between TFs and ARGs
The Cistrome database (http://www.cistr ome.org/) is a comprehensive resource for predicted TF targets and enhancer profiles in cancers. The prediction was from integrative analysis of TCGA expression profiles and public ChIP-seq profiles. Differently expressed TFs were carried out by using "Lima" package in R statistical software between KIRC and solid tissue normal samples. Correlation test between differently expressed TFs and ARGs was performed by R programming language. Moreover, correlation coefficient at least 0.4 corresponding to a P < .01 were selected as the significantly correlated.

| Cluster analysis
In order to show whether autophagy has an important impact on the overall prognosis of patients with KIRC, consensus clustering was performed to divide patients into clusters based on the differently expressed ARGs. "ConsensusClusterPlus" package in R statistical software was adopted to perform the Consensus clustering.

| Establishment of an independent prognostic index (PI, riskScore) based on ARGs
In order to identify the key ARGs, a univariate COX regression analysis was firstly performed by us to exclude some ARGs with little prognostic value. Subsequently, a multivariate Cox regression analysis was utilized to remove the genes that might not be an independent indicator in prognosis monitoring. In addition, in order to prevent the occurrence of overfitting, we also used Lasso regression to remove key ARGs highly correlated with one other. According to the weight of each gene in Lasso regression analysis, we finally obtained the correlation coefficient in the model formula for predicting the prognosis of patients. Combined with the expression of various prognosisrelated genes, we established an independent prognostic model. The PI (riskScore) was calculated using the following formula β1 × gene1 expression + β2 × gene2 expression + ⋯ + βn × genen expression, where β corresponded to the correlation coefficient.

TCGA cohort and ArrayExpress cohort
According to our prognostic model, each patient in TCGA cohort and ArrayExpress cohort will get a risk score. In each cohort, we set the median risk score as the cutoff value for dividing KIRC patients into a high-risk group and a low-risk group, respectively. Kaplan-Meier (K-M) method was utilized to plot the survival curves, and the log-rank test was performed to assess differences in the survival rates between high-risk group and low-risk group. The receiver operating characteristic curves (ROC) were created by the "survival-ROC" package, and the area under the curve (AUC) values was calculated to evaluate the specificity and sensitivity of the model. The riskScore distribution of patients in different risk groups, the number of censored patients, and the heatmap of prognosis-related ARGs were also displayed. A prognostic nomogram was also performed to visualize the relationship between individual predictors and survival rates in patients with KIRC based on the Cox proportional hazard regression model by means of "rms" package of R software. C-index and the Calibration curves were used to evaluated the performance of the prognostic nomogram.
To further evaluate whether our model can be used as an independent prognostic factor, we included age, gender, stage, race, grade, T, M, N, and PI as independent variables. And then we did univariate cox regression analysis and multivariate cox regression analysis on the changes of survival time and survival outcome. Multivariate ROC curves were also made to evaluate the prognostic value of each variable. Finally, we combined various clinical variables and riskScore to make a new nomogram to predict the survival outcome of patients in different cohorts.
In addition, we also made a clinical correlation analysis to analyze the correlation between PI and clinical features such as age, gender, stage, race, grade, T, M, N. Besides, the correlation between each prognosis-related ARGs and clinical features such as age, gender, stage, race, grade, T, M, N were also analyzed.

| Identification of candidate small molecule drugs
Connectivity map (cMap), as a gene expression profiles database led by Todd Golub and Eric Lander, it facilitated researchers to quickly identify molecule drugs highly correlated with diseases and discover its possible mechanism. 13 Up-regulated and down-regulated ARGs related to KIRC were uploaded and then functional connection between genes and bioactive chemicals was explored. Connectivity scores ranging from −1 to 1 were utilized to estimate how closely a compound is connected to the query signature. Positive score indicated that the query signature could be promoted by a drug, while a negative score could be repressed by a drug in cMap.

| Statistical analysis
Statistical analyses of all data utilized in this article were completed by R software (version 3.4.1, https://www.rproje ct.org/). When the difference met a joint satisfaction of FDR < 0.05 and |log2FC| > 1, it was regarded to be statistically significant. "ConsensusClusterPlus" package was adopted to perform the Consensus clustering. The univariate and multivariate COX regression analysis were used to evaluate the relationship between ARGs expression and survival data to establish a prognostic model. "rms" package of R software was used to create the nomogram. The receiver operating characteristic curves were created by the "surviv-alROC" package of R and AUC values were also calculated by this package too. All statistical tests were two-sided and P < .05 was considered to be statistically significant.

| Differentially expressed ARGs
The flow diagram for this study was displayed in Figure S1. Through the online TCGA database, we obtained the RNA sequences and clinical information of 539 KIRC samples and matched 72 solid tissue normal samples. By comparing autophagy-related genes from HADb, we finally obtained the expression of 232 relevant genes. In order to further screen out valuable differentially expressed ARGs, we set the joint satisfaction of FDR < 0.05 and |log2FC| > 1 to the filtration condition. Heatmap of differently expressed ARGs was presented in Figure 1A. Figure 1B was a volcano map showing 9 down-regulated and 36 up-regulated differentially expressed ARGs. Boxplot of these ARGs was detailed in Figure 1C.

| Functional annotation of differentially expressed ARGs
In order to better understand the functions and mechanisms of these ARGs, we analyzed the enrichment of GO terms function and KEGG pathway. The results of the functional enrichment analysis are summarized in Figure 2. Table 1 lists the top 10 main GO entries and the KEGG pathways. In terms of biological processes, these differential genes are mainly concentrated in autophagy, regulation of peptidase, and endopeptidase activity and so on. Separately, autophagosome is the highest enrichment level in GO terms for cellular components, protein heterodimerization activity, and peptidase regulator activity were most enriched GO terms for molecular function ( Figure 2A; Table 1). In addition, the results of KEGG pathway enrichment analysis were shown in Figure 2B, which shows that F I G U R E 1 Differentially expressed autophagy-related genes (ARGs); A, Heatmap of differentially expressed ARGs; B, Volcano map of differentially expressed ARGs; C, Boxplot of differentially expressed ARGs these differentially expressed ARGs were closely related to Human cytomegalovirus infection, Autophagy animal, HIF-1 signaling pathway, and other functional pathways. We also show the correlation between these differentially expressed ARGs and the related pathways in the form of heatmap ( Figure 2C).

| Differentially expressed TFs
By comparing genetic sequences data with TFs from Cistrome, we finally obtained the expression of 317 relevant TFs. In order to further screen out valuable differentially expressed TFs, we set the joint satisfaction of FDR < 0.05 and |log2FC| > 1 to the filtration condition. Heatmap of differently expressed ARGs was presented in Figure 3A. Figure 3B was a volcano map showing 19 down-regulated and 41 up-regulated differentially expressed TFs. Networks between TFs and ARGs were detailed in Figure 3C.

| Identification of clusters for KIRC based on ARGs
Furthermore, we did consensus clustering for patients with KIRC based on ARGs. Figure 4A-C suggested that satisfactory clustering effect could be obtained when k = 3. However, Figure 4D-E suggested that k = 2 is the best option. Finally, patients with KIRC were divided into two groups (cluster 1 and cluster 2). Then, the clinical characteristics and survival curves of these two groups were analyzed. From Figure 4G, we found that there was a significant correlation between tumor stage, grade, age, fustat, and clustering. According to Kaplan-Meier analysis, we noticed that the survival of patients in cluster 2 was worse than that in cluster 1 ( Figure 4H). Considering the distribution of clinical features and survival curve, this clustering method had a certain significance.
After obtaining the PI (riskScore) based on ARGs for predicting the prognosis of KIRC patients, we got the riskScore of each patient in TCGA cohort. Then, we divided patients into two groups (high-risk group and low-risk group) according the median riskScore. Next, we evaluated this model in TCGA cohort from the following aspects: clinical characteristics, survival curve, ROC curve, and prognostic nomogram ( Figures 5 and 6). Figure 5E showed that the higher the risk scores, the higher the patients in high-risk group, and the higher the numbers of dead persons. The heatmap of these 11 key genes expression profiles in the TCGA dataset was also detailed in this figure. Kaplan-Meier plot represents that patients in the high-risk group had significantly shorter overall survival time than those in the low-risk group (P = 4.885e-15, Figure 5A). From the ROC curve of Figure 5C, AUC of this model for predicting prognosis reached 0.747, having a moderate prediction accuracy. In addition, a prognostic nomogram was created to quantify the relationship between these risk genes and survival. From this nomogram, we could obtain the total points and estimate the 1-year, 2-year, and 3-year survival rate of each patient ( Figure 6A). Table 2 showed the evaluation results for this nomogram (the C-index and the AUC). The Calibration curves ( Figure 6C-D) further clarified the accuracy of this nomogram.

| Verification of the model in external cohort
In order to verify whether our model was reliable, we used it to analyze the external cohort from ArrayExpress database (E-MTAB-1980). The external cohort contained 101 KIRC patients. Similarly, we calculated the  riskScore of each patient based on PI, and divided the patients into high-risk group and low-risk group according to the cut-off value we obtained in the TCGA cohort. A Kaplan-Meier curve based on the log-rank test and the ROC curve were created to visualize the prognostic value of our established prognostic model in external cohort ( Figure 5B,D). The areas under the ROC (AUC) values of PI was 0.779. Figure 5F showed that the higher the risk scores, the higher the patients in high-risk group, and the higher the numbers of dead persons. The heatmap of these 11 key genes expression profiles in the external cohort was also detailed in this figure. In addition, a prognostic nomogram was also created to quantify the relationship between these risk genes and survival in external cohort ( Figure 6B). The corresponding evaluation results of this nomogram are shown in Table 2 and Figure 6E, F.

| Independent prognostic factor evaluation and correlation with clinical characteristics
To further evaluate whether our model could be used as an independent prognostic factor, we included age, gender, stage, race, grade, T, M, N and riskScore as independent variables. By means of univariate and multivariate cox regression analysis, our established PI (riskScore) remained significant (both P < .001, Figure 7A,B, 3). Figure 7C presented the multiple ROC curves according to riskScore, age, gender, race, grade, stage, T, N, M. The AUC of the ROC curve made by riskScore and stage was among the largest two (0.747 and 0.800 respectively). We next made a prognostic nomogram to quantify the relationship between clinical traits and survival in TCGA cohort and in external cohort, respectively, and its evaluation ( Figure 8; Table 2). In order to further evaluate the relationship between 11 prognostic ARGs, riskScore, and clinical characteristics, we further made the independent t tests. Table 4 detailed the riskScore is significantly related to stage, T, M, and grade (P < .05). Besides, the correlation between 11 prognosis-related ARGs and clinical features such as age, gender, stage, race, grade, T, M, N was also analyzed. Bold fonts represented that P value was <.05.

| Identification of relevant small molecule drugs
cMap database was utilized to screen out candidate small molecule drugs related to ARGs of KIRC. Based on differently expressed ARGs of KIRC, the most significantly related small molecule drugs were identified. P < .05, |mean| > 0.5 and n ≥ 4 were set as the threshold. As detailed in Table 5, 10 small molecule drugs were negatively correlated with KIRC containing emetine, cephaeline, co-dergocrine mesilate, tobramycin, fluvastatin, piribedil, pivampicillin, saquinavir, methylprednisolone, and ifenprodil, indicating the potential to repress this disease. Four small molecule drugs were positively correlated with KIRC containing thioproperazine, copper sulfate, carbachol, and bambuterol, indicating the potential to promote this disease.

| DISCUSSION
The concept of transformational medicine was first put forward in < lancet>, emphasizing clinical application as the center, to transform the results of basic scientific research into valuable clinical applications. 14 Currently, surgical resection remained the main method for the treatment of renal cell carcinoma, however these postoperative patients still had a high possibility of recurrence and their survival status varied differently. 15,16 Hence, an effective way to predict the prognosis of renal cell carcinoma was of great significance to guide the whole process managing patients with renal cell carcinoma.
At present, TMN staging, UISS risk grading system and SSIGN scoring system provided a certain reference value for evaluating the prognosis of renal cell carcinoma patients [17][18][19]. Moreover, some prognostic molecular markers such as p53 and PTEN had also been further explored by researchers, 20 but their efficiency was not so satisfactory. Due to the development of high-throughput sequencing, it had become feasible for us to use public database (TCGA, GEO or other databases) data for analyzing the associations between different key genes and the clinical outcomes of KIRC. Not long ago, a study by Wang et al have successfully established an autophagy-clinical prognostic index in bladder cancer patients. 21 In this article, we not only constructed a prognosis prediction index for KIRC in both TCGA and ArrayExpress databases, but also explored associations between ARGs and TFs. Moreover, we successfully divided KIRC patients into two clusters based on differentially expressed ARGs. Our study was anticipated to provide new insights of autophagy for future work.
We made full use of the RNAseq data in the TCGA database to find autophagy related genes with high correlation with KIRC survival. Finally, we obtained 45 differentially expressed ARGs, and analyzed their functions including GO analysis and KEEG pathways. Preliminary analysis showed that the expression of these ARGs is mostly up-regulated in some of the most important pathways, which provided us with a reference that autophagy might play a role in promoting tumor development. However, the expressions of some ARGs were down-regulated, which might be related to the complex mechanism of autophagy in tumors. 22 Interestingly, we also performed a consensus clustering analysis of existing renal cancer patients based on these ARGs, and KIRC patients were successfully divided into two clusters with significant differences in overall survival, indicating that ARGs might play an important role in the prognosis of patients with KIRC. By means of cMap database, 10 small molecule drugs were negatively correlated with KIRC containing emetine, cephaeline, co-dergocrine mesilate, tobramycin, fluvastatin, piribedil, pivampicillin, saquinavir, methylprednisolone, and ifenprodil, indicating the potential to repress this disease. Four small molecule drugs were positively correlated with KIRC containing thioproperazine, copper sulfate, carbachol, and bambuterol, indicating the potential to promote this disease.

CX3CL1
, PRKCQ, and ATG16L2). Most of these ARGs had been reported and were consistent with the role in our study. Therein, BNIP3 was an interacting protein of BCL2, which was considered to be a surface receptor of mitochondria, regulating cell death and promoting survival in some diseases. 23,24 BID played a similar role by activating BAX/ BAK 25 and ERBB2, also known as HER2, was a member of the human epidermal growth factor receptor family, promoting cell proliferation, survival, and playing an important role in the occurrence and development of tumor.
Clinically, ERBB2 mutation had become an important target for cancer therapy. CX3CL1 was the ligand of chemokine CX3CR1, which could promote tumor infiltrating cells into tumor microenvironment and play the role of immunotherapy. 26 As for IFNG, it was believed that IFNG could affect the blocking of immune checkpoints. 27 Eukaryotic initiation factor 4e binding protein (EIF4EBP1), as an important gene regulating autophagy, had been found to be highly expressed in many cancers with poor prognosis of tumor. 28 PRKCQ was an important kinase in the activation

of T cells. Its phosphorylation induced the activation of
Fra-1 and played an important role in tumor recurrence and invasion. 29,30 CASP4, as a kind of human apoptotic protease, was considered to be related to inflammation, immune activity and apoptosis. 31,32 It has also been found to be related to the poor prognosis of esophageal cancer, colorectal cancer and breast cancer. [33][34][35] Contrary with the reported results, the HR value of BAG1 was <1 suggesting that it was associated with a better prognosis,. 36,37 ATG16L2 had been found to be associated with positive prognosis. [38][39][40] However, the results of our multivariate COX regression analysis suggested that ATG16L2 played an opposite role in KIRC patients. This also brought us new thinking about the roles of BAG1 and ATG16L2 in kidney cancer. Further, based on these 11 prognostic ARGs and clinical characteristics in both TCGA and ArrayExpress databases, an individualized KIRC PI (riskScore) was established. The strength of this article was that we performed a systematic analysis of the roles of autophagy in KIRC with a robust statistical approach. The KIRC PI was successfully established and carefully evaluated in TCGA cohort and ArrayExpress cohort. Moreover, networks between ARGs and TFs were constructed for future basic research. Last but not least, tumor clustering based on ARGs was effective, indicating that ARGs played a vital role in KIRC. However, the limitations of the present article should not be ignored. On the one hand, we only discussed the relationship between ARGs and the prognosis of KIRC patients, without further clarifying the specific mechanism. On the other hand, the results of our study were only validated in the KIRC patient data in the TCGA database and ArrayExpress database. Retrospective data analysis made our prediction model valuable in the training set. Whether it had real application value or not, required more data support from clinical patients.

| CONCLUSIONS
Taken together, an individualized KIRC PI (riskScore) was successfully established in both TCGA and ArrayExpress databases. Based on clinical characteristics and 11 key ARGs (CASP4, IFNG, BAG1, BNIP3, ERBB2, RGS19, BID, EIF4EBP1, CX3CL1, PRKCQ, and ATG16L2), our study realized the transformation of a large number of sequencing data and clinical features to the clinical diagnosis and treatment methods. Besides, networks between TFs and ARGs were also displayed and KIRC patients were successfully divided into two clusters based on differentially expressed ARGs. Last but not least, small molecule drugs related to ARGs were also identified for KIRC. Our findings were anticipated to provide new insights of autophagy for future work.

ACKNOWLEDGMENTS
We thank the researchers and study participants for their contributions.

CONFLICT OF INTEREST
None declared.