Construction of a robust prognostic model for adult adrenocortical carcinoma: Results from bioinformatics and real‐world data

Abstract This study aims to construct a robust prognostic model for adult adrenocortical carcinoma (ACC) by large‐scale multiomics analysis and real‐world data. The RPPA data, gene expression profiles and clinical information of adult ACC patients were obtained from The Cancer Proteome Atlas (TCPA), Gene Expression Omnibus (GEO) and The Cancer Genome Atlas (TCGA). Integrated prognosis‐related proteins (IPRPs) model was constructed. Immunohistochemistry was used to validate the prognostic value of the IPRPs model in Fudan University Shanghai Cancer Center (FUSCC) cohort. 76 ACC cases from TCGA and 22 ACC cases from GSE10927 in NCBI’s GEO database with full data for clinical information and gene expression were utilized to validate the effectiveness of the IPRPs model. Higher FASN (P = .039), FIBRONECTIN (P < .001), TFRC (P < .001), TSC1 (P < .001) expression indicated significantly worse overall survival for adult ACC patients. Risk assessment suggested significantly a strong predictive capacity of IPRPs model for poor overall survival (P < .05). IPRPs model showed a little stronger ability for predicting prognosis than Ki‐67 protein in FUSCC cohort (P = .003, HR = 3.947; P = .005, HR = 3.787). In external validation of IPRPs model using gene expression data, IPRPs model showed strong ability for predicting prognosis in TCGA cohort (P = .005, HR = 3.061) and it exhibited best ability for predicting prognosis in GSE10927 cohort (P = .0898, HR = 2.318). This research constructed IPRPs model for predicting adult ACC patients’ prognosis using proteomic data, gene expression data and real‐world data and this prognostic model showed stronger predictive value than other biomarkers (Ki‐67, Beta‐catenin, etc) in multi‐cohorts.


| INTRODUC TI ON
Adrenocortical carcinoma (ACC) is a rare and aggressive endocrine malignancy with high risk of relapse, poor survival and limited treatment options. The Surveillance, Epidemiology and End Results (SEER) database estimates that the annual incidence rate of ACC is approximately 0.72 per million cancer cases, resulting in 0.2% of all cancer deaths in the United States. 1 However, ACC shows highly aggressive biological behaviour with less than 35% of patients surviving 5 years after initial diagnosis. 2 Therefore, appropriate treatment is extremely important. The current preferred treatment of ACC is based on surgical resection of the primary tumour that is usually the first and most effective therapeutic strategy. [3][4][5] Currently, there are very few drugs to treat this disease and mitotane remains the only medication approved by the US Food and Drug Administration for ACC treatment. 6 Thus, new treatment options and drug targets are urgently needed, especially for clinical management of patients with ACC who are resistant to mitotane.
Proteomics is a powerful tool for detecting unknown protein species, exploring absolute quantified protein abundance, and identifying biomarkers for pathogenic process. 7,8 Proteomics has been used widely to explore biomarkers for various diseases 9 and the latest developments in proteomics have made it possible to conduct more comprehensive examinations of protein biomarkers in various cancers. 10 For instance, Bouchal et al 11 used transcriptome and proteomic analysis to identify potential biomarkers associated with metastatic breast cancer, and several proteomic studies have focused on identifying new diagnostic biomarkers in patients with prostate cancer. 12,13 Thus far, very few studies have used a largescale sequencing proteomic approach to identify potential protein biomarkers for ACC. 14 Bioinformatics studies have generated large amounts of complex biological data through combinations of computer science, information technology and biology. For example, The Cancer Proteome Atlas (TCPA) database provides researchers with reverse-phase protein array (RPPA) data. 15 The RPPA technique is a powerful proteomic approach for economical, sensitive and high-throughput evaluation of sizable numbers of selected protein markers, which made it possible to explore protein biomarkers using bioinformatics. 16,17 Because there is a big difference between adult patients and child patients with ACC, in this study, we focused only on adult patients. This study constitutes the first large-scale proteomic analysis combined with transcriptome data to describe the protein landscape of ACC in adult patients.
To explore novel protein biomarkers of potential prognostic value and develop a protein-derived predictive model in adult patients with ACC, we analysed the survival of proteins and constructed an integrated prognosis-related proteins model on risk assessment.
Gene expression profiles also were analysed to reveal the underlying biological interaction networks. The goal of this study was to provide potential novel therapeutic targets and a high performing prognostic predictive model for clinical management of adult ACC.

| Data downloading and processing
The RPPA data (level 4) of adult ACC were obtained from The Cancer Proteome Atlas (TCPA). The gene expression profiles and clinical information of patients with ACC were downloaded from The Cancer Genome Atlas (TCGA). Preprocessing and normalization of the raw biological data were performed using R software to remove noise and ensure the integrity of the data. By matching the sample IDs, we obtained 46 ACC cases with full data for clinical information, protein abundance and gene expression. We also obtained 76 ACC cases from TCGA (Table 1) and 22 ACC cases from GSE10927 18 ( Table 2) in NCBI's GEO database with full data for clinical information and gene expression. All the cases were patients over 18 years old.

| Survival analysis of candidate proteins
Kaplan-Meier analysis was performed based on the median protein abundance value and univariate Cox regression was used to evaluate the prognostic value of candidate proteins. For both statistics, P-values < .05 were considered significant. The volcano plot was obtained using the ggplot2 package in R software. 19 Red indicates negative association between protein abundance and survival, green indicates positive association between protein abundance and survival, and black indicates no statistical significance. Survival curves were drawn using the survival package in R software. Red indicates high-risk group, and blue indicates low-risk group. 20

| Screening of candidate proteins and construction of a predictive multivariate Cox model
Lasso Cox regression was used to further narrow the proteins with prognostic significance using the glmnet package in R software. 21 Multivariate analysis was performed using the Cox proportional hazards regression model to identify candidate proteins and evaluate the risk score based on candidate protein abundance and survival rates. An integrated prognosis-related proteins (IPRPs) model was then constructed (Risk score = 2.743 × fibronectin abundance (ref.
. Median risk score of the predictive IPRPs model was used as the cut-off value and patients were classified into high-risk or low-risk groups.

IPRPs model in TCPA cohort
Besides the risk score of the IPRPs model for the patients with ACC, the covariables for the univariate and multivariate Cox regression models included age, gender, pTstage, pNstage, pMstage and pathologic stage. A receiver operating characteristic (ROC) curve was constructed to analyse the diagnostic accuracy of the logistic model and the area under curve (AUC) was calculated. Co-abundance analysis was performed using Pearson's test to identify proteins associated with the logistic model with 0.4 set as the correlation coefficient cut-off value. Survival curves and a scatter diagram were used to explore the correlation between risk score and patient's prognosis, and a heat map of candidate protein abundance in the high-risk and low-risk groups was drawn.

| Validation of the IPRPs model in a cohort from the Fudan University Shanghai Cancer Center (FUSCC) in China
Real-world data were collected to validate the prognostic value of the IPRPs model. The cohort included 39 adult patients with ACC (Table 3) from the FUSCC between 2013 and 2019, and tumour specimens were Kaplan-Meier method was applied to validate the prognostic value of the model, and the median of the risk score was set as the cut-off value.

| Comparing the IPRPs model with other biomarkers using gene expression data
The number of patients with proteomic data was low, and therefore we used the gene expression data for the prognostic validation. The IPRPs model was compared with other biomarkers in the TCGA cohort (76 cases) and GSE10927 (22 cases). Survival analyses were carried out using the Kaplan-Meier method and median of gene expression was set as the cut-off value. AUC, C-index and net reclassification improvement (NRI) were calculated to compare IPRPs model with other biomarkers.

F I G U R E 1
Survival analysis and screening of proteins. In the volcano plot (A), red and green separately represent high-and low-risk candidate protein biomarkers. 42 proteins with both P-value < .05 (Kaplan-Meier analysis and univariate Cox regression analysis) were selected and listed in Table 1. The model of Lasso cox regression (B-C)

| Gene set enrichment analysis (GSEA)
To explore potential associated signal pathways, the TCGA datasets of the high-risk and low-risk groups (according to risk score of the IPRPs model) were analysed using the GSEA software (version 3.0) with the number of permutations set to 1000. False discoveryadjusted P-values were obtained using the Benjamini and Hochberg method. 22 Significant differential expression was defined as an adjusted P-value of < .01 and a false discovery rate of < 0.25.

| Identification of differentially expressed genes (DEGs) related to risk score of the IPRPs model
The DEGs (adjusted P-value < 0.01; fold change at least 2×) between the high-risk and low-risk groups were identified using the Limma package. 23  Functional enrichment analysis of the hub genes was completed using the ClusterProfiler package. 27

| RE SULTS
In this work, we aimed to explore new prognostic biomarkers for adult patients with ACC using proteomics and transcriptomics data.
A flow chart of the methods used in this study is given in Figure S1.

| Selection for candidate proteins with significant prognostic value
From the volcano plot ( Figure 1A), 42 candidate protein biomarkers with P-values < 0.05 in both the Kaplan-Meier analysis and univariate Cox regression analysis were selected and are listed in Table 4.
The Lasso Cox regression results for the selected proteins are shown in Figure 1B, C.

| Construction of the IPRPs model
In the univariate Cox regression analysis (Figure 2A), the pathological stage (P < .001), pTstage (P < .001), pMstage (P = .001) and risk score of the IPRPs model (P < .01) were associated with shorter overall survival. However, in the multivariate Cox regression analysis, only risk score (P < .05) was significantly correlated with worse outcome ( Figure 2B). C-index (0.939, 95% CI:0.916-0.962) and NRI (0.235, 95% CI:0-0.597) indicated that our model is stable. These results indicate that our IPRPs model has independent prognostic significance. The risk score with AUC of 0.933 indicates the diagnostic accuracy and consistent predictive ability of our IPRPs model ( Figure 2C).

| Validation of the prognostic value of the IPRPs model in the FUSCC cohort
Representative IHC plots for the ACC samples are displayed in

| External validation of the IPRPs model and comparison with other biomarkers using gene expression data
In the TCGA cohort ( Figure 5A-F Table 5, and it indicated that IPRPs model may act better than other biomarkers in RPPA data and IHC.

| Significantly involved pathways of the IPRPs
The top 100 genes that were most significant positively and negatively correlated with the risk score are depicted in a heat map ( Figure 6A). Besides an ACC progressive phenotype, the GSEA in-

| Identification of DEGs associated with the IPRPs
A significant difference was detected between the gene expression in high-risk and low-risk groups as shown in the heat map ( Figure 7A).
A PPI network of the DEGs was constructed and the identified hub  (Table 6 and Figure 7D, E).

| Correlation analysis between the IPRPs and other potential signatures
Various types of proteins may be associated with the candidate proteins as shown in Figure 2D. were correlated with the TFRC abundance; among them, CYCLINB1 abundance was highly positively correlated with TFRC abundance (correlation coefficient = 0.607) ( Figure S2D).

| D ISCUSS I ON
The prognosis of ACC is poor because most patients with ACC have locally advanced or metastatic diseases and cannot be treated by surgery. Approximately 66% of patients with localized diseases experience recurrence and usually require systematic treatment. 28,29 Although there are diagnostic and prognostic molecular detection methods for ACC, including IGF2, p53, and the Wnt/β-catenin and PI3K signalling pathways, they have not been well applied in morphological evaluation, auxiliary diagnosis, or prognostic modelling  FN is a large extracellular matrix protein in bones, which can combine with itself and collagen to form a network. 38 Studies have shown that the abundance of FN in breast cancer is higher than in normal tissues and FN abundance is significantly related to the invasiveness of the disease. 39 Knowles et al 40  Iron is a basic trace element involved in cell metabolism, division and proliferation, and iron also has been considered as an important factor in the development of cancer. 41 TFRC is a cell surface receptor that is responsible for transferrin-mediated iron uptake; thus, TFRC may play a key role in the energy supply of cancer cells. 42 Shpyleva et al 43 found a high abundance of TFRC in breast cancer, and TFRC antibodies have been used to inhibit tumour growth. 44 We found mutual inhibition between TFRC and SMAC, and that the abundances of X1433ZETA, ERK2 and CYCLINB1 were positively correlated with TFRC abundance. Modulation of PPIs is a promising new idea in drug development 45,46 ; thus, the design of TFRC inhibitors based on the interaction modes may create new therapeutic drugs.
TSC1, in a complex with tuberous sclerosis 2, inhibits the nutrientmediated or growth factor-stimulated phosphorylation of S6K1 and EIF4EBP1 by negative regulation of mTORC1 signal transduction. 47,48 We also found interactions between various types of proteins and TSC1. Among them, PARP1 abundance showed the highest correlation with TSC1 (correlation coefficient = 0.742) and it attracted our attention because of its key role in DNA repair. 49 Maintaining the integrity of the genome is the basis of cell survival, and PARP inhibi- They only focused on gene expression data, which is usually considered unstable than protein data. And they just identified 9 hub genes with prognostic value without any further validation. In our study, we

| CON CLUS ION
We constructed an IPRPs model for predicting the prognosis of adult patients with ACC using proteomic data, gene expression data and real-world data. The prognostic model showed a stronger predictive value for prognosis than other biomarkers (eg Ki-67 and betacatenin) in multi-cohorts. Our results distinguished FASN, FN, TFRC and TSC1 from previously identified tumour promoters and revealed novel prediction model IPRPs that outperformed the currently established prognostic parameters for anticipating disease course and better clinical management of adult ACC.

ACK N OWLED G EM ENTS
We thank the TCPA, TCGA and GEO database for providing RPPA data and gene expression profiles of ACC.

CO N FLI C T O F I NTE R E S T S
The authors declare no competing interests.

E TH I C A L A PPROVA L
The Ethics approval and consent to participate of the current study was approved and consented by the ethics committee of Fudan University Shanghai Cancer center.

DATA AVA I L A B I L I T Y S TAT E M E N T
The datasets analysed during the current study available from the corresponding author on reasonable request.