Comprehensive analysis of autophagy‐related prognostic genes in breast cancer

Abstract Accumulating evidence revealed that autophagy played vital roles in breast cancer (BC) progression. Thus, the aim of this study was to investigate the prognostic value of autophagy‐related genes (ARGs) and develop a ARG‐based model to evaluate 5‐year overall survival (OS) in BC patients. We acquired ARG expression profiling in a large BC cohort (N = 1007) from The Cancer Genome Atlas (TCGA) database. The correlation between ARGs and OS was confirmed by the LASSO and Cox regression analyses. A predictive model was established based on independent prognostic variables. Thus, time‐dependent receiver operating curve (ROC), calibration plot, decision curve and subgroup analysis were conducted to determine the predictive performance of ARG‐based model. Four ARGs (ATG4A, IFNG, NRG1 and SERPINA1) were identified using the LASSO and multivariate Cox regression analyses. A ARG‐based model was constructed based on the four ARGs and two clinicopathological risk factors (age and TNM stage), dividing patients into high‐risk and low‐risk groups. The 5‐year OS of patients in the low‐risk group was higher than that in the high‐risk group (P < 0.0001). Time‐dependent ROC at 5 years indicated that the four ARG–based tool had better prognostic accuracy than TNM stage in the training cohort (AUC: 0.731 vs 0.640, P < 0.01) and validation cohort (AUC: 0.804 vs 0.671, P < 0.01). The mutation frequencies of the four ARGs (ATG4A, IFNG, NRG1 and SERPINA1) were 0.9%, 2.8%, 8% and 1.3%, respectively. We built and verified a novel four ARG–based nomogram, a credible approach to predict 5‐year OS in BC, which can assist oncologists in determining effective therapeutic strategies.


| INTRODUC TI ON
Breast cancer (BC) is the major health burden because of the high rates of morbidity and cancer death among women. [1][2][3] Although great medical advances in BC screening, diagnosis, and comprehensive therapy, its survival outcome is not entirely satisfactory. Thus, individual treatment strategies of breast cancer still present huge challenges. Currently, therapeutic schemes and prognostic assessment for BC patients are mainly based on the tumour-node-metastasis (TNM) staging system and molecular subtypes. [4][5][6] The TNM staging system has been widely applied for cancer management, but it is unable to achieve individualized survival prediction. [7][8][9] In addition, some cases at the same TNM stage have distinct survival outcome. Then, many BC patients did not receive optimal treatment strategies and had to face with unfavourable survival prognosis. 10 Therefore, the identification of novel signatures to improve prognostic assessment and clinically guide treatment decisions is indispensable in routine practice.
During the past decade, genome expression profiling can effectively provide detailed information for survival prediction in cancer patients. [11][12][13] Moreover, gene biomarkers have been considered as an important tool to predict the prognostic outcome and treatment responses for cancer patients. [14][15][16] Autophagy, also recognizing as type II programmed cell death, is a vital biological process that keeps the stability of the intracellular environment. 17,18 Besides, autophagy can degrade and recycle components of aged or damaged organelles to promote rapid growth of cancer cells. 19 Thus, autophagy-related proteins are able to regulate tumour growth and progression. 20 Several studies have investigated the prognostic roles of autophagy-related gene (ARG) signatures in various tumours. [21][22][23][24][25] However, little attention has been paid to identify the association between ARGs and overall survival (OS) in BC via high-throughput expression profiles.
The high-throughput platform made contributions to genomic analysis in medical oncology with huge clinical significances. Thus, identifying a novel ARG signature using high-throughput expression profile to predict 5-year OS for BC patients is urgently needed.
Hence, in this study, to improve prognostic evaluation in BC patients, we sought to identify ARG signature from The Cancer Genome Atlas (TCGA) database. A nomogram integrating the prognostic ARGs and clinicopathological risk factors was established to predict 5-year OS and achieve effective risk stratification for BC patients.

| Patients and study design
The gene expression profiles and clinical information for BC patients were obtained from the TCGA database. Inclusion criteria were included: (a) patients with invasive BC; (b) complete gene expression profiles and follow-up information; and (c) survival time more than 1 month. Eventually, 1007 BC patients (training cohort) satisfying the inclusion criteria were selected in this study. Based on the computer allocation numbers, 504 cases as validation set were randomly confirmed from the training set.
We acquired TCGA data from public database (https://cance rgeno me.nih.gov/), and hence, ethical approval and patient's informed consent were waived in this study.

| Development of risk score and the ARGbased model
The 232 ARGs were acquired from the HADb (Human Autophagy Database, http://autop hagy.lu/clust ering/ index.html). Firstly, univariate Cox proportional hazards regression analysis (CPHRA) was conducted to investigate the association between the expression level of ARGs and 5-year OS. Next, the ARGs were recognized as significant variables when the P value was <0.05 in the univariate CPHRA. Subsequently, the least absolute shrinkage and selection operator (LASSO) regression analysis was performed to screen out ARGs as candidate genes. LASSO regression analysis is able to shrink regression coefficients towards zero. And on the basis of the regulation weight λ, ARGs with the regression coefficients (equal to zero) were excluded. Finally, the multivariate CPHRA was implemented to identify the independent prognostic ARGs for 5-year OS in BC patients (P < 0.05). In the light of the expression values of selected ARGs, the risk score was constructed and weighted by multivariate Cox regression coefficients. The risk score = sum of regression coefficients × expression level of ARGs.
Finally, a ARG-based model, integrating the prognostic ARGs and clinicopathological risk factors, was formulated. According to the ARG-based model, the prognostic nomogram score was calculated for each patient.

| Statistical analysis
Distribution differences in variables in the two independent cohorts were evaluated using the chi-square and the Mann-Whitney U test.
Univariate, LASSO and multivariate Cox regression analyses were performed to identify the independent indicators of OS (P < 0.05).
Then, Cox regression coefficients were used to build a risk score and four ARG-based nomogram. Time-dependent ROC curve analysis is extensively carried out to assess the predictive performance of the four ARG-based model. Survival curves were drawn using the Kaplan-Meier method and contrasted via the log-rank test. OS (primary end-point) was calculated as the interval from surgery to the time of death or last follow-up. We confirmed the optimal cut-off value of the ARG-based nomogram using X-tile plot. 32 P < 0.05 was considered as statistically significant.

| Identification of the four ARG signatures
The detailed flow chart of the study procedure is presented in Figure 1.
And detailed baseline characteristics of included patients are shown in Table 1. There were no distinct differences in patient characteristics between two independent cohorts in Table 1 (P > 0.05). In the light of univariate CPHRA, 39 ARGs were preliminary screen out in the training cohort (P < 0.05). Subsequently, LASSO Cox regression analysis was carried out to determine 17 ARGs as candidate genes in the primary data set ( Figure S1). To assess the potential function of the 17 prognostic ARGs, functional enrichment analysis was performed via  differentiation, regulation of cellular response to heat and establishment of protein localization to organelle ( Figure S2). Next, multivariate CPHRA was applied to further identify four ARGs (ATG4A, IFNG, NRG1 and SERPINA1) as independent prognostic indicators in the training cohort. Moreover, we also found that the mutation frequencies of the four ARGs (ATG4A, IFNG, NRG1 and SERPINA1) were 0.9%, 2.8%, 8% and 1.3% in the TCGA database, respectively. The common mutation type of the four ARGs was copy-number amplification ( Figure S3).
Some studies have revealed that abnormal DNA methylation is a vital mechanism for epigenetic silencing of gene expression in variety of tumours. Thus, we explored this relationship between the four ARGs and DNA methylation in the TCGA database via the MEXPRESS tool (http://mexpr ess.be/). 35 For example, as shown in Figure S4, there was a distinct inverse association between DNA methylation and ATG4A These results suggested that the multiple DNA methylation regions may play important roles in regulating the four ARG expression.

| Establishment of four ARG-based risk score and prognostic model
Using the multivariate Cox regression coefficients, we computed a risk score for each patient; risk score = (0.903 × expression level of  Table 2. Finally, the risk score and two clinicopathological risk factors (age and TNM stage) were verified as independent predictors for 5-year OS (P < 0.05). Thus, to provide the oncologists with a quantitative technique for predicting the 5-year OS, we developed a ARG-based nomogram, which incorporated the four ARG-based risk score and two clinicopathological risk factors (age and TNM stage) ( Figure 2).

| Assessment of the four ARG-based prognostic model
The scores and survival status are manifested in Figure 4, which demonstrated that patients with higher scores had worse OS than that of those with lower scores (P < 0.001). Stratified analyses were applied for BC patients in T stage, N stage and TNM stage, suggesting that the four ARG-based model had distinct risk stratification ability ( Figure 5). The patients with high-risk scores had distinctly worse OS than patients with low-risk scores in T1 (P = 0.00022), T2 (P < 0.0001), T3/4 (P < 0.0001), N0 (P < 0.0001), N1 (P < 0.0001), N2 (P = 0.00011) and N3 (P = 0.015), TNM stage I/II (P < 0.0001) and TNM stage III/ IV (P < 0.0001) ( Figure 5).

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets used and analysed during the current study are available from the corresponding author on reasonable request.