Identification of autophagy‐related long non‐coding RNA prognostic signature for breast cancer

Abstract Autophagy‐related long non‐coding RNAs (lncRNAs) disorders are related to the occurrence and development of breast cancer. The purpose of this study is to explore whether autophagy‐related lncRNA can predict the prognosis of breast cancer patients. The autophagy‐related lncRNAs prognostic signature was constructed by Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression. We identified five autophagy‐related lncRNAs (MAPT‐AS1, LINC01871, AL122010.1, AC090912.1, AC061992.1) associated with prognostic value, and they were used to construct an autophagy‐related lncRNA prognostic signature (ALPS) model. ALPS model offered an independent prognostic value (HR = 1.664, 1.381‐2.006), where this risk score of the model was significantly related to the TNM stage, ER, PR and HER2 status in breast cancer patients. Nomogram could be utilized to predict survival for patients with breast cancer. Principal component analysis and Sankey Diagram results indicated that the distribution of five lncRNAs from the ALPS model tends to be low‐risk. Gene set enrichment analysis showed that the high‐risk group was enriched in autophagy and cancer‐related pathways, and the low‐risk group was enriched in regulatory immune‐related pathways. These results indicated that the ALPS model composed of five autophagy‐related lncRNAs could predict the prognosis of breast cancer patients.

more care on the prognostic markers of autophagy-related genes in different types of cancer. 7,8 Recent studies have shown that the regulation of autophagy is involved in the resistance of breast tumours to chemotherapy drugs. 9 Moreover, long-chain non-coding RNA (lncRNA) is a non-coding RNA with a length of more than 200 bp. LncRNA is widely involved in the biological behaviour of breast cancer, such as proliferation, apoptosis, invasion and metastasis. [10][11][12] Interestingly, lncRNA also plays a vital role in regulating autophagy. 13 Studies have shown that lncRNA-mediated autophagy phenomenon plays an important role in breast cancer resistant to tamoxifen or trastuzuma. 14,15 On the other hand, increasing evidence has been presented that the use of autophagy-related lncRNAs to predict tumour patients outcomes. 16 In this study, we hypothesized that a variable model composed of multiple autophagy-related lncRNAs could be used to predict the prognosis of breast cancer patients. The lncRNA, mRNA expression dataset and clinical pathological features of breast cancer from The Cancer Genome Atlas (TCGA), were used to assess prognostic value of autophagy-related lncRNAs. Finally, we employed an autophagyrelated lncRNA prognostic signature (ALPS) model to effectively predict the prognosis of breast cancer patients.

| Patient data sets
RNA-seq expression and clinical information of 1,108 breast cancer patients were obtained from The Cancer Genome Atlas (TCGA) data portal (https://cance rgeno me.nih.gov/). Ensembl human genome browser, GRH38.p13 (http://asia.ensem bl.org/index.html), was used to annotate and classify 14,142 lncRNAs and 19,658 protein-coding genes. Male subjects or patients with less than 30 days overall survival (OS) were excluded, 1,027 breast cancer patients were used in the present study.
The patients were randomly divided into a training and testing group.
After excluding patients with incomplete clinical pathological data, this study enrolled 569 patients for subsequent analysis.

| Identification of autophagy-related lncRNAs in breast cancer
232 autophagy-related genes come from the Human Autophagy Database (HADB; http://www.autop hagy.lu/index.html). Moussay et al detailed descriptions of human autophagy-related genes. 17 The Pearson correlation coefficient method was used to screen autophagy-related lncRNAs with |R|>0.3 and P < 0.001.

| Construction of autophagy-related lncRNA prognostic signatures for breast cancer
The univariate Cox regression model was used to analyse the relationship between the expression level of autophagy-related lncRNA and the OS in breast cancer patients (P < 0.05). A Least Absolute Shrinkage and Selection Operator (LASSO) Cox regression analysis of prognostic-related autophagy-associated lncRNAs using the 'glmnet' package was performed in R software. To evaluate its independent prognostic effect on survival, multivariate Cox regression analysis was used to analyse autophagy-related lncRNAs candidates. Therefore, an ALPS model composed of five autophagy-related lncRNAs was constructed. This ALPS model selects the best lncRNA prognostic markers based on the lowest Akaike information criterion (AIC) value. The risk score of each patient was calculated according to the following formula: Risk Score = ∑ n k = 0 coef (k) * x (k), where coef(k) and x(k) represent regression coefficient and the expressive value of each autophagy-related lncRNA, respectively. 18

| Independent prognostic analysis and ROC curve plotting
The Kaplan-Meier survival curve and log-rank test were used to compare the OS of the high-risk group and the low-risk group. The cut-off value of the risk score was employed to divide patients into high and low-risk groups. Cox proportional risk modelling was fitted to estimate crude and multivariable-adjusted hazard ratios (HRs) and 95% confidence intervals (CI), and potential covariates involved age, TNM stage, tumour size (T), lymph node metastasis (N), distant metastasis (M), risk score, ER status, PR status and HER2 status. The accuracy of each clinicopathological feature and risk score in predicting survival time was evaluated by the receiver operating characteristic (ROC) curve.

| Nomogram
Nomogram was utilized to predict the probable 1-year, 3-year and 5-year survival of breast cancer patients. A nomogram was constructed by integrating with clinical pathological variables such as age, stage, T stage, N stage, M stage, ER status, PR status, HER2 status and the risk score derived from the prognostic signature.

| Principal component analysis (PCA) and Gene set enrichment analysis (GSEA)
PCA was used to investigate the distribution of patients with different risk score. GSEA version 4.0.3 (Broad Institute, USA) was used to analyse the genes that were differentially expressed between the high-and low-risk group patients. 1000 permutations were selected, and Affymetrix was used as the chip platform for the calculation of the normalized enrichment score (NES). Normal P-value < 0.05 and false discovery rate (FDR q-value) <0.25 were considered significantly enriched. 19

| Construction of the LncRNA-mRNA co-expression network
The correlation between autophagy-related lncRNA and its coexpressed mRNA was analysed by co-expression network and Sankey Diagram. Cytoscape software (version 3.7.1, http://www. cytos cape.org/) and ggalluvial R package were used to visualize the co-expression network and Sankey Diagram. 20

| Statistical analysis
The statistical analysis of all data was performed using R software (version 4.0.3, https://www.r-proje ct.org/). P < 0.05 was regarded as statistically significant.

| Identification of prognostically significant autophagy-related lncRNAs in breast cancer patient tissue samples
1,270 autophagy-related lncRNAs were identified from 14,142 lncR-NAs and 232 autophagy-related genes via the criterion with |R|>0.3 and P < 0.001. Univariate Cox proportional hazard analysis showed that 41 autophagy-related lncRNAs were significantly related to the survival of breast cancer patients (Table S1). In the training group, LASSO Cox regression was used to screen prognostic autophagyrelated lncRNAs based on 1,000 times ten-fold cross-validation ( Figure 1A,B). Multivariate Cox analysis further ascertained five lncR-NAs with prognostic significance, namely MAPT-AS1, LINC01871, AL122010.1, AC090912.1, AC061992.1. These five lncRNAs were employed to construct an ALPS model ( Figure 1C).

| Evaluation of the ALPS model consisting of five autophagy-related lncRNAs
According to median value of risk score based on ALPS model, breast cancer patients were divided into high-risk groups and low-risk groups. Draw risk curves and scatter plots were used to illustrate the risk score and corresponding survival status of breast cancer patients. The results showed that the higher the risk score, the higher the mortality rate was observed in the training group, test group and combined group, respectively (

| Correlation of the risk score of ALPS model with clinicopathological factors
To further explore whether the ALPS model was associated with the characteristics of breast cancer, we evaluated the relationship between the risk score of ALPS model and clinical characteristics.
The risk score of ALPS model was significantly correlated with Stage, ER status, PR status and HER2 status (Table 1) Figure S2 visualized the co-expression network of five autophagy-related lncRNAs and their regulated mRNAs from the ALPS model.

| The ALPS model is an independent prognostic factor for patients with breast cancer
Next, we performed univariate and multivariate Cox regression analyses to determine that ALPS model could be used as an independent F I G U R E 3 Prognostic significance analysis of the ALPS. A-C, Kaplan-Meier survival curve analysis shows that survival time of patients with high-risk scores based on the autophagy-related lncRNA prognostic signature is significantly shorter than those with low-risk scores in the training group, testing group, and combined group. D-F, The AUC for risk model score and clinical features according to the ROC curves in the training group, testing group, and combined group. Clinical feature: Age, stage, and T, N, M stage, and ER, PR, HER2 status risk factor for breast cancer patients. Multivariate Cox regression analysis showed that age (HR = 1.061, 1.040-1.082, P < 0.001) and the risk score of ALPS model (HR = 1.664, 1.381-2.006, P < 0.001) were independently associated with OS ( Figure 6B). These data indicated that the ALPS model was an independent prognostic factor in breast cancer patients. A nomogram map was performed to predict   Figure 6C).

| Principal component analysis and Gene set enrichment analysis
We performed PCA maps to visualize the distribution of patients based on the whole genome, autophagy-related gene sets, autophagy-related lncRNAs and five lncRNAs from the ALPS model ( Figure 7A-D). The results showed that, different from other gene sets, the five lncRNAs from ALPS model tend to be low-risk distribution. GSEA results showed that the genes enriched in highrisk breast cancer patients were related to positive regulation of TGF-beta signalling pathway, P53 signalling pathway ( Figure 7E, F).
Anti-cancer immunomodulatory pathways were significantly upregulated in the low-risk group, including pathways related to antigen processing and presentation, T cell receptor signal transduction ( Figure 7G, H).

| D ISCUSS I ON
In this study based on autophagy-related lncRNAs and clinical data from TCGA, we found that an autophagy-related lncRNA prognostic  Figure S2). For example, in this study, FAS and CASP1 co-expressed with LINC01871 are associated with promoting cell apoptosis. 31,32 We can speculate that LINC01871 may be a protective factor in breast cancer.

ACK N OWLED G EM ENTS
We thank the The Cancer Genome Atlas (TCGA) network for sharing large amounts of data.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data utilized in this study are included in this article, and all data supporting the findings of this study are available on reasonable request from the corresponding author.