A novel prognostic model associated with the overall survival in patients with breast cancer based on lipid metabolism‐related long noncoding RNAs

Abstract Background Lipid metabolism is closely related to the occurrence and development of breast cancer. Our purpose was to establish a novel model based on lipid metabolism‐related long noncoding RNAs (lncRNAs) and evaluate the potential clinical value in predicting prognosis for patients suffering from breast cancer. Methods RNA data and clinical information for breast cancer were obtained from the cancer genome atlas (TCGA) database. Lipid metabolism‐related lncRNAs were identified via the criteria of correlation coefficient |R 2| > 0.4 and p < 0.001, and prognostic lncRNAs were identified to establish model through Cox regression analysis. The training set and validation set were established to certify the feasibility, and all samples were separated into high‐risk group or low‐risk group. Gene Ontology (GO) and Gene Set Enrichment Analysis (GSEA) were conducted to evaluate the potential biological functions, and the immune infiltration levels were explored through Cibersortx database. Results A total of 14 lncRNAs were identified as protective genes (AC022150.4, AC061992.1, AC090948.3, AC092794.1, AC107464.3, AL021707.8, AL451085.2, AL606834.2, FLJ42351, LINC00926, LINC01871, TNFRSF14−AS1, U73166.1 and USP30−AS1) with HRs < 1 while 10 lncRNAs (AC022150.2, AC090948.1, AC243960.1, AL021707.6, ITGB2−AS1, OTUD6B−AS1, SP2−AS1, TOLLIP−AS1, Z68871.1 and ZNF337−AS1) were associated with increased risk with HRs >1. A total of 24 prognostic lncRNAs were selected to construct the model. The patients in low‐risk group were associated with better prognosis in both training set (p < 0.001) and validation set (p < 0.001). The univariate and multivariate Cox regression analyses revealed that risk score was an independent prognostic factors in both training set (p < 0.001) and validation set (p < 0.001). GO and GSEA analyses revealed that these lncRNAs were related to metabolism‐related signal pathway and immune cells signal pathway. Risk score was negatively correlated with B cells (r = −0.097, p = 0.002), NK cells (r = −0.097, p = 0.002), Plasma cells (r = −0.111, p = 3.329e‐04), T‐cells CD4 (r = −0.064, p = 0.039) and T‐cells CD8 (r = −0.322, p = 2.357e‐26) and positively correlated with Dendritic cells (r = 0.077, p = 0.013) and Monocytes (r = 0.228, p = 1.107e‐13). Conclusion The prognostic model based on lipid metabolism lncRNAs possessed an important value in survival prediction of breast cancer patients.


| INTRODUC TI ON
As the most commonly diagnosed cancer in women, breast cancer may occur in one in eight women during their lifetimes. 1,2 Although cancer treatment has significantly improved in recent decades, its mortality is still high and accounts for approximately 6.4% of mortality rate. 3 In recent decades, metabolic changes have been widely observed in a variety of cancer cells. 4 Due to the consistent change of nutrients in the tumor microenvironment, cancer cells maintain rapid proliferation, survival, migration, invasion and metastasis via lipid metabolism. 5 Lipid accumulation is recognized as a signature of cancers. 6 The reduction in lipid accumulation could suppress tumor growth. 7 Epidemiological studies also proved that fatty acid synthase that plays vital role in lipid metabolism is associated with molecular subtypes and prognosis of breast cancer. [8][9][10] Long noncoding RNAs (lncRNAs) were defined as a type of RNA more than 200 nucleotides in length without capacity to encode protein. LncRNAs participate in many significant biological processes and are closely related to breast cancer diagnosis and prognosis. 11,12 However, the mechanism of lncRNAs in transcription is still poorly understood. Our analysis was conducted to identify whether lipid metabolism related to lncRNAs could predict prognosis in breast cancer accurately.

| Gene expression and clinical information of breast cancer patients
The RNA-seq data and corresponding clinical information of 1053 breast cancer tissues and 111 normal tissues were downloaded from the TCGA database (http://www.cance rgeno me.nih.gov/). The data with complete clinical information were retained.

| Constructing prognostic model
All samples were separated into training set and validation set randomly. The risk score of each prognostic lncRNAs was calculated to construct the predictive prognostic model. According to previous article, predictive prognostic model was constructed. 13 All patients were separated into two groups based on the risk score. Kaplan-Meier plot, survival status and prognostic index distribution were drawn to compare the survival differences.

| Evaluating signature of clinicopathological variables
Clinicopathological variables (primary tumor status, lymph node status, age and stage) were associated with the prognosis of breast cancer. Clinicopathological variables and risk score of prognostic model were compared through Cox univariate and multivariate analyses. The receiver operating characteristic curve (ROC) plot was drawn to evaluate the accuracy of our model in predicting prognosis of patients.

| Gene Ontology and GSEA analyses
"Limma package" was used to identify the differentially expressed genes with the cut-off criteria of false discovery rate (FDR) < 0.05 and |fold change (FC)| > 2. Differentially expressed lncRNAs were identified to perform Gene Ontology (GO) analysis. GSEA was performed 1000 times to explore the potential functions by using "c2.cp.kegg.v7.2.symbols.

Conclusion: The prognostic model based on lipid metabolism lncRNAs possessed an
important value in survival prediction of breast cancer patients.

K E Y W O R D S
bioinformatic analysis, biomarkers, breast cancer, lipid metabolism, lncRNA gmt" as gene sets database. The p value and normalized enrichment score (NES) were applied to evaluate the potential pathways.

| Evaluating the tumor-infiltrating immune cells
The

| Identification of lipid metabolism related to lncRNAs and prognostic genes
A total of 14,142 lncRNAs were included in TCGA database, and 728 lipid metabolism related to lncRNAs were eligible for selection criteria (|R 2 | > 0.4 and p < 0.001). There were 1053 breast cancer samples in TCGA database, and 77 prognostic lncRNAs associated with overall survival (p < 0.05, Figure 1A) were identified. Totally, 24 prognostic lncRNAs were narrowed down via Step Function. Among 24 lncRNAs, 14 lncRNAs were associated with better outcome, while 10 lncRNAs were associated with worse outcome ( Figure 1B). A co-expression network was constructed in Figure 1C.

| Constructing prognostic model
All samples were divided into training set and validation set randomly at a 3:2 ratio. The characteristics of training and validation set were attached in Appendix S1. Each prognostic gene attains a score, and the risk score of each sample was calculated via the formula. Finally, according to the risk score, each sample was divided into high-risk group or low-risk group. High-risk patients were associated with worse prognosis in both training set (p < 0.001, Figure 2A) and validation set (p < 0.001, Figure 3A) whether risk score was an independent prognostic factor for breast cancer. The univariate and multivariate regression revealed that risk score (p < 0.001) was independent prognostic factor in both training set ( Figure 2D,E) and validation set ( Figure 3D,E). Multi-parameter ROC curves revealed that AUC values for risk score in training set ( Figure 2F) and validation set ( Figure 3F) were 0.834 and 0.962.

| GO and GSEA analyses
The enrichment analysis of GO revealed that these lncRNAs were re-

| The infiltrating status of immune cells
We found that the risk score was negatively correlated with B cells

| DISCUSS ION
In this study, a novel prognostic model was identified based on lipid metabolism-related genes. First, 77 prognostic lncRNAs were identified, and narrowed down to 24 genes via Step Function.
The risk score was calculated to divide each sample into high-risk which leads to the proliferation of CD8 + T cells via enhancing T-cell receptor aggregation and signal transduction. 29 In our results, it was obvious that the risk score was negatively correlated with CD8 + T cells. GSEA also revealed that risk score was down-regulated in T-cell receptor signaling pathway, which may acquire a better understanding of immune cells functions in lipid metabolism signaling pathway.
There are several limitations in our study. All breast cancer information was obtained from the TCGA database, and the patients were primarily Americans. Breast cancer patients from other regions further require confirmation with additional evidence. Inevitable bias exists in the study, because the validation set was also form TCGA database.

| CON CLUS ION
In summary, a novel prognostic model that could predict the prognosis of breast cancer patients based on 24 lipid metabolism related to lncRNAs was identified. This prognostic model not only guides the occurrence of breast cancer but also could provide evidence of the response to immunotherapy.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data analyzed in this study could be obtained from TCGA and CIBERSORTx database.