A robust signature of immune‐related long non‐coding RNA to predict the prognosis of bladder cancer

Abstract Background Bladder cancer is the second most common malignant tumor in the urogenital system. The research investigated the prognostic role of immune‐related long non‐coding RNA (lncRNA) in bladder cancer. Methods We extracted 411 bladder cancer samples from The Cancer Genome Atlas database. Single‐sample gene set enrichment analysis was employed to assess the immune cell infiltration of these samples. We recognized differentially expressed lncRNAs between tumors and paracancerous tissues, and differentially expressed lncRNAs between the high and low immune cell infiltration groups. Venn diagram analysis detected differentially expressed lncRNAs that intersected the above groups. LncRNAs with prognostic significance were identified by regression analysis. Multivariate Cox analysis was used to establish the risk score model. Then we established and evaluated the nomogram. Additionally, we performed gene set enrichment analysis to explore the potential functions of the screened lncRNAs in tumor pathogenesis. Results Three hundred and twenty differentially expressed lncRNAs were recognized. We randomly divided patients into the training data set and the testing data set at a 2: 1 ratio. In the training data set, 9 immune‐related lncRNAs with prognostic significance were identified. The risk score model was constructed to classify patients as high‐ and low‐risk cohorts. Patients in the low‐risk cohort had better survival outcomes than those in the high‐risk cohort. The nomogram was established based on the indicators including age, gender, tumor‐node‐metastases stage, and risk score. The model's predictive performance was confirmed by the receiver operating characteristic curve analysis, concordance index method, calibration curve, and decision curve analysis. The testing data set also achieved similar results. Bioinformatics analysis suggested that the 9‐lncRNA signature was involved in the modulation of various immune responses, antigen processing and presentation, and T cell receptor signaling pathway. Conclusions Our study uncovered the prognostic value of immune‐related lncRNAs for bladder cancer and showed that they may regulate tumor pathogenesis in various ways.


| INTRODUCTION
Bladder cancer is the second most common urological malignancy in the world with approximately 573,000 new cases and 213,000 deaths in 2020. 1 It was reported that the prognosis of bladder cancer patients was strictly related to the immune microenvironment of tumor tissues. 2 Accumulated evidence has verified the therapeutic role of immune checkpoint inhibitors in bladder cancer, including atezolizumab, avelumab, durvalumab, nivolumab, and pembrolizumab. 3 A recent research demonstrated that pembrolizumab could prolong the progression-free survival of patients with high RNA-based immune signature scores, 4 which suggested that we might identify immune-related prognostic indicators to improve the prognosis of bladder cancer patients and guide their treatment.
Long non-coding RNAs (lncRNAs) are a group of RNAs that participate in the human physiological and pathological processes by interacting with specific RNAs and proteins. In recent years, it has been discovered that lncRNAs were involved in tumor growth and progression. 5 In bladder cancer, lncRNA plays a vital role in lymphatic metastasis, epithelial-mesenchymal transformation, proliferation, migration, and apoptosis of tumor cells. 6,7 LncRNA SOX2OT could maintain the stemness phenotype of bladder cancer stem cells and serve as an adverse indicator of clinical outcomes and prognosis. 8 Furthermore, the exosomal lncRNA LNMAT2 could stimulate the formation and migration of lymphatic endothelial cells tube, and intensify the cancer lymphangiogenesis and lymphatic metastasis in bladder cancer. 9 Therefore, lncRNA, as a novel biological marker, offers broad prospects for the early diagnosis and prognosis prediction of bladder cancer.
Studies have demonstrated that immune-related ln-cRNAs have a unique value in the prognosis of several cancers. The heterogeneous expression of lncRNAs was identified among different immune-infiltrating groups in muscle-invasive bladder cancer. 10 Shen et al. 11 recognized 11 immune-related lncRNAs as prognostic markers for breast cancer, whose signature was related to the infiltration of immune cell subtypes. Li et al. 12 screened seven immune-related lncRNAs in low-grade glioma and confirmed that these lncRNAs have prognostic value in patients. Cao et al. 13 have screened five immune-related lncRNAs in bladder cancer but the signature's area under the curve (AUC) was relatively low (AUC = 0.666). Tong et al. 14 have raised an epithelialmesenchymal transition-related lncRNA signature in bladder cancer, which has included too many lncRNAs. Therefore, we aimed to propose a novel signature of immune-related lncRNA to predict the prognosis of bladder cancer.
In the study, we analyzed the data set of lncRNAs and corresponding clinical information from the Cancer Genome Atlas (TCGA) and screened for immune-related lncRNAs by single-sample gene set enrichment analysis (ssGSEA). Furthermore, we established a prognostic model based on these lncRNAs and explored their potential biological functions in bladder cancer.

| Bladder cancer sample sources and grouping
Gene expression data (RNA-Seq), lncRNA sequencing data, and corresponding clinical data of bladder cancer were downloaded from the TCGA database (https:// portal.gdc.cancer.gov). Twenty-nine immune cell data sets were applied to evaluate the infiltration level of immune cells through the ssGSEA method (Table S1). 15 After that, patients were classified as the high and low immune cell infiltration groups using the hclust package. The stromal score, immune score, and tumor purity score were calculated by the ESTIMATE algorithm to verify the effectiveness of ssGSEA groupings. 16 In addition, we assessed the difference between the two groups by analyzing the expression of the human leukocyte antigen (HLA) gene. CIBERSORT algorithm was employed to determine the infiltration of various immune cells in the tumor sample and verify the potency of the immune groupings again. 17 was involved in the modulation of various immune responses, antigen processing and presentation, and T cell receptor signaling pathway.

Conclusions:
Our study uncovered the prognostic value of immune-related lncRNAs for bladder cancer and showed that they may regulate tumor pathogenesis in various ways.

K E Y W O R D S
bladder cancer, immune, lncRNA, prognostic model, TCGA 2.2 | Screening of immunerelated lncRNA |log 2 Fold Change (FC)| > 0.5 and p < 0.05 were set as the standard to recognize the differentially expressed lncR-NAs between the high and low immune cell infiltration groups by edgeR package. Differentially expressed lncR-NAs between bladder cancer and adjacent tissues were also identified by the same method. Venn diagram analysis was used to screen out immune-related lncRNAs in bladder cancer from the above two sets.

| Construction of the risk score model
Patients with a follow-up of more than 30 days were randomly divided into the training and testing sets at a ratio of 2:1 (cross-validation method). In the training set, univariate Cox regression was performed on immune-related lncRNAs and clinical data to identify prognosis-related lncRNAs. We conducted the least absolute shrinkage and selection operator (LASSO) regression analysis to screen crucial lncRNAs which were tightly associated with overall survival (OS). Survival analyses were performed on the lncRNAs, respectively, to further screen lncRNAs with prognostic significance. The multivariate Cox regression model was utilized to calculate the respective coefficients (β i ) of selected lncRNAs. Then, a risk score model consisting of β i and lncRNA expression levels (Expi) was established to appraise the risk score of each patient. We set the median risk score as a cutoff value and divided patients into high-risk and low-risk groups. Kaplan-Meier survival analysis was performed to compare the OS between the two groups. Receiver operating characteristic (ROC) curve analysis was utilized to evaluate the predictive efficacy of the model. The risk curve and scatter plot were generated to illustrate the risk score and survival status of each sample. The heatmap showed the expression profiles of the signature in the two groups. The correlation between risk scores and immune infiltration subtypes was analyzed by the Pearson correlation. The testing data set was applied to validate the above results.

| Establishment and evaluation of the nomogram
We evaluated the prognostic significance of risk score and clinical variables such as age, gender, and tumor-nodemetastases (TNM) stage by univariate and multivariate Cox regression analyses. The nomogram was established according to the results of multivariate Cox regression to predict each patient's 3-and 5-year OS. We conducted the ROC curve analysis, concordance index (C-index) method, calibration curve method, and decision curve analysis (DCA) to assess the model's accuracy. Finally, the testing set data were used to evaluate the above results.

| Gene set enrichment analysis
We executed Gene Ontology (GO) enrichment analysis and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis to investigate the potential pathways in which the immune-related lncRNAs may participate.

| Statistical analysis
All statistical analysis was accomplished by R version 3.6.2 (Institute for Statistics and Mathematics, Vienna, Austria; https://www.r-proje ct.org). The correlation was determined by Pearson correlation analysis. Chi-square test and t-test were utilized to compare clinical variables. Survival status was assessed by the Cox regression analysis. OS was generated by the Kaplan-Meier method and evaluated by the log-rank test. Two-tailed p < 0.05 was considered statistically significant.

| Grouping and identification of bladder cancer samples
The flowchart of our research is shown in Figure 1. Information on 411 bladder cancer tissues and 19 adjacent tissues were obtained from the TCGA database. Three hundred and ninety-three patients with a follow-up of more than 30 days were included in the study (Table S2). The transcriptome data of bladder cancer samples were analyzed by the ssGSEA method to assess the level of immune cell infiltration level. An unsupervised hierarchical clustering algorithm was employed to divide patients into the high immune cell infiltration group (n = 85) and the low immune cell infiltration group (n = 326) ( Figure 2A). The ESTIMATE algorithm was used to calculate the ESTIMATE score, immune score, stromal score, and tumor purity of all samples. Compared with the low immune cell infiltration group, the high immune cell infiltration group presented higher ESTIMATE score, higher immune score, higher stromal score, and lower tumor purity (p < 0.001) ( Figure 2B-E). The expression of HLA family genes in the high immune cell infiltration group was higher than that in the low immune cell infiltration group (p < 0.001) ( Figure 2F). In addition, the CIBERSORT method revealed that the high immune cell infiltration group had a higher density of immune cells ( Figure 2G). Overall, our results indicated that the bladder cancer grouping was feasible.

| Identification of the immunerelated lncRNAs
We recognized 2067 differentially expressed lncRNAs between tumors and adjacent tissues ( Figure 3A) and 1076 differentially expressed lncRNAs between the high and low immune cell infiltration groups ( Figure 3B). The Venn diagram analysis detected 320 differentially expressed lncRNAs that intersected the above groups ( Figure 3C). Taking together, we screened 320 immune-related lncR-NAs in bladder cancer.

| Construction and assessment of the risk score model
In the training data set, univariate Cox regression was performed on immune-related lncRNAs to identify 38 prognosis-associated lncRNAs ( Figure 4A). LASSO  Figure S1). β i was calculated (Table 1) to establish the risk score model: Risk score = ∑ 9 i = 1 ( i * exp i ). We set the median risk score as a cutoff value and divided 411 patients into high-risk and low-risk groups. The Kaplan-Meier curve disclosed that the OS in the low-risk group was significantly better than that in the high-risk group (p = 7.542e-05) ( Figure 5A). The risk curve and scatter plot indicated that the risk coefficient and mortality of patients in the high-risk group were F I G U R E 2 Establishment and verification of bladder cancer grouping. (A) The heatmap showed the unsupervised clustering of 29 immune cells in the high immune cell infiltration group (Immunity_H) and the low immune cell infiltration group (Immunity_L). Parameters including the tumor purity, ESTIMATE score, immune score, and stromal score were also displayed. (B-E) The box plots revealed the statistical differences in tumor purity, ESTIMATE score, immune score, and stromal score between the high and the low immune cell infiltration groups. (F) The expression of human leukocyte antigen (HLA) family genes in the high immune cell infiltration group was higher than that in the low immune cell infiltration group. (G) The CIBERSORT method demonstrated that a higher density of immune cells was found in the high immune cell infiltration group compared to the low immune cell infiltration group. *p < 0.05; **p < 0.01; ***p < 0.001 higher than those in the low-risk group ( Figure 5B,C). The heatmap exhibited the expression profiles of the 9-lncRNAs signature in the two groups ( Figure 5D). The correlation status of B cells, CD4 + T cells, CD8 + T cells, dendritic cells, macrophages, and neutrophils with risk score were plotted in Figure S2. The correlation values (infiltration status) of B cells, CD4 + T cells, CD8 + T cells, dendritic cells, macrophages, and neutrophils with risk score were −0.157, −0.200, 0.167, −0.005, 0.428, and −0.046, respectively, in the training data set. And the correlation values of B cells, CD4 + T cells, CD8 + T cells, dendritic cells, macrophages, and neutrophils with risk score were −0.186, −0.009, 0.106, −0.061, 0.239, and 0.036, respectively, in the testing data set. Similar results were obtained using the same method on the testing data set ( Figure 5E-H).

| Construction and evaluation of the prognostic model
Univariate Cox regression showed that the risk score and clinical indicators including age, gender, and TNM stage were firmly related to OS ( Figure 6A). We further conducted the multivariate Cox analysis and found that the 9-lncRNAs signature was an independent prognostic factor for bladder cancer (p < 0.001) ( Figure 6B). ROC curve analysis validated the predictive performance of the signature ( Figure 6C). We then established a nomogram including age, gender, TNM stage, and risk score ( Figure 7A). The AUCs for 3-, 5-year OS predicted by the model were 0.784 and 0.790, respectively ( Figure 7B). The C-index of the nomogram was 0.751. The calibration curves and DCAs of the prognostic model showed that the model had

| Gene set enrichment analysis
We performed GO enrichment analysis and KEGG pathway analysis on the differentially expressed genes between the high-risk and low-risk groups. GO enrichment analysis indicated that the genes were enriched in the ephrin receptor signaling pathway, epidermal growth factor receptor (EGFR) signaling pathway, ERBB signaling pathway, mRNA splicing site selection, DNA adenosine diphosphate (ADP) ribosyltransferase activity, and T cell selection ( Figure 8A). KEGG pathway analysis showed that these genes were involved in amino sugar and nucleotide sugar metabolism, antigen processing and presentation, extracellular matrix (ECM) receptor interaction, focal adhesion, primary immunodeficiency, and T cell receptor signaling pathway ( Figure 8B).

| DISCUSSION
Bladder cancer has the characteristics of a high recurrence rate and poor prognosis. 18 Accurately predicting the prognosis of bladder cancer patients is of great importance for guiding their treatment. Prognostic models based on immune-related genes have been developed and proved to have excellent predictive efficacy in bladder cancer patients. 19,20 Using lncRNAs to construct a prognostic model may be an important supplement to predict the prognosis of bladder cancer.
We analyzed the lncRNAs data set from the TCGA database and screened 320 immune-related lncRNAs. Nine immune-related lncRNAs with prognostic significance were ultimately identified. Furthermore, our findings indicated that AL136084.3 was an adverse prognostic factor for bladder cancer, whereas the other lncRNAs were favorable prognostic factors. Multivariate Cox analysis was used to construct the risk score model. We found that patients in the low-risk group had longer OS than that of the high-risk group. In addition, we noticed that the infiltration of B cells was significantly negatively correlated with the prognosis of bladder cancer, while that of macrophages was on the contrary. B cells have long been recognized as effector cells of humoral immunity that inhibited tumor progression through secreting immunoglobulins, activating T cells, and killing tumor cells directly. 24 A high density of CD20 + B cells was independently correlated with a prolonged time to recurrence in bladder cancer. 25 Moreover, macrophages could contribute to tumor progression by accelerating genetic instability, promoting metastasis, nurturing cancer stem cells, and taming protective immunity. 26 In bladder cancer, a higher expression of M2 macrophage is associated with a worse grade and stage of the tumor. 27 Subsequently, we established a nomogram including age, gender, TNM stage, and risk score. The ROC curve analysis, C-index, calibration curves, and DCA confirmed the model's predictive power. Compared to models based on other sequencing data, the prognostic model constructed by immune-related lncRNAs presented better efficacy according to the ROC curve method. [28][29][30][31] Furthermore, we performed GO enrichment analysis and KEGG pathway analysis to explore the potential functions of the 9-lncRNAs signature in bladder cancer. The results showed that these lncRNAs were involved in various immune responses, antigen processing, and presentation, T cell receptor signaling pathway, EGFR signaling pathway, ERBB signaling pathway, ECM receptor interaction, focal adhesion, and primary immunodeficiency. Epidermal growth factor was reported to activate the androgen receptor and increase the expression of TRIP13 to promote bladder cancer progression. 32,33 Notably, ECM modification could not only promote tumor cells to escape, but also help generate and maintain the cancer stem cell niche. 34 Moreover, high infiltration of memory-activated CD4 + T cell subsets were associated with prolonged OS and reduced risk of tumor recurrence in bladder cancer. 35 Chobrutskiy et al. 36 demonstrated that lower CDR3 region isoelectric point in T cell receptor was associated with better survival outcomes in bladder cancer.
However, there are some limitations to our research. It is a retrospective study whose data were obtained from the TCGA database that lacked information including treatment and recurrence records. In vivo or in vitro experiments and prospective clinical researches are needed to validate our conclusions.

| CONCLUSION
In summary, the present study identified a 9-lncRNAs signature that possessed prognostic value for bladder cancer patients. The immune-related lncRNAs may regulate tumor pathogenesis through the modulation of various immune responses, antigen processing and presentation, and T cell receptor signaling pathway. Our research proposes a predictive model and biomarkers for bladder cancer patients.

ACKNOWLEDGMENTS
We thank the patients and investigators who participated in the TCGA database for providing data. We thank the ResearchSquare for providing a prepublication platform to communicate with other researchers. This work was funded by grants from the National Natural Science Foundation of China (grant numbers: 81572511, 81702525, 81702528), Guangzhou Science and Technology Program key projects (grant number: 201803010029), Natural Science Foundation of Guangdong Province (grant numbers: 2016A030313317), Medical Scientific Research Foundation of Guangdong Province (grant number: C2018060), and Yixian Clinical F I G U R E 8 Gene set enrichment analysis on the differentially expressed genes between the high-risk and low-risk groups. (A) Gene Ontology (GO) enrichment analysis indicated that the genes were enriched in the ephrin receptor signaling pathway, epidermal growth factor receptor signaling pathway, ERBB signaling pathway, mRNA splice site selection, DNA ADP ribosyltransferase activity, and T cell selection. (B) Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis showed that these genes were involved in amino sugar and nucleotide sugar metabolism, antigen processing and presentation, extracellular matrix receptor interaction, focal adhesion, primary immunodeficiency, and T cell receptor signaling pathway Research Project of Sun Yat-sen Memorial Hospital (grant number: sys-c-201802).