Five key lncRNAs considered as prognostic targets for predicting pancreatic ductal adenocarcinoma

Abstract Pancreatic ductal adenocarcinoma (PDAC) has a poor prognosis, and the 5‐year survival rate was only 7.7%. To improve prognosis, a screening biomarker for early diagnosis of pancreatic cancer is in urgent need. Long non‐coding RNA (lncRNA) expression profiles as potential cancer prognostic biomarkers play critical roles in development of tumorigenesis and metastasis of cancer. However, lncRNA signatures in predicting the survival of a patient with PDAC remain unknown. In the current study, we try to identify potential lncRNA biomarkers and their prognostic values in PDAC. LncRNAs expression profiles and corresponding clinical information for 182 cases with PDAC were acquired from The Cancer Genome Atlas (TCGA). A total of 14 470 lncRNA were identified in the cohort, and 175 PDAC patients had clinical variables. We obtained 108 differential expressed lncRNA via R packages. Univariate and multivariate Cox proportional hazards regression, lasso regression was performed to screen the potential prognostic lncRNA. Five lncRNAs have been recognized to significantly correlate with OS. We established a linear prognostic model of five lncRNA (C9orf139, MIR600HG, RP5‐965G21.4, RP11‐436K8.1, and CTC‐327F10.4) and divided patients into high‐ and low‐risk group according to the prognostic index. The five lncRNAs played independent prognostic biomarkers of OS of PDAC patients and the AUC of the ROC curve for the five lncRNAs signatures prediction 5‐year survival was 0.742. In addition, targeted genes of MIR600HG, C9orf139, and CTC‐327F10.4 were explored and functional enrichment was also conducted. These results suggested that this five‐lncRNAs signature could act as potential prognostic biomarkers in the prediction of PDAC patient's survival.


| INTRODUCTION
Pancreatic ductal adenocarcinoma (PDAC) remains the most common cause of cancer-related mortality worldwide, with a 5-year survival rate of 7.7% and average survival time of fewer than 6 months. 1 PDAC effects 56 670 new patients each year in the United States, and is the fourth leading cause of cancer death in the United States. 2,3 The poor survival rates FIGURE 1 The heatmap of DElncRNAs expression in PDCA. 108 DElncRNAs were detected. Among these DElncRNAs, 3 DElncRNAs was upregulated genes and 105 DElncRNAs were down-regulated genes. The color from blue to red shows a trend from low expression to high expression were due to inability to diagnosed PDAC at an early stage and to the poorly effective therapeutic methods currently available. 4 In addition, pancreatic cancer is largely resistant to radiotherapy and chemotherapy, and little progress has been made concerning its treatment in past decades. Therefore, to decrease mortality and improve the management of PDAC, detection and risk stratification of PDAC is urgent to identify new early diagnostic biomarkers and therapeutic targets.
Long non-coding RNA (lncRNA) is defined as RNA transcript of ≥200 bp with little or no protein-coding capacity. [5][6][7][8][9] At present, more and more evidence demonstrated that lncRNA played an important molecular role in apoptosis, proliferation, progression, metastasis, invasion, and relapse of the various tumor. [10][11][12][13][14] Recent studies indicated that the dysregulated lncRNA expression profiles were associated with the development and survival in patients with various cancers, including PDCA, which uncovers the potential of lncRNA as prognostic cancer biomarker. [15][16][17][18][19][20] In an attempt to improve prognosis, a molecular screening biomarker at an early stage of pancreatic cancer is in urgent need. In this study, we aimed to explore the difference of lncRNA expression profiles between PDAC and adjacent pancreas aiming at identifying some potential lncRNA biomarkers using The Cancer Genome Atlas data (TCGA), which could help us to predict the prognosis of patients with PDAC. Moreover, these results could provide new insight into the molecular mechanism based on lncRNA for PDAC.

| Patient datasets
The mRNA expression and corresponding clinical information of PDAC patients were obtained from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/), which was imputed on IlluminaHiSeq RNA-Seq platform, containing 178 PDAC tissues and four adjacent non-tumor pancreatic tissues. Both mRNA profiles data and clinical characteristics of PDAC are publicly available and open-access. Therefore, approval by a local ethics committee was not needed.

| lncRNA differential expression profiles
Firstly, the raw counts of PDAC mRNA expression profiles (level 3 data) were downloaded from the TCGA databases, and we acquired the lncRNA expression data by repurposing the probes in the mRNA expression profiles to lncRNA based on the annotation from the GENCODE project (http://www. gencodegenes.org). 21 The transformed data (antisense, lincRNA, and sense_intronic) was considered as lncRNA. The RNA-Seq data of PDCA covered 14,470 lncRNA expression profiles. Next, the differentially expressed lncRNA profiles were imputed using R/Bioconductor package of edgeR. 22 The differentially expressed genes (DEGs) of data set with |log2 fold change| ≥ 1 and P-value less than 0.05 was considered selection criteria for subsequent analysis.

ROC curve
A univariate Cox model was used to calculate the association between the expression level of each lncRNA and patient's overall survival (OS). When the P-values were less than 0.05, those lncRNAs were considered to be statistically significant in univariate Cox analysis. Next, multivariate Cox analysis was employed to evaluate the contribution of lncRNA as independent prognosis factors of patient survival. The backward stepwise method was conducted to further select the best model. Then, the selected lncRNAs were screened and confirmed by the Lasso regression. The lncRNA based prognosis risk score was established based on a linear combination of the expression level multiplied regression model (β) with the following formula. The Prognosis Index = (β* expression level of C9orf139) + (β* expression level of MIR600HG) + (β* expression level of RP5- Based on the cut-off of the median PI, PDAC patients were divided into high-and low-risk groups. The Kaplan-Meier survival curves for the cases predicted to have low or high risk were produced. To further validate whether the prediction of the five-lncRNA biomarkers was independent of other clinical variables, univariate and multivariate Cox regression, stratified analyses were conducted. The prognostic performance was evaluated using time-dependent receive operating characteristic (ROC) curves within 5 years by comparing the sensitivity and specificity of the survival prediction based on the risk score. All reported P values were two-sided. All analyses were performed using R/BioConductor (version 3.3.2).

| Weighted co-expression network construction with WGCNA and target prediction
We analyzed the incorporated network using weighted gene co-expression network analysis (WGCNA), which can enable to describe the correlation patterns gene expression profiles. The WGCNA R package was employed to evaluate the significance of the five lncRNA and their module membership. We assessed the weighted co-expression relation between all data set subjects in an adjacency matrix using the pairwise Pearson correlation. The appropriate soft threshold power was automatically calculated and generated as described for the standard scale-free network. In the study, the soft threshold was set at β = 7 (scale free R 2 = 0.85). Following the identification of weighted correlation, characteristics of the network were presented by Cytoscape 3.4.0. We also predicted the target genes of five lncRNA by the mRNA and lncRNA network.

| Functional enrichment analysis
The target genes of lncRNAs were selected from weighted coexpression network. The enrichment analysis of those coexpression protein-coding genes was conducted using Cytoscape plug-in ClueGO and DAVID Bioinformatics Tool (version 6.7, https://david.nciferf.gov/). 23 Go enrichment analysis was based on the threshold of P-value < 0.05 and enrichment score >1.0. Significant enrichment results were also visualized using Cytoscape software.

| Validation of the differentially expressed lncRNA with GEO data
To verify the differentially expressed lncRNA from TCGA database, we attempt to screen the mRNA-Sep datasets of PDCA from GEO database. To identify eligible studies, we employed the following search strategies: "pancreatic ductal adenocarcinoma" or "PDCA" or "pancreatic cancers." The lncRNA expression level was also extracted for further analysis. The differentially expressed genes were also imputed using R package of Limma.

| Differentially expression lncRNA profiles in PDAC
The lncRNA expression profiles (level 3 data) in PDAC patients tissues (n = 178) compared with adjacent non-tumor tissues (n = 4) was obtained from the TCGA database. A total of 109 differentially expression lncRNA was identified. Among these differentially expressed lncRNA, three lncRNAs were over-expressed, and 106 lncRNAs were down-expressed. Of this over-expressed lncRNAs, three  CTD-2527I21.15; CH507-513H4.5; FAM53B-AS1; and RP11-430C7.5) was significantly associated with OS when the P-value was less than 0.05. The multivariate Cox proportional regression was applied to confirm the results above, and we found that the five lncRNA (C9orf139, MIR600HG, RP5-965G21.4, RP11-436K8.1, and CTC-327F10.4) were proved to be an independent prognostic indicator for PDAC (Table 1). Next, we employed the Lasso regression to verify further variables, and the identical variables were observed (  (Figure 4). Kaplan-Meier curves for the high-and low-risk groups are shown in Figure 5. Patients with high-risk score showed poorer OS than patients with those who have low risk score (median OS of 15.8 months vs 14.2 months). The HR of the risk score produced by the univariate Cox proportional hazards regression method was 2.11 (95%CI, 1.37-3.24) and multivariate Cox proportional hazards regression method also show the consistent result (HR = 1.91, 95%CI: 1.22-2.98) adjusted for the clinical covariate. We employed time-dependent ROC curves to evaluate the prognostic power of five-lncRNAs biomarkers. The AUC for the six-lncRNA biomarkers prognostic model was 0.727 at 5 years of OS ( Figure 6).
Subsequently, the univariate Cox regression between clinical features and PDAC was performed to confirm the prognostic significance of the clinical characteristics. After analysis, clinical covariates of N stage and Person neoplasm cancer status, Primary therapy outcome success were significantly correlated with OS. However, clinical covariates of gender, tumor grade, T stage, M stage, pathologic disease stage, cancer status, new tumor event after initial treatment, and primary therapy outcome success were not correlated with OS (Table 2).
In the meantime, the prognostic value of different clinicopathological characteristics was also examined. The K-M curves revealed that tumor status and grade classification, new tumor event after initial treatment, primary therapy outcome success could manifest the outcome between highrisk and low-risk groups (Figure 7). Since patients with the early tumor stage may benefit significantly from a prognostic biomarker signature, we also assess the prognostic power of the five-lncRNAs in stage I and II PDAC tumors (n = 160). The prediction also demonstrated good performance on early tumors (AUC = 0.700, Figure 8). the target of lncRNA biomarkers. It revealed enrichment of 301 gene ontology categories. The enrichment biological process was shown in Figure 10. A total of 21 KEGG pathways were enriched by the target genes ( Figure 11), including Primary immunodeficiency, natural killer cell mediated cytotoxicity, Rap1 signaling pathway, and regulation of actin cytoskeleton.

| Validation of the five lncRNA expression with GEO database
Five studies were considered eligible from GEO, including GSE15471, GSE28735, GSE32676, GSE41368, and GSE71989. However, only two key lncRNA expression (C9orf139 and MIR600HG) could find in two datasets (GSE28735 and GSE41368).

| DISCUSSION
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most aggressive and lethal commonly diagnosed cancers worldwide by complex molecular and cellular heterogeneity. 24 In the past, great efforts have been made to provide novel insights into the molecular mechanisms underlying PDCA, but the focus has been on protein-coding genes or miRNA. [25][26][27][28][29] PDAC is the most common form of pancreatic cancer and its incidence is rising every year. 30

FIGURE 9
The network of lncRNA MIR600HG (A), C9orf139 (B), and CTC-327F10.4 (C) with co-expression genes by WGCNA  15,16,[31][32][33] However, none specific biomarkers have been found to display the therapeutic efficiency, and prognosis factor is of great importance for the treatment of PDAC patients.
Recently several studies have reported that long noncoding RNA as early prediction and diagnostic biomarkers of PDAC had been identified. Experimental evidence revealed that the upregulated MIR31HG acts as an oncogenic lncRNA that promotes tumor progression. 34 Based on the microarray data, Li et al identified 5250 differentially expressed lncRNA in PDAC, including 1881 upregulated lncRNAs and downregulated 3369 lncRNAs. The high expression level of lncRNA BC008363 had significantly better survival rates. 35 A more recent study reported that the lncRNA HOTTIP/ HOXA13 played a potential therapeutic target and molecular biomarker for PDAC. 36 Therefore, in an attempt to decrease mortality and improve prognosis of PDCA, a molecular screening biomarker at an early stage of pancreatic cancer is in urgent need. In this study, we aimed to explore the FIGURE 10 GO terms shows as an interaction network using Cytoscape plug-in ClueGO SONG ET AL. difference of genes expression between PDAC patients and adjacent non-tumor pancreatic tissues aiming at identifying some potential lncRNA biomarkers using a database of The Cancer Genome Atlas. The differentially expressed lncRNAs were screened, and we conducted univariate and multivariate Cox analysis to verify these lncRNAs for prediction of PDAC prognosis further. Lastly, we have recognized five lncRNA consisting of C9orf139, MIR600HG, RP5-965G21.4, RP11-436K8.1, and CTC-327F10.4. It was then validated to be an independent prognosis predictor for patients with PDAC. The AUC of the ROC curve for the five lncRNA signature predicting 3-year survival was 0.742. The five lncRNA signatures have a good performance for survival prediction. The result also demonstrated that the five lncRNA was independent of other clinical factors in PDAC. Among these lncRNAs, low expression (CTD-2527I21.15 and CH507-513H4.5) and overexpression of lncRNA (MIR600HG and CC9orf139, RP11-489O18.1) were associated with poor prognosis in patients with PDAC. Although the target gene of these lncRNAs has not been previously studied in cancers, we figured out that these lncRNAs may be involved PDAC tumorigenesis and many studies are needed to validate the findings in the future. Through Weighted Co-expression network using WGCNA, we found that lncRNA MIR600HG, CTC-327F10.4, and C9orf139 are correlated with 333 protein-coding genes. Using the enrichment and functional analysis of Cytoscape plug-in ClueGO and DAVID Bioinformatics Tool, we found that the Gene ontology of targeted genes were involved in immunological synapse formation, regulation of cell morphogenesis, and lymphocyte activation involved in immune response. Aberrant activation of Primary immunodeficiency, natural killer cell mediated cytotoxicity, Rap1 signaling pathway, and regulation of actin cytoskeleton pathway may inhibit tumor cells growth and proliferation, progression. To our knowledge, the five-lncRNA biomarkers have not been previously studied, and further understanding the function of the five lncRNAs will help the clinician to early diagnose the patients and bring some clinical indications in the development of novel prognostic factors in PDAC.
Several limitations of the current study should be considered. First, the prognostic power of five lncRNAs signature was only analyzed and validated in the TCGA data set, and no other PDAC lncRNAs expression profile can be used for further validation. Second, the TCGA data was from a single central source and ethics of population in TCGA database was mainly confined to white and black, the findings in the work cannot extrapolated to other ethics. Third, no experimental data on the potential mechanisms of the lncRNA have been reported, and further experimental studies on these aspects can enhance our understanding of the functional role in PDAC.
In this work, our findings demonstrated that the five lncRNAs molecular biomarkers were identified to predict 5-year survival in patients with PDAC, which could become a novel prognostic indicator for predicting the clinical outcome. However, the biological function of these five lncRNAs needs to be further validated with more experiment.