Long noncoding RNAs predict the survival of patients with colorectal cancer as revealed by constructing an endogenous RNA network using bioinformation analysis

Abstract Long noncoding RNAs (lncRNAs) are aberrantly expressed in various cancers types and can function as competing endogenous RNAs (ceRNAs), which promote and maintain tumor initiation and progression. In this study, we explored the functional roles and regulatory mechanisms of lncRNAs as ceRNAs in colorectal cancer and their clinical potential as biomarkers. The RNA sequencing profiles of patients with colorectal cancer were downloaded from TCGA database, and 62 lncRNAs, 30miRNAs, and 59 mRNAs were identified to comprise the ceRNA network (fold change > 2, P < 0.01). Functional enrichment analysis suggested that the target genes of the ceRNA network may be involved in the pathways related to cancer, including the signaling pathway that regulates the pluripotency of stem cells, wnt signaling pathway, hippo signaling pathway, basal cell carcinoma, and colorectal cancer. Univariate and multivariate Cox's proportional hazard regression model revealed that five (H19, MIR31HG, HOTAIR, WT1‐AS, and LINC00488) out of 62 lncRNAs were closely related to the overall survival (OS) (P < 0.05). Furthermore, the five‐lncRNA model could be an independent prognostic model in colorectal cancer. We computed for the risk function and constructed a risk score based on the five lncRNAs. Results showed that patients with high‐risk scores have poor survival rates. Additionally, combing the risk score and other clinicopathological features, we can better predict the patient's survival probabilities. Furthermore, we validate our model in the GSE38832 dataset. Collectively, our study has provided a deeper understanding of the lncRNA‐related ceRNA regulatory mechanism in CRC and identified five‐lncRNA model, which could be considered as candidate prognostic biomarkers and therapeutic targets.


| INTRODUCTION
Colorectal cancer (CRC) is the third most commonly diagnosed malignancy and the fourth leading cause of cancerrelated mortality. 1 Early diagnosis is crucial, and radical resection is the only curative therapy for cases where the cancer has not yet metastasized. 2 Although the accumulation of molecules plays important roles in cancer processes, the heterogeneity of CRC renders difficulty in diagnosing and determining patient prognosis based on one factor. 3 Thus, novel molecular networks that greatly optimize the use of therapies and benefit patients must be identified.
The multifaceted role of lncRNAs in CRC development has been extensively studied. LncRNAs participate in CRC development through the following ways: (a) as precursor of miRNAs or ceRNAs, (b) by interacting with proteins, (c) affecting gene transcription, and epigenetic mechanisms. 4 The pathogenesis of lncRNAs in tumorigenesis and cancer development is further explained by the emergence of competing endogenous RNAs (ceRNAs) as an important class of posttranscriptional regulators that alter the expression of key tumorigenic or tumor suppressive genes through a mi-croRNA-mediated mechanism. [5][6][7] Hence, the ceRNA network may serve as a biomarker for the diagnosis, prognosis, and prediction of therapeutic responses in CRC. The lncRNAs acting as ceRNAs have diverse biological functions that deserve further exploration. 8 In this study, we collected the RNA sequencing (RNA-Seq) data of 647 colorectal tumors and 51 adjacent nontumor samples from the TCGA database. The lncRNA expression profiles were combined with the clinical features, and a lncRNA-miRNA-mRNA ceRNA network was constructed. We then identified five-lncRNA model with the potential to predict survival based on the ceRNA network and used these lncRNAs as novel candidate biomarkers for CRC.

| Patient information and data processing
The RNA-Seq data and clinical information of CRC patients were completely downloaded from The Cancer Genome Atlas (TCGA) database (https://cancergenome.nih.gov/). The exclusion criteria were as follows: (a) without clinical or prognostic information, and (b) other malignancies in addition to CRC. Finally, 698 CRC samples including 647 tumor tissues and 51 matched normal tissues were enrolled for comprehensive integrated analysis. The data processing met the TCGA publication guidelines (https://cancergenome. nih.gov/publications/publicationguidelines).

RNAs (DERNAs)
The DERNA (DElncRNAs, DEmRNAs, and DEmiRNAs) data between tumor and normal samples were analyzed using "edgeR" package in R. Expression differences were characterized as fold change (FC) and associated P-values. A log2|FC| > 2.0 and P < 0.01 were considered significant. The DERNA profiles were normalized by log2 transformation.
To further enhance the bioinformatics analysis reliability and facilitate subsequent verification, we screened the overlapping lncRNAs between the DElncRNA and GSE38832 RNA-Seq data in the Gene Expression Omnibus (GEO) Datasets of NCBI dataset using Venn diagram for further study.

| Functional enrichment analysis
To further elucidate the biological function of ceRNA co-expression, we performed Gene Ontology (GO) biological enrichment analysis through DAVID bioinformatics database and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways through the KOBAS database (https://kobas.cbi. pku.edu.cn). A significance level of P < 0.05 was set as the cutoff criteria.

| Building a predictive model for prognosis and survival
The DElncRNAs were evaluated using univariate Cox's proportional hazard regression model in R to identify the prognostic signature. Only those with P value <0.05 were considered as candidate variables and entered into a stepwise multivariate Cox regression analysis. Then we construct a prognostic predictive model and obtained a combined prognosis score system (risk score) based on those DElncRNAs. The risk score was calculated as follows: Risk score = ex- NAn × β lncRNAn ), where, exp is the expression level; and β is the regression coefficient derived from the multivariate Cox regression model. 10 The patients with CRC were categorized into high-and low-risk groups according to its median. 11 The differences in patients' overall survival (OS) and disease-free survival (DFS) between the two groups were evaluated by Kaplan-Meier survival curve and log-rank test analysis. The receiver operating characteristic (ROC) curve was used to assess the sensitivity and specificity of the lncRNA signatures in predicting survival.
Furthermore, using risk score and clinical information as covariates, univariate and multivariable Cox proportional hazards models were fitted to infer whether the risk score is an independent predictive factor. Hazard ratio (HR) and 95% confidence interval (CI) were also assessed.

| Survival prognosis validation of the prognostic lncRNA model in GEO dataset
To verify the predictive performance of the prognostic predictive model, we further validated it in the GSE38832 dataset. Using the same model derived above, we formed high-and low-risk group, respectively in the GSE38832. Kaplan-Meier survival curve was drawn, and the ROC curve analysis was performed to evaluate the predictive power of the predictive model.

| Combining risk score with clinical significance prognostic prediction for CRC
We then evaluated the prognostic value of different clinical features including American Joint Committee on Cancer (AJCC) TNM stage, invasive degree (T stage), lymph node status (N stage), metastasis (M stage). Moreover, we investigated the potential prediction ability of prognosis in CRC by combining risk score and clinical characteristics (low/high-risk score + high/ low stage) using a Kaplan-Meier estimator and log-rank test.

| Statistical analysis
The relationship between the prognostic lncRNAs and clinical features was examined using Pearson correlation analysis. Difference of lncRNA expression between two groups was compared by Independent samples t test. All the hypothesis testing is two-sided, and P value <0.05 was considered as statistically significant. Other analysis in this article was conducted in R version 3.4.2 (https://www.r-project.org/) with the following packages: "edgeR," "pheatmap," "forestplot," "rms," "ggplot2," "survivalROC," "survival." 3 | RESULTS

| Identification of DERNAs in CRC
A total of 698 samples, including 647 CRC tumor tissue samples and 51 adjacent non-tumor tissue samples, were collected for this study. A total of 1143 DElncRNAs, including 888 up-regulated and 255 down-regulated lncRNAs, were identified in the CRC tissues and matched normal tissues according to the cutoff criteria (P < 0.01 and |log2FC| > 2.0). The volcano plot is presented in Figure 1A. We further obtained 276 DEmiRNAs (180 upregulated and 96 down-regulated miRNAs) and 2151 mRNAs (1204 up-regulated and 947 down-regulated RNAs) from the TCGA database, and the results are shown in Figure 1B,C.
Furthermore, by combining 1143 DElncRNAs derived from TCGA with GSE38832 dataset, we get 158 overlapping DElncRNAs for the subsequent study ( Figure 2A).

| lncRNA-miRNA-mRNA ceRNA network
To construct the ceRNA network, we assessed the target relationship between miRNAs and lncRNAs by using the miRcode among the aberrantly expressed lncRNAs and miRNAs. Furthermore, we predicted the target mRNAs of miRNAs F I G U R E 1 Volcano plot of differentially expressed RNAs in CRC patients. A, DElncRNAs; B, DEmRNAs; C, DEmiRNAs. The red dot represents up-regulated RNA, and green dot represents down-regulated RNA. log2|FC| > 2.0 and P < 0.01 as the selection criteria through miRcode, miRDB, and miRarBase, and the overlapping genes were selected. The result showed co-expression from 30 out of 276 miRNAs, 62 out of 158 lncRNAs, and 59 out of 2151 mRNAs ( Figure 2B). The visualization of co-expression was built using Cytoscape 3.6.0.

| Function analysis
To further elucidate the biological function of the ceRNAs, we performed the biological enrichment analysis through DAVID GO terms and KOBAS KEGG pathways related     to the target gene. The KEGG pathways were significantly enriched in "signaling pathway regulating pluripotency of stem cells," "wnt signaling pathway," "hippo signaling pathway," "basal cell carcinoma," and "colorectal cancer". The top five GO terms were "negative regulation of translation," "extracellular space," "transcription from RNA polymerase II promoter," "odontogenesis," and "negative regulation of fibroblast proliferation" (Figure 3).

| Building a predictive model for prognosis and survival
We evaluated the association between lncRNAs expression and patients' survival using univariate and multivariate Cox proportional hazard regression. Among the 62 DElncRNAs, five lncRNAs, namely, H19, MIR31HG, HOTAIR, WT1-AS, and LINC00488 were screened out (Table 1, Figure 4A) and a predictive lncRNA model was constructed ( Figure  4B-D). The risk score which based on the five lncRNAs by their relative coefficient in multivariate Cox regression was calculated as: risk score = 0.0761 × exp H19 + 0.1690 × exp-MIR31HG + 0.0901 × exp HOTAIR + 0.0976 × exp WT1-AS − 0.1551 × exp LINC00488 . We then calculated the risk score for each patient and ranked them by increasing scores. Out of the 567 patients, 283 were classified in the high-risk group and 284 in the low-risk group based on the median score. Survival analysis was performed using the Kaplan-Meier method with a log-rank statistical test. The results showed that patients with high-risk scores have significantly worse OS (P < 0.001, Figure 4E) and DFS than those with low-risk scores (P = 0.021, Figure 4G). By calculating the area under the ROC curve (AUC) of risk score, we could predict the 5-year survival of patients with colon adenocarcinoma (0.675 for OS and 0.690 for DFS) ( Figure 4F,H).

| Survival prognosis validation of the prognostic lncRNAs model in GEO dataset
To evaluate the survival predictive power of the five-lncRNA model in CRC patients, this model was further tested in the GSE38832 dataset (n = 122). Using the same predictive model derived from the ceRNA network, 122 patients were classified into a high-risk group (n = 61) and a low-risk group (n = 61). As shown in Figure 5A, Patients in the lowrisk group had significantly longer overall survival time than    < 0.001 P < 0.001 P < 0.001 P < 0.001 P those in the high-risk group (P = 0.011). The AUC was 0.695 in the GSE38832 dataset ( Figure 5B).
As expected, American Joint Committee on Cancer (AJCC) stage which was wildly applied in various tumors including CRC could predict the prognosis of patients effectively ( Figure 7A-H). Additionally, Kaplan-Meier curves also showed that the patient's prognosis separated by risk score and TNM staging have significantly different (P < 0.001, Figure 7I-L). Patients with lower risk score and tumor grade have obvious better prognosis.
Therefore, we constructed a nomogram that integrated the risk score of five-lncRNA model and clinicopathological features to predict survival probability of patients who had undergone surgical resection ( Figure 8). Based on the risk score and clinicopathological features, we can better predict the patient's 1, 3, 5-year survival probabilities ( Figure 9).

| DISCUSSION
Early diagnosis and radical resection are critical for CRC. The 5-year survival rate is 90% when the localized disease is detected at an early stage. However, survival rates dramatically decrease to only 11% for patients with distant metastasis. 12 CRC is a molecularly heterogeneous disease, and no single genetic "driver" is known to be superior in identifying aggressive disease. 13 Thus, identifying novel molecular network biomarkers is needed to stratify patients for earlier detection and to improve targeted treatment options.
To date, the potential diagnostic and therapeutic targets of colorectal cancer research have focused primarily on the deregulation of protein-coding genes. However, most biological characteristics, including tumorigenesis, arise from complex interactions of the cells with numerous constituents (eg, proteins, DNAs, RNAs, and small molecules) rather than with individual molecules. 14 Thus, the regulatory networks of tumorigenesis must be clearly understood.
Accumulating evidence reveals that lncRNAs contain miRNA-response elements and can compete with mRNAs for miRNAs. Hence, lncRNAs can act as ceRNAs and are implicated in multiple biological processes and tumorigenesis. 15 Compared with protein-coding genes, lncRNAs have significant advantages as diagnostic and prognostic biomarkers. 16 Several studies have confirmed that the differentially expressed lncRNAs are closely related to the pathogenesis and prognosis of tumors and can be used as tumor-associated predictors. 17,18 With the development of molecular techniques, new ln-cRNAs might prove to be vital components in the ceRNA network, which modulates other RNA transcripts. 19 Tsang et al 20 confirmed that the oncogenic functions of lncRNA H19 in CRC could be attributed to its ceRNA activity of sequestering miR-675 and downregulating the expression of its target RB. FER1L4 could exert a tumor suppressive effect on colon cancer and partially acts as a ceRNA suppressing miR-106a-5p expression. 21 LncRNA CCAT1 functions as a ceRNA participating in proliferation and apoptosis of human HCT-116 and HCT-8 cells. 22 Zhou et al 23 found that lincRNA-ROR promotes the progression of colon cancer and holds prognostic value due to its association with miR-145. However, comprehensive analysis of large-scale samples for calculating the prognostic value of the differentially expressed lncRNAs in patients with CRC has not yet been conducted. Here, we constructed a lncRNA-miRNA-mRNA ceRNA network in the TCGA database. We then identified five-lncRNA (H19, MIR31HG, HOTAIR, WT1-AS, and LINC00488) model in this network that was associated with the clinical outcome of CRC according to univariate and multivariate Cox proportional regression analyses. The aberrant expression of HOTAIR and H19 has been reported in various types of human cancer 24,25 and have been revealed as a negative prognostic factor for patients with colorectal cancer. 26 H19 was reported to be the primary miRNA precursor of miR-675 and could serve as the potential target for cancer therapy. 20 HOTAIR also participates in gastric cancer as a ceRNA regulatory network. 27 WT1-AS is related to various cancers and may function as a tumor suppressor. 28,29 However, two novel ln-cRNAs (MIR31HG and LINC00488) have not been previously investigated and could be new prognostic indicators for patients with colon adenocarcinoma.
The AJCC TNM staging system is the preferred staging system for the management of CRC and could provide essential information for surgical solutions. By combining risk core and TNM staging, we could effectively predict the prognosis of patients, which further suggest that it may be responsible approach in predicting tumor occurrence and development.
Furthermore, the prediction model can be combined with other markers such as CEA to further improve the diagnostic efficiency of colon cancer. Additionally, based on the model and clinicopathological features, we can give patients a comprehensive score which better predicts the patient's 1, 3, 5-year survival probabilities.
The results of the KEGG pathway involved in ceRNA network analysis showed that targeted genes were mainly enriched F I G U R E 9 Flowchart of bioinformatics analysis in "signaling pathway regulating pluripotency of stem cells," "wnt signaling pathway," "hippo signaling pathway," "basal cell carcinoma," and "colorectal cancer." Self-renewal and differentiation of stem cells are regulated by morphogenic pathways such as Wnt and Notch signaling. 30 High Wnt pathway activity is important in determining the fate of cancer stem cells in CRC. 31 Dysregulation of the Hippo pathway exerts a significant impact on cancer development including CRC. 32 In summary, we successfully constructed a lncRNAassociated ceRNA network in a large-scale assembly of CRC samples and confirmed that the deregulation of the ceRNA network can lead to tumorigenesis. Furthermore, we constructed an independent survival prognostic model by analyzing the genome-wide lncRNA expression profiles using a ceRNA network and discusses its clinical application value. The five-lncRNA model could serve as potential prognostic indicator alone or in combination with other clinicopathological for patients with CRC. Compared with the previous literature, 33,34 we validate the prognostic model in the GEO database, which increases the reliability of the results. However, our study was limited by shortage of our clinical validation cohort. Besides, future functional investigations and molecular experiment are still required to explore the mechanisms underlying the roles of these lncRNAs in CRC.