A four‐long noncoding RNA signature predicts survival of hepatocellular carcinoma patients

Abstract Background Hepatocellular carcinoma (HCC) is a common neoplasm located in the liver. Accumulating evidence has highlighted that long noncoding RNAs (lncRNAs) are correlated with the survival of HCC patients. This study focuses on finding a lncRNA signature to predict the prognostic risk of HCC patients. Methods Statistical and machine learning analyses were conducted to analyze the lncRNA expression data and corresponding clinical data of 180 HCC patients collected from the public online Tanric and The Cancer Genome Atlas (TCGA) databases. Results From the training dataset, we obtained the four‐lncRNA model comprising RP11‐495K9.6, RP11‐96O20.2, RP11‐359K18.3, and LINC00556 which can divide HCC patients into two different groups with significantly different prognosis (n = 90, median 1.81, 95% confidence interval [CI]: 1.50‐4.91 vs 8.56 years, 95% CI: 6.96‐9.97, log‐rank test P < .001). The test dataset confirmed the prognostic ability of the signature (n = 90, median 1.95, 95% CI: 1.14‐4.08 vs 5.80 years, 95% CI: 3.11‐6.82, log‐rank test P = .007). Receiver operating characteristic curve displayed the better prediction efficiency of the four‐lncRNA signature than the tumor/node/metastasis stage. Cox analysis showed the four‐lncRNA signature was an independent predictor of HCC prognosis. Conclusion The four‐lncRNA signature can be used as an independent biomarker for HCC patients to predict the prognostic risk.


| INTRODUC TI ON
Hepatocellular carcinoma (HCC) is a refractory tumor that kills 746 000 people every year, 1,2 ranked as the third cause of cancer-induced death. The main reasons for the high mortality of HCC are the following two points. First, the disease is insidious and difficult to be detected early; thus, most of the HCC patients are diagnosed at advanced stages when they are in poor physical condition and miss the opportunity of surgery; second, there are few effective treatments for patients with advanced HCC who are not only insensitive to radiotherapy but also poorly responsive to conventional chemotherapy drugs. 3 In recent years, it has been recognized that molecular characteristics are closely related to the prognosis and therapeutic effectiveness of HCC patients. 4 Therefore, identifying molecular indicators will result in more accurate prognostic judgments and improved treatments, which are urgently needed for HCC patients.
Long noncoding RNAs (lncRNAs) are a group of noncoding RNAs with the length more than 200 bp. 5,6 Recent studies have found that lncRNAs play important roles in the regulation of important biological processes in various types of cancer, especially the oncogenic or onco-suppressive role, 7,8 implying the potential of lncRNAs as biomarkers and therapeutic targets for cancer. 9,10 In addition, the prognostic role of lncRNA in HCC has been reported in many studies. For instance, ln-cRNA PTTG3P was found to be associated with short survival in HCC patients and could be used as an unfavorable prognostic predictor. 11 LncRNA ASB16-AS1 was demonstrated to promote the malignant behavior of HCC through regulating miR-1827/FZD4/Wnt/β-catenin pathway and has the prognostic value. 12 CTC-297N7.9 was observed to be high expressed in HCC patients with good prognosis, indicating its protective role. 13 Subsequently, due to better prediction performance than a single lncRNA molecule, lncRNA signatures for HCC prognosis prediction are being discovered. 14 In the present study, we aimed to identify lncRNAs that could predict outcomes of HCC patients and construct a prognostic ln-cRNA signature based on lncRNA expression profile data of HCC from the The Cancer Genome Atlas (TCGA) and Tanric databases.

| Construction process of the lncRNA risk score model
LncRNA transcriptome expression data of 180 HCC patients were downloaded from the Tanric database (https://www.tanric.org/ home). 17 Corresponding clinical information of 180 HCC patients was downloaded from TCGA database (https://xenab rowser.net/ datap ages/). We omitted lncRNAs expressing value with coefficient of variance >0.1 and selected survival-related lncRNAs from training samples by performing Cox analysis (P < .05). Then, we used the random survival forests-variable hunting algorithm to further filter nodes until nine lncRNAs were screened out. 18 We developed risk score models to estimate prognosis risk as follows 16,19 : where N represents the lncRNAs number in the model, lncRNAexp is the lncRNAs expression value, and coefficientCOXi is the coefficient of lncRNAs in the Cox analysis. We selected signatures which predicted the HCC OS with AUC > 0.7 and log-rank P < .05 from all 2 9 -1 = 511 signatures.

| Statistical analysis
We used R program, including pROC, TimeROC, Survival, and RandomForestSRC (from Bioconductor: http://www.bioco nduct or.org/) to perform statistics and machine learning analysis. Using the receiver operating characteristic (ROC) and the Time ROC analysis, 20,21 we compared the prognostic performance of tumor/node/metastasis (TNM) stage and the lncRNA signature. Cox analysis was performed on the data processing to identify the prognostic factors with significance defined as P < .05. Pearson's test with P < .05 and the Pearson coefficient 2 were used to select co-expressed protein-coding genes with lncRNAs which were visualized by Cytoscape (3.2.3). 22 We performed Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) enrichment analysis by the R package clusterProfiler. 23 Then, we used univariate Cox regression analysis and got a 642-lncRNA set associated with HCC patient OS ( Figure 1A, P < .05). Finally, through random survival forests analysis, we obtained 9 prognostic lncRNAs according to importance score ( Figure 1A,B).
Kaplan-Meier result revealed the outcome of high-risk patients were significantly different from low-risk patients (median survival time: 1.95, 95% CI: 1.14-4.08 vs 5.80 years, 95% CI: 3.11-6.82, P = .007; Figure 2B). At last, we tested the risk identification ability of the signature in the entire TCGA dataset (n = 180) and the Kaplan-Meier result showed that the HCC patients of the lowrisk group (n = 90) outlived ones in high-risk group (n = 90) in Figure 2C (log-rank P < .001).

| Prognostic independence test of the four-lncRNA signature
Chi-square test found there was no correlation between the signature and other clinical features (Table 3). We further performed univariable and multivariable Cox analysis to evaluate the prognostic independence of the four-lncRNA signature. As shown in

| Comparison of the lncRNA signature with TNM stage system
Receiver operating characteristic analyses found that the AUC value of the lncRNA signature was greater than that of the

| Stratified analysis for TNM stage
Combined the TNM stage with lncRNA signature risk scores, we stratified the HCC patients into different subgroups. HCC patients with TNM I + II stage were stratified into high-risk and low-risk subgroups. Kaplan-Meier analysis showed there was a significant difference in survival time between the two subgroups (log-rank test P < .001, Figure 4A). HCC patients with TNM III + IV stage were also divided into two risk subgroups with different survival (log-rank test P = .0043, Figure 4B).

| Function prediction of the four lncRNAs in the signature
First, we used Pearson's test to compute the co-expressed mRNAs with the four lncRNAs in the entire TCGA dataset (n = 180). A total of 749 mRNAs were selected which were co-expressed with at least one of the four lncRNAs (coefficient >0.2/<−0.2, P < .05, Table S1, Figure 5A). Then, we used those co-expressed genes to predict the biological function of the four lncRNAs. We found the four lncRNAs were enriched in 27 GO terms and KEGG pathways and the top 20

| D ISCUSS I ON
A vast amount of research suggests that lncRNAs might serve as biomarkers in the diagnosis and prognosis of various tumors, including HCC. In addition, lncRNA has the advantage of being a marker because it is easy to detect in body fluids. 24 Thus, there have been many articles on the prognostic lncRNA markers of HCC. Based on high throughput sequencing data, lncRNAs associated with the HCC prognosis have been identified, such as ASB16-AS1, LINC01138, and CTC-297N7.9. 12,13,25 These lncRNAs were found play important roles in HCC carcinogenesis through regulating tumor proliferation and migration. Because of its better predictive efficacy, lncRNA signatures have been developed for prognostic prediction in many cancers such as esophageal squamous cell carcinoma, glioblastoma, lung adenocarcinoma, and pancreatic ductal adenocarcinoma, among others. 19,[26][27][28] In this study, we collected and downloaded the expression data and clinical information of HCC cohort from Tanric and TCGA. Using statistical and machine learning analysis, we found 642 lncRNAs significantly correlated with overall survival and constructed a four-lncRNA signature which was proved to be a reliable indicator of HCC survival in 180 samples. The independence test detected the survival prediction ability of the four-ln-cRNA signature in HCC was not affected by age, gender, and TNM stage. In addition, stratification analysis discovered the four-ln-cRNA signature or the four-lncRNA-based risk score model can further subdivide HCC patients at same TNM stage into different risk groups with significantly different outcomes, suggesting that the four-lncRNA signature can be used as an assistant prognostic model for TNM stage in HCC. Moreover, we found high expression of RP11-495K9.6, RP11-96O20.2, RP11-359K18.3, and LINC00556 was correlated with poor prognosis of HCC patients (HR > 1, P < .05). Since the function of these four lncRNAs has not been reported yet, we performed Go and KEGG analysis and found that the coding genes co-expressed with the four lncRNAs were enriched in terms related to DNA replication and repair, indicating that the four lncRNAs in the signature may participate in the HCC progression through DNA replication and repair related pathways. The specific mechanism of these lncRNAs regulates the prognosis of HCC remains to be elucidated.
In summary, using statistical and machine learning analyses, we constructed a four-lncRNA signature including RP11-495K9.6, RP11-96O20.2, RP11-359K18.3, and LINC00556 which could be used effectively to predict clinical outcome of HCC patients. The four-ln-cRNA signature exerts great applicable value in prognosis prediction, therapy selection, and disease recognition.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

AUTH O R S ' CO NTR I B UTI O N S
Haitao Jiang contributed to data analysis, interpretation, and drafting. Lianhe Zhao contributed to data collection. Yunjie Chen and Liang Sun involved in study design, study supervision, and final approval of the article. All authors read and approved the final article.

DATA AVA I L A B I L I T Y S TAT E M E N T
LncRNA transcriptome expression data of patients were downloaded from the Tanric database (https://www.tanric.org/home).