Comprehensive analysis of a ceRNA network reveals potential prognostic cytoplasmic lncRNAs involved in HCC progression

Abstract The aberrant expression of long noncoding RNAs (lncRNAs) has drawn increasing attention in the field of hepatocellular carcinoma (HCC) biology. In the present study, we obtained the expression profiles of lncRNAs, microRNAs (miRNAs), and messenger RNAs (mRNAs) in 371 HCC tissues and 50 normal tissues from The Cancer Genome Atlas (TCGA) and identified hepatocarcinogenesis‐specific differentially expressed genes (DEGs, log fold change ≥ 2, FDR < 0.01), including 753 lncRNAs, 97 miRNAs, and 1,535 mRNAs. Because the specific functions of lncRNAs are closely related to their intracellular localizations and because the cytoplasm is the main location for competitive endogenous RNA (ceRNA) action, we analyzed not only the interactions among these DEGs but also the distributions of lncRNAs (cytoplasmic, nuclear or both). Then, an HCC‐associated deregulated ceRNA network consisting of 37 lncRNAs, 10 miRNAs, and 26 mRNAs was constructed after excluding those lncRNAs located only in the nucleus. Survival analysis of this network demonstrated that 15 lncRNAs, 3 miRNAs, and 16 mRNAs were significantly correlated with the overall survival of HCC patients (p < 0.01). Through multivariate Cox regression and lasso analysis, a risk score system based on 13 lncRNAs was constructed, which showed good discrimination and predictive ability for HCC patient survival time. This ceRNA network‐construction approach, based on lncRNA distribution, not only narrowed the scope of target lncRNAs but also provided specific candidate molecular biomarkers for evaluating the prognosis of HCC, which will help expand our understanding of the ceRNA mechanisms involved in the early development of HCC.


| INTRODUCTION
Liver cancer is the sixth most common type of malignant tumor in the world and is currently the second most common cause of tumor-related death (Bray et al., 2018). According to 2018 epidemiological data from the United States, the mortality of patients with liver cancer increased by 2.7% per year for women and 1.6% per year for men from 2011 to 2015 (Siegel, Miller, & Jemal, 2018). Given the lack of specific clinical manifestations of early hepatocellular carcinoma (HCC) in patients, 70-80% of patients are in advanced stages when they present symptoms and miss the opportunity to receive radical resection (C. ). In addition, although there has been great progress in the development of treatment approaches for HCC, including radiotherapy, chemotherapy, transcatheter arterial chemoembolization (TACE) therapy, radiofrequency ablation, targeted therapy, and immunotherapy, the overall 5-year survival rate of HCC patients after curative intent surgical treatment has been reduced by only 1-3%, and the 5-year recurrence rate postoperation can reach 70%. Notably, the median survival time of patients with advanced liver cancer who do not receive treatment is only 7.1 months (Kulik & El-Serag, 2019). Therefore, identifying the molecular mechanisms underlying the initiation, development and metastasis of HCC is essential for early diagnosis, the selection of therapeutic approaches, the determination of follow-up schedules, and the assessment of prognosis to help increase patient life expectancy and clinical benefits.
As types of noncoding RNA (ncRNA) without protein coding ability, long noncoding RNAs (lncRNAs) were originally considered transcriptional noise (Quinn & Chang, 2016). However, accumulating evidence has demonstrated that the differential expression of lncRNAs plays pivotal roles in hepatocarcinogenesis, vascular invasion and distant metastasis through dose compensation, epigenetic regulation, cell cycle regulation and cell differentiation regulation (He et al., 2014;Schmitt & Chang, 2016). lncRNAs are usually more than 200 nucleotides in length and exhibit greater species, tissue, and cell specificity than do shorterlength microRNAs (miRNAs) and messenger RNAs (mRNAs) due to their evolutionary unconserved characteristics. In addition, lncRNAs perform different regulatory functions based on their subcellular localizations. In general, in the nucleus, lncRNAs mainly function in chromatin regulation, transcriptional regulation, and variable splicing regulation. In the cytoplasm, lncRNAs affect mRNA stability and translational regulation, largely through the competitive endogenous RNA (ceRNA) regulation mechanism of adsorbed miRNA (Cao, Pan, Yang, Huang, & Shen, 2018).
The ceRNA hypothesis was first proposed by Salmena andcolleagues in 2011 (Salmena, Poliseno, Tay, Kats, &Pandolfi, 2011). In the ceRNA gene interaction network, which includes lncRNAs, miRNAs, and mRNA, lncRNAs can act as endogenous molecular sponges that competitively bind miRNAs via shared microRNA response elements with reverse complementary binding seed regions to indirectly regulate mRNA expression levels. In recent years, numerous experiments have validated the hypothesis that this type of indirect regulatory mechanism is involved in carcinoma initiation, progression, and invasion. For example, DSCR8 promotes HCC cell progression by sponging miR-485-5p to activate frizzled-7, which is associated with the Wnt/β-catenin srignaling pathway (Y. Wang, Sun, et al., 2018). HOXD-AS1 can prevent SOX4 from undergoing miRNA-mediated degradation via binding miR-130a-3p, thereby promoting HCC metastasis (H. Wang, Huo, et al., 2017).
However, integrated and comprehensive analyses of the regulatory functions of the lncRNA-miRNA-mRNA ceRNA network in tumor pathogenesis have been hindered by a lack of available databases and research approaches. The Cancer Genome Atlas (TCGA) platform is an open-source sequencing database covering more than 30 human cancer types and contains information on clinical pathology and corresponding bioinformatics data (Hutter & Zenklusen, 2018). This database is an ideal resource for biological discovery and data mining. ceRNA networks have been constructed for most tumor types, such as head and neck squamous cell carcinoma (HNSCC; Fang et al., 2018), gastric cancer (GC; C. Y. Li et al., 2016), and cutaneous melanoma . These networks are useful for gaining insight into complicated gene interactions and for identifying potential biomarkers for cancer diagnosis, treatment, and prognosis.
In the current study, in a first step, we compared differentially expressed lncRNAs, miRNAs, and mRNAs between well/moderately differentiated (G1/G2) tissues and normal tissues and differentially expressed genes (DEGs) between poorly differentiated (G3 and G4) and normal tissues. Subsequently, the DEGs intersecting with 753 lncRNAs, 97 miRNAs, and 1,535 mRNAs were identified as candidate genes to construct a ceRNA regulatory network for HCC. Then, the locations and putative interactions of the lncRNAs among lncRNA-miRNAs-mRNAs were determined based on the miRcode, TargetScan, miRDB, and miRTarBase databases. Thirty-seven lncRNAs, 10 miRNAs, and 26 mRNAs were selected to build the ceRNA network associated with HCC occurrence. Finally, 13 lncRNAs significantly affecting HCC patient prognosis were used to develop a risk score system after lasso-penalized Cox regression analysis. This novel ceRNA network-construction method, which considers lncRNA distribution, might aid in the screening of significant genes in HCCassociated ceRNAs with a narrower scope and higher accuracy.

| Identification of DEGs
For the normalized gene expression profile data, we used the edgeR package of R software to analyze significantly aberrantly expressed lncRNAs, miRNAs, and mRNAs at two levels: moderately to well differentiated (G1-G2 stage) HCC samples versus normal samples and poorly differentiated (G3-G4 stage) HCC samples versus normal samples (Robinson, McCarthy, & Smyth, 2010). We selected a log fold change ≥ 2 and FDR＜0.01 as significant cutoff values based on the Benjamini-Hochberg method (Madar & Batista, 2016). Then, the differentially expressed lncRNAs, miRNAs, and mRNAs meeting the criteria were displayed in volcano plots. We generated Venn diagrams to visualize the intersecting DEGs between the results of the two comparisons for further analysis.

| Construction of the ceRNA network
The ability of the lncRNAs to sequester and bind miRNAs was predicted using the miRcode database (http://www.mircode.org/; Jeggari, Marks, & Larsson, 2012). The target mRNAs of miRNAs were retrieved from the miRDB (Wong & Wang, 2015), miRTarBase (Chou et al., 2016), and TargetScan (Fromm et al., 2015) databases. To increase the reliability of the results, only those miRNA-mRNA relationship pairs found in all 3 databases were selected as candidate genes for constructing the ceRNA

| Function and pathway analyses of DEmRNAs
The Database for Annotation, Visualization, and Integrated Discovery (DAVID) online functional annotation tool (https://david.ncifcrf.gov/) was used to analyze the Gene Ontology (GO) molecular function enrichment of the differentially expressed, intersecting mRNAs, as

| Survival analysis
Kaplan-Meier (K-M) survival analyses of the intersecting DElncRNAs, DEmiRNAs, and DEmRNAs in the ceRNA network were performed using the survival package in R. The optimal cutoff value was calculated according to the X-tile method (Camp, Dolled-Filhart, & Rimm, 2004).
p＜0.01 was considered to indicate statistical significance.

| Construction of the risk score system
DElncRNAs associated with HCC patient survival were analyzed through lasso-penalized Cox regression to remove confounding factors and reduce the number of genes. A Cox model was initially generated by applying the penalized maximum likelihood method.
Ten-fold cross-validation was used to derive the best lambda to minimize the mean cross-validated error and predict the regression coefficients (β) of the multivariate Cox regression model. exceeded the cutoff value, the expression level of the correlated gene was considered "1", whereas when the expression value was less than or equal to the cutoff value, the expression level was considered "0." According to optimal cutoff value, all 365 HCC patients were divided into low-and high-risk groups. To estimate the distinguishing and predictive abilities of the risk score system, K-M survival curves and time-dependent receiver operating characteristic (ROC) curves were constructed.

| Univariate and multivariate cox regression analyses
To detect whether the clinical characteristics, including age, gender, body mass index (weight/height 2 ), pathologic stage, histologic grade, alpha-fetoprotein, inflammation extent, vascular invasion, and family history, were significantly associated with overall survival in HCC patients, univariate Cox regression analysis was performed. Risk score level, pathologic stage, and vascular invasion, as candidate variables, were included in the multivariate Cox regression analysis.
p＜0.05 was considered to indicate statistical significance. The hazard ratio and 95% confidence intervals for each variable were calculated.

| Regression analysis of DElncRNAs and DEmRNAs
Regression analysis of the relative expression levels of DElncR-NAs and DEmRNAs was performed and the results visualized using R software and the ggpubr, tidyverse, Hmisc, and corrplot packages. p＜0.05 and r＞0.3 were considered statistically significant. BAI ET AL.  We first predicted potential miRNAs that interacted with 753 lncRNAs using the miRcode database. Then, the intersecting genes between the predicted miRNAs and 97 DEmiRNAs were obtained.

| Prediction of lncRNAs targeted by miRNAs
Finally, we identified 53 lncRNAs and 13 miRNAs with mutual interaction ability (Table S1).

| Prediction of mRNAs targeted by miRNAs
To improve the reliability of the bioinformatics prediction, we identified target genes of the above-mentioned 13 miRNAs by selecting mRNAs shared by all three databases (miRDB, miRTarBase, and TargetScan). We then compared candidate target mRNAs with 1,535 differentially expressed mRNAs. Finally, miRNA-mRNA interaction pairs involving 10 miRNAs and 26 mRNAs were confirmed to establish the ceRNA network (Table S2).

| Intracellular localization of lncRNAs and the ceRNA network
Inspecting the cytoplasmic-nuclear localization of lncRNAs is a vital step in studying the complicated but precise regulatory mechanisms of these lncRNAs because the endogenous competition role of lncRNAs is mainly exhibited in the cytoplasm. Hence, we excluded 13 lncRNAs that were located only in the nucleus from the 53 DElncRNAs identified with the lncATLAS database. The distribution information for all of the differentially expressed lncRNAs was visualized with Cytoscape ( Figure 3a; Table S3). After considering the interactions among the remaining DEGs, 37 DElncRNAs, 10 DEmiRNAs, and 26 DEmRNAs were incorporated into a final HCC ceRNA regulatory network composed of 73 nodes and 142 interactions (Figure 3b).

| GO and KEGG enrichment analysis
Next, we studied the potential biological processes and pathways of the 26 DEmRNAs in the newly formed ceRNA network. Using the DAVID database, we performed GO functional enrichment analysis and identified 13 significant GO terms (p＜0.01; Table S4). Among these terms, "nucleoplasm," "core promoter binding," "transcription factor complex," "spermatogenesis," and "protein binding," in decreasing order of p value, were the top 5 GO terms. The relationships between the DEmRNAs and GO terms were visualized with Cytoscape software (Figure 3c). The KOBAS database was subsequently utilized to identify the KEGG pathway enrichment of the 26 DEmRNAs. Eight KEGG pathways were identified as statistically significant at p < 0.001, and the most significant pathway was "microRNAs in cancer" (Figure 3d).

| Survival analysis of ceRNA network-associated genes
To identify the potential DEGs with strong correlations with the prognostic characteristics of patients with HCC, K-M survival analyses and log-rank tests for each gene were performed to evaluate the contributions of 37 DElncRNAs, 10 DEmiRNAs, and 26 DEmRNAs. As a result, 13 lncRNAs, 3 miRNAs, and 15 mRNAs were identified as oncogenes because high expression levels of these RNAs were correlated with short survival time (p＜0.01). Additionally, the expression levels of 2 lncRNAs, CLLU1 and HTR2A-AS1, and the mRNA PROK2 were positively correlated with the overall survival of patients with HCC (Table S5), suggesting protective roles of these RNAs in HCC development. K-M survival curves of the top 3 lncRNAs, miRNAs, and mRNAs, as ranked based on the association between expression level and the prognosis of HCC patient are shown in Figure 4a, b, and c, respectively. 3.7 | Construction of the lncRNA-associated risk score system lncRNAs dominate the upstream portion of the ceRNA network and function as primary effectors of miRNAs and mRNAs. In addition, the expression and distribution of lncRNAs are highly specific, which makes them optimal biomarkers for HCC diagnosis and prognostic assessment. Hence, based on 15 lncRNAs that were significantly correlated with overall survival, lasso-penalized Cox regression and multivariate Cox regression analyses were applied to select potential prognosis-related lncRNAs, and their contributions were weighted by their relative coefficients (Figure 5a,b). Then, DSCR8 and AC006305.1 were excluded, and the final risk score formula was as follows: Notably, the designation of these two groups yielded improved

| DISCUSSION
HCC is the most common pathological type of liver cancer. The mortality rate of HCC patients ranks second among all cancers worldwide. In East Asia, Southeast Asia, Africa and southern Europe in particular, the incidences of liver cancer and associated mortality continue to rise (Bertuccio et al., 2017). Traditional surgical treatment can significantly improve the prognosis of some patients with HCC, but a large number of patients are intolerant to current treatments and experience recurrence and progression. Therefore, HCC-related regulatory factors have become the focus of current research; such research is pivotal for achieving effective HCC treatment in the future. Furthermore, due to the development of high-throughput sequencing technologies, lncRNAs have been found to play roles in transcriptional interference and to be indispensable for gene regulation, especially in ceRNA networks (Guttman & Rinn, 2012). Accumulating evidence has shown that ceRNA-related genes greatly influence the occurrence, development and prognosis of most types of cancer (Schmitt & Chang, 2016).
The abnormal differentiation and development of liver cells lay the foundation for HCC genesis. To better understand the molecular mechanisms involved in the early occurrence of liver cancer, in an initial step, we identified shared aberrant lncRNAs, miRNAs, and mRNAs by comparing normal samples with HCC samples that showed varying degrees of differentiation, as determined using the TCGA database. After predicting the lncRNA-miRNA interactions and miRNA-mRNA interactions and excluding those lncRNAs that were distributed only in the nucleus, we constructed an HCCassociated ceRNA regulatory network and performed GO and KEGG pathway enrichment analyses of the mRNAs in this network. K-M survival analysis revealed that a high percentage of genes was significantly correlated with overall survival in this network. Furthermore, we selected 13 of 15 survival-related lncRNAs to calculate risk scores via multivariate Cox regression and lasso analysis. Finally, correlation analysis of lncRNAs and mRNAs was performed.
lncRNAs have multiple regulatory functions that vary based on the specificity of their subcellular localization. In the nucleus, lncRNAs can participate in chromatin interactions, transcriptional regulation and RNA processing, whereas within the cytoplasm, lncRNAs mainly function as regulatory factors of transcription products, translation and signaling pathways. Notably, the correlated lncRNAs, miRNAs, and mRNAs in the ceRNA regulatory network mainly interact with each other in the cytoplasm (Cao et al., 2018). To the best of our knowledge, our study is the first to consider not only the interactions of candidate genes but also the lncRNA distribution within the cell. To determine the expression localization of lncRNAs, we used ENSEMBL gene IDs to assess the lncRNAs from lncATLAS, which is an easy-to-use web-based visualization tool. We did not incorporate the lncRNAs that existed only in the nucleus into the ceRNA network because the functioning of lncRNAs, miRNAs, and mRNAs as competitive endogenous RNAs mainly occurs in the cytoplasm.
The GO terms of the dysregulated mRNAs in the ceRNA network belonged predominantly to the following categories: "nucleoplasm," "core promoter binding," "transcription factor complex," "spermatogenesis," and "protein binding," suggesting that HCC may be a metabolism-related disease. "MicroRNAs in cancer," "small cell lung cancer," "cell cycle," "gastric cancer," "cushing syndrome," "cellular senescence," "p53 signaling pathway," and "prostate cancer" were found to be the eight most enriched KEGG pathways in the KOBAS database analysis, indicating common abnormal signaling pathways in several cancer types, consistent with previous reports (X. L. Li, Zhou, Chen, & Chng, 2015;Mattioni et al., 2015). Strikingly, K-M survival analysis of the ceRNA-correlated genes demonstrated that 15 of the 37 lncRNAs, 3 of the 10 miRNAs, and 16 of the 26 mRNAs had statistically significant influences on prognosis (p＜0.01). When we evaluated significance at p < 0.05, 24 of the 37 lncRNAs, 5 of the 10 miRNAs, and 20 of the 26 mRNAs were strongly correlated with HCC patient prognosis (Table S5). Therefore, many genes significantly F I G U R E 5 Risk score system. Lasso-penalized Cox regression analysis of 15 DElncRNAs. The coefficient values at varying levels of penalty (a). Each curve represents an lncRNA. Ten-fold cross-validation was used to calculate best lambda which leads to minimum mean cross-validated error (b). Red dots represent partial likelihood deviance; solid vertical lines indicate their corresponding 95% CI; the left dotted vertical line is the value of lambda that gives minimum cvm, named lambda. min; the right dotted vertical line is the largest value such that error is within 1 standard error of the minimum, named lambda. 1se. The selection of the optimal cutoff and survival curve based on risk score. Risk scorerelated standardized log-rank statistics was shown in (c). Maximally statistic was defined as the optimal cutoff value. Distribution of densities for low-and high-risk score HCC patients was shown in (d). Kaplan-Meier survival curve of two groups were displayed in (e). Time-dependent ROC curves based on risk score level were shown in (f). Risk score analysis of 13 DElncRNAs (g). The above scatter plot displays the risk score of 13 DElncRNAs, and the below heatmap exhibits the DElncRNA expression profiles in each HCC patients with survival data. Red is defined as high expression, and blue is defined as low expression. Univariate and multivariate analyses of clinical parameters associated with overall survival (h). The middle point represents the HR, and the length of the line represents the 95% confidence intervals for each indicator. Red represents statistical significance, and blue has no statistical significance. CI: confidence interval; HCC: hepatocellular carcinoma; HR: hazard ratio; lncRNA: long noncoding RNA; ROC: receiver operating characteristic [Color figure can be viewed at wileyonlinelibrary.com] influence the overall survival time of HCC patients, demonstrating that an HCC-associated ceRNA regulatory network can identify potential candidate biomarkers for predicting HCC patient prognosis.
In addition, in analyzing the 15 lncRNAs related to prognosis, lasso-penalized Cox regression analysis excluded two lncRNAs, "DSCR8" and "AC006305.1." Among the remaining lncRNAs, CRNDE is one of the best studied oncogenic lncRNAs. In most cancer types, such as glioma, pancreatic cancer, cervical cancer, colorectal carcinoma, GC, clear cell renal cell carcinoma, and hepatocellular carcinoma, CRNDE can facilitate the proliferation, migration, invasion and metastasis of cancer cells, leading to a poor prognosis for patients with the above cancer types (Ding et al., 2018;Hu, Du, Zhang, & Huang, 2017;Jiang et al., 2017;Meng, Li, Li, & Ma, 2017;Tang, Zheng, & Zhang, 2018;G. Wang, Pan, Zhang, Wei, & Wang, 2017;Zheng et al., 2016). Furthermore, high expression of the lncRNA PART1 indicates the probability of recurrence in non-smallcell lung cancer and hepatocellular carcinoma after curative resection (M. Li, Zhang, Zhang, Wang, & Lin, 2017;Lv et al., 2018). In addition, a previous study showed that PART1 can promote prostate cell proliferation and apoptosis by inhibiting TLR pathway activation (Sun, Geng, Li, Chen, & Zhao, 2018). LINC00462 is another confirmed oncogene that can enhance the progression of HCC and pancreatic cancer (Gong et al., 2017;Zhou, Guo, Sun, Zhang, & Zheng, 2018). CLLU1 mainly acts as a stable and inherent diagnostic biomarker of chronic lymphocytic leukemia and is significantly associated with poor clinical outcomes (Buhl et al., 2009;Gonzalez et al., 2013). In addition, as an independent prognostic factor, the good predictive power of the risk score prognostic model were proved via timedependent ROC curve analysis. Therefore, our ceRNA network identified not only a series of lncRNAs with unequivocal functions F I G U R E 6 Correlation analysis of DEGs linear regression analysis between lncRNAs and mRNAs. LncRNAs versus protein coding genes as indicated (n = 370). The gray area around the blue line represent 95% confidence interval (a). Identified lncRNA-miRNA-mRNA axis are integrated into a module map (b). Left bar: lncRNA; middle bar: miRNA; right bar: mRNA. DEGs: differentially expressed genes; lncRNA: long noncoding RNA; miRNA: microRNA; mRNA: messenger RNA [Color figure can be viewed at wileyonlinelibrary.com] but also potential unexplored lncRNAs, including AL163952.

| CONCLUSIONS
In conclusion, we introduced a novel strategy for constructing an lncRNA-miRNA-mRNA ceRNA regulatory network according to the subcellular distributions of lncRNAs and interactions among genes. In addition to providing a comprehensive analysis network, this approach narrows the scope of research and enhances the prediction accuracy for target lncRNAs with great potential to serve as candidate biomarkers for the diagnosis, prognosis, and therapeutic targets of HCC patients.

ACKNOWLEDGMENTS
We thank Tian Du and Siqi Wang for assistance with the bioinformatics algorithm. This work was supported by the Interna-