Identification and development of long non‐coding RNA‐associated regulatory network in colorectal cancer

Abstract Colorectal cancer (CRC) is one of the leading causes of cancer‐associated death globally. Long non‐coding RNAs (lncRNAs) have been identified as micro RNA (miRNA) sponges in a competing endogenous RNA (ceRNA) network and are involved in the regulation of mRNA expression. This study aims to construct a lncRNA‐associated ceRNA network and investigate the prognostic biomarkers in CRC. A total of 38 differentially expressed (DE) lncRNAs, 23 DEmiRNAs and 27 DEmRNAs were identified by analysing the expression profiles of CRC obtained from The Cancer Genome Atlas (TCGA). These RNAs were chosen to develop a ceRNA regulatory network of CRC, which comprised 125 edges. Survival analysis showed that four lncRNAs, six miRNAs and five mRNAs were significantly associated with overall survival. A potential regulatory axis of ADAMTS9‐AS2/miR‐32/PHLPP2 was identified from the network. Experimental validation was performed using clinical samples by quantitative real‐time PCR (qRT‐PCR), which showed that expression of the genes in the axis was associated with clinicopathological features and the correlation among them perfectly conformed to the ‘ceRNA theory’. Overexpression of ADAMTS9‐AS2 in colon cancer cell lines significantly inhibited the miR‐32 expression and promoted PHLPP2 expression, while ADAMTS9‐AS2 knockdown had the opposite effects. The constructed novel ceRNA network may provide a comprehensive understanding of the mechanisms of CRC carcinogenesis. The ADAMTS9‐AS2/miR‐32/PHLPP2 regulatory axis may serve as a potential therapeutic target for CRC.

Although researchers have made great efforts to elucidate the underlying tumourigenesis mechanism of the development of colorectal cancer, much of it is still unclear. Therefore, it is critical to explore the regulatory mechanisms of colorectal tumourigenesis to promote identification of promising diagnostic biomarkers as well as development of optimal therapeutic strategies.
Long non-coding RNAs (lncRNA) are recently discovered type of non-coding RNAs (ncRNA) with transcript more than 200 bp in length. 3 lncRNAs have been recognized to play important roles in tumourigenesis via regulating the expression of genes. Increasing number of evidence has illustrated that lncRNAs expression profiling may become useful tool for the diagnosis of cancers. 4,5 Although a great amount of lncRNAs has been annotated, more efforts are needed to identify the function of them. 6 The hypothesis of competing endogenous RNA (ceRNA) described that by sharing miRNA response elements (MREs), lncRNAs and mRNAs compete for binding to miRNAs, which exert the role of regulating each other's expression 7 and affect the tumourigenesis and development of tumours. 8,9 For the past few years, the widespread application of microarray and high throughput sequencing has greatly promoted the process of development of useful biomarkers for cancer diagnosis and management. Thanks to the publicly available cancer genomic databases, such as The Cancer Genome Atlas (TCGA), the comprehensive bioinformatics analysis has been employed in cancer research, which facilitates the discovery of a vast range of valuable biological information. Based on the ceRNA theory, researchers tried to construct lncRNA-miRNA-mRNA ceRNA networks in several kinds of malignance. [10][11][12][13][14] However, the data mining and analysis methods used in these reports were discrepant and the results were ambiguous and unconvincing. Out study first analysed the dysregulated expression of mRNA, lncRNA and miRNA between CRC tumour and normal tissues from TCGA database. Subsequently, the lncRNA-mediated ceRNA network was developed by bioinformatics prediction and correlation analysis. The RNAs functioning as prognostic biomarkers for patients with CRC were further identified by survival analysis. This study was conducted to investigate how the lncRNAs regulate target genes in the ceRNA regulatory network in CRC and predict novel lncRNAs as therapeutic targets and potential diagnostic biomarkers.  (Table S1). Replicated cases and cases without complete transcriptome profiling data were excluded. The expression profiling was performed by Illumina HiSeq RNASeq and Illumina HiSeq_miR-NASeq platforms respectively. mRNAs and lncRNAs were identified and annotated by using the Ensembl database, 15 RNAs which were not annotated in the database were excluded. This study was performed following the TCGA publication guidelines. As the data were all retrieved from TCGA, approval from a local Ethics Committee were unnecessary.

| Identification of differentially expressed lncRNAs, mRNAs and miRNAs
The lncRNA, mRNA and miRNA expression data of tumour and normal samples were merged for COAD and READ respectively. Rows of RNA data with no expression or a mean count of ≤1 were deleted.
To obtain the differentially expressed lncRNAs, mRNAs and miRNAs between normal tissue and CRC, the count data were processed with the Bioconductor package edgeR 16 in R software. All RNA expression levels were standardized to the sample mean. The P value was corrected with a false discovery rate (FDR). Fold changes of expression levels (log2 absolute) ≥ 2 and FDR < 0.01 were considered as a statistically significant difference. Venn diagrams were made to select intersected RNAs that differentially expressed in both COAD and READ databases.

| Functional enrichment analysis
To elucidate potential biological processes and to explore promising signalling pathways associated with the differentially expressed mRNAs (DEmRNAs), we conducted the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis and the gene ontology (GO) enrichment analysis using the Database for Annotation, Visualization and Integrated Discovery (DAVID, version 6.8). The biological processes and pathways with P < 0.05 were considered as significant functional categories.

| Development of a ceRNA network
The ceRNA network of CRC was then developed step-by-step according to the following procedures. Firstly, the file of potential interactions of lncRNA-miRNA was downloaded from StarBase V3.0 17 and miRcode. 18 StarBase provides the most comprehensive CLIP-Seq experimentally supported miRNA-mRNA and miRNA-lncRNA interaction networks to date. About 10 000 ceRNA pairs from CLIP-supported miRNA target sites were identified in StarBase.
MiRcode provides 'whole transcriptome' human microRNA target predictions based on the comprehensive GENCODE gene annotation, including >10 000 long non-coding RNA genes. We filtered out the miRNAs that were not differentially expressed between tumour tissues and normal tissues. Next, three highly reliable online miRNA reference databases, miRTarBase, 19 miRDB, 20 and TargetScan, 21 were used to retrieve experimentally validated or predictive miRNA targeted mRNAs. To improve the reliability of prediction, only the mRNAs presented in all three databases were defined as the miRNAtargeted mRNAs. The targeted mRNAs were further compared with differentially expressed RNAs that identified between tumour tissues and normal tissues, the intersected mRNAs were retained to develop the ceRNA network. The ceRNA network was constructed and visualized by Cytoscape v3.6.1. 22 Figure 1 shows a flow chart for the development of the ceRNA network. For the lncRNAs, mRNAs and miRNAs included in the ceRNA network of CRC, we generated heat maps using the 'pheatmap' packages in the r software. Key lncRNA was identified by calculating the degree of edges between lncRNA nodes and first stage miRNA nodes in the network. The expanded secondary miRNA-mRNA interactomes were collected.
F I G U R E 1 Flow chart of comprehensive bioinformatics analysis in the construction of competing endogenous RNA (ceRNA) regulatory network

| Clinical samples
A total of 50 primary CRC tissues and adjacent normal tissues were collected from Department of Gastrointestinal Surgery, Beijing Hospital.
All tissues were frozen immediately in liquid nitrogen after surgical excision and stored at −80℃. Clinicopathological information was retrieved from the hospital database. Written informed consent was obtained from all patients and the study was approved by the ethics committee.

| Cell culture and transfection
Human colon epithelial cell line (NCM460) and colon cancer cell lines (HT29, SW480 and SW620) were obtained from American Type Culture Collection (ATCC, Manassas). All cells were cultured in Dulbecco's modified Eagle's medium (DMEM) supplemented with 10% foetal bovine serum (FBS) in 5% CO 2 atmosphere at 37°C.

| Quantitative real-time PCR
The expression levels of the key genes of the ceRNA regulatory axis were measured by quantitative real-time PCR (qRT-PCR). Total RNAs isolated from CRC tissues and cell lines by using TRIzol reagent (Invitrogen) and reverse transcription was performed with a Prime Script RT reagent kit (Takara Biotechnology, China). qRT-PCR reactions were carried out on Applied Biosystems 7500 Real-time PCR Systems (Thermo Fisher Scientific). Small RNA RNU6 (U6) was used as endogenous control to normalize miRNA, while GAPDH was used for lncRNA and mRNA expression. The relative expression level of the target RNA was calculated by 2 −ΔΔCt .

| Survival and statistical analysis
Survival analyses for all RNAs in the ceRNA network were performed by using the 'survival' package in r software. Kaplan-Meier curve analysis with log-rank test was performed for comparison of the survival differences between groups. Co-expression analysis on the expression levels of RNAs was carried out by Pearson's correlation test. A P < 0.05 was considered statistically significant. Of these, 313 (62.2%) lncRNAs, 632 (44.5%) mRNAs and 108 (57.4%) miRNAs were up-regulated and 190 (37.8%) lncRNAs, 787 (55.5%) mRNAs and 80 (42.6%) miRNAs were down-regulated in CRC patients compared with normal tissue samples ( Figure S1).

| Development of a ceRNA regulatory network in CRC
To elucidate the regulatory mechanism of CRC, a lncRNA-miRNA-mRNA related ceRNA network of CRC was developed according

| Experimental validation
The expression levels of ADAMTS9-AS2, miR-32 and PHLPP2 were  Table 1). Overexpression of miR-32 was associated with larger tumour size (P = 0.012) and down-regulated PHLPP2 was correlated with the advanced TNM stage (P = 0.002).
Furthermore, ADAMTS9-AS2 was underexpressed in three colon cancer cell lines (HT29, SW480 and SW620), when compared to the normal colon cell line (NCM460) ( Figure 6G). HT29 cells demonstrated a relatively higher expression level of ADAMTS9-AS2, contrary to the SW620 cells which exhibited a relatively lower expression level. Therefore, HT29 cells were selected for ADAMTS9-AS2 knockdown and SW620 cells were chosen for overexpression. The effects of ADAMTS9-AS2 knockdown and overexpression were determined by qRT-PCR ( Figure 6H-I).
Then the expression levels of miR-32, PHLPP2 and ADAMTS9 mRNA after ADAMTS9-AS2 knockdown or overexpression were measured by qRT-PCR. The miR-32 expression significantly increased after ADAMTS9-AS2 knockdown in HT29 cells and decreased after ADAMTS9-AS2 overexpression in SW620 cells ( Figure 6J). On the contrary, the PHLPP2 expression was inhibited in HT29 cells with ADAMTS9-AS2 knockdown and was promoted in SW620 cells with ADAMTS9-AS2 overexpression ( Figure 6K). Notably, the mRNA expression level of ADAMTS9 was significantly increased after ADAMTS9-AS2 knockdown and decreased after ADAMTS9-AS2 overexpression ( Figure 6L).

| D ISCUSS I ON
As shown in the present study, differentially expressed mRNAs, lncRNAs and miRNAs were identified in tumour samples compared with adjacent non-tumour samples from TCGA-COAD and READ F I G U R E 3 A, The overall lncRNA-miRNA-mRNA ceRNA network in colorectal cancer. B, The sub-network centre on ADAMTS9-AS2. C, The ADAMTS9-AS2/miR-32/PHLPP2 ceRNA regulatory axis. The red nodes represent high expression, while the blue nodes represent low expression. miRNAs, lncRNAs and mRNA are represented by ellipse, round rectangle and diamonds respectively. Purple borders surrounding the nodes indicate prognostic significance. Green borders surrounding the nodes indicate good correlation with ADAMTS9-AS2. Grey edges indicate interactions between RNAs database respectively. To improve the reliability, the CRC-specific RNAs were defined as the intersected DERNAs of the two databases. Then, according to ceRNA theory, we constructed a lncRNA-miRNA-mRNA regulatory network by integrated bioinformatics analysis. The functional annotations of targeted genes were studied by GO and Pathway analysis. Furthermore, these RNAs were investigated for their relationship with overall survival. Subsequently, the correlation between expression levels and the clinicopathological features were validated in clinical samples by qRT-PCR. The regulatory effects in the ADAMTS9-AS2/miR-32/PHLPP2 axis was proved by knockdown or overexpression of ADAMTS9-AS2 in colon cancer cell lines. Our study may provide a comprehensive perspective of the potential mechanisms of gene interaction and regulation in CRC.
Among the ceRNA network, the lncRNA ADAMTS9-AS2 is the most notable gene because it has the highest degree of connectivity to other nodes, although its relationship with survival was not significant, we assumed that it was a hub gene that plays critical roles in the network. Then we extracted a sub-network that centred on ADAMTS9-AS2 for further analysis. The results showed that ADAMTS9-AS2 directly interacted with 13 miRNAs and indirectly interacted with 21 miRNA-targeted mRNAs in this network ( Figure 3B).
By investigating the sub-network, we found that overexpression of three miRNAs (hsa-mir-32, hsa-mir-217 and hsa-mir-144) was associated with unfavourable prognosis. Further analysis indicated that there was an interaction between hsa-mir-32 and mRNA PHLPP2.
Our results revealed that PHLPP2 was significantly down-regulated in the CRC tissues in comparison with adjacent non-tumour tissue and the low expression of PHLPP2 was significantly related to poor survival.
To further explore the interaction, Pearson's correlation analysis was conducted among those RNAs. It revealed that there were negative correlations between the expression of miR-32 and the expression of ADAMTS9-AS2(r = −0.316, P < 0.01) as well as PHLPP2 (r = −0.298, P < 0.01) in CRC samples. Furthermore, a potential positive interaction between ADAMTS9-AS2 and PHLPP2 expression levels was detected (r = 0.758, P < 0.01). ADAMTS9-AS2 and PHLPP2 expressions were significantly down-regulated in CRC specimens, while miR-32 was up-regulated in the CRC tissues. These findings indicate that lncRNA ADAMTS9-AS2 may regulate the expression of PHLPP2 by negatively interacting with hsa-mir-32. These regulatory patterns perfectly conformed to the 'ceRNA theory' and these findings were validated in clinical samples and cell lines by qRT-PCR. Therefore, we have been suggested that the ceRNA axis ADAMTS9-AS2/miR-32/PHLPP2 plays important roles in molecular regulation mechanism of CRC ( Figure 3C).
The lncRNA ADAMTS9-AS2 is a novel tumour suppressor and is the antisense transcript of ADAMTS9. However, relevant research on ADAMTS9-AS2 in cancer is still limited. Previous researches have reported that ADAMTS9-AS2 was down-regulated in colorectal cancer, clear cell renal cell carcinoma, lung cancer and glioma. [23][24][25][26][27] Moreover, ADAMTS9-AS2 expression was negatively correlated with the survival of these cancers. miR-32 is a tumour-associated microRNA, but its role in cancer is contradictory. Fu et al found that up-regulated miR-32-5p activated the PI3K/Akt pathway and led to drug resistance by promoting epithelialmesenchymal transition and angiogenesis. 28 It has been reported that high expression of miR-32-5p was positively associated with poor survival outcome of hepatocellular carcinoma 28 and gastric cancer. 29  To summary, a ceRNA regulatory network was successfully developed by identification of cancer-specific lncRNAs, miRNAs and mRNAs from large-scale CRC samples in TCGA database. We propose that the regulatory network centred on ADAMTS9-AS2 may play a critical part in the carcinogenesis of CRC. Our study highlighted a novel ceRNA mechanism in which ADAMTS9-AS2 sponging miR-32 to regulate the expression of PHLPP2. However, the ADAMTS9-AS2/miR-32/ PHLPP2 regulatory axis requires further studies to fully elucidate their biological functions and to confirm the molecular mechanisms.

ACK N OWLED G EM ENTS
This study was supported by Beijing Natural Science Foundation (7184240). The authors are grateful to Mrs. Xuan Zhang and Prof.
Zhiyi Pan for providing help in data analysis.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.