Identification of AP002498.1 and LINC01871 as prognostic biomarkers and therapeutic targets for distant metastasis of colorectal adenocarcinoma

Abstract Background Increasing evidence suggests that lncRNA (Long non‐coding RNA, lncRNA)‐mediated ceRNA (competing endogenous RNA, ceRNA) networks are involved in the occurrence and progression of colorectal cancer (CRC). However, the roles of the lncRNA–miRNA–mRNA ceRNA network in distant metastasis of CRC are still unclear. Methods In this study, we constructed a specific ceRNA network to identify potential biomarkers and therapeutic targets for distant metastasis of CRC. Specifically, RNA‐Seq data from The Cancer Genome Atlas (TCGA) were used to screen for differentially expressed lncRNAs (DElncRNAs) and mRNAs (DEmRNAs) related to metastasis. After validation and selection by qRT–PCR and univariate and multivariate analysis of the metastasis‐ and prognosis‐related lncRNAs, the regulated microRNAs (miRNAs) and coexpressed mRNAs were used to construct a ceRNA network for distant metastasis of CRC. Results Two key distant metastasis‐related DElncRNAs, AP002498.1 and LINC01871, were identified by univariate and multivariate analysis in combination with analyses of clinical data and expression levels. Furthermore, lncRNA‐associated ceRNA subnetworks were constructed from the predicted miRNAs and 13 coexpressed DEmRNAs (SERPINA1, ITLN1, REG4, L1TD1, IGFALS, MUC5B, CIITA, CXCL9, CXCL10, GBP4, GNLY, IDO1, and NOS2). The AP002498.1‐ and LINC01871‐associated ceRNA subnetworks regulated the expression of the target genes SERPINA1 and MUC5B and GNLY, respectively, through the associated miRNAs. Conclusion The DElncRNA AP002498.1 and the LINC01871/miR‐4644 and miR‐185‐5p/GNLY axes were identified as being closely associated with distant metastasis and could represent independent prognostic biomarkers or therapeutic targets in colorectal adenocarcinoma.

Specifically, RNA-Seq data from The Cancer Genome Atlas (TCGA) were used to screen for differentially expressed lncRNAs (DElncRNAs) and mRNAs (DEmRNAs) related to metastasis.After validation and selection by qRT-PCR and univariate and multivariate analysis of the metastasis-and prognosis-related lncRNAs, the regulated microRNAs (miRNAs) and coexpressed mRNAs were used to construct a ceRNA network for distant metastasis of CRC.

Conclusion:
The DElncRNA AP002498.1 and the LINC01871/miR-4644 and miR-185-5p/GNLY axes were identified as being closely associated with distant metastasis and could represent independent prognostic biomarkers or therapeutic targets in colorectal adenocarcinoma.

| INTRODUCTION
Colorectal cancer (CRC) is one of the most common malignant tumors worldwide and has the third highest annual new incidence and mortality among all malignant tumors. 1 CRC is the second most common malignancy in China.The incidence rate of CRC has increased rapidly in recent years and is currently ranked second among all malignancies, with the fifth highest mortality. 2etastasis is the leading cause of CRC-related death. 1,2pproximately 20% of CRC patients have distant metastases at diagnosis, 3 frequently in the liver, and most CRC patients eventually develop distant metastases. 46][7][8][9] Although chemotherapy and targeted drugs have greatly improved the efficacy of CRC therapy, 35% of CRC patients still develop liver metastases. 3Therefore, it is imperative to understand the molecular mechanism(s) underlying the development of distant metastasis in patients with CRC and identify novel biomarkers and therapeutic targets to improve the prognosis of CRC patients. 4,5,7,9ong noncoding RNAs (lncRNAs) (containing >200 nucleotides) 10,11 are involved in the initiation and progression of malignant tumors, including CRC. [12][13][14] Studies have highlighted that lncRNAs can bind to sites in microRNAs as competing endogenous RNAs (ceRNAs), thereby regulating the expression of mRNAs and their target genes.These lncRNA-associated ceRNAs have been suggested to play an important role in cancer initiation and progression. 15However, the regulatory mechanisms and prognostic roles of lncRNA-mediated ceRNA networks in distant metastasis of CRC have not been elucidated.
In the present study, we identified two key differentially expressed lncRNAs (DElncRNAs) and the associated regulated mRNAs (DEmRNAs) related to CRC metastasis using The Cancer Genome Atlas (TCGA) database.The CRC metastasis-and prognosis-related lncRNAs and the coexpressed mRNAs were validated by quantitative reverse transcription-polymerase chain reaction (qRT-PCR).Univariate and multivariate analyses were performed in combination with analyses of clinical data and the expression levels of lncRNAs and the associated mRNAs, and the results were incorporated for construction of the ceRNA network.Finally, the axes involving the DElncRNAs AP002498.1 and LINC01871/miR-4644 or miR-185-5p/GNLY were identified to be closely associated with distant metastasis and could be promising independent prognostic biomarkers or therapeutic targets for distant metastasis of CRC (Figure 1).project was approved by the Research Ethics Committee of Peking University People's Hospital (Approval No: 2019PHB028-01).All of the subjects signed informed consent forms prior to participation.The clinicopathological details of the patients are listed in Table S1.The tumornode-metastasis (TNM) stage was determined according to the Union for International Cancer Control (UICC) staging system (2016 version).The patients included were 25 men (54.3%) and 21 women (45.7%), with an average age of 71 years (range 31-99 years), who had a confirmed diagnosis of colon carcinoma based on the histopathology of the resected material.Twenty patients had lymph node metastases, and six had distant metastases.The lncRNA expression levels in the tumor tissues were measured by RT-PCR.

DNA (cDNA) preparation
A single core tumor biopsy sample with a diameter greater than 5 cm was collected from each CRC patient.Total RNA was isolated using the QIAGEN miRNeasy Mini Kit (QIAGEN, Germany) according to the manufacturer's instructions.Total RNA was reversetranscribed into cDNA using the PrimeScriptTM RT Reagent Kit (Takara, Japan).Each reaction mixture (10 μL) contained 500 ng of RNA, 5× master mix, and reverse transcriptase.

TCGA data
The raw RNA-seq data and clinical information associated with 548 CRC samples were downloaded from the TCGA database (https:// cance rgeno me.nih.gov/ ).The count reads file and FPKM file downloaded from TCGA.
Then, we used the "edgeR" package in R software to standardize the data and screen the differentially expressed RNAs.The Ensemble IDs in the standardized data were mapped to symbol IDs, which were used in subsequent data analysis.The clinicopathological details for the represented patients are presented in Table S2.The GENCODE database (https:// www.genco degen es.org/ ) 17 was used to annotate the lncRNAs and protein-coding RNAs.The data were extracted from the TCGA database according to the publication guidelines strictly approved by TCGA.

| Identification and correlation analysis of metastasis-related DElncRNAs and DEmRNAs
The lncRNA and mRNA expression profiles were divided into the metastasis and nonmetastasis groups based on the distant metastasis status (M stage) of the associated patient.The R package "edgeR" was used to identify the DElncRNAs and DEmRNAs between the metastasis and nonmetastasis groups. 18The inclusion criteria were |log2-fold-change| > 1 and adjusted p < 0.05.Six metastasis-related DElncRNAs were identified, and the hub metastasis-related DElncRNAs were selected based on the qRT-PCR results and the chi-squared test, which was used to determine potential correlations between the clinicopathological characteristics and the lncRNA expression levels.The correlation coefficients between the metastasis-related lncRNAs and mRNAs were calculated based on the lncRNA and mRNA expression levels using Pearson correlation analysis.The lncRNA-mRNA pairs with Pearson correlation coefficients >0.4 were considered to have coexpression relationships.

| Prediction of lncRNA-miRNA and mRNA-miRNA pairs
We screened metastasis-related mRNAs for positive correlations with the lncRNAs.Survival and univariate analyses (chi-squared test) were further performed to validate the key mRNAs.These validated mRNAs were analyzed with TargetScan (http:// www.targe tscan.org/ ) to predict possible binding to miRNAs.In parallel, the miRDB (http:// www.mirdb.org/ ) 19 database was used to predict lncRNA-miRNA pairs.The prominent miRNAs that could regulate both lncRNAs and mRNAs, as identified by correlation analysis, were selected to construct the ceRNA network.

| Biotinylated microRNA pull-down assay
The miRNA of interest labeled with biotin at the 3′ end and a scrambled control miRNA were commercially synthesized.Cells were seeded in a 10 cm tissue culture dish 1 day before transfection.Forty-eight hours posttransfection with the control miRNA and 3′ biotin-labeled miRNA, cell lysates were harvested, and pull-down, RNA isolation, and RT-PCR were then performed according to a previous study. 20,21

| Statistical analysis
SPSS version 22.0 (SPSS, Chicago, IL, USA) and GraphPad Prism 8.0 (La Jolla, CA, USA) software were used for statistical analysis.Data from all quantitative assays are presented as the means ± standard deviations (SDs).Comparisons between study groups were performed using two-tailed Student's t-test or the chi-squared test.The Kaplan-Meier plots display the proportions of surviving patients (overall survival [OS] and disease-free survival [DFS]) with respect to the length of follow-up in months.To further perform survival analysis and the comparisons between clinicopathological parameters and the lncRNA/mRNA expression, 325 samples with full and sound clinical information, including age, sex, TNM stage, OS months, OS status, and DFS months, DFS status were included, shown in Table S6.Values lower than the median were considered to indicate low expression, and those greater than the median were considered to indicate high expression.p < 0.05 was considered to indicate a statistically significant difference.

| DElncRNAs between patients with and without liver metastasis
The expression profiles of 14,166 lncRNAs were obtained from the transcriptome profiling data of CRC patients in the TCGA dataset.Differentially expressed RNAs were identified between patients with and without distant metastasis.The clinicopathological characteristics of the 548 patients represented in the TCGA dataset are shown in Table S2.We identified six DElncRNAs based on the M stage (Figure 2A).AP002498.1,LINC01871, BX322234.2,and LINC00261 were downregulated in patients with distant metastasis, while H19 and AC026336.3were upregulated.

| Evaluation and verification of lncRNA expression and clinicopathological characteristics
To verify the findings from the TCGA dataset analysis, the expression levels of six DElncRNAs were evaluated in 46 colon tumor tissues from Chinese CRC patients using qRT-PCR (Figure 2B).The detailed clinical characteristics of the 46 CRC patients recruited from Peking University People's Hospital are shown in Table S1.The expression levels of the lncRNAs AP002498.1 and BX322234.2were significantly decreased in patients with distant metastasis compared to those without distant metastasis.There was also a trend toward a decreased LINC001871 level (p = 0.086).
The correlations between the six DElncRNAs and various clinicopathological parameters (e.g., age, sex, TNM stage, tumor stage, lymph node metastasis status, and distant metastasis status) were explored using the chi-squared test (Table 1).The lncRNA AP002498.1 level was significantly correlated with lymph node metastasis (p < 0.05) and positively related to distant metastasis (p = 0.189).The LINC01871 and BX322234.2levels were significantly correlated with distant metastasis (p = 0.029 and p = 0.029).Low levels of AP002498.1 and LNC01871 were significantly associated with high probabilities of lymph node and distant metastasis, indicating a poor outcome for this group of patients in the CRC cohort.The levels of H19, LINC00261, and AC026336.3did not correlate with any of the clinicopathological characteristics.

| Analysis of independent prognostic factors and correlations with clinical characteristics in the TCGA dataset
Univariate analysis was performed to further validate the correlations of the two prognostic risk-related lncR-NAs (AP002498.1 and LINC01871) and clinicopathological characteristics in the TCGA dataset (Tables 2 and 3).LNC01871 had a significant protective effect on TNM stage, tumor stage, and distant metastasis in the TCGA CRC dataset (Table 3).
Kaplan-Meier survival analysis was also performed based on the six DElncRNAs.The result of the OS and disease-free survival analyses based on the six DElncRNAs are presented in Figure 3A,B, respectively.Among the six DElncRNAs, AP002498.1 was significantly linked with the prognosis of CRC patients (p = 0.000).Specifically, patients with low AP002498.1 levels had shorter OS times, suggesting a potential protective effect of AP002498.1 against distant metastasis.In addition, reduced LINC01871 levels were associated with a decreased disease-free survival time (p = 0.009).Considering these results collectively with the RT-PCR results, these two DElncRNAs were identified as key lncRNAs in patients with distant metastasis.
Multivariate analysis was performed using a Cox regression model with the inclusion of all factors impacting patient prognosis (e.g., key lncRNA levels, age, sex, TNM stage, tumor stage, lymph node metastasis status, and distant metastasis status).In addition to age and TNM stage, AP002498.1 was an independent indicator of prognosis (p = 0.016; odds ratio = −0.823;95% CI = 0.224-0.860).

| Coexpression relationships between metastasis-related lncRNAs and mRNAs
The expression profiles of 19,645 mRNAs were acquired from the TCGA database, and 47 DEmRNAs (11 upregulated and 36 downregulated) were identified based on the distant metastasis stage (Figure S1).Pearson correlation analysis was then performed to evaluate the coexpression relationships between the metastasis-related mRNAs and the two key DElncRNAs.The expression of AP002498.1 was positively correlated with that of six DEmRNAs (SERPINA1, ITLN1, REG4, L1TD1, IGFALS, and MUC5B) (Figures 4A and S2A), which were downregulated in patients with liver metastases (Figures 4B and S2B).Based on the survival analysis, SERPINA1, ITLN1, and REG4 might be significant protective prognostic factors for CRC, because high expression levels of these genes were significantly associated with longer OS times (Figure 4C).Moreover, we analyzed the associations between mRNA expression levels and the clinicopathological variables used to describe CRC progression in the 325 CRC patients with full and reliable clinical data from the cohort of 548 patients represented in the TCGA dataset.By the chi square test, SERPINA1 expression was found to be highly negatively correlated with TNM stage, tumor stage, and distant metastasis (p = 0.020, 0.027, and 0.007, respectively), indicating that lower SERPINA1 mRNA expression levels were associated with poorer outcomes for CRC patients in the TCGA dataset (Table 2).Of interest, ITLN1, REG4 and MUC5B were also identified as protective factors by univariate analysis.However, L1TD1 and IGFAL had no significant independent prognostic effects, based on this same analysis (Table S4).Based on the survival analysis and chi No DEmRNAs were positively correlated with BX322234.2,whereas LINC01871 was positively correlated with seven mRNAs (IDO1, CXCL10, GBP4, CIITA, NOS2, CXCL9, and GNLY) (Figures 5A and S3A).These mRNAs were downregulated in patients with distant metastasis compared to those without distant metastases (Figure 5B and S3B).Low IDO1, CXCL10, and GBP4 levels were statistically associated with an unfavorable disease-free survival probability (Figure 5C).There were no statistically significant differences in survival between the low and high mRNA expression groups for CIITA, NOS2, CXCL9, and GNLY (Figure S3C).We also found that the expression of the three LINC01871-related mRNAs, namely, IDO1, CXCL10, and GBP4, was highly negatively correlated with TNM stage, lymph node metastasis, and liver metastasis (Table 3).In addition, CIITA and GNLY were identified as significant prognostic factors based on the chi square test, as shown in Table S5, and GNLY expression was highly negatively correlated with TNM stage, lymph node metastasis, and distant metastases (p = 0.026, 0.042 and 0.005, respectively), indicating that lower GNLY expression was related to a poorer outcome for CRC patients in the TCGA dataset.Taken together, these findings identified IDO1, CXCL10, GBP4, CIITA, and GNLY as hub LINC01871-related mRNAs based on the DFS analysis and chi square test results.Based on these findings, a reduction in or loss of the expression the four AP002498.1-relatedmRNAs and five LINC01871related mRNAs was determined to be associated with a decreased survival time.

| DISCUSSION
][24] As key upstream nodes in ceRNA networks, lncRNAs may be exploited to clarify the mechanism of CRC carcinogenesis and development and serve as molecular markers and therapeutic targets for CRC. 15 However, there is a lack of ceRNA network research directly related to distant metastasis of CRC.In this study, we observed a very interesting phenomenon in our stratified differential expression analysis based on distant metastasis status.
Using TCGA RNA-seq data for CRC patients, we constructed a distant metastasis-related ceRNA network to explore the potential role of lncRNAs and their related genes in CRC diagnosis and prognosis.AP002489.1 and LINC01871 were both downregulated in CRC tissues from patients with distant metastasis.AP002498.1 is a novel OS-related lncRNA that has not been previously reported (Figure 3 and Table S6).We identified it as an independent protective prognostic factor for CRC patients.In contrast, LINC01871 expression was significantly negatively correlated with DFS; however, it was not an independent factor.CRC patients with lower expression of LINC01871 had shorter disease-free survival times.Based on DEmRNA coexpression analysis and miRNA predictions, we identified the ceRNA regulatory subnetworks for LINC01871.][31][32][33][34][35][36][37][38] Based on the lncRNA-mRNA coexpression relationships, AP002498.1 could positively regulate SERPINA1, ITLN1, REG4, L1TD1, IGFALS, and MUC5B mRNA expression by acting as a miRNA sponge.As shown in Figure 5 and Table 2, SERPINA1, ITLN1, and REG4 upregulation in primary tumor tissues was negatively correlated with distant metastasis, indicating a protective effect on prognosis in patients with CRC.This result is consistent with the previously reported role of these three factors in other solid tumors, 25,26,28,[39][40][41][42][43] in which they were reported to affect distant metastasis and prognosis through modulating either the stemness or autophagy of tumor cells or by regulating the immune microenvironment.In addition, miR-4443 regulated by AP002498.1 promotes liver metastasis of breast cancer via microenvironment-induced TIMP2 loss, 44 and highly expressed lncRNA AP002498.1 may serve as an endogenous sponge to downregulate the expression and inhibit the function of miR-4443.
CIITA, CXCL9, CXCL10, GBP4, GNLY, IDO1, and NOS2 were predicted to be coexpressed genes in the LINC01871 regulatory network.Studies have demonstrated that LINC01871 is involved in stemness, autophagy, ferroptosis, and the immune microenvironment in breast cancer cells; it was reported to be associated with a good prognosis and a lower incidence of distant metastasis in gastric, cervical, and endometrial cancers. 27,29,32,37,45In this study, LINC01871 and its coexpressed mRNAs (CIITA, CXCL10, GBP4, and IDO1) were found to be closely related to DFS rather than OS in CRC patients.Integrating these results with previous reports in the literature 30,[39][40][41]43,46,47,27,[37][38][39]41,44,45 suggested that the LINC01871-mediated ceRNA network could affect the distant metastasis of CRC by regulating mainly the immune microenvironment rather than the stemness, autophagy, focal death, or radiosensitivity of CRC cells. Specifically, the expressin of GNLY, which is regulated by LINC01871, has been associated with a reduced incidence of distant metastasis.Many studies have shown the involvement of GNLY in the targeting of tumors by cytotoxic immune cells and have revealed a correlation between the presence of granulysin and a more positive cancer prognosis.48,49 In addition, Lei et al. 31 found that miR-4644 could accelerate CRC cell proliferation and migration, while Sun et al.50 demonstrated that miR-185-5p is involved in chemotherapy resistance in gastric cancer patients, suggesting the protective role of LINC01871 as a sponge of miR-4644 and miR-185-5p.
Taken together, these results indicated that the ln-cRNA AP002498.1 and LINC01871-associated ceRNA subnetworks affected the OS and DFS prognoses of CRC, respectively, and provided helpful insights and shed new light on areas of research for identifying diagnostic markers and therapeutic approaches for distant metastasis of CRC.Among these insights, the reversibility of the tumor immune microenvironment 47 shows that LINC01871induced targeted therapy is more suitable for preventing and interfering with distant metastasis of CRC during the DFS window.
However, there are still some limitations to the present study.First, the number of patients with distant metastasis was relatively small in both the TCGA dataset and our validation cohort.Thus, a larger number of samples are needed for further analysis.Second, in vitro and in vivo verification experiments are needed to verify our hypotheses and develop novel targeted therapy for the distant metastasis of CRC.

| CONCLUSIONS
In conclusion, based on the two downregulated lncRNAs in CRC tissues with distant metastasis, we identified one distant metastasis-related lncRNA signature that could independently predict the OS of CRC patients and one that could independently predict the disease-free survival of CRC patients.Furthermore, a potential ceRNA regulatory network was accurately constructed, suggesting implications for promising predictive factors for distant metastasis and prognosis in CRC and thus possibly providing new insight into the mechanism underlying distant metastasis of CRC.

F I G U R E 2
Distant metastasis feature analysis and validation of the six DElncRNAs.(A) Correlations of lncRNA expression with pathological M stage.(B) qRT-PCR validation of the six DElncRNAs in 46 clinical CRC tissue samples.*p < 0.05, **p < 0.01.

2.1 | Patient characteristics and sample selection Colon
tumor tissues from 46 CRC patients were collected at Peking University People's Hospital.This research | 3 of 16 WU et al.
Correlations between clinicopathological parameters and the expression of six DElncRNAs in 46 colon cancer patients.
T A B L E 1 and LNC01871.Moreover, correlation analysis was performed Correlations between clinicopathological parameters and the three DEmRNAs coexpressed with AP002498.1 in the TCGA dataset.Correlations between clinicopathological parameters and the three DEmRNAs coexpressed with LINC01871 in the TCGA dataset.