Function and regulation annotation of up-regulated long non-coding RNA LINC01234 in gastric cancer.

Abstract Background Accumulated evidences indicate that long non‐coding RNAs (lncRNAs) participate in many biological mechanisms. Moreover, it acts as an essential regulator in various human diseases such as gastric cancer (GC). Nevertheless, the comprehensive regulatory roles and clinical significance of most lncRNAs in GC are not fully understood. Methods In this research, our aim was to investigate the underlying mechanism of lncRNA LINC01234 in GC. Firstly, the usage of qRT‐PCR helped to establish expression pattern of LINC01234 in GC tissues. Following this, appropriate statistical tests were applied to analyze the relation between expression level and clinicopathological factors. Ultimately, potential functions and regulatory network of LINC01234 were concluded via GSEA and a series of bioinformatics tools or databases, respectively. Results Consequently, at the end of research we found LINC01234 is up‐regulated in GC tissues in comparison with adjacent normal tissues. Furthermore, its expression level is correlated with differentiation of patients with GC. It is also important to highlight bioinformatics analysis revealed that LINC01234 is involved in cancer‐associated pathways such as cell cycle and mismatch repair. Also, regulatory network of LINC01234 presented a probability in the involvement of tumorigenesis through regulating cancer‐associated genes. Conclusion Overall, our results suggested that LINC01234 may play a crucial role in GC.

a large number of lncRNAs with unknown functions and regulation mechanism in GC. Due to the advances of sequencing technology, more and more high-throughput data of transcriptome in GC were carried out. The Cancer Genome Atlas (TCGA) collects sequencing data of genome, transcriptome, and epigenome from many patients with various kinds of cancer including stomach adenocarcinoma (STAD). It provides an opportunity to dig out unknown genes especially for those lncRNAs in GC.
In this research, we first analyzed gene expression profiles of STAD patients in TCGA and found a number of lncRNAs differently expressed in cancerous tissues compared with adjacent non-cancerous tissues. Then, we verified one of the up-regulated lncRNA, LINC01234, in GC tissues compared with adjacent non-cancerous tissues by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR). Also, the association between expression level of LINC01234 and clinicopathological factors was analyzed.
Subsequently, we annotated the functions of LINC01234 using Gene Set Enrichment Analysis (GSEA) method and constructed the LINC01234 regulatory network to well interpret the regulation mechanism of LINC01234 in GC.  (Table S1). Long intergenic non-coding RNA (lincRNA) and

| Differently expression analysis of lncRNAs in STAD from TCGA
antisense RNAs were selected as lncRNAs and were analyzed by t test. False discovery rate (FDR) method was used to correct P values.
Those with FDR < 0.05 and fold change larger than 1.5 were considered to be as differently expressed lncRNAs.

| Collection of GC samples and patients' clinical information
Paired cancerous and adjacent normal tissues of 83 GC patients were collected during surgery in the span of 2010 to 2015 at Zhejiang Cancer Hospital. The adjacent normal tissues were defined as those tissues located 5 cm away from the edge of the tumor. All the samples with a size of around 0.1 cm 3 were immediately preserved in RNA fixer (BioTeke) and stored at −80°C until use. For each GC patient, the clinical information consisted of age, gender, invasion depth, differentiation, lymphatic metastasis, distal metastasis, and TNM stage. It is important to state no patient had undergone preoperative radiotherapy or chemotherapy. Also, each patient had handed over a written consent with a signed name indicating they are willing to participate in this research and the ethics committee of Ningbo University approved for this investigation.

| Total RNA extraction and qRT-PCR
The methods for total RNA preparation and qRT-PCR were analogous to our previous study. 18 For instance, we extracted the total RNA using TRIzol reagent (Thermo Fisher Scientific) from each cancer tissue and adjacent normal tissue. From here, we were able to detect total RNA by using a protein-nucleic acid spectrophotometer according to A260/280 ratio. Hereafter, 2 μg RNA was reverse-transcribed into cDNA with GoTaq qPCR Master Mix (Promega) and the process of qRT-PCR was performed on LightCycler 480 (Roche). The sequences of PCR primers for β-actin were 5′-CATGTACGTTGCTATCCAGGC-3′ (forward) and 5′-CTCCTTAATGTCACGCACGAT-3′ (reverse). On the other hand, the sequences of PCR primers for LINC01234 were 5-TCTACTAGAGCCTCCAGAAGG-3′ (forward) and 5-CTACTCTTCACGCAGAGGA-3′ (reverse). Importantly, the con-

| Gene set enrichment analysis of LINC01234
By using the median expression level of LINC01234 as cutoff, STAD patients were divided into two groups: with low expression and high expression of LINC01234, respectively. Subsequently, FPKM expression profiles for STAD patients and group labels of samples were put into GSEA software. 19 Gene Ontology (GO) Biological Process (BP) term and KEGG pathway datasets were selected to calculate the enriched functions and pathways associated with LINC01234. Adjusted P-value < .05 was considered to be statistically significant.

| TF-LINC01234 regulation
We downloaded the genomic location of peaks of transcription factor (TF) from Cistrome databases, 20 which re-calculated ChIP-seq datasets for TF and histone modification from GEO database. 21 Next, we compared the chromosome position of these binding regions with that of LINC01234, only those with the binding sites locating promoters of LINC01234 were considered as TF-LINC01234 regulation relationships.
Then, TFs were filtered by differently expressed protein-coding genes in GC identified from STAD expression profiles from TCGA by using t test. False discovery rate (FDR) method was used to correct P-values for multiple comparisons, and .05 was set as a cutoff.

| miRNA-LINC01234 interactions
The conclusion of miRNA-LINC01234 interactions was established upon reliable miRNA target prediction tool known as miRanda set on default parameters. 22 Due to the up-regulation of LINC01234 in GC, only those miRNAs down-regulated in GC were obtained according to miRCancer database. 23

| RBP-LINC01234 interactions
Likewise, prediction of RBP-LINC01234 interactions was set by utilizing a model called lncPro 24 using sequence information downloaded from UniProt. 25 Afterward, RBP was also filtered by removing nondifferently expressed protein-coding genes in STAD from TCGA.

| Statistical analysis
IBM SPSS 21.0 software (SPSS) and R 3.3.3 were the two software used to perform statistical analysis. Comparison of "expression values" among three or more groups was analyzed by one-way analyses of variance (ANOVAs), while that between two groups was performed by Student's t test. Statistical differences were set at *P < .05, **P < .01, and ***P < .001. P < .05 was set to analyze the statistical significances.

| Experimental verification of LINC01234 up-regulation in GC tissues
Firstly, we downloaded the expression profiles of STAD from TCGA and investigated the differently expressed lncRNAs. Consequently, 1016 up-regulated and 140 down-regulated lncRNAs in GC compared with non-cancer tissues were found ( Figure 1A, Table S2).
Among them, one of up-regulated lncRNAs, LINC01234, was selected to study deeply because of poor knowledge of it in GC. We then verified the disorder expression pattern of LINC01234 using qRT-PCR in 83 GC tissues and adjacent normal tissues ( Figure 1B).
Hence, by comparing the adjacent non-cancerous tissues, it is concluded that LINC01234 is strictly up-regulated in 61 of 83 GC tissues (73.5%, Figure 1C, P < .001).

| Association analysis between expression level of LINC01234 and clinicopathological factors in GC patients
In the previous study, LINC01234 was considered to be a potential diagnostic marker in GC based on the data of TCGA. 26 Consequently, we evaluated the likely diagnostic value of LINC01234 based on our own dataset. Initially, we performed a statistical analysis to examine the relationship between the clinicopathological factors and the expression level of LINC01234.
As a result, we found differentiation of GC was associated with LINC01234 expression, that means the lower the LINC01234 expression is, the more the possibility for poor differentiation of GC tissues is (P < .05, Table 1). Besides, P value of the test for association between distal metastasis and LINC01234 expression is <0.05. However, the sample size of GC patients with M1 stage is not enough (n = 5) that the result may be unbelievable.

| Potential functions of LINC01234
To explore the potential functions of LINC01234 in GC, we firstly divided the STAD patients from TCGA into two groups, low expression and high expression of LINC01234 in cancer tissues, respectively. Secondly, GSEA was performed to investigate biological processes or pathways that were associated with LINC01234.
Thus, the results showed LINC01234 may be involved in cancer and immune-related pathways such as cell cycle ( Figure 3A), mismatch repair ( Figure 3B), intestinal immune network for IgA production ( Figure 3C), and B-cell receptor signaling pathway ( Figure 3D). In the case of GO BP, cancer-associated functions were found such as negative regulation of tumor factor-mediated signaling pathway ( Figure 3E) and positive regulation of cell migration are involved in sprouting angiogenesis ( Figure 3F). These findings present a strong evidence that LINC01234 has a major role in GC formation.  Figure 4).

| TF-LINC01234 regulation
Some of the 31 TFs have thoroughly participated in GC development. For example, FOXK2 inhibited the proliferation, invasion, and migration of GC cells, and its down-regulation is related to poor prognosis in GC patients. 27 Besides, HDAC2 was significantly upregulated in various histopathologic grades of human GC, and the F I G U R E 1 A, Differently expressed lncRNAs in STAD. B, Expression level (FPKM) of LINC01234 in STAD tissues compared with adjacent normal tissues. C, Expression level (△Ct value) of LINC01234 in GC tissues compared with adjacent normal tissues inactivation of HDAC2 has been confirmed to reduce cell motility, cell invasion, clonal expansion, and tumor growth. 28 Specifically, 2 TFs were found to co-express with LINC01234 according to the coexpression network we previously constructed. 29 They are ELK1 and ZNF664 (Table S4).

| miRNA-LINC01234 regulation
A total of 49 miRNAs were predicted to regulate LINC01234 in GC. Several of them have already been accepted to be correlated with GC progression. Also, the overexpression of miR-1284 was reported to be a suppressor for GC by controlling over cell proliferation and apoptosis. 30 In fact, a prior study showed miR-1284 might modulate multidrug resistance of GC cells by targeting specific genes. 31 The miR-1297 expression found to be remarkably lower in GC tissue and suppress GC cell growth by inhibiting the expression of CREB1. 32

| RBP-LINC01234 regulation
A total of 138 RBPs were predicted to likely interact with LINC01234.
Among them, a part of RBPs was already shown to be related to GC.