Temporal expression and functional analysis of long non‐coding RNAs in colorectal cancer initiation

Abstract Long non‐coding RNAs (lncRNAs) have potential applications in clinical diagnosis and targeted cancer therapies. However, the expression profile of lncRNAs in colorectal cancer (CRC) initiation is still unclear. In this study, the expression profiles of lncRNAs and mRNAs were determined by microarray at specific tumour stages in an AOM/DSS‐induced primary colon cancer model. The temporal expression of lncRNAs was analysed by K‐means clustering. Additionally, weighted correlation network analysis (WGCNA) and gene ontology analysis were performed to construct co‐expression networks and establish functions of the identified lncRNAs and mRNAs. Our results suggested that 4307 lncRNAs and 5798 mRNAs are deregulated during CRC initiation. These differential expression genes (DEGs) exhibited a clear correlation with the differential stage of tumour initiation. WGCNA results suggested that a series of hub lncRNAs are involved in regulating cell stemness, colon inflammation, oxidative stress response and cell death at each stage. Among them, lncRNA H19 was up‐regulated in colon tumours and correlated with poor patient prognosis. Collectively, we have been the first to demonstrate the temporal expression and function of lncRNAs in CRC initiation. These results provide novel diagnosis and therapy targets for CRC.


| INTRODUC TI ON
Colorectal cancer (CRC) is the third most commonly diagnosed cancer in men and the second in women worldwide, resulting in over 690 000 deaths annually. 1 Although the survival of CRC patients has doubled in the last 40 years, the 5-year survival rate remains low at only 65%. 2 The overwhelming morbidity and mortality caused by CRC affects the global economy through loss of productivity and burden on the healthcare system. 3 Various studies have demonstrated that early diagnosis and targeted therapy are efficient strategies for reducing mortality. 4,5 Consequently, researchers have expended considerable resources to understand the underlying molecular mechanisms responsible for CRC tumorigenesis and to develop novel diagnosis and therapy targets for CRC at the coding RNA, microRNAs (miRNA) and epigenetic level. 6,7 Non-coding RNAs (ncRNAs) are RNA transcripts that do not code for proteins. Recently, numerous ncRNAs, including long ncRNAs (lncRNAs), miRNA and circular RNAs, have been discovered owing to advances in next generation RNA sequencing (RNA-Seq) in animals. LncRNAs are more than 200 nucleotides in length and are differentially expressed in particular tissues as well as in different tumour types. 8 LncRNAs and mRNAs share many features, as both are transcribed by RNA polymerase II (Pol II), 5ʹ-capped, spliced, polyadenylated and are biochemically identical. However, lncRNAs usually are shorter in length, have fewer but longer exons and have lower primary sequence conservation compared with mRNAs. 9 Although lncRNAs do not encode proteins, they still have many functions. In both the innate and adaptive immune system, lncRNAs play an important role in regulating the expression of cytokine genes and the development of immune cells, including myeloid cells, dendritic cells, T cells and B lymphocytes. 10 The functions of lncRNAs are also crucial in both developmental and differentiation processes in mammals via controlling specific gene expressions. 11 Moreover, lncRNAs have strong relationships with tumorigenesis. LncRNAs can influence the transcription of tumour-related genes by altering epigenetic modifications or regulating enhancers. 12 A few lncRNAs are revealed to act as tumour suppressors that regulate cancer signalling pathways such as p53 tumour suppressor signalling. 13 Therefore, lncRNAs may be great biomarkers in clinical diagnosis and provide effective potential targets in future cancer therapies.
Several lncRNAs were demonstrated to be involved in regulating proliferation, migration and chemotherapy resistance and predicting prognosis of patients in colon cancer. 14,15 However, the temporal expression profile of lncRNAs in CRC initiation was still unclear. In this study, we aimed to investigate the expression and function of ln-cRNAs at each stage of CRC initiation. The genome-wide annotation of temporally deregulated lncRNAs may improve our understanding of molecular mechanisms that underpin colon cancer initiation and provide a large number of candidate prognostic and therapeutic targets for CRC.

| Animals
C57BL/6J mice (20 g weight, male) were purchased from Beijing HFK Bioscience Co. (Beijing, China) and housed in a specific-pathogen free room (25°C) under a 12-h light/dark cycle with ad libitum access to food and water. All mouse care and experiments were carried out in accordance with Sichuan University guidelines concerning animal use and care.

| Mouse model for primary colon cancer
Mice were injected intraperitoneally with 10 mg/kg azoxymethane (AOM; Sigma-Aldrich, Germany). After 7 days, the mice received drinking water containing 2% dextran sodium sulfate (DSS; MP Biomedicals, CA, USA) for 7 days followed by normal drinking water for 14 days. 16 Totally, three DSS cycles were performed. The degree of colitis-associated cancer (CAC) formation was observed by high resolution mini-endoscopy (STOKE, Germany) and then, the mice were killed at 9 weeks post-AOM injection. 16 After this, whole colons were removed and flushed with phospate-buffered saline and portions of the distal colons were either frozen in liquid nitrogen for total RNA extraction or fixed with 4% formalin and paraffin-embedded for histological analyses.

| Total RNA extraction
Total RNA was extracted from colon tissues using TRIzol reagent (Invitrogen, MA, USA), following the manufacturer's instructions.
The concentration of total RNA was quantified by the NanoDrop ND-2000 (Thermo Scientific, MA, USA) and the RNA integrity was assessed using Agilent Bioanalyser 2100 (Agilent Technologies, Palo Alto, California, USA). The RNA with 28S/18S > 0.7 and 2100 RIN > 7.0 was used for further microarray analysis. Agilent Technologies) was used to analyse array images to get raw data. Genespring (version 14.8; Agilent Technologies) was employed to complete the basic analysis using the raw data. First, the raw data were normalized with the quantile algorithm. The probes that met at least 1 of 2 conditions with flags in 'P' were chosen for further data analysis. Differentially expressed genes or lncR-NAs were then identified through fold change as well as P values calculated with a t test. The threshold set for up-and down-regulated genes was a fold change ≥2.0 and a unadjusted P value ≤0.05.

| Microarray analysis
Three biological replicates are included at each time-point in the microarray analysis.

| Quantitative PCR analysis
TRIzol reagent (Invitrogen) was used to isolate total RNA from mouse intestinal tissues and then the PrimeScript RT Reagent Kit (Perfect Real Time, Takara, Tokyo, Japan) was used for reverse transcription, following the manufacturer's instructions. Using TB Green Premix Ex TaqTM II (Tli RNaseH Plus, Takara, Tokyo, Japan), cDNA samples were amplified to set up a standard curve for calculating the relative expression of RNAs. In addition, we used a housekeeping gene, β-actin, as an internal reference. The sequences of primers used for quantitative real-time PCR (qPCR) are listed in as in Table 1. The reaction of qPCR was set at 95°C (30 seconds) for pre-denaturation, then 95°C (5 seconds) and 58°C (30 seconds) for a total of 42 cycles.
A final dissociation curve was obtained.

| K-means analysis
K-means clustering of unions of the differential expression genes (DEGs) was performed using R package. 17 based on the fuzzy K-means algorithm. The expression of genes within the same cluster was similar and may have common function. Total 6-15 K-means clusters were divided based on the gene expression values and Kmeans = 12 was selected as the optimal one. Enrichment analysis was conducted on every cluster of genes.

| Weighted correlation network analysis
LncRNA and mRNA co-expression networks were constructed using the weighted correlation network analysis (WGCNA) package in R language 18 and Cytoscape were used for data visualization. The gene co-expression network is a scale-free weighted gene network. A significant feature of scale-free networks is that most nodes have only a few connections, with a few nodes having a large number of connections. To satisfy the precondition of scale-free network distribution, the adjacency matrix weight parameter β value needed to be determined. In this study, we evaluated β values from 1 to 20, and the corresponding correlation coefficient and mean value of the adjacent gene were calculated for each. A higher correlation coefficient (maximum = 1) indicates that the network is closer to the network size distribution.
At a certain degree of gene connectivity, the β value should be as small as possible when the correlation coefficient is sufficiently large. Therefore, we selected β = 11 to construct the co-expression networks. Based on the above analysis, we constructed a WGCNA to subdivide thousands of genes into several modules.

| TCGA analysis
Read counts of both mRNA and lncRNA in colon adenocarcinoma tissue samples were downloaded from Genomic Data Commons (GDC, https://portal.gdc.cancer.gov). Differential

Primer names Sequences
Scgn sense 5ʹ-ATCAGTGGTGTGGATCTGGA-3ʹ expression analysis was performed to identify DEGs using the DESeq pipeline. Significant DEGs were defined as genes with abs(LogFC) ≥ 1 and adjusted P value <0.01. Normalized expression values of lncRNAs in normal tissue and tumour tissue were visualized as boxplots.

| Statistical analysis
Numerical continuous data were presented as the mean ± standard deviation and were analysed using Student's t tests for comparing two groups and analysis of variance for multiple group comparisons. Statistical analysis was performed using SPSS version 19.0 (SPSS, Inc., Chicago, IL, USA). Kaplan-Meier curves and the log-rank test were used to compare survival times among the groups. P < 0.05 was considered to indicate a statistically significant difference.

| Induction and identification of primary colon tumours in mice
To

| Time-course analysis of lncRNAs expression patterns in colon tumour initiation
To investigate the temporal expression of DEGs, the common 4307 differential expressed lncRNAs and 5798 differential expressed mRNAs expression values were used as input data to analyse patterns in gene expression by K-means analysis. The expression of gene within the same cluster was similarly and may have common function and 12 K-means clusters were defined. As shown in Figure 3A, 1344 DEGs in cluster 4 were gradually raised and kept at a high level at T3. Meanwhile, 1874 DEGs in cluster 6 were raised at T1 and stabilized at a high level from T1 to T3 ( Figure 3A), whereas 792 DEGs in cluster 5 were kept at a base level from T0 to T2 and only raised at T3 ( Figure 3A). Moreover, DEGs in cluster 2 (1337 DEGs), 9 (1082 DEGs) and 11 (1103 DEGs) reduced at T2, T1 and T3 separately ( Figure 3A). To further understanding the potential functional role of DEGs with obvious temporal correlation, GO-BP analysis of cluster 2, 4, 5, 6, 9 and 11 were performed. As shown in Figure 3B, Wnt signalling pathway was enriched in cluster 5, which only raised at T3, while immune-related processes and chemokine/cytokine expression-related processes were enriched in cluster 9 and 11 separately. Collectively, these results suggested that the DEGs that have similar expression trends may play analogical functional role in CRC formation.

| Weighted correlation network analysis of DEGs
To identify genes expressed together as modules on a higher systems level, a WGCNA was carried out to associate lncRNAs with mRNAs and predict their functions. All DEGs were divided into 53 gene clusters based on the predicted functions ( Figure 4A). To infer the dynamic biological process through time-point samples, we also introduced temporal correlation as an important evaluation index for selecting the functional modules. As shown in Figure 4B, eight mod-

| Functions of lncRNAs in colon tumour initiation
Next, these genes in blue, brown, green and grey60 modules were subjected to GO significant enrichment analyses to identify the biological functions and metabolic pathways in which these modules participate. Biological process analysis showed that the blue module was enriched with the Wnt signalling pathway, which plays a crucial role in regulating stemness of cells and promoting cancer formation ( Figure 5A). Meanwhile, the cell death regulation pathway and Notch signalling pathway were both involved in green module ( Figure 5A).
Genes in the grey60 module were gradually up-regulated at the specific stage of colon inflammation appearance. Additionally, biological process analysis indicated that immune-related processes (monocyte aggregation, leucocyte migration, negative regulation of thymocyte aggregation, positive regulation of monocyte chemotaxis, positive regulation of NF-κB import into nucleus and positive regulation of IL-6 production) are enriched associated with the colitis phenotype ( Figure 5A). Furthermore, G-protein coupled receptor-related signalling was associated with the function of these genes in brown ( Figure 5A). To define the hub genes during CRC formation, lncRNA-mRNA co-expression in the above four modules was established and the lncRNAs that associated with more than five mRNAs were defined as hub genes ( Figure 5B) and Ywhaz were defined as the hub lncRNAs in the grey60 module ( Figure 5B). Meanwhile, ten hub lncRNAs and 12 hub lncRNAs were defined in the green and brown modules, respectively ( Figure 5B).
These results shown that deregulated lncRNAs function in central regulatory roles in CAC formation.

| LncRNA H19 predicts the prognosis of colon cancer patients
LncRNA H19 has been demonstrated to be an oncogene and is upregulated in several cancers. [19][20][21] We also demonstrated that H19 expression was gradually increased along the colon cancer initiation  Meanwhile, H19 was required for intestinal epithelial proliferation, mucosal healing via binding to p53 and microRNAs, 19 and tendon healing through targeting miR-29b-3p. 36 Notably, co-expression analysis in our study suggested that H19 was positively correlated with Wnt10a and MMP7 expression, which play a central role in maintaining cell stemness and tissue repair. [37][38][39] The identified hub lncRNAs in our study may play an important role in regulating various biological processes, but further experimental studies should be performed.

| D ISCUSS I ON
This study first clarified the temporal expression and function of lncRNAs at specific stages of CRC initiation. The defined hub ln-cRNAs that are involved in regulating cell stemness, colon inflammation, oxidative stress responses and cell death would be a series of novel and efficient diagnostic and therapeutic targets for CRC.
Of course, due to the small number of samples, there do exist some limitations in the reported analysis. And further experimental studies should be performed to investigate the functions and underlying mechanism of these hub lncRNAs in CRC initiation.

ACK N OWLED G EM ENTS
Thanks to Chengdu Basebiotech Co., Ltd for providing assistance on bioinformatic analysis.